5
Proc. Nati. Acad. Sci. USA Vol. 88, pp. 9087-9091, October 1991 Biochemistry Sequence effects on local DNA topology VASILY P. CHUPRINA*t, ANDREI A. LIPANOV*§, OLEG Yu. FEDOROFF*t, SEONG-GI KIMt, AGUSTIN KINTANAR¶, AND BRIAN R. REIDt *Research Computer Center, U.S.S.R. Academy of Sciences, Pushchino, 142292, Moscow Region, U.S.S.R.; tUniversity of Washington, Seattle, WA 98195; *Institute of Molecular Genetics, U.S.S.R. Academy of Sciences, 46 Kurchatov Square, Moscow 123182, U.S.S.R.; and %Iowa State University, Ames, IA 50011 Communicated by I. Tinoco, Jr., June 24, 1991 ABSTRACT Nuclear Overhauser effect-derived distances between adenine H2 protons and anomeric Hi' protons on the same strand or on the complementary strand are presented for several different DNA duplexes. The cross-strand (n)AH2 to (m + 1)Hl' distances [designated as x, where (u) and (m) are complementary residues] vary by up to 1 A depending on the sequence. In all possible A-containing pyrimidine-purine steps (CA, TG, and TA), x is >4.5 A. In GA steps, x varies within rather wide limits in the range 3.8-4.5 A, whereas in AA steps the lower limit is 3.7 A and the upper limit is ==4.2 A. In purine-purine steps, x is affected by at least three factors: (i) adjacent pyrimidine-purine steps at the 5' end [e.g., YRA sequences (where Y = T or C and R = G or A)], or a pyrimidine-purine step at the 3' end of the pyrimidine- pyrimidine step on the complementary strand, cause x to increase, (fi) an AT step at the 3' end of a purine-purine step (e.g., RAT) causes x to decrease, and (ii) substitution of bases at the next-nearest neighbor position leads to changes in x at GA and AA steps. The latter factor seems to be due to a cooperative effect arising from formation of the "anomalous" B' structure when the substitution produces an A.T1, tract (which always produces a decrease in x). The data indicate that (n)AH2-(n + 1)Hl' distances on the same strand (designated as s) are also sequence dependent. Thus on AA steps, neighboring substitutions produce the same effect on s as on the cross-strand x distances. The results lead to the ability to predt changes in AH2-H1' distances depending on the DNA sequence. By using high-resolution x-ray B-type structures as a set of allowable B conformations, a very good correlation was found between x and the minor groove width parameters P-P or Hl'-Hl'. Thus, the x distances are a direct probe of the minor groove width in B-type DNA, and changes in this distance therefore reflect changes in the minor groove width. Since many of the se- quences studied are sites of protein recognition, the observed sequence-structure dependence in DNA probably plays an important role in the process of recognition by proteins and minor groove ligands such as drugs. Determination of the sequence dependence of the DNA double helix structure is a subject that has attracted attention for several years. Considerable progress has been achieved lately in this area, particularly in DNA sequences containing A/T tracts (1, 2), but many problems remain. Neither x-ray structure analysis nor NMR has been entirely successful in attempts to solve this problem so far. The approach based on single-crystal x-ray structure analysis has been compromised by two problems: first, relatively few DNA sequences have been crystallized and solved to high resolution in the B-form (3, 4), and second, crystal packing forces have been shown to affect the structure (5, 6). Even if a reliable relationship between B-DNA conformation and sequence could be deter- mined in the crystalline state, it remains unclear whether this relationship would hold in solution. Many studies have been devoted to the determination of double-stranded DNA struc- ture by two-dimensional NMR spectroscopy. The usual procedure is to determine the distances between closely located protons from nuclear Overhauser effect spectroscopy (NOESY) cross-peaks, which are then used in distance geometry calculations or as constraints in refining canonical DNA using molecular mechanics or dynamics (for reviews, see refs. 7 and 8). The NMR approach is a relatively recent one that is still undergoing development and improvement; some of the problems include spectral overlap and underde- termination, and the reliability of the method has been called into question (9). In the present study our aim was not to obtain specific three-dimensional structures that would fit the NMR data, but rather to reveal a possible sequence depen- dence of some particular structural features. Therefore, we used the simpler approach of investigating the correlation between changes in sequence and changes in distances be- tween a few specific proton pairs. In this approach, we avoid the potential errors and structure-distance artifacts that often arise in structure determination using molecular dynamics, molecular mechanics, or distance geometry. We present AH2-H1' distances measured in our laboratory by two- dimensional NOESY spectroscopy for 13 double-stranded oligomers containing adenine residues in various sequence environments. We observe distinct trends between these distances and the DNA sequence, and we also show the DNA structural parameters that correlate with these distances. MATERIALS AND METHODS All DNA dodecamers were synthesized and purified as described (10). The DNA samples were dissolved in 0.4 ml of buffer containing 25 mM sodium phosphate (pH 7.0), 0.5 mM EDTA, and 50 mM NaCl. The samples were repeatedly lyophilized to dryness and finally dissolved in 0.4 ml of 99.996% 2H20. The DNA sample size was typically 20-25 mg, resulting in a final concentration of 6-7 mM. The NMR experiments were performed at 500 MHz, either on a Bruker WM-500 spectrometer or on a home-built NMR spectrome- ter. Five NOESY spectra with mixing times of 30, 60, 90, 120, and 180 msec were usually collected by using the phase- sensitive method (11). In each NOESY spectrum, the mixing time was randomly varied over ±101% of the corresponding mixing time to eliminate zero-quantum coherence transfer. For all experiments, 1024 or 2048 complex points in the t2 time domain and 400 points in t1 were collected. For each t1 value, 32 or 64 scans were averaged with 4386 Hz of spectral width and 2.0 sec of relaxation delay between transients. The data were collected at 30-370C. The acquired data were transferred to an IRIS 4-D computer and then processed using FTNMR (Hare Research, Woodinville, WA). Data sets Abbreviation: NOESY, nuclear Overhauser effect spectroscopy. §Present address: Molecular Biology Institute, University of Cali- fornia, Los Angeles, CA 90024-1570. 9087 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Downloaded by guest on October 6, 2020

Sequence effects on local DNA topology - PNAS · B-type DNA, and changes in this distance therefore reflect changes in the minor groove width. Since many of the se-quencesstudied

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sequence effects on local DNA topology - PNAS · B-type DNA, and changes in this distance therefore reflect changes in the minor groove width. Since many of the se-quencesstudied

Proc. Nati. Acad. Sci. USAVol. 88, pp. 9087-9091, October 1991Biochemistry

Sequence effects on local DNA topologyVASILY P. CHUPRINA*t, ANDREI A. LIPANOV*§, OLEG Yu. FEDOROFF*t, SEONG-GI KIMt,AGUSTIN KINTANAR¶, AND BRIAN R. REIDt

*Research Computer Center, U.S.S.R. Academy of Sciences, Pushchino, 142292, Moscow Region, U.S.S.R.; tUniversity of Washington, Seattle, WA 98195;*Institute of Molecular Genetics, U.S.S.R. Academy of Sciences, 46 Kurchatov Square, Moscow 123182, U.S.S.R.; and %Iowa State University,Ames, IA 50011

Communicated by I. Tinoco, Jr., June 24, 1991

ABSTRACT Nuclear Overhauser effect-derived distancesbetween adenine H2 protons and anomeric Hi' protons on thesame strand or on the complementary strand are presented forseveral differentDNA duplexes. The cross-strand (n)AH2 to (m+ 1)Hl' distances [designated as x, where (u) and (m) arecomplementary residues] vary by up to 1 A depending on thesequence. In all possible A-containing pyrimidine-purine steps(CA, TG, and TA), x is >4.5 A. In GA steps, x varies withinrather wide limits in the range 3.8-4.5 A, whereas in AA stepsthe lower limit is 3.7 A and the upper limit is ==4.2 A. Inpurine-purine steps, x is affected by at least three factors: (i)adjacent pyrimidine-purine steps at the 5' end [e.g., YRAsequences (where Y = T or C and R = G or A)], or apyrimidine-purine step at the 3' end of the pyrimidine-pyrimidine step on the complementary strand, cause x toincrease, (fi) an AT step at the 3' end of a purine-purine step(e.g., RAT) causes x to decrease, and (ii) substitution of basesat the next-nearest neighbor position leads to changes in x atGA and AA steps. The latter factor seems to be due to acooperative effect arising from formation of the "anomalous"B' structure when the substitution produces an A.T1, tract(which always produces a decrease in x). The data indicate that(n)AH2-(n + 1)Hl' distances on the same strand (designated ass) are also sequence dependent. Thus on AA steps, neighboringsubstitutions produce the same effect on s as on the cross-strandx distances. The results lead to the ability to predt changes inAH2-H1' distances depending on the DNA sequence. By usinghigh-resolution x-ray B-type structures as a set of allowable Bconformations, a very good correlation was found between xand the minor groove width parameters P-P or Hl'-Hl'. Thus,the x distances are a direct probe of the minor groove width inB-type DNA, and changes in this distance therefore reflectchanges in the minor groove width. Since many of the se-quences studied are sites of protein recognition, the observedsequence-structure dependence in DNA probably plays animportant role in the process of recognition by proteins andminor groove ligands such as drugs.

Determination of the sequence dependence of the DNAdouble helix structure is a subject that has attracted attentionfor several years. Considerable progress has been achievedlately in this area, particularly in DNA sequences containingA/T tracts (1, 2), but many problems remain. Neither x-raystructure analysis nor NMR has been entirely successful inattempts to solve this problem so far. The approach based onsingle-crystal x-ray structure analysis has been compromisedby two problems: first, relatively few DNA sequences havebeen crystallized and solved to high resolution in the B-form(3, 4), and second, crystal packing forces have been shown toaffect the structure (5, 6). Even if a reliable relationshipbetween B-DNA conformation and sequence could be deter-mined in the crystalline state, it remains unclear whether this

relationship would hold in solution. Many studies have beendevoted to the determination of double-stranded DNA struc-ture by two-dimensional NMR spectroscopy. The usualprocedure is to determine the distances between closelylocated protons from nuclear Overhauser effect spectroscopy(NOESY) cross-peaks, which are then used in distancegeometry calculations or as constraints in refining canonicalDNA using molecular mechanics or dynamics (for reviews,see refs. 7 and 8). The NMR approach is a relatively recentone that is still undergoing development and improvement;some of the problems include spectral overlap and underde-termination, and the reliability of the method has been calledinto question (9). In the present study our aim was not toobtain specific three-dimensional structures that would fit theNMR data, but rather to reveal a possible sequence depen-dence of some particular structural features. Therefore, weused the simpler approach of investigating the correlationbetween changes in sequence and changes in distances be-tween a few specific proton pairs. In this approach, we avoidthe potential errors and structure-distance artifacts that oftenarise in structure determination using molecular dynamics,molecular mechanics, or distance geometry. We presentAH2-H1' distances measured in our laboratory by two-dimensional NOESY spectroscopy for 13 double-strandedoligomers containing adenine residues in various sequenceenvironments. We observe distinct trends between thesedistances and theDNA sequence, and we also show the DNAstructural parameters that correlate with these distances.

MATERIALS AND METHODSAll DNA dodecamers were synthesized and purified asdescribed (10). The DNA samples were dissolved in 0.4 ml ofbuffer containing 25 mM sodium phosphate (pH 7.0), 0.5 mMEDTA, and 50 mM NaCl. The samples were repeatedlylyophilized to dryness and finally dissolved in 0.4 ml of99.996% 2H20. The DNA sample size was typically 20-25mg, resulting in a final concentration of 6-7 mM. The NMRexperiments were performed at 500 MHz, either on a BrukerWM-500 spectrometer or on a home-built NMR spectrome-ter. Five NOESY spectra with mixing times of30, 60, 90, 120,and 180 msec were usually collected by using the phase-sensitive method (11). In each NOESY spectrum, the mixingtime was randomly varied over ±101% of the correspondingmixing time to eliminate zero-quantum coherence transfer.For all experiments, 1024 or 2048 complex points in the t2time domain and 400 points in t1 were collected. For each t1value, 32 or 64 scans were averaged with 4386 Hz of spectralwidth and 2.0 sec of relaxation delay between transients. Thedata were collected at 30-370C. The acquired data weretransferred to an IRIS 4-D computer and then processedusing FTNMR (Hare Research, Woodinville, WA). Data sets

Abbreviation: NOESY, nuclear Overhauser effect spectroscopy.§Present address: Molecular Biology Institute, University of Cali-fornia, Los Angeles, CA 90024-1570.

9087

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

Oct

ober

6, 2

020

Page 2: Sequence effects on local DNA topology - PNAS · B-type DNA, and changes in this distance therefore reflect changes in the minor groove width. Since many of the se-quencesstudied

9088 Biochemistry: Chuprina et al.

were zero-filled to 2048 points, apodized with a sine-squared900 phase-shifted function, and then Fourier transformed.The resolved cross-peaks corresponding to each detectableproton-proton interaction were integrated from the NOESYspectra.

RESULTS AND DISCUSSIONIn this study we provide AH2-H1' distances for 13 duplexesand focus on the variation in these distances with sequence(Table 1). Distances were calculated from the correspondingNOESY cross-peak build-up initial rates using the well-known r-6 distance dependence of dipolar cross-relaxationand a cytosine H5-H6 distance of 2.5 A as a fixed reference(Fig. 1 shows an example). It should be pointed out that,regardless of how accurate the calculated distances are, therates of growth of the AH2-H1' cross-peak volumes in theexamples to be presented are strongly sequence dependentand follow the same patterns that we discuss below fordistances. The validity of the conversion from rates todistances (accuracy) is discussed in the legend to Table 1.

AH2-H1' distances are relatively immune to spin diffusiondistance errors and can be determined fairly accurately fromthe two-spin approximation, at least up to mixing times of120 msec, since AH2 protons have long spin-lattice relax-

ation times (T1) and spin-spin relaxation times (T2), and theyare quite removed from other protons that could serve asintervening spins, regardless of the DNA conformation.Although the variations in AH2-H1' distances are onlyaround 1 A and the error in measuring distances of -4 A is±0.2 A (see legend to Table 1), the very fact that consistentchanges of AH2-H1' distances are observed in all sequencesstudied (including literature data) indicates that thesechanges are real and reproducible.Dependence of the Cross-Strand x Distances on Base Se-

quence. The cross-strand x distances (as described in Table 1)probe the adenine environment in CA, TA, GA, and AAsteps. Fig. 2 shows histograms for all these steps, includingdata taken from the literature (12-15, 17). Several trends andconsistent patterns are observed.

Pyrimidine-purine steps. On all pyrimidine-purine steps(e.g., TA and CA/TG), x is >4.5 A.

Purine-purine steps. On GA steps (or the complementaryTC step), x can be either large (4.5 A) or small (3.9 A) (Fig.2 and Table 1). An inspection ofthe sequences listed in Tables1 and 2 suggests several conclusions about the flankingsequence dependence of these distances. (i) A comparison ofsequences 1 and 3 with sequences 8, 10, and 11 from Table2 shows that the presence of a pyrimidine-purine step at the5' end of the GA step, or a pyrimidine-purine step at the 3'end of the TC step in the complementary strand (e.g., aYGA/TCR sequence), is accompanied by an increase in x atthe GA and TC steps. The differences in x at the GA/TC stepin sequences 4 and 13, and perhaps 5 and 7, also largelydepend on the presence or absence of a pyrimidine-purinestep at the 5' end of the GA or at the 3' end of the TC. Thesedata, along with the observation that the x distance is large onall known pyrimidine-purine steps (see above and Fig. 2),suggest not only that x increases at the pyrimidine-purinestep but also that its influence can spread to the adjacent stepsin both directions. This suggests that at least part of theincrease in the x distance at the 5' end of an An tract(sequence 10 from Table 1; refs. 13-15) is due to the presenceof a proximal pyrimidine-purine step next to or at the 5' endofYRAn or NYA,, sequences. Interestingly, the absence ofanincrease in x at the 3' end of an A,, tract correlates with theabsence of a proximal pyrimidine-purine step following theA,, tract. (ii) A comparison of distances in sequences 5, 6, and12 of Table 2 shows that the purine-pyrimidine AT step in anAnTm tract leads to a decrease in x at a GA step that is

Table 1. Interstrand (n)AH2-(m + 1)Hl' and intrastrand(n)AH24n + 1)Hl' distances

Duplex Res-

No. Sequence idue x s Site1 5'-CGCG&LTTCGCG A5 4.4 4.3 EcoRI

3'-GCGCTT^kGCGC A6 4.0 *2 5'-GCCGTTR&CGGC A7 >4.5 >4.5 Hpa I

3'-CGGCKATTGCCG A8 4.2 4.23 5'-GCGAAATTTCGC A4 4.3 4.3

3'-CGCTTTAAAGCG A5 4.0 4.1A6 3.7 3.6

4 5'-GCGTTTIA&CGC A7 >4.5 >4.5 Aha III3'-CGCIIKTTTGCG A8 4.2 4.2

A9 3.9 3.95 5'-GCCGT&TACGGC A6 * 4.0 Sna I

3'-CGGCKTATGCCG A8 >4.5 4.06 5'-GCCTGITC&GGC A6 4.4 3.9 Bc1 I

3'-CGG&CT&GTCCG A9 * *7 5'-GCAGGATCCTGC A3 * * BamHI

3'-CGTCCT&GGACG A6 4.0 3.98 5'-GCCGG&TCCGGC A6 4.0 3.8 BamHI

3'-CGGCCTAGGCCG9 5'-GCCGTGCACGGC A8 >4.5 3.9 HgiAI

3'-CGGCACGTGCCG10 5'-CG^k^AA TCGG A3 4.5 4.5

3'-GCTTTTT&GCC A4 * 4.5A5 3.9 4.5A6 3.8 4.1A7 3.8 4.0A1S 3.9 4.1

11 5'-CGRAG&ATCGG A3 * * MboII3'-GCTTCTTAGCC A4 * >4.5

A6 4.0 >4.5A7 4.0 *A1S 3.9 3.9

12 5'-GCGTTCG&&CGC A8 4.4 * Asu II3'-CGCAAGCTTGCG A9 4.1 4.1

13 5'-GCCGATATCGGC AS 4.5 3.9 EcoRV3'-CGGCTATKGCCG A7 * 3.9

x is the (n)AH2-4m + 1)H1' interstrand distance and s is theintrastrand (n)AH2-(n + 1)H1' distance. We define (n) and (m) ascomplementary residues with (n - 1) as the 5' neighbor of (n) and (m+ 1) as the 3' neighbor of (m); thus, (n - 1) and (m + 1) arecomplementary residues, as are (n + 1) and (m - 1). The distanceestimates are not significantly distorted by spin diffusion since thereis no intervening spin pathway; i.e., in all B-DNA crystal structures,only the H1'(m + 1), H1'(n + 1), and the AH2 ofan adjacent adenineare within 4.4 A from any AH2(n). This is also confirmed by ourtwo-spin estimates ofAH2-H1' distances in back-calculated spectrafor some B-type structures ofDNA (data not shown). Furthermore,the two-spin calculated distances coincide with the multispin calcu-lated distances (full simulations) and with the actual distances inthese B-type structures with an accuracy of0.1 A. The potential errorarising from partial saturation of the longer spin-lattice relaxationtime (T1) AH2 protons is minimized by using the above-diagonal Hi'to AH2 cross-peak rather than the below diagonal AH2-H1' cross-peak. For some sequences (e.g., GCGAAATTTCGC), one-dimensional spectra taken with 2- and 20-sec relaxation delays werecompared and indicate an error of no more than 0.2 A at the shorterdelay. Differential internal motion could in principle also affect thedistances given in Table 1. However neither the Hl'-H2' cross-peakvolumes nor the thymine MS-H6 volumes display significantchanges along the sequence (<20%6). Finally, the presented distancesinclude the inherent signal-to-noise error in cross-peak volumedetermination. As these distances are rather large (."4.3 A) andcross-peaks are not very strong, the error in volume determinationcan be as much as "40o, which results in distance errors of 0.2 A,making the upper limit distance higher for distances >4.3 A. Theboldface letters indicate A-containing segments. An asterisk indi-cates that the corresponding cross-peaks were overlapped.

Proc. Natl. Acad. Sci. USA 88 (1991)D

ownl

oade

d by

gue

st o

n O

ctob

er 6

, 202

0

Page 3: Sequence effects on local DNA topology - PNAS · B-type DNA, and changes in this distance therefore reflect changes in the minor groove width. Since many of the se-quencesstudied

Proc. Nati. Acad. Sci. USA 88 (1991) 9089

Table 2. (n)AH2-(m + 1)H1' interstrand distances inGA/TC steps

No.

7.6 4 72 70 68 66DM00'T

FIG. 1. Stack plot of a 120-msec NOESY spectrum of GCGTT-TAAACGC in the H8/H6/H2 (7-8 ppm) to H1'/H5 (5-6 ppm)region. The arrows from right to left show the cross-strand(n)AH24m + 1)H1' nuclear Overhauser effects (NOEs) for A7, A8,and A9. Note the virtual absence of the cross-strand A7 NOE at the5' end of the A tract compared with the strong A9H2-T5H1' NOE atthe 3' end of the A tract.

5'-adjacent to the AT step (or at the 3'-adjacent TC step)down to -3.9 A. (iii) The data presented in i and ii indicatethat the structure of the GA/TC step is affected by theadjacent 5' or 3' bases. By comparing sequences 1 and 3 withsequences 9 and 12 (or their complementary chains) in Table2, it is seen that the x distance at the GA/TC step is affectedby replacements at even more remote base positions. Itseems that here we observe the cooperativity effect in AnTmtracts. These data indicate that the cooperative effect thatgenerates a narrow minor groove structure within the AnTmtract may be achieved with only three AT base pairs providedthey are arranged in an ATT or AAT sequence, though herea considerable role may well be played by the flanking bases.

Purine-purine and pyrimidine-pyrimidine steps. For allAA/TT steps, x is <4.2 A (Fig. 2 and Table 1). However, asseen in Fig. 2 and Table 1, this distance is not fixed butcorrelates with changes in the flanking bases. It seems that

A

30- 308 4.0 4 2

,ross AH2 - ,A

Sequence1 GCCTGATCAGGC2 GCGTTCGAACGC3 GCCG&TATCGGC4 GCG&ATTCGC5 GCG&AATTTCGC6 CGAAAAATCGC7 GG&AATTTCC8 CTGGITCCAG9 CCG&TTCTTCG10 GCAGG&TCCTGC11 GCCGG&TCCGGC12 CCG&TTTTTCG13 CGAAGAATCGG

x

4.44.44.54.44.34.53.84.03.94.04.03.94.0

The numbers below each sequence are interstrand distancesbetween (n)AH2 and (m + 1)H1'. All the data have been taken fromTable 1 with the exception ofsequences 7 and 8, which are taken fromliterature (14, 17). Sequences 6 and 12, as well as 9 and 13, representtwo strands ofthe same duplex. The GA step ofinterest is in boldfacetype.

the presence of a pyrimidine (YAA) or a YG step (YGAA) atthe 5' end of the AA step increases x at the AA step (Table1), but the presence of a Tn tract at the 3' end of the AA stepdecreases x at this step. If there is an A at the 5' end of AAand a T, C, or A at the 3' end, the distance at the AA stepdecreases to <3.9 A (Table 1). It seems that this decrease ismainly due to the cooperative transition effect in the AnTmregion, forming the altered B' structure discussed above.These comparisons forAA/TT steps show that the variationsin x at this step are due to the same factors that influenceGA/TC steps.

Sequence Dependence of the Intstrand s Distances. Theintrastrand distance s between (n)AH2 and (n + l)H1' [seeTable 1 for the definition of (n) and (I)] can be determinedfor AA, AT, AC, and AG steps. As shown in the histogramin Fig. 3, in AT and AC steps s is <4.2 A; it is in the rangeof 3.6-4.2 A. The corresponding distance in AA steps, andperhaps AG steps, has a higher upper limit, in the range of4.0to >4.5 A. These intrastrand distances, like the cross-stranddistances, depend on the flanking sequences. In an AA step,

B

10] GATC

231j

'36 38 40 42 44 4os,s Al'2 -H A

FIG. 2. Distribution of the number of AA/TT, CA/TG, and TA steps (A) and GA/TC steps (B) versus the interstrand (n)AH2-4m + 1)Hl'distances (x) at these steps. Data were taken from the literature (12-15, 17) for seven sequences (open bars) and from Table 1 (hatched bars).Distances for the ref. 15 sequence were increased by 0.2 A because the authors used a rather short reference cytosine H5-H6 distance (2.39A). Interproton distances were calculated with an accuracy of approximately +0.2 A. The variations in x can be described as follows. (i) Atall pyrimidine-purine steps, x is >4.5 A. (ii) When purine precedes the GA, or a Tn tract (n 2 2) follows the GA, x is 3.8-4.0 A; in all othercases x increases to >4.3 A. The same applies to the complementary TC step with the only difference being that the 3' and 5' ends should beexchanged. (iii) At AA steps, x is -4.2 A in a YAA context, decreases to 4.0-4.1 A in GAA sequences, and decreases further to 3.8 A insideAnTm tracts; in all cases a Tn tract at the 3' end decreases x by -0.2 A. The effects are the same on TT steps, with the difference being thatthe 3' and 5' ends should exchange places. All these distances depend to some extent on experimental solution conditions; for instance, lowertemperatures may reduce the minor groove width in AnTm tracts.

Biochemistry: Chuprina et al.

Dow

nloa

ded

by g

uest

on

Oct

ober

6, 2

020

Page 4: Sequence effects on local DNA topology - PNAS · B-type DNA, and changes in this distance therefore reflect changes in the minor groove width. Since many of the se-quencesstudied

9090 Biochemistry: Chuprina et al.

B

3.6 3.8 4.0 4.2 4.4 4.5

Same AH2-H/i'A

I AC

5-

4-

3-

2

03.6 vhE~. 44 43.6 3.8 4.0 4.2 4.4 I 5

Same AH2 Hi'. A

.0.4

FIG. 3. The histogram is analogous to that represented in Fig. 2 with the exception that interproton distances (s) are between (n)H2 and (n +1)H1' on the same strand. (A) AA step. (B) AC step. (C) AT step. Distance variations can be described for an AA step as follows: s is 24.5 Awhen a pyrimidine is at the 5' end and decreases to 4.1-4.3 A in AnTm tracts. At AC steps, s is -4.1 A and decreases by 0.2 A in AnTm tracts.At AT steps, s is -4.0 A and decreases by 0.3 A in the middle of long AnTm tracts.

the presence of a pyrimidine at the 5' end of the AA stepresults in s increasing to .4.5 A. The reduction in s when theAA step is followed by Tm in an AnTm tract is apparent froma comparison of sequence 3 and both strands of sequence 10in Table 1. A comparison ofx and s distances in the same AAstep shows that x is always smaller than s in all sequences;i.e., the (n)AH2-(m + 1)H1' distance is less than the (n -

1)H2-(n)H1' in the same AA dinucleotide step. For an ATstep, s has a tendency to decrease when this step is locatedin the center of an AnTm tract.

Correlation Between AH2-H1' Distances and StructuralParameters in Crystal Structures of B-Type DNA. We havedescribed some phenomenological sequence-dependent pat-terns in the changes of AH2-H1' interstrand and intrastranddistances as a function of context. It is of interest to under-stand how these distances relate to DNA structural param-eters. We therefore analyzed seven available B-type x-raystructures solved with resolution of =2 A or better; thesesequences are CGCGAATTCGCG, CGCGAATTbrCGCG(br indicates bromine), CGCATATATGCG, CCAACGT-TGG, CCAGGCCTGG, CGATCGATCG, and CCAAGAT-TGG (4). It should be noted that we only consider these x-raystructures as a set of structurally allowable B conformationsfrom which correlations may be derived; therefore, ourarguments are not affected by the fact that packing forces or

A

dehydration artifacts might distort the DNA structure.Among the geometric parameters of DNA structure definedby the Cambridge convention (16), only "slide" and "cup"have definite correlations with the cross-strand distance x

(coefficients of "'0.76 and "'0.4). Thus, when x increases,slide increases and cup decreases. For the other parameters,the correlation coefficient is <0.2. However, we find a verygood correlation coefficient (0.95) between x and the distancebetween the two nearest Hi' protons on opposite chains (seelegend to Fig. 4; this distance is an excellent measure of thegroove width at the bottom ofthe minor groove); there is alsoa good correlation with cross-strand P-P distances althoughthe correlation is less when restricted to only the 3.5- to 4.5-range (Fig. 4). It seems that one of the reasons for thecorrelation between AH2-H1' distances and cup and slide isthe correlation between minor groove width (H1'-H1' andP-P) and cup and slide in the available crystal structures. Theintrachain s distances to the (n + 1)H1' correlate also with theminor groove width (i.e., cross-strand P-P and H1'-H1'),although the correlation coefficient is worse-namely, ""0.7for H1'-H1' and ==0.6 for P-P. We find no other noticeablecorrelations with any other standard geometric parameters.

Possible Mechanisms for the Structure-Sequence Depen-dence. It is rather easy to interpret changes in x because, as

we have shown, they correlate well with the minor groove

12;-UM ='9H~~~~~~~~~~10 7

96

8 5

3 4 5 6 7 3 4 5 6 7

Cross AH2-Hl', A Cross AH2-H1', A

FIG. 4. (A) Correlation between (n)AH2-(m + 1)H1' interstrand distances and the P(n + 2)-P(m + 2) phosphate-phosphate distances (usuallythe closest P-P). (B) Correlation between (n)AH2-(m + 1)H1' and H1'(m + 1)-H1'(n + 1) distances. We used seven available B-type DNA crystalstructures solved with a resolution of -2 A or better (4). G-A mismatch steps were not used. Hydrogens were added using the program X-PLOR

(18). Analysis of B-type DNA structures in crystals, fiber, and model calculations shows that the H1'(m + 1)-H1'(n + 1) distance is a probeof the width at the bottom of the minor groove. In A, the correlation coefficient was 0.7%; in B it was 0.949.

A

ame A X A

Proc. Natl. Acad. Sci. USA 88 (1991)

Dow

nloa

ded

by g

uest

on

Oct

ober

6, 2

020

Page 5: Sequence effects on local DNA topology - PNAS · B-type DNA, and changes in this distance therefore reflect changes in the minor groove width. Since many of the se-quencesstudied

Proc. Natl. Acad. Sci. USA 88 (1991) 9091

width. As shown above, changes in x distances seem to becaused by at least three factors-namely, pyrimidine-purinesteps, purine-pyrimidine steps (we have data only for AT inan A,,Tm tract), and cooperative formation of the "anoma-lous" B' structure in A .Tm tracts. The first and second effectswere predicted for some of the context sequences by theo-retical calculations (refs. 19-21; V.P.C. and O.Y.F., unpub-lished results), which suggest that pyrimidine-purine stepslead to widening ofthe minor groove. It is energetically easierto make the minor groove narrower at a purine-pyrimidinestep than at a pyrimidine-purine step involving the same twobase pairs; this is largely due to stacking energy (19-21). Thuspurine-purine cross-strand clashes at pyrimidine-purinesteps prevent the minor groove from narrowing. Reducedstacking within the same chain can play an additional role(19-21). Pyrimidine-purine and purine-pyrimidine effects onDNA structure were first proposed by Calladine (22); how-ever, the results and patterns described here are inconsistentwith the conclusions and predictions of Calladine. Further-more, our results also indicate that flanking effects cannot beattributed simply to purines or pyrimidines per se in that G orA residues (or C versus T) at a given position affect thestructure in different ways (compare GA and AA steps in Fig.2). For instance, the data show that replacement of A by Gin a given sequence can lead to an increase in x as seen bycomparison of sequences 1, 3, 7, and 8 in Table 1. This canbe attributed to the presence of the NH2 group in the minorgroove and the elimination of the cooperative structuraltransition that occurs only in A,,T,, tracts. The third effect(the cooperative transition to B' structure) has been men-tioned in reviews (1, 2) and can be explained within theconcept of the Dickerson spine ofhydration (4), which resultsin narrowing of the minor groove (19-21, 23).The present data allow us to explain, and even predict, the

effect of sequence on minor groove width in B-type duplexes.In particular, the results indicate that alternating (AT)"sequences, which contain a TA every other step, should becharacterized by a rather wide minor groove. A differentconclusion has been reported by Suzuki et aL (24), who claimthat (AT),, duplexes have wrinkled D-type rather than B-typestructure [i.e., while the minor groove cannot be narrow inB-type structures if x is .4.5 A, a narrow minor groove andlarge x distances are both characteristics of wD helix mor-phology, as is marked overwinding (8 base pairs per turn)].However, the proposed overwound wD-helix model for(AT),, duplexes (24) is not supported by solution measure-ments (25, 26), fiber diffraction data (27), cleavage data (28),and theoretical calculations (19), which together indicate'10.5 base pairs per turn, a B-DNA structure at highhumidity, large x distances, and a wide minor groove.Our data do not directly address the question of DNA

bending. However, both experimental results (29) and theo-retical calculations (19-21, 23) indicate that there is a rela-tionship between changes in the minor groove width andDNA bending. A detailed model ofbending in A/T tracts hasbeen proposed (20, 21) in which the A,,Tm tract itself shouldbe essentially straight, with little or no bending. On the otherhand, TmA,, tracts are predicted to have a pronounced bendinto the major groove and a wide minor groove at the TAregion. The behavior of the minor groove in the GTTTAAACblock of sequence 4, and the GAAATTTC segment in se-quence 3, supports our predictions and, according to calcu-lations (20), the GTTTAAAC block should be bent (if theGTTT and AAAC segments adopt the B'-type conformation),whereas the GAAATTTC block should be relatively straight.We may speculate even more on the existence of small bendsin DNA that do not contain B' A T, segments (30, 31) if wekeep in mind the plausible general correlation between the

minor groove width and bending on the one hand and the nowdemonstrated sequence dependence of the minor groovewidth on the other.

In the legends to Figs. 2 and 3 we have presented somesimple "rules" that qualitatively describe the variations inAH2-H1' distances in the duplexes listed in Table 1 and thatmay be used to predict such variations in new sequences.These rules should be considered only as preliminary empir-ical estimates, which may need further refinement whenadditional experimental data appear. In conclusion, wewould like to note that the observed sequence-structuredependence in DNA may play an important role in theprocess ofDNA recognition by proteins because many of thesequences we have analyzed (Table 1) are protein recognitionsites and because the structural effects we have described areprobably also important for the binding to DNA of minorgroove ligands such as drugs.

We thank Prof. R. E. Dickerson and Dr. V. Heinemann and theircolleagues for providing crystal DNA coordinates; Drs. J. T. Davis,R. Klevit, J. M. Schurr, and E. Sletten for discussion; as well as Drs.P. F. Flynn, D. R. Davis, W. Nerdal, L.-J. Lin, K. M. Banks, J.Orban, P. Rajagopal, M. Salazar, and D. Cheng for access to theirNMR data. Thanks are due to Mary Coventry for typing themanuscript, and support from National Institutes of Health GrantGM32681 to B.R.R. is gratefully acknowledged.

1. Crothers, D. M., Haran, T. E. & Nadeau, J. G. (1990) J. Biol. Chem.265, 7093-7096.

2. Hagerman, P. J. (1990) Annu. Rev. Biochem. 59, 755-781.3. Kennard, 0. & Hunter, W. N. (1989) Q. Rev. Biophys. 22, 327-379.4. Dickerson, R. E. (1990) in Structure and Methods: DNA and RNA,

Proceedings of the Sixth Conversation in Biomolecular Stereodynamics,eds. Sarma, R. M. & Sarma, M. K. (Adenine, Schenectady, NY), Vol.3, pp. 1-38.

5. Dickerson, R. E., Goodsell, D. S., Kopka, M. L. & Pjura, P. I. (1987) J.Biomol. Struct. Dyn. 5, 557-580.

6. DiGabriele, A. D., Sanderson, M. R. & Steitz, T. A. (1989) Proc. Natl.Acad. Sci. USA 86,1816-1820.

7. Reid, B. R. (1987) Q. Rev. Biophys. 20,1-34.8. Clore, G. M. & Gronenborn, A. M. (1989) Crit. Rev. Biochem. Mol. Biol.

24, 479-564.9. Metzler, W. J., Wang, C., Kitchen, D. B., Levy, R. M. & Pardi, A.

(1990) J. Mol. Biol. 214, 711-736.10. Hare, D. R. & Reid, B. R. (1986) Biochemistry 25, 5341-5350.11. States, D. J., Haberkorn, R. A. & Ruben, D. J. (1982) J. Magn. Reson.

48, 286-292.12. Nilges, M., Clore, G. M., Gronenborn, A. M., Brunger, A. T., Karplus,

M. & Nilsson, L. (1987) Biochemistry 26, 3718-3733.13. Kintanar, A., Klevit, R. E. & Reid, B. R. (1987) Nucleic Acids Res. 15,

5845-5862.14. Katahira, M., Sugeta, H., Kyogoku, Y., Fujii, S., Fujisawa, R. & Tomita,

K. (1988) Nucleic Acids Res. 16, 8619-8632.15. Nadeau, J. G. & Crothers, D. M. (1989) Proc. Nati. Acad. Sci. USA 86,

2622-2626.16. Dickerson, R. E. (1989) J. Biomol. Struct. Dyn. 6, 627-634.17. Nilges, M., Clore, G. M., Gronenborn, A. M., Piel, N. & McLaughlin,

L. W. (1987) Biochemistry 26, 3734-3744.18. Brtinger, A. (1988) X-plor Manual, Version 1.5 (Yale Univ. Press, New

Haven, CT).19. Chuprina, V. P. (1987) Nucleic Acids Res. 15, 293-311.20. Chuprina, V. P. & Abagyan, R. A. (1988) J. Biomol. Struct. Dyn. 6,

121-138.21. Chuprina, V. P., Fedoroff, 0. Yu. & Reid, B. R. (1991) Biochemistry 30,

561-568.22. Calladine, C. R. (1982) J. Mol. Biol. 161, 343-352.23. Chuprina, V. P. (1985) FEBS Lett. 186, 98-102, and correction (1986)

195, 363.24. Suzuki, E., Parrabiraman, N., Zon, G. & James, T. L. (1986) Biochem-

istry 25, 6854-6865.25. Rhodes, D. & Klug, A. (1981) Nature (London) 292, 378-380.26. Peck, L. J. & Wang, J. C. (1981) Nature (London) 292, 375-378.27. Mahendrasingam, A., Rhodes, N. J., Goodwin, D. C., Nave, C., Pigram,

W. J. & Fuller, W. (1983) Nature (London) 301, 535-537.28. Herrera, J. E. & Chaires, J. B. (1989) Biochemistry 28, 1993-2000.29. Burkhoff, A. M. & Tullius, T. D. (1987) Cell 48, 935-943.30. Zhurkin, V. B. (1985) J. Biomol. Struct. Dyn. 2, 785-804.31. Trifonov, E. N. (1985) CRC Crit. Rev. Biochem. 19, 89-106.

Biochemistry: Chuprina et al.

Dow

nloa

ded

by g

uest

on

Oct

ober

6, 2

020