9
Crystal Structures of the Chromosomal Proteins Sso7d/Sac7d Bound to DNA Containing T-G Mismatched Base-pairs Shaoyu Su 2 , Yi-Gui Gao 1 , Howard Robinson 1 , Yen-Chywan Liaw 3 , Stephen P. Edmondson 4 , John W. Shriver 4 and Andrew H.-J. Wang 1,2 * 1 Department of Biochemistry 2 Center for Biophysics and Computational Biology University of Illinois at Urbana-Champaign, Urbana IL 61801, USA 3 Institute of Molecular Biology Academia Sinica Nankang Taipei, Taiwan 11529, ROC 4 Department of Biochemistry & Molecular Biology, School of Medicine, Southern Illinois University, Carbondale IL 62901, USA Sso7d and Sac7d are two small chromatin proteins from the hyperther- mophilic archaeabacterium Sulfolobus solfataricus and Sulfolobus acidocal- darius, respectively. The crystal structures of Sso7d-GTGATCGC, Sac7d- GTGATCGC and Sac7d-GTGATCAC have been determined and refined at 1.45 A ˚ , 2.2 A ˚ and 2.2 A ˚ , respectively, to investigate the DNA binding property of Sso7d/Sac7d in the presence of a T-G mismatch base-pair. Detailed structural analysis revealed that the intercalation site includes the T-G mismatch base-pair and Sso7d/Sac7d bind to that mismatch base-pair in a manner similar to regular DNA. In the Sso7d-GTGATCGC complex, a new inter-strand hydrogen bond between T2O4 and C14N4 is formed and well-order bridging water molecules are found. The results suggest that the less stable DNA stacking site involving a T-G mismatch may be a preferred site for protein side-chain intercalation. # 2000 Academic Press Keywords: DNA binding protein; protein-DNA interactions; protein stability; hyperthermophile; achaeabacteria *Corresponding author Introduction In eukaryotic cells DNA is packed around the histone core of nucleosome the three-dimensional structure of which has been elucidated (Luger et al., 1997). However, the mechanism for organizing their DNA into a compact form in prokaryotes and achaeabacteria is less-well understood. Sso7d and Sac7d are two small, abundant and basic proteins from the hyperthermophilic archaeabacteria Sulfo- lobus solfataricus and Sulfolobus acidocaldarius, respectively (Baumann et al., 1994; McAfee et al., 1995). They and other related proteins (e.g. Sac8 and Sac10) are believed to play an important role in DNA packing and maintenance in these archae- ons. Sso7d/Sac7d bind to DNA with micromolar affi- nity in a non-cooperative manner without strong sequence preference, and they increase the t m of DNA by 40 deg. C (McAfee et al., 1996). The structures of several Sac7d/Sso7d-DNA complexes have been studied recently by X-ray crystallogra- phy (Robinson et al., 1998; Gao et al., 1998), NMR (Agback et al., 1998) and low-angle X-ray scattering (Krueger et al., 1999). In these protein-DNA com- plexes, the protein structure is similar to that of the free protein (Baumann et al., 1994; Edmondson et al., 1995), consisting of an incomplete b-barrel made of a triple-stranded b-sheet orthogonal to a b-hairpin. The small b-barrel is capped by an amphiphilic C-terminal a-helix. The triple-stranded b-sheet is placed across the DNA minor groove with the intercalation of the Val26 and Met29 side- chains into DNA base-pairs, causing a sharp kink (60 ) in the DNA duplex (Robinson et al., 1998; Gao et al., 1998). The intercalation sites in DNA were found at either the CpG or the TpT (ApA ) sequences, likely due to their inherently less favorable base- base stacking energy compared to the GpC or the GpT sequences. It has been suggested previously that the TpG step may be a preferred site for pro- E-mail address of the corresponding author: [email protected] Abbreviations used: Sac7d, a group of 7 kD DNA binding proteins from Sulfolobus acidocaldarius, individually referred to as Sac7a, Sac7b, Sac7c, Sac7d, and Sac7e in order of increasing basicity; Sso7d, analogous protein of Sac7d from Sulfolobus solfataricus; r.m.s.d., root mean square deviation; PEG400, polyethylene glycol 400. doi:10.1006/jmbi.2000.4112 available online at http://www.idealibrary.com on J. Mol. Biol. (2000) 303, 395–403 0022-2836/00/030395–9 $35.00/0 # 2000 Academic Press

Crystal structures of the chromosomal proteins Sso7d/Sac7d bound to DNA containing T-G mismatched base-pairs

  • Upload
    ceu

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

doi:10.1006/jmbi.2000.4112 available online at http://www.idealibrary.com on J. Mol. Biol. (2000) 303, 395±403

Crystal Structures of the ChromosomalProteins Sso7d/Sac7d Bound to DNA Containing T-GMismatched Base-pairs

Shaoyu Su2, Yi-Gui Gao1, Howard Robinson1, Yen-Chywan Liaw3,Stephen P. Edmondson4, John W. Shriver4 and Andrew H.-J. Wang1,2*

1Department of Biochemistry2Center for Biophysics andComputational BiologyUniversity of Illinois atUrbana-Champaign, UrbanaIL 61801, USA3Institute of Molecular BiologyAcademia Sinica NankangTaipei, Taiwan 11529, ROC4Department of Biochemistry &Molecular Biology, School ofMedicine, Southern IllinoisUniversity, CarbondaleIL 62901, USA

E-mail address of the [email protected]

Abbreviations used: Sac7d, a grobinding proteins from Sulfolobus aciindividually referred to as Sac7a, Saand Sac7e in order of increasing baanalogous protein of Sac7d from Sur.m.s.d., root mean square deviationpolyethylene glycol 400.

0022-2836/00/030395±9 $35.00/0

Sso7d and Sac7d are two small chromatin proteins from the hyperther-mophilic archaeabacterium Sulfolobus solfataricus and Sulfolobus acidocal-darius, respectively. The crystal structures of Sso7d-GTGATCGC, Sac7d-GTGATCGC and Sac7d-GTGATCAC have been determined and re®nedat 1.45 AÊ , 2.2 AÊ and 2.2 AÊ , respectively, to investigate the DNA bindingproperty of Sso7d/Sac7d in the presence of a T-G mismatch base-pair.Detailed structural analysis revealed that the intercalation site includesthe T-G mismatch base-pair and Sso7d/Sac7d bind to that mismatchbase-pair in a manner similar to regular DNA. In the Sso7d-GTGATCGCcomplex, a new inter-strand hydrogen bond between T2O4 and C14N4 isformed and well-order bridging water molecules are found. The resultssuggest that the less stable DNA stacking site involving a T-G mismatchmay be a preferred site for protein side-chain intercalation.

# 2000 Academic Press

Keywords: DNA binding protein; protein-DNA interactions; proteinstability; hyperthermophile; achaeabacteria

*Corresponding author

Introduction

In eukaryotic cells DNA is packed around thehistone core of nucleosome the three-dimensionalstructure of which has been elucidated (Luger et al.,1997). However, the mechanism for organizingtheir DNA into a compact form in prokaryotes andachaeabacteria is less-well understood. Sso7d andSac7d are two small, abundant and basic proteinsfrom the hyperthermophilic archaeabacteria Sulfo-lobus solfataricus and Sulfolobus acidocaldarius,respectively (Baumann et al., 1994; McAfee et al.,1995). They and other related proteins (e.g. Sac8and Sac10) are believed to play an important rolein DNA packing and maintenance in these archae-ons.

ing author:

up of 7 kD DNAdocaldarius,c7b, Sac7c, Sac7d,

sicity; Sso7d,lfolobus solfataricus;; PEG400,

Sso7d/Sac7d bind to DNA with micromolar af®-nity in a non-cooperative manner without strongsequence preference, and they increase the tm ofDNA by �40 deg. C (McAfee et al., 1996). Thestructures of several Sac7d/Sso7d-DNA complexeshave been studied recently by X-ray crystallogra-phy (Robinson et al., 1998; Gao et al., 1998), NMR(Agback et al., 1998) and low-angle X-ray scattering(Krueger et al., 1999). In these protein-DNA com-plexes, the protein structure is similar to that of thefree protein (Baumann et al., 1994; Edmondsonet al., 1995), consisting of an incomplete b-barrelmade of a triple-stranded b-sheet orthogonal to ab-hairpin. The small b-barrel is capped by anamphiphilic C-terminal a-helix. The triple-strandedb-sheet is placed across the DNA minor groovewith the intercalation of the Val26 and Met29 side-chains into DNA base-pairs, causing a sharp kink(�60 �) in the DNA duplex (Robinson et al., 1998;Gao et al., 1998).

The intercalation sites in DNA were found ateither the CpG or the TpT (�ApA ) sequences,likely due to their inherently less favorable base-base stacking energy compared to the GpC or theGpT sequences. It has been suggested previouslythat the TpG step may be a preferred site for pro-

# 2000 Academic Press

396 Sso7d-TG Mismatch DNA Complexes

tein side-chain intercalation (Churchill et al., 1995).Therefore it seems possible that certain mis-matched base-pairs at the intercalation site,which are likely to introduce even less stable stack-ing interactions, may provide a favorable bindingsite for Sac7d/Sso7d, particularly when it islocated at a kinked site induced by amino acidintercalation.

We chose the T-G mismatched base-pair toaddress this question, since it is a common mis-match occurring in cells. The processes of cytosinemethylation and spontaneous deamination conti-nually create T-G mismatches in the DNA genome(Wang et al., 1982). There have been several NMRstudies on T-G mismatches in DNA (Kalnik et al.,1988; Allawi & SantaLucia, 1998), but only onecrystal structure of a B-form DNA duplex contain-ing a T-G mismatch is available (Hunter et al.,1987). All structural studies suggest that the T-Gpair adopts a ``wobble'' con®guration with smalland localized structural perturbation with respectto the overall double helix.

Interestingly T-G mismatching may be recog-nized speci®cally by small molecular ligands (e.g.anticancer drugs) (Yang et al., 1999). The structureof the complex of a T-G mismatch repair enzymebound with DNA has been determined (Barrettet al., 1998).

The goal in this study is to understand the inter-action of hyperthermophilic proteins with DNAcontaining T-G mismatches. Our study may con-tribute to understanding the question of DNArepair in hyperthermophilic archaea (Grogan,2000). Toward this end, we have determined thestructures of the Sso7d-GTGATCGC and Sac7d-GTGATCGC complexes; each has two T-G mis-matches in DNA. The Sac7d-GTGATCAC is alsocrystallized as a control. Our new Sso7d/Sac7d-DNA structures show that Sso7d/Sac7d indeedcan bind to DNA with mismatched base-pairsusing a similar mechanism to that in the complexeshaving matched base-pairs.

Table 1. Crystal and re®nement data of three Sac7d-DNA/Ss

Sso7d-GTGA

Crystal dataa (AÊ ) 47.48b (AÊ ) 49.80c (AÊ ) 37.68Resolution (AÊ ) 1.45No. of reflections (>2s(F)) 14,498 (8.0-Rmerge (%) 5.4Completeness (%) 89.1Completeness at highest shell (%) 63.0 (1.45-1Refinement dataNo. of reflections 17,583 (>0Rworking/Rfree (5 % data) 0.229/0.r.m.s.d. bond distance (AÊ ) (protein/DNA) 0.009/0r.m.s.d. bond angle (deg.) (protein/DNA) 2.30/2No. of atoms (protein/DNA/water) 510/322/

a Re®nement and calculation of R-factors are done by SHELX usinb Re®nement and calculation of R-factors are done by X-PLOR usi

Results and Discussion

Overall structures of Sac7d/Sso7d-DNA complexes

The crystal structures of Sso7d-GTGATCGC,Sac7d-GTGATCGC and Sac7d-GTGATCAC com-plexes have been determined and re®ned at1.45 AÊ , 2.2 AÊ and 2.2 AÊ , respectively (Table 1).Sso7d/Sac7d bind to the DNA duplex as a mono-mer. One protein-DNA complex is in the asym-metric unit of all three crystal forms. The overallstructure of the Sso7d-GTGATCGC complex isshown in Figure 2(a), with DNA shown with anelectrostatic potential surface and the protein as awire drawing. The electrostatic potential of DNAwas calculated in the presence of protein. Similarto other Sso7d/Sac7d-DNA crystal structures, theprotein binds a four base-pair track and widensthe minor groove signi®cantly.

The side-chains of ®ve important amino acidresidues (Tyr8, Trp24, Val26, Met29 and Arg43) arein contact with the DNA minor groove surface.The Tyr8 hydroxyl does not form hydrogen bondswith other protein residues or DNA bases. But itsaromatic side-chain, together with Arg43 side-chain, are in van der Waals contacts with eachother and with the DNA backbone. These aminoacid residues help broaden the DNA minorgroove.

Protein-DNA contacts

The 1.45 AÊ resolution structure of Sso7d-GTGATCGC affords an excellent opportunity tostudy protein-DNA-water interactions in detail.There are several direct intermolecular hydrogenbonds involving Trp24, Ser31, Arg43 and Val26.Arg43 NH1 is close to T5O2 (2.99 AÊ ), C6O2(2.88 AÊ ) and C6O40 (3.09 AÊ ). They help de®ne theconformation of the Arg43 side-chain. The hydro-gen bond between Trp24 Ne1 and G3N3 is

o7d-DNA complexes

TCGC Sac7d-GTGATCAC Sac7d-GTGATCGC

49.78 50.4776.76 75.8935.14 35.002.2 2.2

1.45 AÊ ) 5268 (8.0-2.2 AÊ ) 5954 (8.0-2.2 AÊ )6.3 5.873.8 84.4

.52 AÊ ) 60.9 (2.2-2.3 AÊ ) 72.9 (2.2-2.3 AÊ )

s(F))a 5268 (>2s(F))b 5954 (>2s(F))b

287a 0.190/0.277b 0.197/0.257b

.010 0.008/0.007 0.014/0.009

.35 1.94/1.29 1.79/1.25159 502/322/100 510/322/76

g all re¯ections.ng re¯ection >2s(F).

Sso7d-TG Mismatch DNA Complexes 397

observed in all Sso7d/Sac7d-DNA complexes, con-sistent with the observation that the mutation fromTrp to Ala at site 24 decreases the binding af®nityof Sac7d to DNA (J. Bedell & J. W. S., unpub-lished).

The af®nity also comes from the shape comple-mentarity between the protein and DNA, whichresults in extensive van der Waals interactions.There is a cavity at the protein-DNA interface inthe G3-A4-T5 region. Four well-de®ned water mol-ecules (green spheres in Figure 2(a)) were found atthe interface between DNA and the triple strandedb-sheet of the protein. A schematic diagram sum-marizing all potential hydrogen bonding for thesefour water molecules is shown (Figure 1(d)). Thesewater molecules are arranged in a diamond shape.At the diamond top, water 1001 is hydrogen bond-ing to the carbonyl oxygen of Phe32 (2.82 AÊ ), andto Ser31Og (2.86 AÊ ). At the diamond bottom, water1003 is very close to T13O2 (2.54 AÊ ). The watermolecules 1006 and 1002 are 2.81 AÊ and 3.05 AÊ

away from C14O2 and A4N3, respectively.It is important to point out that the water

arrangements vary in different Sso7d/Sac7d-DNA

complexes. In Sso7d-GTAATTAC (Gao et al., 1998),one of the water positions is not fully occupied (asevident from its weak electron density), supportingthe idea that these interfacial water molecules mayact as ``modulators'' for protein binding to differ-ent DNA sequences, a property required forsequence-general DNA-binding protein.

Protein structure

The main-chain folding of Sso7d/Sac7d is a b-barrel of the OB-fold topology (Murzin, 1996).Figure 2(b) shows the superposition of the proteinsin seven Sso7d/Sac7d-DNA complexes, ®tted bytheir a-carbons. The triple stranded b-sheet bindson the DNA minor groove surface and contributesmost of the contacts with DNA. The core residues'side-chain conformations are conserved, but thesurface side-chains are quite variable. The leftpanel of Figure 2(b) shows that the backbone con-formations are nearly identical, except for loops 9-11, 36-39 and 48-52. The center panel displays theconserved plus other core residues. The side-chainconformation of these amino acid residues has

Figure 1. (a) Amino acidsequence alignment of Sso7d andSac7d. Except for the C-terminal a-helix, only six amino acid residuesare different between the two pro-teins. Sac7d does not have a gly-cine at position 39. The secondarystructure of the two proteins con-sists of ®ve b sheets and one a-helix (labeled red). (b) and (c) The(2Fo ÿ Fc) Fourier electron densitymaps (contoured at 1s level) of theregions at the interface of protein-DNA for the Sso7d-GTGATCGCcomplex. The electron densities ofDNA, protein and solvent atomsare colored in red, purple andgreen, respectively. Possible hydro-gen-bonds are drawn as brokenlines. (d) A schematic diagramdepicting the four water moleculeshydrogen-bonding with DNA andprotein atoms in the Sso7d-GTGATCGC complex. These fourwater molecules are always presentat the interface of Sso7d/Sac7d andDNA, although with differentarrangements in different com-plexes.

Figure 2. (a) Stereoscopic surface representation of the Sso7d-GTGATCGC complex shown with DNA electrostaticpotentials which were calculated in the presence of protein. The side-chains of ®ve important amino acid residues,Tyr8, Trp24, Val26, Met29 and Arg43, are also highlighted. (b) Superposition of the Sso7d/Sac7d protein structuresobtained from seven available crystallographic structures, ®tted by common protein main-chain atoms. The left panelshows the wire diagram of seven protein backbones. The center panel displays the structurally conserved and coreresidues. The conformations of these amino acid side-chains remain nearly identical. The right panel shows the highlyvariable non-conserved surface side-chains.

398 Sso7d-TG Mismatch DNA Complexes

small variations. In contrast, the conformations ofnon-conserved side-chains, having little contactwith DNA, are signi®cantly variable.

Detailed statistical comparison of the proteinconformations in seven Sso7d/Sac7d-DNA com-plexes is shown in Figure 3. For the backbones, thecore residues 20-25, 29-34 and 42-47 are highly con-served. Residues in loops 9-11, 36-39 and 48-52,not unexpectedly, have the largest r.m.s.d. for side-chains and main-chains. We conclude that the pro-tein retains its overall conformation when it binds

to different DNA sequences, with conformationaladjustments only for surface amino acid side-chains.

DNA structure

The electrostatic potential surface of DNA isshown in Figure 2(a). The minor groove is neutral-ized by its contact with protein. The region close toArg43 becomes positively charged. The majorgroove is completely accessible, having no contact

Figure 3. The r.m.s.d. statistics of for each residue in the seven structures shown in Figure 2 are shown for themain-chains and the side-chains.

Sso7d-TG Mismatch DNA Complexes 399

with the protein. In the complex, four DNA base-pairs are covered by the proteins (Figure 4(a)).Superposition of the DNA intercalation sites forfour Sso7d/Sac7d-DNA complexes (Sso7d-GCGATCGC, Sso7d-GTGATCGC, Sso7d-GTAAT-TAC and Sac7d-GTAATTAC) reveals that theintercalation of Val26 and Met29 side-chains fromthe minor groove direction sharply kinks the DNAoctamer by �60 �. Interestingly, the side-chain con-formation of Val26 and Met29 varies, dependingon the intercalated DNA sequences.

Detailed DNA conformation of the Sso7d-GTGATCGC complex is listed in Table 2. Thestructures with T-G mismatches have small rollangles (average �7.6 �), compared to other com-plexes with matched base-pairs (�10 �). The aver-age helical twist values are �30 � in all complexes.Values of helical twist can be grouped into two cat-egories, depending on whether a base-pair is at theintercalation site or not. The base-pairs at the inter-calation site have the lowest values. This is furtherevidence that the DNA conformation at the interca-

Table 2. DNA helical parameters of the Sso7d-GTGATCGC c

Base-pair Roll Tilt InclinationP

G101-C116 3.9 ÿ0.6 ÿ2.7T102-G115 50.9 6.1 ÿ3.4G103-C114 6.7 ÿ7.2 13.3A104-T113 ÿ2.9 ÿ0.4 8.5T105-A112 ÿ1.1 1.3 10.4C106-G111 4.5 0.0 11.1G107-T110 ÿ8.6 ÿ4.8 11.3C108-G109 7.4Average 7.6 ÿ0.8 7.0

lation site mimics A-DNA. The propeller twistranges from ÿ2.3 � at G1-C16 to ÿ13.6 � at C6-G11,with an average of ÿ8.2 �.

The glycosyl angles (w) and individual sugar-phosphate torsion angles are listed in Table 3. Theaverage values agree with those of the parent octa-mer (Robinson et al., 1998) to within 15 �. The sub-stitution of two T-G wobble base-pairs forstandard Watson-Crick base-pairs appears to havelittle effect on the conformation of the backbone.Evidently, the sugar phosphate backbone is suf®-ciently ¯exible to accommodate the T-G wobblepairs without changing the backbone conformationsigni®cantly.

Upon binding to Sso7d/Sac7d protein, the DNAminor groove is widened signi®cantly and manynucleotides surrounding the intercalation site(including T2, A4, T5, C14 and T13) adopt theC30-endo (N-type) sugar puckers. A superpositionof the intercalation site DNA of the complex andA-DNA reveals the similarity of the backboneconformations between them (Figure 4(b)).

omplex

ropeller twist(o) Buckle (k) Helical twist () Rise (AÊ )

ÿ2.3 ÿ1.1 33.2 2.9ÿ6.6 14.4 20.9 5.3ÿ7.4 ÿ24.9 21.8 3.0ÿ9.6 ÿ4.4 27.3 2.8ÿ11.5 6.6 36.1 3.4ÿ13.6 2.9 28.4 3.2ÿ5.7 ÿ0.3 35.4 3.1ÿ8.3 4.5ÿ8.2 ÿ0.3 29.0 3.4

Figure 4. (a) Superposition ofDNA intercalation sites for fourSso7d/Sac7d-DNA complexes:Sso7d-GCGATCGC, Sso7d-GTGATCGC, Sso7d-GTAATTACand Sac7d-GTAATTAC. Themodels are ®tted by the commonbackbone atoms of protein. (b) Theschematic diagram of the DNAbackbone conformation in Sso7d-GTGATCGC. The sugars aroundthe intercalation site adopt C30-endo(N-type) pucker, similar to A-DNAconformation. The right panelshows the intercalation site ofDNA, ®tted to A-DNA using base-pair atoms.

400 Sso7d-TG Mismatch DNA Complexes

Hydration around the T-G wobble base-pairs

Figure 5 presents two T-G base-pairs (one at theSso7d binding site and the other at the open site)and the surrounding solvent molecules observed inSso7d-GTGATCGC complex. Both T-G base-pairsare of the wobble type with a normal hydrogenbonding geometry. Interestingly, at the bindingsite of Sso7d-GTGATCGC, T2O4 is close to C14N4(3.09 AÊ ). The geometry suggests that this is aninter-strand direct hydrogen bond. In contrast, inthe Sso7d-GCGT[br5U]CGC complex, despite thefact that there is a close contact between C14N4and G15O6 (Figure 5(b)), the unfavorable geometrysuggests that it is not a hydrogen bond. Therefore

Table 3. DNA backbone torsion angles (degree) of the Sso7d

Nucleotide a b g d

G101 - - 56 140G109 - - 356 148T102 282 177 45 95T110 302 177 43 135G103 276 206 51 134G111 294 171 38 151A104 287 167 53 87A112 299 176 54 120T105 307 164 51 92T113 286 168 55 89C106 304 173 59 136C114 283 161 51 82G107 285 167 44 131G115 288 191 54 142C108 258 85 166 247C116 291 165 53 78Average 289 168 77 125

the kinked conformation at the T2pG3 step inSso7d-GTGATCGC allows a new inter-strandhydrogen bond to form.

Water molecules are found to bridge the thy-mine O4 with the guanine O6 and N2 atoms,respectively. Water 1007 is within hydrogen-bond-ing distance of T2O4 and G15O6. In the minorgroove side, G15N2 has a distance of 3.10 AÊ to thecarbonyl oxygen of Val26 and 2.98 AÊ to water1029, which in turn has a distance of 3.48 AÊ toT2O2. Water 1029 cannot get closer to T2O2, other-wise it will have a severe close contact with Cg1 ofVal26. Water 1128 is 3.21 AÊ from the nitrogen ofLys28 and 2.70 AÊ from G13N3.

-GTGATCGC complex

e z w P

185 266 257 166176 268 259 180199 275 249 60225 194 253 135178 275 273 146174 261 270 170163 283 250 81189 272 244 124178 270 229 80208 277 236 79198 271 250 152197 281 254 60197 268 258 127184 274 262 160

- - 197 152- - 225 69

189 267 248 121

Figure 5. Stereo diagram of the three T-G base-pairs observed in the Sso7d-GTGATCGC and Sso7d-GCGTUCGCcomplexes, and the surrounding water molecules. The hydrogen bonds for base-pairing are labeled as continuouslines and other possible hydrogen bonds are marked as broken lines. The water molecules are displayed as purpleballs.

Sso7d-TG Mismatch DNA Complexes 401

In the non-binding site (Figure 5(c)), there is alsoa water-bridging network in the major groove. Aninteresting observation is that water 1017 and 1015,with distances of 2.77 AÊ and 3.01 AÊ to G7N7 andG7O6, respectively, are now close to each otherwith a distance of 3.01 AÊ .

A well-de®ned water molecule 1013 in the minorgroove is hydrogen bonded to T10O2 (3.19 AÊ ) andto G7N2 (2.98 AÊ ). This characteristic bridgingwater in a wobble T-G base-pair has been observedin all DNA crystal structures thus far and may beexploited for speci®c recognition of T-G base-pairusing minor groove binders (Yang et al., 1999). Nowater molecule is found to bind to G7N2/N3,probably due to disorder.

Conclusion

To date, seven crystal structures of Sso7d/Sac7d-DNA are available. Two observations are in com-mon. First, the base at the intercalating site 30 endis always a purine, with Trp24 Ne1 forming ahydrogen bond to its N3 atom. Second, the thirdbase at the intercalating site's 30 end is a thymine,with Arg43 NH1 forming a hydrogen bond to thy-mine's O2 atom. This may suggest that the posi-tioning of the intercalating residues are determinedby the alignment of the triple-stranded b-sheetwith respect to DNA using hydrogen bonds.

Another important observation is the conserva-tion of the protein conformation among all Sso7d/

402 Sso7d-TG Mismatch DNA Complexes

Sac7d structures derived from three independentcrystal lattices. We are con®dent that the bindingmode found in the solid state is likely to be pre-served in solution (Krueger et al., 1999).

Our results further show that the binding ofSso7d/Sac7d protein to a mismatched T-G base-pair is structural similar to the binding to normalWatson-Crick base-pairs. In the structure of Sso7d-GTGATCGC, a T-G mismatch is at the 50 side ofthe intercalating site. It should be mentioned thatSso7d/Sac7d formed crystals with GTGATCGC,but not with GCGATTGC and GCGGTCGC. Theseobservations may be explained as follows. In thecase of GCGATTGC, Sso7d/Sac7d still binds pre-ferentially to the T-G mismatched DNA base-pairof the T6pG7 step. The movement of the intercala-tion site toward the 30-end will prevent the DNAfrom end-over-end stacking among complexes,which is necessary for crystal packing as seen in allavailable Sso7d/Sac7d-DNA crystal structures.Additionally, this movement of the intercalationsite will make it impossible to form the hydrogenbond between Arg43 and a thymine as the thirdbase after the intercalation site. This hydrogenbonding appears to be important since it is presentin all seven Sso7d/Sac7d-DNA complex X-raystructures. In the second case of GCGGTCGC,there is no longer a TpG step in the sequence. Thetandem T-G base-pairs at the central G4pT5 stepwill likely alter the helical twist angle of theduplex, which will interfere with the end-over-endstacking packing. Further experiments of bindingactivity of Sso7d/Sac7d to mismatched DNAmight help to address this question.

T-G mismatches occur frequently in genomicDNA. The question of DNA repair in hyperther-mophilic organisms is an intriguing one. It hasbeen reported that a mismatch in the middle of thedouble strand DNA will prevent the annealingactivity, but not the binding activity of Sso7d(Guagliardi et al., 1997). It is of interest to ask whatis the consequence of Sso7d/Sac7d protein bindingto DNA helix under high temperature in Sulfolu-bus. Will the Sso7d/Sac7d proteins protect DNAcontaining T-G mismatch from being repaired?Additional work on the study of hyperthermophi-lic proteins (including repair enzymes) bindingwith mismatched DNA would be useful.

Materials and Methods

Sso7d and Sac7d were puri®ed and stored as lyophi-lized powder (McAfee et al., 1995). The complexes werecrystallized using the sitting drop vapor diffusion meth-od at room temperature. The Sso7d-GTGATCGC com-plex was crystallized from 1.3 mM Sso7d, 1.3 mM DNAduplex, 2.5 mM Tris-HCl buffer (pH 6.5), 2.5 % (w/v)PEG400 solution, equilibrated with 15 % PEG400. TheSac7d-GTGATCGC and Sac7d-GTGATCAC complexeswere crystallized under similar conditions, except 2 %PEG400 was used. All data were collected at ÿ150 �C ona Rigaku R-Axis IIc image plate area detector system tovarious resolution ranges (Table 1). The crystals are allin the space group P212121. The data were processed

using BioTex v1.1 provided by Molecular StructureCorporation.

The crystal structures are determined by the molecularreplacement (MR) method using the AMORE programin the CCP4 suite (CCP4, 1994) and re®ned by the simu-lated annealing (with individual temperature factorre®nement) procedure incorporated in X-PLOR (BruÈ nger,1992). The parameters for ideal protein geometry fromEngh & Huber (1991) and the parameters for DNA fromParkinson et al. (1996) were used for re®nements. Well-ordered water molecules were located and included inthe model. For the Sso7d-GTGATCGC complex, themodel was further re®ned by SHELX97 (Sheldrick &Schneider, 1997). Both R-factor and Rfree were used tomonitor the progress of the re®nement. The stereochemi-cal quality of the re®ned structures was checked withprogram PROCHECK (Laskowski et al., 1993). All aminoacid residues in the three structures adopt normal back-bone and side-chain conformations, except Glu11, whichappears to be disordered. The temperature factors associ-ated with Glu11 are high and the electron density maparound it is poor. All model building and modi®cationswere carried out by the programs O (Jones et al., 1991)and MIDAS PLUS (Ferrin et al., 1991). The crystal dataand re®nement summaries are given in Table 1. AllFigures were prepared with the programs O, MIDASPLUS and GRASP (Nicholls et al., 1991). DNA confor-mation was calculated by CURVES (Lavery & Sklenar,1989).

RCSB Protein Data Bank accession code

Coordinates for Sso7d-GTGATCGC, Sac7d-GTGATCGC and Sac7d-GTGATCAC, and associatedstructure factors have been deposited with the ProteinData Bank at the Research Collaboratory for StructuralBioinformatics under the accession codes 1C8C, 1CA6and 1CA5.

Acknowledgments

The project is supported by grants from NIH toA.H.J.W. (GM41612) and to J.W.S. (GM 49686).

References

Agback, P., Baumann, H., Knapp, S., Ladenstein, R. &Hard, T. (1998). Architecture of nonspeci®c protein-DNA interactions in the Sso7d-DNA complex.Nature Struct. Biol. 5, 579-584.

Allawi, H. T. & SantaLucia, J., Jr (1998). NMR solutionstructure of a DNA dodecamer containing single G.T mismatches. Nucl. Acids Res. 26, 4925-4934.

Barrett, T. E., Savva, R., Panayotou, G., Barlow, T.,Brown, T., Jiricny, J. & Pearl, L. H. (1998). Crystalstructure of a G:T/U mismatch-speci®c DNA glyco-sylase: mismatch recognition by complementary-strand interactions. Cell, 92, 117-129.

Baumann, H., Knapp, S., Lundback, T., Ladenstein, R. &Hard, T. (1994). Solution structure and DNA-bind-ing properties of a thermostable protein from thearchaeon Sulfolobus solfataricus. Nature Struct. Biol. 1,808-819.

BruÈ nger, A. T. (1992). X-PLOR 3.1, A System for X-rayCrystallography and NMR, Yale University Press,New Haven, Connecticut.

Sso7d-TG Mismatch DNA Complexes 403

Churchill, M. E., Jones, D. N., Glaser, T., Hefner, H.,Searles, M. A. & Travers, A. A. (1995). HMG-D isan architecture-speci®c protein that preferentiallybinds to DNA containing the dinucleotide TG.EMBO J. 14, 1264-1275.

Collaborative Computational Project Number 4 (1994).The CCP4 Suite: Programs for Protein Crystallogra-phy. Acta Crystallog. sect. D, 50, 760-763.

Edmondson, S. P., Qiu, L. & Shriver, J. W. (1995). Sol-ution structure of the DNA-binding protein Sac7dfrom the hyperthermophile Sulfolobus acidocaldarius.Biochemistry, 34, 13289-13304.

Engh, R. A. & Huber, R. (1991). Accurate bond andangle parameters for X-ray protein structure re®ne-ment. Acta Crystallog. sect. A, 47, 392-400.

Ferrin, T. E., Couch, G. S., Huang, C. C., Pettersen, E. F.& Langridge, R. (1991). An affordable approach tointeractive desktop molecular modeling. J. Mol.Graphics, 9, 27-32.

Gao, Y. G., Su, S., Robinson, H., Padmanabhan, S., Lim,L., McCrary, B. S., Edmondson, S. P., Shriver, J. W.& Wang, A. H.-J. (1998). The crystal structure of thehyperthermophile chromosomal protein Sso7dbound to DNA. Nature Struct. Biol. 5, 782-786.

Grogan, D. W. (2000). The question of DNA repair inhyperthermophilic archaea. Trends Microbiol. 8, 180-184.

Guagliardi, A., Napoli, A., Rossi, M. & Ciaramella, M.(1997). Annealing of complementary DNA strandsabove the melting point of the duplex promoted byan archaeal protein. J. Mol. Biol. 267, 841-848.

Hunter, W. N., Brown, T., Kneale, G., Anand, N. N.,Rabinovich, D. & Kennard, O. (1987). The structureof guanosine-thymidine mismatches in B-DNA at2.5-A resolution. J. Biol. Chem. 262, 9962-9970.

Jones, T. A., Zou, J. Y., Cowan, S. W. & Kjeldgaard, M.(1991). Improved methods for building proteinmodels in electron density maps and the location oferrors in these models. Acta Crystallog. sect. A, 47,110-119.

Kalnik, M. W., Kouchakdjian, M., Li, B. F., Swann, P. F.& Patel, D. J. (1988). Base pair mismatches and car-cinogen-modi®ed bases in DNA: an NMR study ofG.T and G.O4meT pairing in dodecanucleotideduplexes. Biochemistry, 27, 108-115.

Krueger, J. K., McCrary, B. S., Wang, A. H.-J., Shriver,J. W., Trewhella, J. & Edmondson, S. P. (1999). Thesolution structure of the Sac7d/DNA complex: a

small-angle X-ray scattering study. Biochemistry, 38,10247-10255.

Laskowski, R. A., MacArthur, M. W., Moss, D. S. &Thornton, J. M. (1993). PROCHECK: a program tocheck the stereochemical quality of protein struc-tures. J. Appl. Crystallog. 26, 283-291.

Lavery, R. & Sklenar, H. (1989). De®ning the structureof irregular nucleic acids: conventions and prin-ciples. J. Biomol. Struct. Dynam. 6, 655-667.

Luger, K., Mader, A. W., Richmond, R. K., Sargent, D. F.& Richmond, T. J. (1997). Crystal structure of thenucleosome core particle at 2.8 AÊ resolution. Nature,389, 251-260.

McAfee, J. G., Edmondson, S. P., Datta, P. K., Shriver,J. W. & Gupta, R. (1995). Gene cloning, expression,and characterization of the Sac7 proteins from thehyperthermophile Sulfolobus acidocaldarius. Biochem-istry, 34, 10063-10077.

McAfee, J. G., Edmondson, S. P., Zegar, I. & Shriver,J. W. (1996). Equilibrium DNA binding of Sac7dprotein from the hyperthermophile Sulfolobus acido-caldarius: ¯uorescence and circular dichroismstudies. Biochemistry, 35, 4034-4045.

Murzin, A. G. (1996). Structural classi®cation of pro-teins: new superfamilies. Curr. Opin. Struct. Biol. 6,386-394.

Nicholls, A., Sharp, K. A. & Honig, B. (1991). Proteinfolding and association: insights from the interfacialand thermodynamic properties of hydrocarbons.Proteins: Struct. Funct. Genet. 11, 281-296.

Parkinson, G., Vojtechovsky, J., Clowney, L., Brunger,A. T. & Berman, H. M. (1996). New parameters forthe re®nement of nucleic acid containing structures.Acta Crystallog. sect. D, 52, 57-64.

Robinson, H., Gao, Y. G., McCrary, B. S., Edmondson,S. P., Shriver, J. W. & Wang, A. H.-J. (1998). Thehyperthermophile chromosomal protein Sac7d shar-ply kinks DNA. Nature, 392, 202-205.

Sheldrick, G. M. & Schneider, T. R. (1997). SHELXL:high resolution re®nement. Methods Enzymol. 277,319-343.

Wang, R. Y., Kuo, K. C., Gehrke, C. W., Huang, L. H. &Ehrlich, M. (1982). Heat- and alkali-induced deami-nation of 5-methylcytosine and cytosine residues inDNA. Biochim. Biophys. Acta, 697, 371-377.

Yang, X. L., Hubbard, R. B., IV, Lee, M., Tao, Z. F.,Sugiyama, H. & Wang, A. H.-J. (1999). Imidazole-imidazole pair as a minor groove recognition motiffor T:G pairs. Nucl. Acids Res. 27, 4183-4190.

Edited by I. Tinoco

(Received 9 May 2000; received in revised form 9 August 2000; accepted 9 August 2000)