5
Proc. Nati. Acad. Sci. USA Vol. 83, pp. 9368-9372, December 1986 Biochemistry Two DNA recognition domains of the specificity polypeptides of a family of type I restriction enzymes (protein-DNA interaction/protein evolution) FRANCES V. FULLER-PACE* AND NOREEN E. MuRRAYt Department of Molecular Biology, University of Edinburgh, King's Buildings, Mayfield Road, Edinburgh EH9 3JR, United Kingdom Communicated by F. Sanger, August 25, 1986 ABSTRACT The hsd genes of Salmonella typhimurium and Salmonella potsdam encode related type I restriction and modification systems designated SB and SP, respectively; the polypeptide encoded by the hsdS gene dictates the DNA sequence recognized. The hsdS genes of the SB and SP systems have a conserved sequence of around 100 base pairs flanked by two nonhomologous (variable) regions of around 500 base pairs. Recombination between the hsdS genes of SB and SP generated a system (SQ) with a different recognition specificity. We have localized the position of the crossover in the central conserved region by analysis of nucleotide sequences. Concom- itant with the generation of a new combination of flanking variable regions is the recombination of minor differences in the central conserved region. A polypeptide domain encoded on the 5' side of the crossover dictates recognition of the trinucleotide component of the target sequence, and a second domain, encoded on the 3' side of the crossover, similarly governs recognition of the tetra- or penta-nucleotide compo- nent. Our analysis implicates at least parts of the variable regions in the determination of the specificity of interaction between protein and DNA. Furthermore, the trinucleotide components of the recognition sequences of S. typhimurium and Escherichia coli K-12 are identical, and the 5' segments of their hsdS genes are strikingly homologous rather than variable. Some strains of Escherichia coli and Salmonella spp have type I restriction systems that are related to that of E. coli K-12 (K system). The concept of a family of enzymes, members of which carry out essentially the same reactions following the recognition of different DNA sequences, poses questions concerning both the specific interaction of the enzyme with its target sequence and the evolution of different specificities of interaction. Type I restriction endonucleases are encoded by three chromosomal genes: hsdR, hsdM, and hsdS. Genetic experiments show that the corresponding polypeptides (R, M, and S) can be interchanged between related systems and identify the S subunit as the determinant of the specificity of recognition (1-3). The various S poly- peptides interact with related polypeptides to produce mod- ification enzymes and with M and R subunits in the formation of restriction endonucleases. Related S subunits must retain a basic similarity for their interactions with other M and R polypeptides but have diverged to enable recognition of different nucleotide sequences. The hsdS genes of related E. coli systems (K, B, and D) have only localized DNA sequence homology (4). The genes vary in length from 1335 to 1425 base pairs (bp), and the regions of homology are -100 bp in the middle and 250 bp at the 3' end. Although these two regions are highly conserved, the remainder of each hsdS gene shares little or no homology with either of the other two. The K family includes the SB system found in Salmonella typhimurium and the SP system of Salmonella potsdam. When the SP hsd genes were transferred by phage P1 transduction to an SB recipient, one transductant was found to have a new specificity, designated SQ (5). Heteroduplex analyses of the hsd genes of the SB, SP, and SQ systems (6) showed that their organization parallels that of E. coli K-12 and B (4), with two nonhomologous (variable) regions of about 500 bp flanking a conserved core of around 100 bp. Furthermore, the hsdS gene of SQ contains one variable region from SP and the other from SB, confirming that the SQ specificity was generated by recombination between the two parental specificity genes (6). The DNA sequences recognized by type I restriction enzymes comprise specific trinucleotide and tetranucleotide (or pentanucleotide) segments separated by a spacer of defined length but of nonspecific sequence (7-9). One inter- pretation of the heteroduplex analyses correlates the two variable regions of the hsdS gene, and hence of the S polypeptide, with the two specific segments of the DNA target sequence (4). Alternatively, reassortment of minor differences within the conserved regions might suffice to generate a new specificity (4). Models in which recombina- tion results in the reassortment of two domains of the S polypeptide predict the recognition of a recombinant target sequence. The recognition sequence of SQ does indeed comprise the trinucleotide element of SP and the pentanu- cleotide component of SB (10). In this paper we define the site of the crossover that segregated the parts of the gene encoding the polypeptide domains recognizing the trinucleotide and tetranucleotide components of the target sequences and, thereby, generated the SQ specificity. We report the nucleotide sequence of the specificity gene of the SP system, of particular interest because one of the two target sequences of SP is identical to that of K (9). The K, SP, and SQ specificity systems all recognize 5' AAC; hence, an amino acid sequence common to their S polypeptides should identify the relevant recogni- tion domain. MATERIALS AND METHODS Strains, Vectors, and Media. The X hsd phages (Fig. 1) were propagated in the hsd-deletion strain NM477 (4). The respec- tive hsd regions were subcloned in pBR322, propagated in the recA hsdS host HB101 (1), and used as probes and as sources of small fragments of DNA for cloning in M13. Vectors MplO, Mphl, Mpl8, and Mp19 (11) were used, and recombinants were grown in the hsd-deletion host NM522 (4). Enzymes and Chemicals. DNA polymerase (Klenow frag- ment) was purchased from Boehringer Mannheim; DNA Abbreviation: bp, base pair(s). *Present address: Research Institute of Scripps Clinic, La Jolla, CA 92037. tTo whom reprint requests should be addressed. 9368 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

family of type I restriction enzymes

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Proc. Nati. Acad. Sci. USAVol. 83, pp. 9368-9372, December 1986Biochemistry

Two DNA recognition domains of the specificity polypeptides of afamily of type I restriction enzymes

(protein-DNA interaction/protein evolution)

FRANCES V. FULLER-PACE* AND NOREEN E. MuRRAYtDepartment of Molecular Biology, University of Edinburgh, King's Buildings, Mayfield Road, Edinburgh EH9 3JR, United Kingdom

Communicated by F. Sanger, August 25, 1986

ABSTRACT The hsd genes of Salmonella typhimurium andSalmonella potsdam encode related type I restriction andmodification systems designated SB and SP, respectively; thepolypeptide encoded by the hsdS gene dictates the DNAsequence recognized. The hsdS genes of the SB and SP systemshave a conserved sequence of around 100 base pairs flanked bytwo nonhomologous (variable) regions of around 500 basepairs. Recombination between the hsdS genes of SB and SPgenerated a system (SQ) with a different recognition specificity.We have localized the position of the crossover in the centralconserved region by analysis of nucleotide sequences. Concom-itant with the generation of a new combination of flankingvariable regions is the recombination of minor differences inthe central conserved region. A polypeptide domain encoded onthe 5' side of the crossover dictates recognition of thetrinucleotide component of the target sequence, and a seconddomain, encoded on the 3' side of the crossover, similarlygoverns recognition of the tetra- or penta-nucleotide compo-nent. Our analysis implicates at least parts of the variableregions in the determination of the specificity of interactionbetween protein and DNA. Furthermore, the trinucleotidecomponents of the recognition sequences of S. typhimurium andEscherichia coli K-12 are identical, and the 5' segments of theirhsdS genes are strikingly homologous rather than variable.

Some strains of Escherichia coli and Salmonella spp havetype I restriction systems that are related to that of E. coliK-12 (K system). The concept of a family of enzymes,members of which carry out essentially the same reactionsfollowing the recognition of different DNA sequences, posesquestions concerning both the specific interaction of theenzyme with its target sequence and the evolution ofdifferentspecificities of interaction. Type I restriction endonucleasesare encoded by three chromosomal genes: hsdR, hsdM, andhsdS. Genetic experiments show that the correspondingpolypeptides (R, M, and S) can be interchanged betweenrelated systems and identify the S subunit as the determinantof the specificity of recognition (1-3). The various S poly-peptides interact with related polypeptides to produce mod-ification enzymes and withM and R subunits in the formationof restriction endonucleases. Related S subunits must retaina basic similarity for their interactions with other M and Rpolypeptides but have diverged to enable recognition ofdifferent nucleotide sequences.The hsdS genes of related E. coli systems (K, B, and D)

have only localized DNA sequence homology (4). The genesvary in length from 1335 to 1425 base pairs (bp), and theregions of homology are -100 bp in the middle and 250 bp atthe 3' end. Although these two regions are highly conserved,the remainder of each hsdS gene shares little or no homologywith either of the other two.

The K family includes the SB system found in Salmonellatyphimurium and the SP system of Salmonella potsdam.When the SP hsd genes were transferred by phage P1transduction to an SB recipient, one transductant was foundto have a new specificity, designated SQ (5). Heteroduplexanalyses of the hsd genes of the SB, SP, and SQ systems (6)showed that their organization parallels that of E. coli K-12and B (4), with two nonhomologous (variable) regions ofabout 500 bp flanking a conserved core of around 100 bp.Furthermore, the hsdS gene of SQ contains one variableregion from SP and the other from SB, confirming that the SQspecificity was generated by recombination between the twoparental specificity genes (6).The DNA sequences recognized by type I restriction

enzymes comprise specific trinucleotide and tetranucleotide(or pentanucleotide) segments separated by a spacer ofdefined length but of nonspecific sequence (7-9). One inter-pretation of the heteroduplex analyses correlates the twovariable regions of the hsdS gene, and hence of the Spolypeptide, with the two specific segments of the DNAtarget sequence (4). Alternatively, reassortment of minordifferences within the conserved regions might suffice togenerate a new specificity (4). Models in which recombina-tion results in the reassortment of two domains of the Spolypeptide predict the recognition of a recombinant targetsequence. The recognition sequence of SQ does indeedcomprise the trinucleotide element of SP and the pentanu-cleotide component of SB (10).

In this paper we define the site of the crossover thatsegregated the parts of the gene encoding the polypeptidedomains recognizing the trinucleotide and tetranucleotidecomponents of the target sequences and, thereby, generatedthe SQ specificity. We report the nucleotide sequence of thespecificity gene of the SP system, of particular interestbecause one of the two target sequences of SP is identical tothat of K (9). The K, SP, and SQ specificity systems allrecognize 5' AAC; hence, an amino acid sequence commonto their S polypeptides should identify the relevant recogni-tion domain.

MATERIALS AND METHODSStrains, Vectors, and Media. The X hsd phages (Fig. 1) were

propagated in the hsd-deletion strain NM477 (4). The respec-tive hsd regions were subcloned in pBR322, propagated in therecA hsdS host HB101 (1), and used as probes and as sourcesof small fragments ofDNA for cloning in M13. Vectors MplO,Mphl, Mpl8, and Mp19 (11) were used, and recombinantswere grown in the hsd-deletion host NM522 (4).Enzymes and Chemicals. DNA polymerase (Klenow frag-

ment) was purchased from Boehringer Mannheim; DNA

Abbreviation: bp, base pair(s).*Present address: Research Institute of Scripps Clinic, La Jolla, CA92037.tTo whom reprint requests should be addressed.

9368

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Proc. Natl. Acad. Sci. USA 83 (1986) 9369

polymerase I was from Northumbria Biologicals LimitedEnzymes (Cramlington, United Kingdom); T4 DNA ligase,restriction enzymes, phage M13 17-mer primer, and hybrid-ization probe primer were from the New England Biolabs;deoxynucleoside triphosphates (dNTPs) and dideoxynucle-oside triphosphates (ddNTPs) were from P-L Biochemicals;and deoxycytidine 5'-[a-32P]triphosphate (110 TBq/mmol)and deoxyadenosine 5'-[a-(35S)thio]triphosphate (15.2 TBq/mmol) were from Amersham.DNA Preparation, Analysis, and Cloning. Fragments of

DNA separated in 0.7% agarose gels in TBE buffer, pH 8.2(12), were isolated by electroelution into a well lined withdialysis tubing. Small DNA fragments generated by digestionwith Alu I or Sau3a or by sonication (13) were cloned in phageM13 vectors for sequencing. DNA from plaques was trans-ferred to nitrocellulose for hybridization (14); single-strandedphage M13 DNA was added from cleared lysates (15).DNA Sequencing. Template DNA was sequenced by the

dideoxy chain-termination method (16) with deoxyadenosine5'-[a-(35S)thio]triphosphate and was analyzed on buffer-gradient gels (12). The sequences of DNA fragments werecompiled with computer programs (17, 18). Sections ofsequence, initially obtained on only one strand, were selectedfrom libraries of recombinants by using strand-specificprobes (15). A few areas were completed by using syntheticoligonucleotide primers.

RESULTS AND DISCUSSIONThe Crossover That Generated the SQ Specificity. The hsdS

gene of the SQ system derives one variable region from SPand the other from the SB gene (6). Hybridization of DNAfrom X hsd SB, SP, and SQ phages with a probe covering thecentral conserved region of the hsdS gene of E. coli K-12showed that, in these hsd phages, the central conservedregion is to the right of the EcoRI and HindIII targetsidentified by an asterisk in Fig. 1 (6). Hybridization to anhsdM-specific probe from E. coli K-12 (data not shown)located the hsdM gene to the left of these same targets. Thisestablishes the orientation of hsdS with respect to hsdM, and,

2*

a

A hsd SB

A hsd SP

A hsd SQ

hsd genes

aaiI i I II

2.0 0.6 4.5 3.4 0.5

E*E

F -II I I I2.0 0.7 4.4 3.4 0.5

*OCz oo = co w

2. I0 I I I2.0 0.7 4.4 3.4 0.5

since transcription is initiated upstream of hsdM (19), thehsdS gene of SQ specificity derives its proximal variableregion from SP and its distal variable region from SB.The nucleotide sequences of the central conserved regions

of the three hsdS genes show that the crossover between theSP and SB specificity genes occurred within the longestsection of perfectly conserved sequence (Fig. 2). The distalconserved regions of the hsdS genes of SB and SQ systemshave identical sequences but differ from SP (data not shown,but for amino acid sequences, see Fig. 3C). This is consistentwith the origin of SQ by a single crossover.The SQ and SP systems recognize the same trinucleotide

component of the target sequence (5' AAC). Therefore, thedomain of the S polypeptide conferring recognition of AACis encoded by the segment of the hsdS gene located proximalto the region within which the crossover event occurred. Thisincludes the proximal variable region and the first 33 nucle-otides of the central region. The domain of the SB (or SQ)specificity polypeptide conferring recognition of RTAYG(where R = unspecified purine nucleoside and Y = unspec-ified pyrimidine nucleoside) must be encoded by the DNAdistal to the crossover (see Fig. 3A). These conclusions referto polypeptide domains that discriminate between specificnucleotides. Adenine residues are common to the recognitionsequences, and other domains could contribute to theirrecognition; for example, an active site for methylationencoded by the hsdM gene may also dictate specificity foradenine residues as opposed to cytidine residues.Amino Acid Sequences of the Conserved Regions of the

Salmonella Specificity Polypeptides. There is only one possibleopen reading frame in each central conserved region, trans-lation of which yields amino acid sequences that are con-served with respect to each other and the S polypeptides ofE. coli (4). These sequences identify minor differencesbetween each system and show where SQ differs from itsparents (Fig. 3B). The crossover that generated SQ separatesinformation that specifies recognition of each of the twocomponents of the target sequence. Assuming that theorganization of the domains of the E. coli polypeptidesparallels that in Salmonella, we conclude that the informationfor recognition of the complete target sequence is not entirelywithin the central conserved region. While the variabilitywithin the first 11 amino acids of this region could correlatewith differences in the trimeric component of the targetsequence, there is insufficient variability distal to the cross-over. In particular, for E. coli B and Salmonella SP, whoserecognition sequences are quite different, there is completeidentity after the first five amino acids (Fig. 3B).

In an alternative hypothesis (20), the two components ofthe target sequence are recognized by two conserved do-mains of the S polypeptide. Argos (20) found that the Spolypeptides of E. coli contain repeated domains, and theserepeated domains overlap the central and distal conservedSP ATACCAATCCCGTCACTTGCTGAACAAAAAATCATCGCCGAAAAACTCGATACGCTGCTGGCGCAGGTAG

SB GTTCCTGTCCCACCTCTTGCCGAACAAAAAGTCATCGCCGAAAAACTCGATACGCTGCTGGCGCAGGTAG

soI M I S I

FIG. 1. The X hsd SB, SP, and SQ specificity phages. The EcoRI(Eco), HindIII (Hin), and BamHI (Bam) targets within and flankingthe li-kilobase insert of each X hsd phage are indicated. A probe forthe hsdM gene hybridizes to DNA to the left of the targets markedwith an asterisk, while a 400-bp probe spanning the central conservedregion of the hsdS gene ofE. coli K-12 hybridizes to DNA to the rightof the same targets. The deduced orientation of hsdM and S, thereverse of that suggested previously (6), is confirmed by the nu-cleotide sequence (Fig. 4). The arrow identifies the direction oftranscription.

SP ACAGCACCAAAGCACGTCTTGAGCAAATCCCGCAAATCCTGAAACGTTTTCGTCAGGCGGTGTTA

SB ACAGCACCAAAGCACGTCTTGAGCAAATCCCACAAATCCTGAAACGTTTTCGCCAATCAGTGATA

SO ACAGCACCAAAGCACGTCTTGAGCAAATCCCACAAATCCTGAAACGTTrTCGCCAATcAGTCATA

FIG. 2. Localization of the crossover that generated SQ. Thenucleotide sequences of the central conserved regions of the SP, SB,and SQ hsdS genes are given. This includes base pairs 469-603 of theSP specificity gene (base pairs 695-829 in Fig. 4). The 70-bp regionunderlined is common to all three sequences and identifies the regionin which the crossover occurred.

Ft'rAt..t-.AA'i"L.L.'.LiTUAU'r'rUCT(.iAACAAAAAATCATCGCCGAAAAACTCGATACGCTGCTGGCGCAGGTAG

Biochemistry: Fuller-Pace and Murray

9370 Biochemistry: Fuller-Pace and Murray

Ad. ----proximal --+central+ distaL --1

Argos repeats

IPIPPLAEQKIIAEKLDTLLAQVDSTKARLEQIPQILKRFRQAVL

D VAL SPS TL EB L LK

SPSB VVSQ

SS IS IS

EQHEIVRRVEQLFAYADYIEQVNNALARVNNLTQSILAKARFRGELTAQWRAENPDLISGENSAAALLEKIKAERAASGGKKASRKKS

DBK A

SPSB/SQ

TS

T S

T T S EK V T

T A

FIG. 3. The specificity polypeptides. (A) A schematic diagram of an hsdS polypeptide. The regions encoded by the conserved nucleotidesequence (4) are stippled. The nonconserved regions are indicated as open segments. The positions of the repeated domains identified by Argos(20) are shown. The proximal variable region is the amino-terminal segment of the polypeptide. (B) Amino acid sequences (in single-letter code)of the central conserved regions of the E. coli and Salmonella specificity polypeptides. The first two amino acids (numbers 157 and 158 for SPand K) are within the first Argos repeat but are outside the conserved region identified by nucleotide sequence (4). The uppermost line is a

consensus sequence deduced from sequences of the hsdS genes of the five natural systems. Occasional ambiguities are arbitrarily resolved byusing the archetypal K-12 sequence. (C) Amino acid sequences (in single-letter code) of the distal conserved regions of the same polypeptideswith a consensus sequence on the uppermost line. This sequence begins at amino acid 377 for the SP specificity polypeptide (see Fig. 5).

regions identified when nucleotide sequences were compared(4). Consequently, when the amino acid sequences of thepolypeptides are aligned, the repeated domains are found tobe conserved (see Fig. 3A). Similar repeated domains havebeen identified in the amino acid sequence of the SP speci-ficity polypeptide (A. F. W. Coulson and J. F. Collins,personal communication). Secondary structure predictionsindicate that the conserved domains contain a-helical re-gions. Since a-helical regions within some repressor proteinsrecognize specific DNA sequences (see ref. 21), Argosargued that the S polypeptide acts as a pseudo-dimer in DNAsequence recognition, with each putative helical regioninteracting with one component of the target sequence.The first repeated domain within each S polypeptide (20)

may span as many as 64 amino acids and includes the centralconserved region (Fig. 3A) but extends 2 amino acids up-stream and about 20 amino acids downstream. The aminoacid sequences of the corresponding regions of the SB, SP,and SQ polypeptides (Fig. 3B) show that those differencesbetween SQ and SB that derive from SP reside in the first 11of this domain of 64 amino acids. These amino acid changes,according to the model of Argos, would alter the specificityfor the trinucleotide element ofthe target sequence. We note,however, that all but one of these amino acid changes areoutside the helical region of the first repeated domain; for SB,K, and B, 3 of the first 11 amino acids are prolines. If part ofthe first repeated domain were to dictate recognition ofAAC,this nonhelical region of 11 amino acids rich in prolines wouldbe implicated. However, when SP and K, which recognizethe same trinucleotide sequence, are compared, one of theseprolines is replaced by serine. Most of the other differencesin this region involve three isoleucines that are common to Kand SP but are variously changed to valine or leucine (Fig.3B). The side chains of these amino acids cannot formhydrogen bonds and so are unlikely to interact with nucleo-tides in the major groove.The second of the domains identified by Argos (Fig. 3A)

includes much of the distal conserved region and is impli-

cated in interaction with the longer component of the DNAtarget sequence. When the predicted amino acid sequences ofthe distal conserved regions are compared, none (excludingSQ) is identical with another, although for the three E. colipolypeptides (B, D, and K), the differences are confined topositions 3, 18, and the carboxyl terminus (see Fig. 3C). Infact, B and D differ only at position 18 and the carboxylterminus, although the tetranucleotide elements of theirtarget sequences are very different (7, 8): TGCT for B andGTCY for D. Gough and Murray (4) concluded that there wasinsufficient variability here for the different specificities ofinteraction. The inclusion of the sequences for the Salmo-nella polypeptides increases the diversity, but three of thedifferences are common to both Salmonella specificities.Indeed, SP and SB differ in only 1 of the first 50 amino acids,and in this same region, the two differences common to SBand SP distinguish SB from E. coli D. The variation withinany substantial segment of the distal conserved region seemsinsufficient to account for the five different specificities. It is

unlikely that the first part of the distal conserved regiondictates the specificity of one system while the second partconfers the specificity of another; consequently, it seems

necessary to consider the distal variable regions. This couldimplicate that part of the distal variable region that overlapsthe start of the distal repeat identified by Argos (see Fig. 3A),a sequence which, like its homologue in the first repeat, isrich in prolines and lacks a-helical content. The consensusamino acid sequence for the first of these (the junction of the

proximal variable and central conserved regions) is Val-Pro-Ile-Pro-Pro-Leu (Fig. 3B), that of the second (the end of the

distal variable region) is Val-jPo-Leu-Pro-Pro-Leu (aminoacids 370-375 in Fig. 5). If the repeated domains noted byArgos (20) include the regions of the polypeptide involved inrecognition of DNA sequences, these nonhelical regionsalone seem unlikely to dictate the recognition sequence.A different role for these two well-conserved sequences at

A

B

C

I ................... .....................L....... -.....J.

Proc. Natl. Acad. Sci. USA 83 (1986)

Biochemistry: Fuller-Pace and Murray

the beginning of each repeat could be interaction with the Mpolypeptides, for type I restriction enzymes are believed toinclude one S and twc M subunits (see ref. 7).The Nucleotide Sequence of the hsdS Gene of SP. A contig-

uous sequence of 1731 bp has been determined (Fig. 4),extending from the BamHI site in hsdM (see Fig. 1) throughthe HindIII site to an Alu I target 4100 bp downstream of thehsdS gene of SP. An open reading frame beginning at bp 226and terminating at bp 1617 encodes a polypeptide of 463amino acids, 1 amino acid smaller than the S polypeptide ofthe K system (4). Comparison of this nucleotide sequencewith that for E. coli K-12 shows good conservation in theregions flanking the hsdS gene, including the 3' end of hsdM.In addition to the homology already discussed (Fig. 3 B andC), there is strong homology within the proximal regions ofthe hsdS genes where 421 of the first 474 nucleotides areconserved. This contrasts with the complete lack of homol-ogy between the corresponding regions of the hsdS genes ofthe K, B, and D systems. The predicted sequences of the Spolypeptides (Fig. 5) show that 143 of the 158 residues of theproximal variable regions of SP and K are identical, with 5 ofthe 15 changes being conservative. This proportion ofchanged amino acids parallels that found in the distal con-served regions (6 of 88 amino acids). No homology wasdetected when the DNA sequences of the distal conservedregions were compared, but similarity of amino acid se-quences has been found (A. F. W. Coulson, personal com-munication).The hsdS genes for four natural representatives of the K

family have now been sequenced, but only when SP and Kare compared is there extensive homology. In this case, the5' segments of the genes, referred to as proximal variableregions for both the E. coli and Salmonella systems, are wellconserved. The SP and K systems are also distinguished fromthe remainder by their very similar target sequences; while K

Proc. Natl. Acad. Sci. USA 83 (1986) 9371

recognizes 5' AAC(N)6GTGC, the SP enzyme recognizes thedegenerate version, 5' AAC(N)6GTRC. The present analysisof the SB, SP, and SQ systems taken together with theirrecognition sequences (10) localizes the coding sequence forthe domain of the SP specificity polypeptide conferringrecognition of 5' AAC as proximal to the region in whichcrossing-over occurred to generate SQ. The domains of theK specificity polypeptide have not been separated, but thesimplest prediction is one in which the organization parallelsthat deduced for SP-namely, that recognition of the 5'trimeric sequence is dictated by a domain encoded by theamino-terminal part of the polypeptide and by the longercomponent by a domain within the carboxyl-terminal seg-ment of the polypeptide. The conservation of nucleotidesequence in the 5' regions of the hsdS genes ofK and SP thencorrelates with the identity of one component (AAC) of theirtarget sequence.

CONCLUSIONSHelix-turn-helix motifs, characteristic of many proteins thatbind DNA (see ref. 21), have not been detected in thespecificity polypeptides of type I restriction systems (4, ?0).Recombination between different specificity genes can reas-sort two domains of the S polypeptide, each of whichspecifies recognition of one component of the target nucle-otide sequence (6, 10). A comparison of the predicted aminoacid sequences places constraints on the localization of thesetwo domains. Sequences in each of the two conserveddomains could be involved but appear to show too littlediversification for the recognition of different target se-quences. Argos (20) proposed that the two recognitiondomains are within repeated sequences that overlap thecentral and distal conserved regions, respectively (see Fig.3A). This allows further diversity because it includes some of

GGAT CCGCACGGCCAAATCCGA TTCCC TGGATATC TCC TGGCTGAAGGATAAGGACAGCA TCG~ACGCCGACAGCC TGCCGGAGCCGGACGTGC TGGCGGCAGAAGCGA TGGGT GAG T GG

GGATCCGCACCGCAAAATCCGATTCGCTGGATATCTCCTGGCtGAAAGA TAAAGACAGTA TTGA TGCCGACAGCCTGCCGGAGCCGGA TGTAT TAGCGGC AGAAGCGA TGGGCGAA CTGGTACAGGCGC TAGGCGAA^C TGGA TGCGC TGA TGCGCGAGC TGGGCGCGGGCGA TGA GGCGGA TGCGC AGCGT CAGT TGC TGGAA^GA AGC AT TTGGT GGGGTGA AGGC ATGAA^T AGGGGGAA^

721 llIIIlllllIlIIllllIIIIIIIIlllllllHill IIIIIIII III llillII IlllII~~ii111IIII Mill1

T ACAGGCGC TGTC TGAA CTGGA TGCGC TGA TGCGTGAAC TGGGGGCGAGCGA TGAGGC CGAT T TGCAGCGTC AGT TGC TGGAAGA AGC GTT TCGTGGCGGTrGAAGGAA TGAGT GCGGUGAAAC TGCCGGAGGGGTGGGCTACAGCTCCAGTATCTACAGTCACAACCCTAATCCGAGGAGTCA CA TATAAAAAAGAGCAGGCAC TC ^AATTA TC TACAAGA TGA TTA TTTGCC TATTA TA CG

96 1 IIII II I I IIIiiIIII I I III1111 11

1201 I 1 1 1 1 II 1I1 II11 111 11IIIII1111 11 111 11 1 IIIllllll II

ATTGCCGGAGGGGTGGGT TATCGCCCCAGTATCTACGGTCACAACTC TAA^TCCGAGGAGTAACGTATAAAAAAGiAGCAGGCAA TAAA TTA TCTAAAAGA TGA TT AT T TGCCT C TTA TCCG

TGCAAACAATA TTCAAAATGGCAAGT TTGAtACTACAGACTTAGTT T TTGTGCC TAAAAAT C TTGT TAAAGAAAGTC AGAAAA TAT CT C CTGAAGiATA TTGTAA TTGCGA TGT CT T CAUG36 1 III IIIII IIIII 11111 IIIIIIII H ill TAGCCCOTGCGAACAATATTCAGAATGCCATGTTTGATACTACGGACTTGGTTTTTGTTCCTArAATCTTGTTAAAGAAAGTCAAAAAATATCTCCTGAAGATATTGTTATTGCAArGTCATCAGGGAGTAAATCTGTAGTCGGTAAATCCGCACATCAACGTCTACCATTTGAATGTAGTT TCGGCGCA TTT TGCGGGGCAT TCCGC C CTGAGAAA TTCA TA T CTC CAAA TTA CA TTGC TCA TT T

48 1 HGAULAAATCCGTAGT TGGTAAATCLC~LACATCAGCATLTALCATTTfiAATGT AUrT TTL6(iL ATTTTbtG>jTfiTAT TALGTLCTUAAAAALTTA TA T T T TtT(T T TTATr UC TCA rT T

CACAAAATCATCCTTTTATCGGAACAAAATTTCATCACTTTCTGCTGGTGCAAATAT TAATAATATTAAACCAGCAAGCTTTGA TT TAATAAATATACCAATCCCGTCAC TTGCTGAACA601 1111 1 ii i llii III 1111111 1731 Hill

CACAAAATCTTCTCTTTATCGTAACAAAATTTCATCACTTTCTGCTGGTGCAAATATTAATAATATTAAGCCGTCAAGCTTTTATTTCATAAATATACCAATCCCACCACTTGCCGAACAAAAAATCA TCGC CGAAAAALTCGATACGCTGCTGGCGCAGGTAGALAGCACCAAAGCAC.T C TTGAGCAAA TCCCGCAAA TCC TGAAA CGTT T TtG TCAGGCGGTGT TAGC CGC TGCAGT721 tIHill|&||III|||AAAAATCATCGCTGAAAAACTCGATACGCTGCTGGCGCAGGTAGACAGCACCAAAGCACGT T TTGAGCAAATCCCACAAATCC TGAAACGTTTT CGTCAAGCGGTATTGGGGrGCGCAGT

GAGTGGAACAC TGACAACGGCCC TCAGGAACAGTCATAGCCTCATTGGC TGGCA TAGTACAAATC TGGGTGCC TTAA TTGT CGA TT C CTGTAA TGGGC TGGC $TAAAAGACAAGGAT TAA84 1 I1TAATGGAAAA TTGACAGAAAAATGGCGTAATTTTGAGCCGCAACATTC TGTAT T T AGAAGT TAAAT T TTGAA TC TAT CT TAAC TG^AATTACGTAATGGXTC T TTCA TCAAAUC( AAA T(A

A*$TGGTAATGAAATTACTATTTTGAGATTGGCTGATTTTAAAGATGCTCAGCGTATAA TTGGCAA TGAAAGAAAAATAAAA TTAGAC TC TAAGGAAGAAAATAAGTATTCA T TAGAAAA T96 1 ll

AAGTGGTGTTGGTCATCCAATACTACGCATTAGTTCTGTACGTGCTGGCCA TGTAGATCAAAACGATATTCGGTT TCTAGAA TGTTCAGAAAGTGAAC TAAACCGCCACAAAT T ACAAGA

GA TGATA TTTT AGTTATAAGAGTGAATGGAAGTGCGGACT TGGC TGGC CGA T TTA TTG^AATA TAAGTCAAACGGCGA TA TTGAAGGTT T TTGCGA TCAT T TTA TACGTTTACG T T AGA T1 08

TGGAGATCTTTTATTTACTCGCTATAACGGAAGTTTAGAATTTGTTGGTGT TTGTGGGT TATTGAAAAAA TTACAACA TCAAAA TTTGC TATATCC TGA TAAAC TTA TT(:GAG(: TCGAT T

T CAAATAAAAT $CA TGTCTAGA TTTTTAACTTATATTGCAAATGAAGGCGAGGGTAGA TTT TATC TACGCAA TAGTTTA TCAACAAGTGCAGGACAAAATACT A TTAACCAAA CA TCAA T1 2U

AACCAAAGATGCTTTACCAGAATATATCGAAATATTTTTTTCATCCCC T CAGCACGAAATGCAATGATGAACTGCGTGAAAACAACTTC TGGTCAA^AAGGTATT TCAGGAAAAGATAT

AAAAGGAC TTAGTTTTTTACTTCC TCCATTAAAAGAACAAGCCGAAATCGTCCGCCGCGTCGAACAACTC TTCGC CTACGC CGACAC CA TTG^AAAAAC AGGT TA ACAACGC CCTGA CCCG

CAAATC CCAAGT TGT TTTA TTACC TCCAGTAAAAGAACAAGCCGAAA TCGT TCGCCGCGTCGAGCAACTC TTCGC CTACGC CGACACCA TAG^AAAACAGGT CAACAACGCtT T AGCC CG

CGTCAACAGCCTCAC CCAGTCGA TCC TGGCGAAGGCCTTCCGCGGAGAGC TTACCGCCCAGTGGCGTGCGGAAAACCC TGA TC TCA TC AGCGGTAAAAA CAGCGCCGC CGCC CT.C TGGAL:GTCAACAACCTGACGCAA TCCA TCC TGGCAAAAGCGTTCCGTGGTGAACT TACCGC CCAGTGGCGGiGCCGAAAACCCGGA T TTGA TCAGCGGAGAAAACAGCGCCGC CGCGTrTUC TGGAAAAAATCAAGGCCGAACGCGCCGTCAGCGGCGGTAAAAAAACCTCGCGTAAAAAAGCCTGACGC TTATTT TTC TGCGCACCT TCCCGG C TT CGT TTTA TTA TT TCCCCGCC ATCA TAACl561 IIIIIIIII ,,1IIIIII l lIIIII llIIII III 1lll lll,,mI1 111111 IIIIIIIII l ll 111111111111 11111111AAAAA TCAAAGC TGAACGCGCAGC TAGCCGGGGTA^AAAAGCC TCACGTA^AAAAATCCTGAACA T TTT TTC TGGCGCACC T TTCCG(iTGCGCT TT TT A TTAT T TCACGCLAAT C ATAAC

CATCAAGAATATATCCAATTCCTTCCTGAACATTCCCATTACAGTCTATTT1 68 1 3

CCACAT^AATATATTTAAATCATTCCAGAAATTGCCCATTTTATTCTATTT

120

240

360

480

bUU

720

840

9bU

108 0

200

11320

1440

1560

1080

FIG. 4. The nucleotide sequence of the hsdS gene of SP. The sequence (upper line) is aligned with that of E. coli K-12. The hsdS gene ofSP begins at bp 226 and ends at bp 1617; the initiation and termination codons are underlined.

9372 Biochemistry: Fuller-Pace and Murray

1 MNRGKLPEGWATAPVSTVTTLIRGVTYKKEQALNYLQDDYLPI IRANNIQ 501111111 11111111111111111111 111 11111 1111111

1 MSAGKLPEGWVIAPVSTVTTLIRGVTYKKEQAINYLKDDYLPLIRANNIQ 50

51 NGKFDTTDLVFVPKNLVKESQKISPEDIVIAAMSSGSKSWGKSAHQRLPF 100I'I 11111111 111111111111111111 111111111 1111111 111

51 NGKFDTTDLVFVPKNLVKESQKISPEDIVIAMSSGSKSVVGKSAHQHLPF 100

101 ECSVGAFCGALRPEKFISPNYIAHFTKSSFYRNKISSLSAGANINNIKPA 150111111111 11111 1 11111111 1111111111

101 ECSFGAFCGVLRPERLIFSGFIAHFTKSSLYRNKISSLSAGANINNIKPA 150

151 SFDLINIPIPSLAEQKI IAEKLDTLLAQVDSTKARLEQIPQILKRFRQAV 2001111.1 111111111111111111111111 11111111111111

151 SFDLINIPIPPLAEQKI IAEKLDTLLAQVDSTKARFEQIPQILKRFRQAV 200

201 LAAAVSGTLTTALRN. SHSLIGWHSTNLGALIVDSCNGLAKRQGLNGNEI 249

201 LGGAVNGKLTEKWRNFEPQHSVFKKLNFESILTELRNGLSSKPNESGVGH 250

250 TILRLADFKDAQRI IGNERKIKLDSKEENKYSLENDDILVIRVNGSADLA 299

251 PILRISSVRAGHVDQNDIRFLECSESELNRHKLQDGDLLFTRYNGSLEFV 300

300 GRFIEYKSNGDIEGFCDHFIRLRLDSNKIMSRFLTYIANEGEGRFYLRNS 349

301 GVCGLLKKLQHQNLLYPDKLIRARLTKDALPEYIZIFFSSPSARNAMMNC 350

350 LSTSAGQNTINQTSIKGLSFLLPPLKEQABEIVRRVEQLFAYADTIEKQVN 399

351 VKTTSGQKGISGKDIKSQWLLPPVKEQAEIVRRVEQLFAYADTIEKQVN 400

400 MALTRVKSLTQSILAKAFRGELTAQWRAENPDLISGKNSAAALLEKIKAE 449

401 NALARVNNLTOSILAKArRGZLTAQWRAENPDLISGENSAAALLEKIKAE 450

450 RAVSGGKKTSRKKA 46311 111111111i

451 RAASGGKASRKKS 464

FIG. 5. The predicted amino acid sequence (in single-letter code)of the specificity polypeptide of SP. The sequence (upper line) isaligned with that of E. coli K-12 (lower line). The proximal variableregion, as defined by Gough and Murray (4), ends at amino acid 158;the central conserved region, at amino acid 201; and the distalvariable region, at amino acid 377 (see Fig. 3A for a diagrammaticrepresentation).

the adjacent variable regions. Although these sequences arenot predicted to be helical, they are adjacent to prominenta-helical domains, and they are within the most conservedparts of the repeats identified by Argos (20).No experimental evidence argues against a correlation of

the variable regions with the recognition domains-a conceptthat now receives circumstantial support from the nearidentity of the amino acid sequences of the amino-terminalsegments of the K and SP polypeptides. It is tempting toassume that the variable regions contribute to the specificitymost simply by including the recognition domains. Alterna-tively, they, or more particularly the parts adjacent to the

a-helical regions, could alter the presentation of the actualDNA binding domains within the relatively conserved re-gions. If single-base changes alone are sufficient to dictate adifferent specificity of recognition, then it should be possibleto isolate mutants with novel specificities. Such mutants havenot been reported, and we have failed to select mutations thatrelax the specificity of K to that of SP. We suggest that thevariable regions provide diversity of recognition, and naturalrecombination can add to this diversity by reassortment ofexisting domains.

We thank A. Campbell and A. Daniel for the nucleotide sequenceofthe SB and SQ specificity genes; E. Kawashima (Biogen, Geneva)for synthetic oligonucleotides; T. Bickle,J. Collins, A. Coulson, A.Gann, D. Meek, and K. Murray for constructive criticism of themanuscript; K. Harris and A. Wilson for help in the preparation ofthe manuscript; and the Medical Research Council for support.

1. Boyer, H. W. & Roulland-Dussoix, D. (1969) J. Mol. Biol. 41,459-472.

2. Hubacek, J. & Glover, S. W. (1970) J. Mol. Biol. 50, 111-127.3. Van Pel, A. & Colson, C. (1974) Mol. Gen. Genet. 135, 51-60.4. Gough, J. A. & Murray, N. E. (1983) J. Mol. Biol. 166, 1-19.5. Bullas, L. R., Colson, C. & Van Pel, A. (1976) J. Gen.

Microbiol. 95, 166-172.6. Fuller-Pace, F. V., Bullas, L. R., Delius, H. & Murray, N. E.

(1984) Proc. Nat!. Acad. Sci. USA 81, 6095-6099.7. Bickle, T. A. (1982) in Nucleases, eds. Linn, S. M. & Roberts,

R. J. (Cold Spring Harbor Laboratory, Cold Spring Harbor,NY), pp. 85-108.

8. Nagaraja, V., Steiger, M., Nager, C., Hadi, S. M. & Bickle,T. A. (1985) Nucleic Acids Res. 13, 389-399.

9. Nagaraja, V., Shepherd, J. C. W., Pripfl, T. & Bickle, T. A.(1985) J. Mol. Biol. 182, 579-587.

10. Nagaraja, V., Shepherd, J. C. W. & Bickle, T. A. (1985)Nature (London) 316, 371-372.

11. Messing, J. (1983) Methods Enzymol. 101, 20-78.12. Biggin, M. D., Gibson, T. J. & Hong, G. F. (1983) Proc. Nat!.

Acad. Sci. USA 80, 3963-3965.13. Deininger, P. L. (1983) Anal. Biochem. 129, 216-223.14. Benton, W. D. & Davies, R. W. (1977) Science 196, 180-182.15. Hu, N. & Messing, J. (1982) Gene 17, 271-277.16. Sanger, F., Coulson, A. R., Barell, B. G., Smith, A. J. H. &

Roe, B. A. (1980) J. Mol. Biol. 43, 161-178.17. Devereux, J., Haeberli, P. & Smithies, 0. (1984) Nucleic Acids

Res. 12, 387-395.18. Staden, R. (1982) Nucleic Acids Res. 10, 4731-4751.19. Sain, B. & Murray, N. E. (1980) Mol. Gen. Genet. 180, 35-46.20. Argos, P. (1985) EMBO J. 4, 1351-1355.21. Pabo, C. 0. & Sauer, R. T. (1984) Annu. Rev. Biochem. 53,

293-321.

Proc. Natl. Acad Sci. USA 83 (1986)