15
Quadruplet Codons: Implications for Code Expansion and the Specification of Translation Step Size Barry Moore, Britt C. Persson, Chad C. Nelson, Raymond F. Gesteland and John F. Atkins* Department of Human Genetics, University of Utah 15N 2030E Rm 7410, Salt Lake City, UT 84112-5330, USA One of the requirements for engineering expansion of the genetic code is a unique codon which is available for specifying the new amino acid. The potential of the quadruplet UAGA in Escherichia coli to specify a single amino acid residue in the presence of a mutant tRNA Leu molecule containing the extra nucleotide, U, at position 33.5 of its anticodon loop has been examined. With this mRNA-tRNA combination and at least partial inactivation of release factor 1, the UAGA quadruplet specifies a leucine residue with an efficiency of 13 to 26 %. The decoding properties of tRNA Leu with U at position 33.5 of its eight-membered anticodon loop, and a counterpart with A at position 33.5, strongly suggest that in both cases their anticodon loop bases stack in alternative conformations. The identity of the codon immediately 5 0 of the UAGA quadruplet influences the efficiency of quadruplet translation via the properties of its cognate tRNA. When there is the potential for the anticodon of this tRNA to dis- sociate from pairing with its codon and to re-pair to mRNA at a nearby 3 0 closely matched codon, the efficiency of quadruplet translation at UAGA is reduced. Evidence is presented which suggests that when there is a purine base at position 32 of this 5 0 flanking tRNA, it influences decoding of the UAGA quadruplet. # 2000 Academic Press Keywords: quadruplet codon; code expansion; frameshift suppressors; anticodon; frameshifting *Corresponding author Introduction Artificially expanding the genetic code to increase the repertoire of co-translationally incor- porated amino acids in vivo is a challenging task but worthwhile, since novel amino acid residues in proteins can provide molecular beacons for structural studies, and can lead to novel func- tional specificity. In addition, re-engineering the code can be instructive for understanding the mechanisms of standard decoding. Two hurdles for co-translational incorporation of new amino acid residues need to be overcome. The first, specific aminoacylation of tRNA with novel amino acids has recently seen dramatic advances (Saks et al., 1996; Liu et al., 1997; Liu & Schultz, 1999). Here, we deal with the second, the issue of unique codons to specify the novel amino acids. The most challenging approach is to expand the genetic alphabet with non-standard bases (Benner et al., 1999). A second approach is to use unassigned codons in an organism, such as Micrococcus luteus, in which not all triplet codons are used (Kowal & Oliver, 1997). In Escherchia coli the stop codon UAG is used to terminate a minority of genes, but when posi- tioned internally within coding sequences in the presence of different suppressor tRNA, it can be used to specify a variety of amino acid residues. Some 4000 standard amino acid residues have been specified in the E. coli lac repressor by this means (Markiewicz et al., 1994). tRNA species that read the triplet UAG have been acylated in vitro with non-standard amino acid residues and used in in vitro protein synthesizing systems (Short et al., 1999) or after injection into Xenopus oocytes (Saks et al., 1996), to generate proteins with a single non-standard amino acid residue. Instead of using UAG, some studies have focused on depletion of a particular standard Present address: B. Persson, Department of Microbiology, Umea ˚ University, S-901 87 Umea ˚, Sweden. E-mail address of the corresponding author: [email protected] doi:10.1006/jmbi.2000.3658 available online at http://www.idealibrary.com on J. Mol. Biol. (2000) 298, 195–209 0022-2836/00/020195–15 $35.00/0 # 2000 Academic Press

Quadruplet codons: implications for code expansion and the specification of translation step size

Embed Size (px)

Citation preview

Page 1: Quadruplet codons: implications for code expansion and the specification of translation step size

doi:10.1006/jmbi.2000.3658 available online at http://www.idealibrary.com on J. Mol. Biol. (2000) 298, 195±209

Quadruplet Codons: Implications for Code Expansionand the Specification of Translation Step Size

Barry Moore, Britt C. Persson, Chad C. Nelson, Raymond F. Gestelandand John F. Atkins*

Department of HumanGenetics, University of Utah15N 2030E Rm 7410, Salt LakeCity, UT 84112-5330, USA

Present address: B. Persson, DepaMicrobiology, UmeaÊ University, S-9Sweden.

E-mail address of the [email protected]

0022-2836/00/020195±15 $35.00/0

One of the requirements for engineering expansion of the genetic code isa unique codon which is available for specifying the new amino acid.The potential of the quadruplet UAGA in Escherichia coli to specify asingle amino acid residue in the presence of a mutant tRNALeu moleculecontaining the extra nucleotide, U, at position 33.5 of its anticodon loophas been examined. With this mRNA-tRNA combination and at leastpartial inactivation of release factor 1, the UAGA quadruplet speci®es aleucine residue with an ef®ciency of 13 to 26 %. The decoding propertiesof tRNALeu with U at position 33.5 of its eight-membered anticodon loop,and a counterpart with A at position 33.5, strongly suggest that in bothcases their anticodon loop bases stack in alternative conformations. Theidentity of the codon immediately 50 of the UAGA quadruplet in¯uencesthe ef®ciency of quadruplet translation via the properties of its cognatetRNA. When there is the potential for the anticodon of this tRNA to dis-sociate from pairing with its codon and to re-pair to mRNA at a nearby30 closely matched codon, the ef®ciency of quadruplet translation atUAGA is reduced. Evidence is presented which suggests that when thereis a purine base at position 32 of this 50 ¯anking tRNA, it in¯uencesdecoding of the UAGA quadruplet.

# 2000 Academic Press

Keywords: quadruplet codon; code expansion; frameshift suppressors;anticodon; frameshifting

*Corresponding author

Introduction

Arti®cially expanding the genetic code toincrease the repertoire of co-translationally incor-porated amino acids in vivo is a challenging taskbut worthwhile, since novel amino acid residuesin proteins can provide molecular beacons forstructural studies, and can lead to novel func-tional speci®city. In addition, re-engineering thecode can be instructive for understanding themechanisms of standard decoding. Two hurdlesfor co-translational incorporation of new aminoacid residues need to be overcome. The ®rst,speci®c aminoacylation of tRNA with novelamino acids has recently seen dramatic advances(Saks et al., 1996; Liu et al., 1997; Liu & Schultz,1999). Here, we deal with the second, the issue

rtment of01 87 UmeaÊ ,

ing author:

of unique codons to specify the novel aminoacids. The most challenging approach is toexpand the genetic alphabet with non-standardbases (Benner et al., 1999). A second approach isto use unassigned codons in an organism, suchas Micrococcus luteus, in which not all tripletcodons are used (Kowal & Oliver, 1997). InEscherchia coli the stop codon UAG is used toterminate a minority of genes, but when posi-tioned internally within coding sequences in thepresence of different suppressor tRNA, it can beused to specify a variety of amino acid residues.Some 4000 standard amino acid residues havebeen speci®ed in the E. coli lac repressor by thismeans (Markiewicz et al., 1994). tRNA speciesthat read the triplet UAG have been acylatedin vitro with non-standard amino acid residuesand used in in vitro protein synthesizing systems(Short et al., 1999) or after injection into Xenopusoocytes (Saks et al., 1996), to generate proteinswith a single non-standard amino acid residue.Instead of using UAG, some studies havefocused on depletion of a particular standard

# 2000 Academic Press

Page 2: Quadruplet codons: implications for code expansion and the specification of translation step size

Figure 1. The cloverleaf structure of Su6 tRNA asdeduced from the DNA sequence is shown. Boxedletters indicate locations where Su6 deviates from theproduct of the wild-type leuX gene. The N shows thelocation of the nucleotide (designated as position 33.5)inserted to create an eight-base anticodon loop. TheFigure is adapted from Yoshimura (1984). tRNA nucleo-tide numbering follows Sprinzl et al. (1998).

196 Quadruplet Codons

amino acid residue in stationary phase E. colicells and manipulated the system to incorporatea derivative amino acid into a protein whosesynthesis was turned on at the same time(Budisa et al., 1998). A different type ofapproach would be to manipulate a naturallyoccurring recoding system such as the mechan-ism for the incorporation of selenocysteine(Baron & BoÈck, 1995). The use of quadrupletcodons has aroused interest because of the possi-bility of incorporating multiple non-standardamino acid residues in a single protein. Successhas been achieved in vitro with the incorporationof two non-standard amino acid residues atdifferent positions of streptavidin, speci®ed byACCU and CGGG (Hohsaka et al., 1999).Although the potential usefulness of one in vitrosystem for producing substantial quantities ofproduct has recently become more realistic(Madin et al., 2000), the experiments reportedhere were performed in vivo. Here, we useUAGN quadruplets in vivo for specifying stan-dard amino acid residue(s) to assess their poten-tial for systems in which aminoacylation ismanipulated.

Four bases can specify an amino acid residue inresponse to a mutant tRNA species with an extranucleotide in its anticodon loop (eight nucleotidesas opposed to the standard seven). This wasinitially demonstrated by Yourno (1972) andRiddle & Carbon (1973) based on the earlier workof Riddle & Roth (1970). At least some eight-baseanticodon loop mutants mediate quadrupletdecoding by detachment from the initial antico-don:codon pairing followed by re-pairing of thetRNA molecule to the message via an overlappingcodon (O'Connor et al., 1989). Whether decoding ofa single quadruplet by eight-base anticodon loopsoccurs by ``once only'' pairing of the anticodon tothe mRNA is uncertain (Atkins et al., 2000), andhas been proposed not to occur (Farabaugh &BjoÈrk, 1999). In either case, decoding of extendedsequences by detatchment and re-pairing needs tobe constrained for a code expansion system basedon quadruplets to be ef®cient. To study accuratelythe variety of potential translation events, anddetermine the extent of detachment and re-pairing,here we utilized mass spectrometry for identi®-cation of the products. The high level of resolutionand speci®city obtained by mass spectrometryreveals detailed insights into the speci®c transla-tional products present.

To investigate the potential for quadrupletdecoding we have made tRNA derivatives withan extra nucleotide placed immediately 50 of thestandard anticodon. Since the extra nucleotide is50 rather than 30, the anticodon is not shifted(Murgola et al., 1983; Tuohy et al., 1992). Theanticodon 30AUC50 is that of an amber suppressorcomplementary to the stop codon UAG. Theabsence of a competing tRNA species for UAGreading and inactivation of a temperature-sensi-tive release factor 1 optimizes the chance for

detection of doublet, triplet and quadrupletdecoding due to the mutant tRNA. This systemis similar to that used by Curran & Yarus (1987)where a single nucleotide was inserted at the 50side of the anticodon loop of an E. coli gluta-mine-inserting amber suppressor, which wasthen tested for triplet and quadruplet decodingof UAGN in competition with a release factor.In contrast to the pioneers of this particularapproach, we have used a different tRNA back-bone (Figure 1), monitored all three readingframes, inactivated release factor 1 in some ofthe experiments and analyzed the products byprotein sequencing and mass spectrometry.

Here, we ®rstly identify the combination of aUAGN message quadruplet and the 30AUCN50

expanded anticodon loop that has the highestpotential for promoting quadruplet reading. Thetranslation products of this combination were theninvestigated. In addition, translation products wereanalyzed for a number of derivative constructswith the UAGN quadruplet and a variety of 50codons, due to the potential in¯uence on frame-shifting of the tRNA decoding this codon.

Page 3: Quadruplet codons: implications for code expansion and the specification of translation step size

Quadruplet Codons 197

Results

A modified tRNALeu and a UAGN quadruplet inmRNA to probe for quadruplet reading

Amber suppressor tRNA Su6 is encoded by thesupP allele of the leuX gene (Yoshimura et al.,1984). Wild-type (WT) leuX codes for a leucinetRNA with an 30AAC50 anticodon. Su6 differs fromthe WT product of leuX by two mutations, A26Gand A35U (Figure 1). The latter mutation givestRNA Su6 an 30AUC50 anticodon allowing it to sup-press UAG stop codons. This tRNA has a 30-100 %ef®ciency of suppression at UAG codons (Miller &Albertini, 1983), and was chosen as a likely candi-date to have activity after it's anticodon loop hadbeen modi®ed to contain eight nucleotides. Oligo-nucleotides designed to create variants of the Su6tRNA, each with one extra nucleotide (either A, C,G or U) inserted immediately 50 of the anticodon,were placed into pACYC184 under control of theinducible tac promoter (Figure 1). The mutanttRNA will be written as 30AUCN50 tRNALeu toreadily identify the extra nucleotide and its pos-ition in relation to the anticodon of its parent.

Assays to determine the most efficientcombination of UAGN message quadruplet:30AUCN50 tRNALeu

To measure reading by the set of the mutanttRNA species in different frames at a UAGN quad-ruplet, oligonucleotides (see Table 1) were insertedbetween a gene encoding glutathione-S-transferase

Figure 2. Assays of the products for combinations of30AUCN50 tRNALeu, compared in all three reading framesfusion relative to full length and terminated fusion. Data pbars indicate average deviation from the mean. UAGN quasymbols (�, 0, ÿ) showing the reading frame of the fusionx-axis subheadings indicate anticodon loop sequences with t

(GST) and b-galactosidase (lacZ) producing a GST-lacZ fusion. A set of 12 plasmids was constructedthat encoded mRNA with the UAG codon, fol-lowed by either A, C, G or U and with the lacZportion in the ÿ1, 0 or �1 frame. In this way,decoding in all three reading frames could be mon-itored for all four UAGN quadruplets by chemi-luminescent immunoblotting.

Results from these assays (Figure 2) were ana-lyzed for tRNA/message combinations that mightrepresent quadruplet codon reading (with a conse-quent shift to the �1 frame). In strain MRA8 witha temperature-sensitive RF1 at non-permissive tem-peratures (42 �C) (Zhang et al., 1994), the levels ofreadthrough and/or frameshifting were two tofour times higher than those found in SU1675,which is WT for RF1 (data not shown). The combi-nation of 30AUCU50 tRNALeu with the complemen-tary UAGA quadruplet and 30AUCC50 tRNALeu

with the complementary UAGG quadruplet resultin a high level of �1 frameshifting, while shiftingto the ÿ1 frame or zero-frame readthrough were atvery low levels for these same tRNA/messagecombinations. The ®nding that four-base trans-lation is more ef®cient with an anticodon loop thatprovides potential four-base complementarity isconsistent with the previous reports by Gaber &Culbertson (1984) with a different system, and byCurran & Yarus (1987) using a similar system.Additionally, four tRNA/message combinationsshowed zero-frame readthrough of UAG at highlevels; 30AUCU50 tRNALeu with quadruplets UAGC,UAGG, and UAGU, and 30AUCC50 tRNALeu with

a UAGN quadruplet in the mRNA in the presence ofare shown. Data represent the percentage of full-lengthoints are the mean of seven separate experiments. Errordruplets are indicated on the x-axis and are followed by30 of the UAGN quadruplet relative to the 50 fusion. Thehe extra nucleotide shown in parentheses.

Page 4: Quadruplet codons: implications for code expansion and the specification of translation step size

Table 1.

198 Quadruplet Codons

quadruplet UAGC. In WT cells without a mutanttRNA species, UAG followed by A or G gives alower level of natural readthrough than does UAGfollowed by C (Poole et al., 1995). The combinationof 30AUCU50 tRNALeu with the quadruplet UAGA,which yields a high level of �1 frameshifting, pro-vides the most promising results for possible quad-ruplet translation and was chosen for furtherinvestigation.

Characterization of protein products

To investigate the speci®c products of translationof the UAGA quadruplet, oligonucleotides wereinserted between a gene encoding glutathione-S-transferase and maltose-binding protein (malE) inthe vector GM1 (see Materials and Methods). Pro-teins were isolated from strains with these con-structs containing UAGA in several contexts andwith either 30AUCU50 tRNALeu (four-base comple-mentarity) or 30AUCA50 tRNALeu (fourth base non-complementary). The products were analyzed byboth mass spectrometry and Edman degradationprotein sequencing.

GCC was chosen as the codon immediately 50 ofthe UAGA quadruplet because the studies byGallant and colleagues showed that in E. coli thiscreates a good 50 context for promoting �1 frame-shifting at an adjacent 30 codon (Peter et al., 1992;Lindsley & Gallant, 1993). GCC is decoded by aminor tRNAAla, one of two E. coli tRNA specieswith a purine base rather than a pyrimidine baseat position 32. Perhaps this purine base interactswith an adjacent A-site tRNA (Peter et al., 1992), ina manner equivalent to the interaction demon-strated by Smith & Yarus (1989). DoÈring et al.(1994) presented a cross-linking study for position32. The constructs used for the assays from the pre-vious section also had a GCC codon immediately50 of the UAGA quadruplet. After this work wasinitiated, O'Connor (1998) reported that C atposition 32 in a tRNAG1y also has a direct effecton the ability of the tRNA containing itto cause frameshifting, although in this case, the

frameshifting did not involve detachment andre-pairing.

The naming scheme for the constructs encodingthe mRNAs is pMNNN#X. The ®rst letter p desig-nates all constructs as plasmids. The subscriptedNNN represents the codon immediately 50 of theUAGA quadruplet(s). The #, which is always 1here, designates the number of UAGA quadrupletspresent, and the superscripted X, when present,represents other plasmid details as described.

As dictated by the mRNA sequence, interpret-ation of the speci®c amino acid residues incorpor-ated at the quadruplet region was based onaccurate measurement of the mass difference (ormass shift) of the molecular mass of the intact pro-teins. Measured molecular masses of proteinsagreed with theoretical values to within 0.01 %.The mass measurements of large molecules (e.g.68,000 Da), however, are limited by the inherentaccuracy of the mass spectrometer used, whichdoes not permit unambiguous distinction of certainamino acid residue substitutions that differ in massby only a few daltons (i.e. leucine, 113 Da; aspara-gine, 114 Da; aspartic acid, 115 Da).

Most ribosomes encountering the UAGA quad-ruplet terminated translation. It is not known,however, if all terminated peptides actuallyresulted from standard termination. A portion ofthis terminated product may be due to release ofthe engineered tRNA from the ribosomal P-site fol-lowed by hydrolysis of the peptidyl-tRNA bond bypeptidyl-tRNA hydrolase (this would generate anidentical sized product). Other translating ribo-somes may be lost as a result of tmRNA functiondue to a pause at the (U)AGA codon (Roche &Sauer, 1999) and this terminated product would belost due to tmRNA-mediated proteolysis.

UAGA quadruplet

GCC codon 5 0 of a UAGA quadruplet

The construct pMGCC1 has a single UAGA quad-ruplet and is framed such that expression of thefull-length fusion requires a �1 frameshift(Figure 3(a)). The molecular mass spectrum of pro-teins generated from pMGCC1 in the presence of30AUCU50 tRNALeu shows a peak at 26,665 Dawhich corresponds to termination at UAG of theUAGA quadruplet (26,667 Da theoretical mass).The theoretical mass of the interpreted protein willbe given throughout the text in parentheses.A peak at 26,534 Da (26,536), labeled -Met inFigure 3(a), is the result of in vivo N-terminal meth-ionine removal from the protein represented by the26,665 Da peak. These minus-methionine groupderivatives are indicated in the Figures throughout,and are not discussed below. A third product of28,235 Da represents zero-frame, triplet read-through of the UAG with insertion of a leucineresidue and 12 additional amino acid residues(28,237 Da) (generation of the product is depen-dent on the engineered tRNALeu, and the amino

Page 5: Quadruplet codons: implications for code expansion and the specification of translation step size

Figure 3. Mass spectra and protein sequencing data shown for translation products of pMGCC1 translated in thepresence of (a) 30AUCU50 tRNALeu with four-base complementarity to the UAGA quadruplet or (b) 30AUCA50 tRNALeu

which is non-complementary in the fourth quadruplet position. Molecular mass spectra are shown for terminationproducts and full-length fusion products. Peaks labeled ÿMet are the result of in vivo N-terminal methionine removalfrom the parent protein about 131 Da higher in mass. Peaks labeled with daggers ({) represent masses for which noprotein interpretation was found; double daggers ({) indicate glutathione group derivatives as described in the text.Interpretation of the products associated with the major peaks are shown by alignment of interpreted amino acidsequence below the mRNA sequence for the construct except for products terminating at UAG of the UAGA quadru-plet, which are not shown. Interpreted sequence for peaks representing standard termination and minus methionineproteins are not shown. Measured mass and the corresponding theoretical mass (in parentheses) are shown for eachmajor peak. Theoretical mass values were calculated using average isotopes. For peptide N-terminal sequencing, ver-tical bars correspond to amino acid recovery for each cycle, and are represented in the order AEGIKLNQRSTVW,indicating all signi®cant amino acids recovered. Bars with upper arrows have been truncated due to very high aminoacid residue recovery. The major amino acid residue recovered in each cycle is labeled in large black letters; minoramino acid residues are labeled with small italic letters. Large gray letters label major secondary amino acid residuerecovery where appropriate. Interpretations of the protein sequence data are shown superimposed on the mRNAthree-frame translated sequence with gray shaded boxes indicating the major sequence and outlined boxes indicatingthe secondary sequences where appropriate. The plasmids generating the mRNA and the overexpressed modi®edtRNALeu are indicated in the Figure.

Quadruplet Codons 199

acid residues with similar masses to leucine are notcandidates as there is no codon for aspartic acid inthe vicinity of the UAG, and skipping the UAG todirectly decode the 30 AAC asparagine codonwould give 11 additional amino acid residues, not12). The leucine residue is inferred to be derivedfrom the 30AUCU50 tRNALeu reading the UAG, with

termination occurring at the next in-frame stopcodon, UAA. The other constructs presented belowwith a different codon immediately 50 of theUAGA quadruplet did not show in-frame UAGreadthrough. This prompted an investigation ofreadthrough with a construct pMGCC1Z, which isthe same as above, but with the 30 fusion in the

Page 6: Quadruplet codons: implications for code expansion and the specification of translation step size

200 Quadruplet Codons

zero-frame. With 30AUCU50 tRNALeu (Figure 4(a)),leucine is the predominant amino acid residueencoded by UAG in the UAGA quadruplet, asseen in both the mass spectrum and the proteinsequence data. The protein sequence data furtherindicates that a small number of glutamine resi-dues is also encoded by the same UAG codon, anda small peak on the right shoulder of the 68,201 Dapeak is compatible with this minor glutamineincorporation. In the presence of 30AUCA50 tRNALeu

(Figure 4(b)) which is non-complementary to thefourth position of the UAGA quadruplet, therelative orders are reversed and glutamine ispredominant, again con®rmed by both the massspectrometric and protein sequence data. Theamino acid residue most commonly ``misacylated''on mutant tRNA molecules is glutamine (Yanivet al., 1974), raising the possibility that 30AUCA50

tRNALeu is misacylated with glutamine. However,as shown below, the 30AUCA50 tRNALeu inserts aleucine residue when mediating quadruplet decod-ing; we deduce the encoding of glutamine by UAGin the above experiment is due to misreading by

Figure 4. Mass spectra and protein sequencing data showpresence of (a) 30AUCU50 tRNALeu with four-base complemenwhich is non-complementary in the fourth quadruplet positio

the WT CAG decoding tRNAGln, especially sinceglutamine is the amino acid residue most com-monly found to be encoded by leaky UAG stopcodons.

Returning to the protein products of pMGCC1(Figure 3(a)), which monitors the �1 frame, twopeaks predominate in the mass spectrum of full-length products. One peak, with a mass assign-ment of 68,206 Da, corresponds to a full-lengthfusion with the �1 frameshift accomplished bythe UAGA quadruplet specifying a leucine resi-due shown by the amino acid sequenceGIRALTVWNSQ (68,203 Da). In this case webelieve that it is the engineered tRNA moleculethat mediates UAGA decoding and not the WTUUG decoding tRNALeu, which would have asecond position AÐA mismatch when decodingUAG. The second peak of mass 68,249 Da cor-responds to the mass of a full-length fusionwith a �1 frameshift achieved by the UAGAquadruplet specifying an arginine residue,GIRARTVWNSQ (68,246 Da), presumablyencoded by the AGA arginine codon found

n for translation products of pMGCC1Z translated in thetarity to the UAGA quadruplet or (b) 30AUCA50 tRNALeu

n. See the legend to Figure 3 for further details.

Page 7: Quadruplet codons: implications for code expansion and the specification of translation step size

Quadruplet Codons 201

within the UAGA quadruplet. Protein sequencingcon®rms these amino acid sequence predictions.The sequence data show two predominantsequences; the primary sequence is GIR-ALTVWNSQ. As can be seen from the mRNA/protein alignment in Figure 3(a), a single leucineresidue corresponds to the UAGA quadruplet.The secondary sequence, TVWNSQLKIEE,requires further explanation. Peptides wereprepared for N-terminal protein sequencing bydigestion with factor Xa protease which cleavesafter the arginine residue in its preferred clea-vage site I(E/D)GR. However, it is also knownto cleave after an arginine residue in other con-texts, and after other basic residues (Ile, 1993;Nagai, 1987). In this study, the factor Xa diges-tion used to liberate the peptide of interest alsocleaved ef®ciently (�100 %) after the arginineresidue at the secondary cleavage site, GIR-ARXXX. A weak protein sequence signal,SLTVWNSQ (Figure 5(a)) suggests minor clea-vage after the arginine in the sequenceGIRSLTVWNSQ. No other non-standard factor

Figure 5. Mass spectra and protein sequencing data showpresence of (a) 30AUCU50 tRNALeu with four-base complemenwhich is non-complementary in the fourth quadruplet positi

Xa cleavages were observed. Given this evidenceand the corroboration of the mass spectral dataof uncleaved material, we conclude that the sec-ondary sequence, TVWNSQLKIEE, representsarginine encoded by the UAGA quadruplet. Apossible explanation for this arginine incorpor-ation is that GCC decoding tRNAAla detachesfrom the GCC codon and re-pairs to mRNA viathe overlapping CCU forming G �C and G �Ubase-pairs. This detachment and re-pairing islikely in¯uenced by the 30 adjacent UAG stopcodon. In this event, the next available codon isthe AGA arginine codon.

The results in the previous section show thattranslation of a UAGA quadruplet in the presenceof 30AUCU50 tRNALeu (with potential four-basecomplementarity in its anticodon loop to theUAGA quadruplet) predominately encodes leucineresidues, with some arginine residues alsoencoded. To determine if this leucine residue incor-poration is dependent on four-base complementar-ity, 30AUCA50 tRNALeu was tested.

n for translation products of pMUGG1 translated in thetarity to the UAGA quadruplet or (b) 30AUCA50 tRNALeu

on. See legend to Figure 3 for further details.

Page 8: Quadruplet codons: implications for code expansion and the specification of translation step size

202 Quadruplet Codons

In the presence of 30AUCA50 tRNALeu

(non-complementary in the fourth position), aprimary peak at 26,666 Da due to termination atUAG of the UAGA quadruplet is found in thespectrum of termination products (26,667 Da)(Figure 3(b)). The minor peak at 28,245 Da iswithin range of triplet readthrough of UAG byleucine (28,237 Da) or triplet readthrough byglutamine (28,252 Da). In this case a conclusivesequence prediction is not made. The higherrange mass spectrum shows a peak at 68,244 Dathat is mass-compatible with the UAGA quad-ruplet encoding arginine, GIRARTVWNSQ(68,246 Da). The protein sequence TVWNSQ pre-dominates, and as discussed above, this rep-resents speci®cation of an arginine residue bythe UAGA quadruplet, and con®rms thesequence predictions of the mass data. WithGCC as the 50 codon, decoding of the UAGA asa leucine residue is deduced to be dependent onfour-base complementarity of the engineeredtRNA. To reduce the possibility of detachmentfrom the 50 ¯anking codon followed by re-pairing to mRNA at a nearby codon, the GCCwas substituted by UCG.

UCG codon 5 0 of a UAGA quadruplet

The construct pMUCG1 has a UCG codonimmediately 50 of the UAGA quadruplet(Figure 5(a)). In the presence of complementary30AUCU50 tRNALeu, mass spectrometry showedonly termination protein product in the lowermass range. A peak at 68,223 Da matches themass of a full-length length product with leucineencoded by the UAGA quadruplet,GIRSLTVWNSQ (68,219 Da); protein sequencingsupports this assignment. A minor sequence,SLTVWNSQ, in the protein sequence data is alsofound from cleavage by factor Xa protease afterthe arginine residue in the sequenceGIRSLTVWNSQ, as discussed above. Since anarginine residue was not inserted at the UAGAquadruplet in the presence of a 50 UCG codon,it seems likely that slippage at the ribosomal P-site, stimulated by the 30 ¯anking stop codon inconjunction with the 50 GCC codon, is respon-sible for the arginine encoded by the UAGAquadruplet (Figure 3). An alternative interpret-ation, base skipping, cannot be de®nitively elimi-nated however, because ignoring the ®rst baseof an A-site codon is likely to be in¯uenced bythe nature of the P-site codon:anticodon inter-action. A minor peak present at 68,527 Da rep-resents the 68,223 Da protein with a glutathionemolecule covalently bound (68,524 Da). Gluta-thione is the competing reagent used to displacefusion proteins from the GST af®nity columnand can form a derivative with proteins bymeans of a disul®de bond between a glutathioneand a cysteine residue. This glutathione deriva-tive is present in other samples, as seen in theFigures. In summary, with UCG 50 of the UAGA

quadruplet, leucine incorporation occurs in thepresence of both 30AUCU50 tRNALeu and30AUCA50 tRNALeu. The ability to exclude signi®-cantly the insertion of arginine residues bringsus closer to the goal of specifying a singleamino acid residue from an engineered tRNAspecies at a message quadruplet.

With 30AUCA50 tRNALeu (non-complementary inthe fourth position) the protein resulting from ter-mination at the UAG of the UAGA quadrupletwas the only protein product found in the spec-trum of termination products (Figure 5(b)). A peakat 68,222 Da shows that a full-length protein is pro-duced by inserting leucine for the UAGA quadru-plet, GIRSLTVWNSQ (68,222 Da). This sequence issupported by the protein sequence data. The massspectrometric data also shows a minor peak atmass 68,010 Da. Given the earlier precedents forhops over stop codons (Weiss et al., 1987, 1990;O'Connor et al., 1989), this likely represents a four-base hop over the quadruplet from the serinecodon (UCG) to the threonine codon (ACG), pro-ducing the sequence, GIRSVWNSQ (68,005 Da).The tRNASer with anticodon 30AGC50 could formtwo Watson Crick base-pairs on re-pairing at theACG threonine codon.

To gain insight into the constraints affecting thechoice of codons 50 of the UAGA quadruplet, con-structs were made with a 50 codon whose tRNAalso could not re-pair at the �1 AGA codon. Thecodon chosen, CGA, is unusual as its cognatetRNAArg

2 is the only E. coli tRNA with an inosineresidue at position 34 in apposition to the thirdcodon base. The purine:purine apposition of I inthe anticodon with A in the codon destabilizes thecodon:anticodon interaction (Curran, 1995; Carteret al., 1997) and in some cases in¯uences frame-shifting (Mejlhede et al., 1999).

CGA codon 5 0 of a UAGA quadruplet

The construct pMCGA1 has a CGA codonimmediately 50 of the UAGA quadruplet(Figure 6(a)). It is framed to require a �1 frameshiftfor full-length fusion expression. The molecularmass spectrum of the termination products showstermination only at UAG of the UAGA quadruplet,and several full-length products. The principlepeak at 68,161 Da corresponds to the sequence,GIHRTVWNSQ (68,156 Da), which could beencoded by tRNAArg

2 (anticodon 30GCI50) hoppingover one base from CGA to AGA without pairingof the ®rst base of the AGA. In the presence of30AUCA50 tRNALeu, this same encoding of arginineis the exclusive event (Figure 6(b)), and again onlythe termination product is found in the lower-range spectrum. The protein sequence data supportthese interpretations. The secondary peak inFigure 6(a) at 68,274 Da represents a product withthe sequence, GIHRLTVWNSQ (68,269 Da), whereleucine is encoded by the UAGA quadruplet. Theprotein sequence data shows a leucine residue incycle 5.

Page 9: Quadruplet codons: implications for code expansion and the specification of translation step size

Figure 6. Mass spectra and protein sequencing data shown for translation products of pMCGA1 translated in thepresence of (a) 30AUCU50 tRNALeu with four-base complementarity to the UAGA quadruplet or (b) 30AUCA50 tRNALeu

which is non-complementary in the fourth quadruplet position. See the legend to Figure 3 for further details.

Quadruplet Codons 203

Discussion

Quadruplet decoding with an ef®ciency inthe range of 20-40 % by an engineered 30AUCU50

tRNALeu has been achieved. Analysis of the �1shifted products revealed that as much as two-thirds is accounted for by leucine being encodedby the UAGA quadruplet, suggesting that there isperhaps a 13-26 % rate of leucine residue insertionrelative to termination at the UAGA quadruplet. Inthis range it is potentially useful for code expan-sion experiments. The current system uses atemperature-sensitive release factor to limit termin-ation. However, context features in the mRNAmolecule might be massaged to improve the ef®-ciency, such as a preceding Shine-Dalgarnosequence (Weiss et al., 1987) or a downstreamstem-loop structure in the mRNA (Tsuchihashi,1991; Larsen et al., 1997).

The results obtained provide information rel-evant to the possibilities for �1 frameshifting: (1)the potential for the tRNA decoding the codon 50of the UAGA quadruplet to re-pair in the newframe; (2) the effect of a purine base at position 32

in that same tRNA; (3) the nature of stacking of theeight-base anticodon loop in the engineered tRNAand the effect of its conformation on which antico-don is presented; and (4) competition betweenframeshifting and recognition of UAG (of theUAGA quadruplet) in the A-site. Other possiblein¯uences might include some special ribosomerecognition of stop codons and possibly the natureof the E-site tRNA.

How does the tRNA specify which bases func-tion as the anticodon? In the standard case, with aseven-base anticodon loop, the ®rst loop base, 32,is stacked on the conserved base U33, forming atwo-base 50 stack. An abrupt turn in the backboneleads to bases 34-38 being stacked on each other,forming a ®ve-base 30 stack to complete the loop.This 502:530 stack con®guration is thought to be thekey for correct presentation of the anticodon. Inseven-base loops, the anticodon is always the ®rstthree bases (34-36) of the 30 stack as if its location isdelineated by the U33 turn, although interconver-sion between two conformations of seven-mem-bered anticodon loops under certain conditions hasbeen described by Labuda et al. (1985) and

Page 10: Quadruplet codons: implications for code expansion and the specification of translation step size

204 Quadruplet Codons

Gollnick et al. (1987). With an eight-base loop,anticodon speci®cation is less obvious, partlybecause of the alternate stacking opportunities.However, we will assume that in this case too, theabrupt turn between the 50 and 30 stack delineatesthe 50 end of the anticodon. The following discus-sion on stacking is based on this assumption, andif the anticodon can be presented from further upthe stack, many of the deductions made aboutstacking would be invalid.

Early work from this laboratory (Bossi & Smith,1984; O'Connor et al., 1989) and by others (Curran& Yarus, 1987) focussed on the most obvious stack-ing possibilities, 3:5 or 2:6, for tRNA species witheight-base anticodon loops (Figure 7). A likelyequilibrium between these stack conformations isalmost certainly in¯uenced by the identity of theextra base at position 33.5. Because of its propen-sity to mediate the turn between stacks, a uridineresidue at position 33.5 is likely to result in a rela-tively higher proportion of 3:5 compared to 2:6than when A is at position 33.5. Based on studieswith seven-base anticodon loops with an A-turninstead of a U-turn between the stacks (Bare et al.,1983; Ayer & Yarus, 1986; Ashraf et al., 1999), mol-ecules with a 3:5 stack and A at position 33.5 arelikely to be functional for initial pairing althoughless so than those with U at 33.5.

These inferences about stacking interactions mayhelp to explain the identity of the amino acid resi-

Figure 7. The potential base stacking con®gurations of a30AUCA50 tRNALeu are shown. The inserted nucleotide is labe

due incorporated during in-frame readthrough ofUAG in the UAGA quadruplet, which is depen-dent on whether U or A was present at position33.5 in the engineered tRNALeu. The engineeredtRNALeu with U33.5 predominates over competitortRNA, whereas with A33.5 it loses out to WTtRNAGln, which normally decodes CAG. We inter-pret this to be due to the U-containing tRNA mol-ecule having a higher proportion of 3:5 stackedmolecules with their greater ef®ciency of decodingthan the A-containing counterpart. An effect on theproportion of molecules in the 3:5 stack con®gur-ation is one way to understand the striking in¯u-ence of the identity of the upstream codon onreadthrough at UAG(A). Of the three codons tested50 of the UAGA quadruplet, only GCC permittedsigni®cant readthrough. Perhaps the purine base atposition 32 of the GCC decoding tRNAAla discrimi-nates, to some extent, in favor of the incoming,engineered tRNA molecules in the 3:5 con®gur-ation. The bulky purine base in the 50 stack of thepeptidyl group tRNA species may sterically hinderan incoming tRNA with an extra base in the 30stack. This would have the effect of discriminatingagainst the U-containing engineered tRNA in the2:6 conformation and would permit the minor 3:5form to enter and mediate readthrough. Only inthe rare case of a tRNA with a purine at position32 would this discrimination occur; in most casesreadthrough would not be expected. For the

n eight-base anticodon loop for 30AUCU50 tRNALeu andled as position 33.5.

Page 11: Quadruplet codons: implications for code expansion and the specification of translation step size

Quadruplet Codons 205

discussion on frameshifting below, note thatthis argument implies that the proportion of theA-containing tRNA that is in the 3:5 stack confor-mation is expected to be even less than with the U-containing form, because of the inef®ciency of A at33.5 to create the turn between the stacks.

Base stacking considerations may also providethe explanation for the frameshifting results. Con-sistent with previous work, having the potentialfor the fourth codon base to form Watson-Crickpairs with the extra base at position 33.5 in thetRNA increases the chance for four-base trans-lation. Either of two models could explain thisresult. In model A, four bases from the 30 side of a502:630 stack conformation are simultaneouslyinvolved in codon:anticodon pairing in the A-siteas a prelude to a four-base translocation. In modelP, three bases from the 30 side of the 503:530 stackconformation are invoved in initial A-site pairingand after triplet translocation to the P-site and ashift to a 2:6 stack, three bases from the alternative2:6 stack conformation pair with bases 2, 3 and 4of the UAGA to effect a �1 frameshift. Bothmodels would predict a lower ef®ciency of four-base translation when there is no complementaritywith the fourth codon base. Four-base translocationwith adenosine at position 33.5, would involvethree out of four pairing, whereas with three-basetranslocation re-pairing in the P-site would involvethe two out of three pairing. No readthrough byleucine incorporation was detected with 30AUCU50

tRNALeu in the presence of mRNA with 50 codonsother than GCC or with 30AUCA50 tRNALeu in thepresence of any 50 codons tested. This ®ndingsuggests that if dissociation and re-pairing areinvolved in quadruplet decoding, they must occurwith high ef®ciency following the initial pairing,otherwise readthrough products would beexpected.

When GCC is the 50 codon, the data suggest thatGCC decoding tRNAAla is involved in somedetachment and re-pairing to mRNA via the �1overlapping CCU codon, leaving an AGA codon inthe A-site. Triplet reading of this AGA by anincoming form of 30AUCU50 tRNALeu in the 502:630

stack conformation could account for a proportionof the leucine residues incorporated due to quadru-plet reading of UAGA. However, its signi®cance isminor, since this tRNA also mediates leucine incor-poration when the possibility of re-pairing by theprior tRNA is removed. With the same 50 codonbefore UAGA but with 30AUCA50 tRNALeu, the pre-dominant amino acid residue incorporated is argi-nine and not leucine. Again, exposure of the AGAarginine codon in the A-site following tRNAAla re-pairing is probably part of the explanation; the reststems from the fact that the engineered tRNALeu

can form only two out of three base-pairs withAGA. The interesting facet here is whether poorcompetition for UAG(A) by the engineered30AUCA50 tRNALeu is because of inef®cient mol-ecules with an A-turn in a 3:5 stack, or three-out-of-four pairing in a 2:6 stack, either could contrib-

ute to the level of re-pairing by GCC decodingtRNAAla. Rapid pairing of an incoming tRNA withA-site bases 1, 2 and 3 might have curtailed P-siterealignment.

Following the considerations above for triplettranslation of UAG, another factor is likely: thepurine base at position 32 of the GCC decodingtRNAAla discriminates at the UAG(A) against 2:6stacked 30AUCA50 tRNALeu (with the subset in the3:5 stack being inef®cient). Such discriminationcould allow incoming, scarce AGA decodingtRNAArg to compete effectively by pairing withbases 2, 3 and 4 in the A-site, a base skippingmodel. Since the only two WT E. coli tRNA specieswith a purine base at position 32, the tRNAPro

2 withanticodon GGG (Sroga et al., 1992) and the minorGCC decoding tRNAAla (Mims et al., 1985), havethe potential for �1 re-pairing with UAGA, syn-thetic tRNA molecules for the 50 codon are neededto distinguish the relative importance of the twofeatures.

With UCG as the 50 codon and a single UAGAthere is no possibility of UCG decoding tRNASer

forming Watson-Crick pairs with the overlapping�1 frame CGU, and it does not have a purine baseat position 32. In this case, the engineered30AUCA50 tRNA mediates some leucine residueincorporation. The ®nding that with 30AUCA50

tRNALeu decoding CGG UAGA UAGA thesequence Arg-Arg-Leu is present, but not Arg-Leu-Leu, could merely re¯ect the inef®ciency of the30AUCA50 tRNA in mediating even single quadru-plet decoding or in addition, if dissociation andre-pairing is involved, an in¯uence of the purineA-turn in the initial pairing. Future work willdoubtless reveal whether purine bases at positions32 or 33 play a role in mediating programmed �1frameshifting for gene expression via their effect onan adjacent tRNA with a standard seven-baseanticodon loop.

Intriguingly, S. cerevisiae mitochondrial tRNAthat decodes CUN naturally has eight bases in itsanticodon loop (Li & Tzagoloff, 1979; Silber et al.,1981), and incidentally inserts threonine ratherthan a leucine residue, as would be predicted fromthe standard genetic code. Silber et al. (1981)suggested that for reading CUN the 30GAU50 anti-codon would require a 3:5 stack conformation.However, as the second base from the 50 end of theloop is U, some proportion of the tRNA is likely tohave a 2:6 stack with a potential anticodonsequence of 30(G)AUU50 (in such mitochondria a Ujust after the U turn in tRNA can ``decode'' anybase (Heckman et al., 1980). At some level this 2:6stack form may compete for reading UAU/C tyro-sine codons and possibly even UAG/UAA. Thisdual codon meaning would have some analogiesto Candida CUG codons which encode serine andleucine, although in that case the key tRNA featureis G rather than U at position 33 to give a G-turn(Santos et al., 1997; Perreau et al., 1999). In addition,the yeast mitochondrial CUN decoding tRNAcould mediate frameshifting. The majority of

Page 12: Quadruplet codons: implications for code expansion and the specification of translation step size

206 Quadruplet Codons

antisense RNA regulated gene systems inprokaryotes employ loop:loop interactions with ananticodon-like U-turn in RNA recognition loopswhich can have 7, 8 or 9 nucleotides (Franch et al.,1999). Whether alternative stacking is ever relevantin a case such as srnB, which has UU at the corre-sponding positions to 33 and 34 in a nine-mem-bered anticodon loop, is unknown.

Materials and Methods

Strains and growth conditions

Strain Sul675 ara, thi, �lac-pro, recA/F0 lacIq, proAB�,KmR is a Recÿ derivative of E. coli CSH26 made byWeiss et al. (1987) and used in our earlier experiments.MRA8 is a prfAl derivative of E. coli MG1655 (Zhanget al., 1994). Strains were grown in LB broth and anti-biotics were added to 100 mg/ml for ampicillin, 50 mg/ml for kanamycin and 25 mg/ml for chloramphenicol.Unless stated otherwise, Sul675 was grown at 37 �C, andMRA8 was grown at 30 �C.

Construction of plasmids carrying tRNA genesunder the tac promoter

Oligonucleotides which form a Su6-like tRNALeu genewith an extra base 50 to the anticodon, were annealed byheating to 95 �C for ®ve minutes and then ligated to plas-mid pUC19 cut with HindIII and BamHI. The ligationmix was used to transform strain Sul675 by electropora-tion, and transformants were screened for correct insertsby sequencing of isolated plasmids. Correct inserts werecut out again with HindIII and BamHI, puri®ed by separ-ation on agarose gel and ligated together with linker oli-gonucleotides into HindIII-SalI cut pACYC184. StrainSu1675was transformed by electroporation, and plas-mids were isolated and sequenced.

Construction of GST-lacZ gene fusion plasmids formeasurement of doublet, triplet andquadruplet decoding

UAGN quadruplet cassettes in all reading frameswere constructed by inserting oligonucleotides (Table 1)into HindIII-ApaI cut SKAGS vector (Wills et al., 1997).Plasmids were screened by sequencing to ®nd all fourdifferent bases after the stop codon and to make certainthat the constructs were in the correct reading frame.

Construction of GST-malE vector, GM-1

The malE coding region was ampli®ed from plasmidpMAL-C2 (New England Biolabs, Beverly, MA) byPCR using the following primers: ATATTAGTTAACT-GAAAATCGAAGAAGGTAAACTGG and TTATTACTCGAGTTACGAGCTCGAATTAGTCTGCGCGT.

HpaI and XhoI restriction endonuclease sites are initalics and malE sequences underlined. The 1110 bp PCRproduct was digested with HpaI and XhoI and ligatedwith vector pGEX-5X (Amersham Pharmacia Biotech,Piscataway, NJ) previously digested with SmaI and XhoI.The SmaI site was destroyed by ligation to the HpaIblunt end of the PCR fragment, leaving the BamHI andEcoRI sites from the pGEX-5X polylinker available forcloning. Following electroporation into E. coli cells, sev-

eral ampicillin-resistant clones were selected andscreened by PCR for the appropriately sized insert. Onepositive clone was selected and the malE sequence deter-mined by DNA sequencing. There were two nucleotidechanges compared to the wild-type malE sequence: onewas a third position change of A! G that did not affectthe protein sequence, and the second was a C! Tchange that resulted in a valine residue substitution foran alanine residue. This single amino acid residue substi-tution does not interfere with malE binding to amyloseresin.

Construction of GST-malE gene fusion plasmids forprotein characterization

Oligos containing UAGA quadruplets in variousframes and with various 30 and 50 contexts (Table 1)were inserted into BamHl-EcoRI cut GM1 vector. Isolatedplasmids were screened by sequencing to ensure correctinsert and reading frame.

Plasmid isolation and DNA sequencing

Plasmids were isolated by a modi®ed mini alkalinelysis/PEG precipitation method from the Perkin-ElmerCorporation. DNA sequencing was performed on anABI373 instrument (PE Biosystems, Foster City, CA).

Western blotting and protein quantification

Cell extracts were made by boiling cells in crackingbuffer (6 M urea, 1 % (w/v) SDS, 125 mM Tris-HCl,pH 7.2) and applied onto SDS-10 % polyacrylamide gels.Separated proteins were blotted onto Immobilon-P mem-branes (Millipore Corp., Bedford, MA) using a Trans-Blot electrophoretic transfer cell (Bio-Rad Laboratories,Hercules, CA). The blotting buffer was 39 mM glycine,48 mM Tris base, 0.05 % SDS, 20 % (v/v) methanol, andproteins were blotted at 0.5 A for three hours. The GSTtermination products and GST-lacZ full-length productswere visualized by chemiluminescence with a BM Che-miluminescence Western Blotting Kit (Roche MolecularBiochemicals, Indianapolis, IN) as per manufacturer'sinstructions. Primary antibody was anti-glutathione-S-transferase (Sigma, St. Louis, MO). Chemiluminescencesignals were quanti®ed with a Lumi-Imager digital cam-era system (Roche Molecular Biochemicals).

Protein purification

MR8 cells transformed with various GST-malE fusions(Table 1) and pACYC184-based supP genes modi®ed tohave either a 30AUCU50 or 30AUCA50 extended anticodonloop, were grown in 1 l multiples in Terri®c broth sup-plemented with 100 mg/ml ampicillin and 50 mg/mlchloramphenicol at 30 �C for approximately two to threehours. Growth temperature was raised to 40 �C andgrowth was continued for six to eight hours longer. Cellswere harvested and resuspended in 15 ml of phosphate-buffered saline (PBS) per liter of culture and stored atÿ20 �C overnight. Cells were disrupted by sonication,and cell debris was removed by centrifugation at10,000 rpm for 20 minutes at 4 �C in a Sorvall SS34rotor. The supernatant was recovered and centrifuged at45,000 rpm for two hours at 4 �C in a Beckmann VTi50rotor. Cleared lysate was applied to two glutathione-SepharoseTM 4B (Amersham Pharmacia Biotech) columnsprepared with 2 ml bed volumes as per manufacturer's

Page 13: Quadruplet codons: implications for code expansion and the specification of translation step size

Quadruplet Codons 207

instructions. The columns were washed with 25 ml ofPBS and bound protein was eluted with 3 ml of 10 mMglutathione (Roche Molecular Biochemicals), 50 mM Tris(pH 8.0). The eluate was diluted to 25 ml in maltose-binding protein buffer (MBPB; 20 mM Tris (pH 7.4)100 mM NaCl, 1 mM EDTA, 1 mM DTT) and applied toone amylose resin column (New England Biolabs) pre-pared with a 2 ml bed volume as per manufacturer'sinstructions. The column was washed with 25 ml ofMBPB and bound protein was eluted with 10 mM mal-tose in MBPB. Eluate was concentrated and washed with6 ml of HPLC-grade water in a Centricon 30 microcon-centrator (Amicon, Beverly, MA).

Electrospray mass spectrometry

Molecular masses of proteins were determined usingpositive-ion electrospray mass spectrometry. The electro-spray ionization process generates a series of multiplycharged molecular ions from which very accurate (betterthan 0.01 %) mass assignments are derived for each pro-tein product. The amino acid residue composition ofeach protein was determined based on the exact molecu-lar mass measured for the intact protein along withrequirements dictated by the known sequence of the cor-responding mRNA molecule.

Following af®nity puri®cation, protein samples wereprepared for electrospray ionization by removing salts,buffers, glutathione, and other contaminants by trappingthe proteins on a 1 mm C8 reverse-phase guard column(Optiguard, Optimize Technologies, Oregon City, OR)and washing the protein extensively with HPLC-gradewater. The proteins were then eluted with a 67 % (v/v)methanol solution containing 0.9 % (v/v) formic aciddirectly into the electrospray interface of a Quattro-IImass spectrometer (Micromass, Beverly, MA). Sampleswere infused at a rate of 4 ml/minute using a syringepump for solvent delivery. Mass spectra were obtainedfrom 5 to 50 accumulated spectra using continuum-datastorage in the positive-ion mode while scanning 900 to1400 Da in four seconds. A cone voltage of 60 eV and aspray voltage of 3.2 kV were used. Molecular mass spec-tra (Figures 3-7) show measured protein molecularmasses that were generated by deconvolution of themultiply charged molecular-ion series using MaxEntsoftware (Micromass). Mass spectra were backgroundsubtracted prior to MaxEnt processing using a 3 %threshold level. Relative normalization scales were notincluded in the Figures of the molecular mass spectra,since protein puri®cation procedures, electrospray ioniz-ation, and processing of mass spectra do not necessarilyrepresent accurate relative amounts of termination andfull-length proteins.

Protein sequencing

Fusion protein intended for protein sequencing wasdigested with Factor Xa protease (New England Biolabs)as per manufacturer's recommended conditions, andfurther puri®ed by electrophoresis through SDS/10 %(w/v) polyacrylamide gels and electrophoretic transferto PVDF membranes (Immobilon-P, Millipore Corp.) aspreviously described (Matsudaira, 1987). N-terminalsequencing of peptides was carried out on either anApplied Biosystems model 477 or model 492 (PE Biosys-tems, Foster City, CA). Lag-corrected data were tabu-lated for Figure graphs.

Acknowledgments

We thank Peter Schultz for suggesting that we testnew codon possibilities, Michael O'Connor and AlanHerr for continuing discussion, Ed Meenen for proteinsequencing, and Monica RydeÂn-Aulin for sending us theMRA8 strain. This work was supported by Departmentof Energy grant DEFG03-99ER62732 to R.F.G. and NIHgrant R01-GM48152 to J.F.A.

References

Ashraf, S. S., Ansari, G., Guenther, R., Sochacka, E.,Malkiewicz, A. & Agris, P. F. (1999). The uridine in``U-turn'': contributions to tRNA-ribosomal binding.RNA, 5, 503-511.

Atkins, J. F., Herr, A. J., Massire, C., O'Connor, M.,Ivanov, I. & Gesteland, R. F. (2000). Poking a holein the sanctity of the triplet code: inferences forframing. In The Ribosome. Structure, Function, Anti-biotics and Cellular Interactions (Garrett, R. A.,Douthwaite, S. R., Liljas, A., Matheson, A. T.,Moore, P. B. & Noller, H. F., eds), pp. 369-383,ASM Press, Washington DC.

Ayer, D. & Yarus, M. (1986). The context effect does notrequire a fourth base pair. Science, 231, 393-395.

Bare, L., Bruce, A. G., Gesteland, R. F. & Uhlenbeck,O. C. (1983). Uridine-33 in yeast tRNA is not essen-tial for amber suppression. Nature, 305, 554-556.

Baron, C. & BoÈck, A. (1995). The selenocysteine-insertingtRNA species: structure and function. In tRNAStructure, Biosynthesis and Function (SoÈ ll, D. &RajBhandary, U. L., eds), pp. 529-544, ASM Press,Washington DC.

Benner, S. A., Burgstaller, P., Battersby, T. R. & Jurczyk,S. (1999). Did the RNA world exploit an expandedgenetic alphabet? In The RNA World (Gesteland,R. F., Cech, T. R. & Atkins, J. F., eds), 2nd edit., pp.163-181, Cold Spring Harbor Laboratory Press,Cold Spring Harbor, NY.

Bossi, L. & Smith, D. M. (1984). Suppressor sufJ: a noveltype of tRNA mutant that induces translationalframeshifting. Proc. Natl Acad Sci. USA, 81, 6105-6109.

Budisa, N., Minks, C., Medrano, F. J., Lutz, J., Huber, R.& Moroder, L. (1998). Residue-speci®c bioincorpora-tion of non-natural, biologically active amino acidsinto proteins as possible drug carriers: structureand stability of the per-thiaproline mutant ofannexin V. Proc. Natl Acad. Sci. USA, 95, 455-459.

Carter, R. J., Baeyens, K. J., SantaLucia, J., Turner, D. H.& Holbrook, S. R. (1997). The crystal structure ofan RNA oligomer incorporating tandem adenosine-inosine mismatches. Nucl. Acids Res. 25, 4117-4122.

Curran, J. F. (1995). Decoding with the A:I wobble pairis inef®cient. Nucl. Acids Res. 23, 683-688.

Curran, J. F. & Yarus, M. (1987). Reading frame selectionand transfer RNA anticodon loop stacking. Science,238, 1545-1550.

DoÈring, T., Mitchell, P., Osswald, M., Bochkariov, D. &Brimacombe, R. (1994). The decoding region of 16 SRNA: a cross-linking study of the ribosomal A, Pand E sites using tRNA derivatized at position 32in the anticodon loop. EMBO J. 13, 2677-2685.

Farabaugh, P. J. & BjoÈ rk, G. R. (1999). How translationalaccuracy in¯uences reading frame maintenance.EMBO J. 18, 1427-1434.

Page 14: Quadruplet codons: implications for code expansion and the specification of translation step size

208 Quadruplet Codons

Franch, T., Petersen, M., Wagner, E. G. H., Jacobsen, J. P.& Gerdes, K. (1999). Antisense RNA regulation inprokaryotes: rapid RNA/RNA interaction facilitatedby a general U-turn loop structure. J. Mol. Biol. 294,1115-1125.

Gaber, R. F. & Culbertson, M. R. (1984). Codon recog-nition during frameshift suppression in Saccharo-myces cerevisiae. Mol. Cell. Biol. 4, 2052-2061.

Gollnick, P., Hardin, C. C. & Horowitz, J. (1987). 19Fnuclear magnetic resonance as a probe of anticodonstructure in 5-¯uorouracil-substituted Escherichia colitransfer RNA. J. Mol. Biol. 197, 571-584.

Heckman, J. E., Sarnoff, J., Alzner-DeWeerd, B., Yin, S.& RajBhandary, U. L. (1980). Novel features in thegenetic code and codon reading patterns in Neuro-spora crassa mitochondria based on sequences of sixmitochondrial tRNAs. Proc. Natl Acad. Sci. USA, 77,3159-3163.

Hohsaka, T., Ashizuka, Y., Sasaki, H., Murakami, H. &Sisido, M. (1999). Incorporation of two differentnonnatural amino acids independently into a singleprotein through extension of the genetic code. J. Am.Chem. Soc. 121, 12194-12195.

Ile, M., Jin, L. & Austen, B. (1993). Speci®city of factorXa in the cleavage of fusion proteins. J. ProteinChem. 12, 1-5.

Kowal, S. & Oliver, J. (1997). Exploiting unassignedcodons in Micrococcus luteus for tRNA-based aminoacid mutagenesis. Nucl. Acids Res. 25, 4685-4689.

Labuda, D., Stricker, G., Grosjean, H. & PoÈrschke, D.(1985). Mechanism of codon recognition by transferstudies with oligonucleotides larger than triplets.Nucl. Acids Res. 13, 3667-3683.

Larsen, B., Gesteland, R. F. & Atkins, J. F. (1997). Struc-tural probing and mutagenic analysis of thestem-loop required for Escherichia coli dnaX riboso-mal frameshifting: programmed ef®ciency of 50 %.J. Mol. Biol. 271, 47-60.

Li, M. & Tzagoloff, A. (1979). Assembly of the mito-chondrial membrane system: sequences of yeastmitochondrial valine and an unusual threoninetRNA gene. Cell, 18, 47-53.

Lindsley, D. & Gallant, J. (1993). On the directionalspeci®city of ribosome frameshifting at a ``hungry''codon. Proc. Natl Acad. Sci. USA, 90, 5469-5473.

Liu, D. & Schultz, P. (1999). Progress toward the evol-ution of an organism with an expanded geneticcode. Proc. Natl Acad. Sci. USA, 96, 4780-4785.

Liu, D. R., Magliery, T. J., Pastrnak, M. & Schultz, P. G.(1997). Engineering a tRNA and aminoacyl-tRNAsynthetase for the site-speci®c incorporation ofunnatural amino acids into proteins in vivo. Proc.Natl Acad. Sci. USA, 94, 10092-10097.

Madin, K., Sawasaki, T., Ogasawara, T. & Endo, Y.(2000). A highly ef®cient and robust cell-free proteinsynthesis system prepared from wheat embryos:plants apparently contain a suicide system directedat ribosomes. Proc. Natl Acad. Sci. USA, 97, 559-564.

Markiewicz, P., Kleina, L. G., Cruz, C., Ehret, S. &Miller, J. H. (1994). Genetic studies of the lac repres-sor. XIV. Analysis of 4000 altered Escherichia coli lacrepressors reveals essential and non-essential resi-dues, as well as ``spacers'' which do not require aspeci®c sequence. J. Mol. Biol. 240, 421-433.

Matsudaira, P. (1987). Sequence from picomole quan-tities of proteins electroblotted onto polyvinylidenedi¯uoride membranes. J. Biol. Chem. 25, 10035-10038.

Mejlhede, N., Atkins, J. F. & Neuhard, J. (1999). Riboso-mal-1 frameshifting during decoding of Bacillus sub-tilis cdd occurs at the sequence CGA AAG.J. Bacteriol. 181, 2930-2937.

Miller, J. H. & Albertini, A. M. (1983). Effects of sur-rounding sequence on the suppression of nonsensecodons. J. Mol. Biol. 164, 59-71.

Mims, B. H., Prather, N. E. & Murgola, E. J. (1985).Isolation and nucleotide sequence analysis oftRNAAlaGGC from Escherichia coli K-12. J. Bacteriol.162, 837-839.

Murgola, E. J., Prather, N. E., Mims, B. H., Pagel, F. T.& Hijazi, K. A. (1983). Anticodon shift in tRNA: anovel mechanism in missense and nonsense sup-pression. Proc. Natl Acad. Sci. USA, 80, 4936-4939.

Nagai, K. & Thùrgersen, H. C. (1987). Synthesis andsequence-speci®c proteolysis of hybrid proteinsproduced in Escherichia coli. Methods Enzymol. 153,461-481.

O'Connor, M. (1998). tRNA imbalance promotes ÿ1frameshifting via near-cognate decoding. J. Mol.Biol. 279, 727-736.

O'Connor, M., Gesteland, R. F. & Atkins, J. F. (1989).tRNA hopping: enhancement by an expanded anti-codon. EMBO J. 8, 4315-4323.

Perreau, V. M., Keith, G., Holmes, W. M., Przykorska,A., Santos, M. A. S. & Tuite, M. F. (1999). The Can-dida albicans CUG-decoding ser-tRNA has an atypi-cal stem-loop structure. J. Mol. Biol. 293, 1039-1053.

Peter, K., Lindsley, D., Peng, L. & Gallant, J. A. (1992).Context rules of rightward overlapping reading.Nature New Biol. 4, 520-526.

Poole, E. S., Brown, C. M. & Tate, W. P. (1995).The identity of the base following the stop codondetermines the effciency of in vivo translationaltermination in Escherichia coli. EMBO J. 14, 151-158.

Riddle, D. L. & Carbon, J. (1973). Frameshift suppres-sion: a nucleotide addition in the anticodon of aglycine transfer RNA. Nature New Biol. 242, 230-234.

Riddle, D. L. & Roth, J. R. (1970). Suppressors of frame-shift mutations in Salmonella typhimurium. J. Mol.Biol. 54, 131-144.

Roche, E. D. & Sauer, R. T. (1999). SrA-mediated peptidetagging caused by rare codons and tRNA scarcity.EMBO J. 18, 4579-4589.

Saks, M. E., Sampson, J. R., Nowak, M. W., Kearney,P. C., Du, F., Abelson, J. N., Lester, H. A. &Dougherty, D. A. (1996). An engineered Tetrahy-mena tRNAGln for in vivo incorporation of unnaturalamino acids into proteins by nonsense suppression.J. Biol. Chem. 271, 23169-23175.

Santos, M. A. S., Ueda, T., Watanabe, K. & Tuite, M. F.(1997). The non-standard genetic code of Candidaspp.: an evolving genetic code or a novel mechan-ism for adaptation? Mol. Microbiol. 26, 431.

Silber, A.-P., Dirheimer, G. & Martin, R. P. (1981).Nucleotide sequence of a yeast mitochondrial threo-nine-tRNA able to decode the C-U-N leucinecodons. FEBS Letters, 132, 344-348.

Smith, D. & Yarus, M. (1989). tRNA-tRNA interactionswithin cellular ribosomes. Proc. Natl Acad. Sci. USA,86, 4397-4401.

Short, G. T., III, Golovine, S. Y. & Hecht, S. M. (1999).Effect of release factor 1 on in vitro proteintranslation and the elaboration of proteins contain-ing unnatural amino acids. Biochemistry, 38, 8808-8819.

Page 15: Quadruplet codons: implications for code expansion and the specification of translation step size

Quadruplet Codons 209

Sprinzl, M., Horn, C., Brown, M., Ioudovitch, A. &Steinberg, S. (1998). Compilation of tRNA sequencesand sequences from tRNA genes. Nucl. Acids Res.26, 148-153.

Sroga, G. E., Nemoto, F., Kuchino, Y. & BjoÈrk, G. R.(1992). Insertion (sufB) in the anticodon loop or basesubstitution (sufC) in the anticodon stem oftRNAPro

2 from Salmonella typhimurium induces sup-pression of frameshift mutations. Nucl. Acids Res.20, 3463-3469.

Sundararajan, A., Michaud, W. A., Qian, Q., Stahl, G. &Farabaugh, P. J. (1999). Near-cognate peptidyl

tRNAs promote programmed translational frame-shifting in yeast. Mol. Cell, 4, 1005-1015.

Tsuchihashi, Z. (1991). Translational frameshifting in theEscherichia coli dnaX gene in vitro. Nucl. Acids Res.19, 2457-2462.

Yourno, J. (1972). Externally suppressible �1 ``glycine''frameshift: possible quadruplet isomers for glycineand proline. Nature New Biol. 239, 219-221.

Zhang, S., RydeÂn-Aulin, M., Kirsebom, L. A. & Isaksson,L. A. (1994). Genetic implication for an interactionbetween release factor one and ribosomal proteinL7/L12 in vivo. J. Mol. Biol. 242, 614-618.

Edited by J. H. Miller

(Received 4 November 1999; received in revised form 29 January 2000; accepted 24 February 2000)