14
THE JOURNAL OF BIOLOGICAL CHEMISTRY Vol. 268, No. 8, Issue of March 15, pp. 5395-5408,1993 Printed in U.S.A. Cytochrome Oxidase Genes from Thermus thermophilus NUCLEOTIDESEQUENCEOFTHEFUSEDGENEAND ANALYSIS OF THE DEDUCED PRIMARY STRUCTURES FOR SUBUNITS I AND I11 OF CYTOCHROME caa3* (Received for publication, August 12, 1992) Michael W. Mather$, Penelope Springer, Sieghard HenselB, Gerhard Buse4, and James A. Feell From the Biochemistry Section and Stable Isotope Resource, Spectroscopy and Biochemistry Group, Los AlamosNational Laboratory, Los Alamos, New Mexico 87545 and the {Znstitut fur Biochemie, Rheinisch- Westfalische Technische Hochschule Aachen, Klinikum Pauwelsstrasse 30, 0-5100 Aachen, Germany Cytochrome caa3, a cytochrome c oxidase from Ther- mus thermophilus, has been purified and extensively characterized as a two-subunit enzyme containing the metal centers characteristic of cytochrome c oxidases (cytochromes a and a3; copper centers CUA and CUB) and an additional cytochrome c (Fee, J. A., Kuila, D., Mather, M. W., and Yoshida, T. (1986) Biochim. Bio- phys. Acta 853, 153-185). We have now cloned and sequenced the genes encoding the subunits of this en- zyme. The smaller subunit consists of a typical oxidase subunit I1 sequence fused to a cytochrome c domain (Mather, M. W., Springer, P., and Fee, J. A. (1991) J. Biol. Chem. 266,5025-5035). The larger subunit, the A-protein, is encoded by a fusion gene lying immedi- ately downstream of the subunit IIc gene. The 5’ por- tion of this gene encodes an oxidase subunit I homolog, whereas the 3‘ portion is homologous to oxidase sub- units 111. The A-protein from the purified enzyme ap- pears too small from SDS-polyacrylamide gel electro- phoresis and quantitative amino acid analyses to be a complete subunit I/III fusion, but it is currently not known if proteolytic processing occurs. Analyses of the sequences of oxidasesubunits are presentedwhich clearly identify T. thermophilus cytochrome caa3 as a bona fide member of the greater family of heme- and copper-requiring oxidases. As one consequence, it is confirmed that the set of invariant histidine residues (potential ligands of the metal centers) in cytochrome c oxidase subunits I and I1 is reduced to 8. Possible topological and helix packing models are developed based on considerations of homology, hydropathy, and variability. * This work was supported by National Institutes of Health Grant GM35342 and Sonderforschungsbereich 160 Grant Bu 463-1 of the Deutsche Forschungsgemeinschaft. Work at Los Alamos was per- formed under the auspices of the United States Department of Energy. This is the third in a series of articles describing the cloning and sequencing of genes encoding subunits of cytochrome oxidases from T. thermophilus. The first article described the cloning and some preliminary sequence data for subunit I of the cytochrome caq oxidase (Fee et al., 1988a). The second described the sequence of the subunit IIc gene of cytochrome caaa and analysis of the deduced primary structure of the subunit (Mather et al., 1991). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduer- tisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The nucleotide sequencejs) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) M8434 I. j: Present address: Dept. of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK 74078. PTo whom requests for reprints should be addressed. Tel.: 505- 667-0774; Fax: 505-665-3166. Cytochrome c oxidase (cytochrome aa3) is the terminal enzyme in the respiratory electron transport system of mito- chondria and many bacteria. It couples one-electron oxidation of cytochrome c to four-electron reduction of dioxygen and extrusion of protons across the membrane (for recent reviews, see Palmer, 1987; Capaldi, 1990; Chan and Li, 1990; Malms- trom, 1990). Bacterial enzymes possess similar spectroscopic and functional properties to, but contain fewer protein sub- units than, their mitochondrial counterparts (Fee et al., 1986; Ludwig, 1987). They are composed of from two to four sub- units (designated I, 11, 111, and IVB), which are homologous to the corresponding proteins in mitochondria (Saraste, 1990). The cytochrome caag from Thermus thermophilus was iso- lated as a two-subunit enzyme (apparent molecular masses of -55 and -33 kDa) containing the canonical metal centers (heme centers a and a3, copper centers CUA and CUB) and an additional heme C (Fee et al., 1980; Hon-nami and Oshima, 1984; Yoshida et al., 1984). The enzyme has been characterized by a variety of methods and shown to be essentially identical to a combined cytochrome aa3 plus cytochrome c (reviewed in Fee et al. (1986, 1988b, 1993)). Peptide and DNA sequence work has established that the smaller subunit, originally termed the C-protein, consists of an amino-terminal portion homologous to oxidase subunits I1 and a carboxyl-terminal domain homologous to soluble cytochromes c (Buse et al., 1989; Mather et al., 1991). This polypeptide, now also refered to as subunit IIc, contains a characteristic stretch of amino acid sequence having conserved histidine and cysteine resi- dues which are widely thoughtto form the CuA site (see Steffens and Buse, 1979; Yasunobu et al., 1980; Holm et al., 1987; Palmer, 1987; Chan and Li, 1990; Mather et al., 1991 for discussion). Results from the present work indicate the A- protein gene encodes a fusion protein consisting of a typical subunit I amino acid sequence, containing the probable bind- ingsites for the binuclear cytochrome a3/CuB centerand cytochrome a, and a typical subunit I11 sequence. Some pre- liminary results were presented at the 41st Mosbach Collo- quium (Mather et al., 1990). EXPERIMENTAL PROCEDURES’ RESULTS AND DISCUSSION Isolation and Sequencing of Genomic Clones Based on the sequence of a cyanogen bromide peptide fragment isolated from purified cytochrome caa3 A-protein Portions of this paper (including “Experimental Procedures,” Table I, and Figs. 1-3, 6, 7, and 9) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are included in the microfilm edition of the Journal that is available from Waverly Press. 5395 This is an Open Access article under the CC BY license.

Cytochrome Oxidase Genes from Thermus thermophilus

Embed Size (px)

Citation preview

THE JOURNAL OF BIOLOGICAL CHEMISTRY Vol. 268, No. 8, Issue of March 15, pp. 5395-5408,1993 Printed in U.S.A.

Cytochrome Oxidase Genes from Thermus thermophilus NUCLEOTIDE SEQUENCE OF THE FUSED GENE AND ANALYSIS OF THE DEDUCED PRIMARY STRUCTURES FOR SUBUNITS I AND I11 OF CYTOCHROME caa3*

(Received for publication, August 12, 1992)

Michael W. Mather$, Penelope Springer, Sieghard HenselB, Gerhard Buse4, and James A. Feell From the Biochemistry Section and Stable Isotope Resource, Spectroscopy and Biochemistry Group, Los Alamos National Laboratory, Los Alamos, New Mexico 87545 and the {Znstitut fur Biochemie, Rheinisch- Westfalische Technische Hochschule Aachen, Klinikum Pauwelsstrasse 30, 0-5100 Aachen, Germany

Cytochrome caa3, a cytochrome c oxidase from Ther- mus thermophilus, has been purified and extensively characterized as a two-subunit enzyme containing the metal centers characteristic of cytochrome c oxidases (cytochromes a and a3; copper centers CUA and CUB) and an additional cytochrome c (Fee, J. A., Kuila, D., Mather, M. W., and Yoshida, T. (1986) Biochim. Bio- phys. Acta 853, 153-185). We have now cloned and sequenced the genes encoding the subunits of this en- zyme. The smaller subunit consists of a typical oxidase subunit I1 sequence fused to a cytochrome c domain (Mather, M. W., Springer, P., and Fee, J. A. (1991) J. Biol. Chem. 266,5025-5035). The larger subunit, the A-protein, is encoded by a fusion gene lying immedi- ately downstream of the subunit IIc gene. The 5’ por- tion of this gene encodes an oxidase subunit I homolog, whereas the 3‘ portion is homologous to oxidase sub- units 111. The A-protein from the purified enzyme ap- pears too small from SDS-polyacrylamide gel electro- phoresis and quantitative amino acid analyses to be a complete subunit I/III fusion, but it is currently not known if proteolytic processing occurs. Analyses of the sequences of oxidase subunits are presented which clearly identify T. thermophilus cytochrome caa3 as a bona fide member of the greater family of heme- and copper-requiring oxidases. As one consequence, it is confirmed that the set of invariant histidine residues (potential ligands of the metal centers) in cytochrome c oxidase subunits I and I1 is reduced to 8. Possible topological and helix packing models are developed based on considerations of homology, hydropathy, and variability.

* This work was supported by National Institutes of Health Grant GM35342 and Sonderforschungsbereich 160 Grant Bu 463-1 of the Deutsche Forschungsgemeinschaft. Work at Los Alamos was per- formed under the auspices of the United States Department of Energy. This is the third in a series of articles describing the cloning and sequencing of genes encoding subunits of cytochrome oxidases from T. thermophilus. The first article described the cloning and some preliminary sequence data for subunit I of the cytochrome caq oxidase (Fee et al., 1988a). The second described the sequence of the subunit IIc gene of cytochrome caaa and analysis of the deduced primary structure of the subunit (Mather et al., 1991). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduer- tisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequencejs) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) M8434 I.

j: Present address: Dept. of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK 74078.

PTo whom requests for reprints should be addressed. Tel.: 505- 667-0774; Fax: 505-665-3166.

Cytochrome c oxidase (cytochrome aa3) is the terminal enzyme in the respiratory electron transport system of mito- chondria and many bacteria. I t couples one-electron oxidation of cytochrome c to four-electron reduction of dioxygen and extrusion of protons across the membrane (for recent reviews, see Palmer, 1987; Capaldi, 1990; Chan and Li, 1990; Malms- trom, 1990). Bacterial enzymes possess similar spectroscopic and functional properties to, but contain fewer protein sub- units than, their mitochondrial counterparts (Fee et al., 1986; Ludwig, 1987). They are composed of from two to four sub- units (designated I, 11, 111, and IVB), which are homologous to the corresponding proteins in mitochondria (Saraste, 1990).

The cytochrome caag from Thermus thermophilus was iso- lated as a two-subunit enzyme (apparent molecular masses of -55 and -33 kDa) containing the canonical metal centers (heme centers a and a3, copper centers CUA and CUB) and an additional heme C (Fee et al., 1980; Hon-nami and Oshima, 1984; Yoshida et al., 1984). The enzyme has been characterized by a variety of methods and shown to be essentially identical to a combined cytochrome aa3 plus cytochrome c (reviewed in Fee et al. (1986, 1988b, 1993)). Peptide and DNA sequence work has established that the smaller subunit, originally termed the C-protein, consists of an amino-terminal portion homologous to oxidase subunits I1 and a carboxyl-terminal domain homologous to soluble cytochromes c (Buse et al., 1989; Mather et al., 1991). This polypeptide, now also refered to as subunit IIc, contains a characteristic stretch of amino acid sequence having conserved histidine and cysteine resi- dues which are widely thought to form the CuA site (see Steffens and Buse, 1979; Yasunobu et al., 1980; Holm et al., 1987; Palmer, 1987; Chan and Li, 1990; Mather et al., 1991 for discussion). Results from the present work indicate the A- protein gene encodes a fusion protein consisting of a typical subunit I amino acid sequence, containing the probable bind- ing sites for the binuclear cytochrome a3/CuB center and cytochrome a, and a typical subunit I11 sequence. Some pre- liminary results were presented at the 41st Mosbach Collo- quium (Mather et al., 1990).

EXPERIMENTAL PROCEDURES’

RESULTS AND DISCUSSION

Isolation and Sequencing of Genomic Clones Based on the sequence of a cyanogen bromide peptide

fragment isolated from purified cytochrome caa3 A-protein

Portions of this paper (including “Experimental Procedures,” Table I, and Figs. 1-3, 6, 7, and 9) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are included in the microfilm edition of the Journal that is available from Waverly Press.

5395

This is an Open Access article under the CC BY license.

5396 Cytochrome caa3 Subunits

(Buse et al., 1989), a unique 38-base pair oligodeoxynucleotide probe was designed (see Fee et al. (1988a) for the rationale of probe design) and hybridized in-gel to restriction digests of T. thermophilus HB8 genomic DNA. Only a single hybridizing band was observed in each restriction digest of the genomic DNA (data not shown, but see Fig. 1 of Mather (1988)), suggesting that there is only a single copy of the gene encoding cytochrome caa3 A-protein in the T. thermophilus genome. Three overlapping restriction fragments were extracted from dried agarose gels as described (Mather, 1988) and cloned into pUC8: a 12-kilobase pair BclI fragment, a 2.7-kilobase pair PstI fragment, and a 2.6-kilobase pair HindIII fragment. The relationship of the cloned restriction fragments and the genes for subunits of cytochrome caa3 on the T. thermophilus chro- mosome is shown in Fig. 1, which also depicts the positions of the overlapping subclones used for sequencing. The se- quence of the c a d gene, encoding cytochrome em3 subunit IIc, has been described previously (Mather et al., 1991). It lies immediately upstream of an open reading frame, designated caaB. The sequence of the caaB region of the cloned DNA was determined using the dideoxynucleotide chain termina- tion method (Sanger et al. (1977); see “Experimental Proce- dures”). The nucleotide sequence is shown in Fig. 2. Except for the final 47 nucleotides in the 3”untranslated region, the sequence of both strands of DNA was determined. There is a single mismatch between the hybridization probe and its target sequence (positions 1363-1391).

Nucleotide Sequence The coding region of caaB is 65% guanine + cytosine, just

slightly lower than the overall base content of T. thermophilus DNA, -69% (Oshima and Imahora, 1974). The codon usage within caaB is very similar to that of most previously se- quenced protein genes from T. thermophilus, which maintain high G + C content primarily by utilizing guanine or cytosine in the third postion of each codon (95.5% in caaB) (Kagawa et al., 1984; Seidler et al., 1987; Bowen et al., 1988, Sat0 et al., 1988; Koyama and Furukawa, 1990; Yakhnin et al., 1990; Dekker et al., 1991; Itaya and Kondo, 1991; Lauer et at., 1991; Nureki et at., 1991; Xu et al., 1991). As shown in Fig. 3, this codon usage is essentially uniform throughout cmB. The uniform codon usage, together with the large number of matching peptide sequences (Buse et al., 1989; see Fig. 2) and the extensive homology to other cytochrome aa3 subunit I and subunit 111 amino acid sequences (see below), suggested this open reading frame was a continuous gene encoding subunit I and subunit 111. However, we were concerned that the observed fusion might be the result of a deletion or other alteration that occurred during cloning. To test this, subclones of the independently isolated BclI genomic clone were used to verify the sequence of the contiguous region from about nucleotide position 1300 through 1940 (Fig. 2) that was ini- tially determined using subclones of the HindIII and PstI genomic clones. As gauged by analogy to other oxidases, this region might be expected to contain the end of a subunit I gene. No differences were observed between the sequence obtained with Bel1 subclones and that from the HindIIIIPst I subclones. The initial and terminal portions of this “doubly sequenced” region include matching peptide sequences deter- mined by Buse et al. (1989) (see Fig. 2), establishing the correctness of the reading frame. Other examples of bacterial gene fusions include the cytochrome keytochrome c1 cistron of Bradyrhizobium japonicum (Thony-Meyer et al., 1989) and the oxidase subunit 1I:cytochrome c cistron found in Bacilli (Ishizuka et al., 1990; Saraste, 1990) and T. thermophilus (Mather et al., 1991).

Deduced Amino Acid Sequence The correct reading frame for translation (also shown in

Fig. 2) was confirmed by alignment to peptide sequences (Buse et al., 1989), by comparison of the translated sequence to previously reported sequences of cytochrome oxidase subunits I and subunits 111 (see Figs. 4 and 5 below), and by codon usage analysis (Gribskov et al., 1984; see Fig. 3). The NH2 terminus of the protein is blocked (Yoshida et al., 1984); attempts to chemically remove a possible N-formyl group were unsuccessful, thus indicating another blocking group (Buse et al., 1989). The initiation codon appears to be the ATG beginning at nucleotide position 31, since it follows the termination codon of the gene for subunit IIc by only 12 base pairs and is preceded by a possible ribosome binding site (GGAGGTG; underlined in Fig. 2). In addition, homology to other oxidases appears to begin about 8 residues downstream from this putative initiator methionine in the deduced amino acid sequence (see below). The open reading frame is termi- nated by the TAG codon beginning at nucleotide position 2404. An inverted repeat pattern follows that could form a stem-loop structure resembling a rho-independent terminator (Platt, 1986) (marked by arrows below the DNA sequence in Fig. 2). Although this indicates that transcription of the cytochrome caa3 genes may end at this point, the extent of the message has not been determined.

Compositions and Possible Subunit Structure-Translation of the open reading frame predicts a polypeptide of 791 residues, which would have a mass of 89,200 Da. This is considerably larger than the values determined by Ferguson analysis of SDS-polyacrylamide gel electrophoresis results and by quantitative amino acid analysis (which range from -54,000 to 71,000 Da) for the large subunit of cytochrome caa3 (Hon-nami and Oshima, 1984; Yoshida et al., 1984). The putative 791-residue protein could thus represent a precursor form that is processed to yield both the mature subunit I (found in the purified cytochrome) and a mature subunit I11 (perhaps lost during purification). Therefore, an attempt was made to determine the COOH-terminal sequence of the pu- rified subunit I polypeptide using timed carboxypeptidase digestion, but an uninterpretable burst of amino acids was observed.’ Subsequently, an attempt was made to accurately determine the molecular masses of the subunits of the purified cytochrome caa3 by laser desorption mass spectrometry (Chait and Kent, 1992). A protein with a molecular mass very close to that predicted for subunit IIc (cf. Mather et at., 1991) was observed, but no larger species was This likely means that the A-protein was not dispersed by the organic medium used to form the matrix for laser desorption.

Comparison of observed and theoretical amino acid com- positions also fail to provide guidance regarding the size of the A-protein. Table I shows previously reported amino acid compositions along with the compositions predicted if cleav- age of the complete I/ITI protein were to occur at different points. In general, there are only minor differences with the exception of a decreased proportion of glycine as one includes residues beyond 533 and an increase in histidine when the last sequence segment is included. Overall, the agreement of deduced and experimentally determined compositions appears reasonable.

Partial amino acid sequences were determined previously for a number of cyanogen bromide peptide fragments isolated from purified cytochrome caa3 A-protein (Buse et al., 1989). These are now seen to be spaced throughout much of the protein, the first beginning at residue 98 and the last ending

* S. Hensel, M. Dewor, and G. Buse, unpublished results. B. T. Chait and J. A. Fee, unpublished results.

Cytochrome caa3 Subunits 5397

a t residue 650 (Fig. 2). However, Buse and co-workers’ have recently determined that the cyanogen bromide fragment beginning with PIY-- (position 192) and ending with position 256 (see Buse et al., 1989) also contains approximately equal amounts of the peptide beginning with AWF-- (position 622) and extending to --AS (position 772) (Fig. 2). If confirmed, the peptide sequence data would suggest that the A-protein includes nearly the complete translation product of caaB. Ongoing efforts with the protein’s chemistry, including mass spectrometry, should clarify the nature of the A-protein.

Sequence Comparisons-The amino acid sequence deduced from caaB contains regions of evident homology to other subunits I and 111. The amino acid sequence of the NH2- terminal 533 residues (the “subunit I-homologous” portion of the deduced sequence) was aligned to 36 previously published cytochrome a3 subunit I sequences using the profile analysis procedure of (Gribskov et al., 1987). The alignment of a representative subset of the 37 sequences is shown in Fig. 4 (note that minor corrections/modifications were made to some of the sequences, as described in the caption). The statistical significance of the alignment of the Thermus se- quence to the alignment of the other subunits I was tested by the randomization or “jumble” test (Doolittle, 1981, 1986), and the probability that an alignment of similar quality could have occurred by chance proved to be infinitesimal (the sim- ilarity score was 65 standard deviations above the average alignment score for randomized Thermus subunit I-homolo- gous sequences). The percent identity of the Thermus subunit I-homologous sequence with other subunits I ranges from a high of 46.1% (Bacillus PS3) down to 37.1% (Caenorhabditis elegans); this would decrease to 29.7% if DNA-deduced try- panosome subunit I sequences were added to the alignment.4 We note that the similarity extends over most of the sequence length, except for the very NH2 terminus and the COOH terminal region; however, the segments of highest conserva- tion generally occur in or immediately adjacent to several of the putative helical transmembrane segments predicted by hydropathy analysis (see below), which are thought to be important structural elements of this membrane protein (Wikstrom et al., 1985; such transmembrane segments are overmarked with “===” in Fig. 4).5

As with subunit I, the subunit III-homologous portion of the Thermus I/III sequence (the COOH-terminal 253 resi- dues) was aligned to a large number (35) of published a n 3

subunit I11 sequences. The alignment of a representative subset is shown in Fig. 5 . The statistical significance of the alignment of the Thermus sequence to the alignment of the other subunits I11 was also tested by the randomization or “jumble” test (Doolittle, 1981, 1986), and the probability that an alignment of similar quality could have occurred by chance proved to be highly unlikely (the similarity score was 22 standard deviations above the average alignment score for randomized Thermus subunit III-homologous sequences). The percent identity of the Thermus subunit III-homologous se- quence with other subunits I11 ranges from 31.7% (mouse) to

‘ Trypanosome subunits I were not formally included in the align- ment and data analyses discussed here since complete studies of trypanosome subunit I mRNAs are not available. However, amino acid sequence comparisons, sequencing of the 5’ and 3’ ends of the mRNAs, and Northern blot studies suggest that there is little or no RNA editing within the coding sequences of trypanosome subunit I messages (Shaw et al., 1988; van der Spek et al., 1990), in contrast to the extensive editing of some trypanosome subunit I11 messages. Data on the alterations introduced by RNA editing are available for the subunit I1 and subunit 111 genes of several trypanosome species.

22.5% (Leishmania tarent~lae).~ Cytochromes c a s from the Bacilli, and also the Escherichia

coli cytochrome bo, have a shortened subunit I11 lacking the first two putative transmembrane segments found in subunits I11 from mitochondrial oxidases (Wikstrom et al., 1985; Che- puri et al., 1990; Ishizuka et al., 1990; Saraste et al., 1991). These same oxidases have an elongated subunit I with an extra COOH-terminal region containing two putative trans- membrane segments. It has been proposed that in these bacterial oxidases the COOH-terminal extension of subunit I contains a region homologous, or a t least structurally analo- gous, to the NH2-terminal region of mitochondrial subunits 111, including the first two putative transmembrane segments. If this proves to be generally true, the bacterial oxidase complexes are not missing any significant portion of subunit I11 (or of subunit I) relative to their mitochondrial counter- parts (Mather et al., 1990). We carried out a careful statistical analysis of the NH2-terminal sequences of bacterial subunits I11 (results not shown); this revealed significant similarities among Paracoccus, Bacillus, and Thermus sequences. The alignments presented here support the hypothesis that the COOH-terminal extensions found in subunits I from Bacilli are related to the NH2-terminal portions of subunits I11 in other oxidases and may serve the same structural or functional roles.

Toward a Working Model of Cytochrome can3

In the remainder of the Discussion, we build on the simi- larity of the Thermus sequences to those of other heme-copper oxidases by combining results from hydropathy and variabil- ity studies with results from site-directed mutagenesis (see references below) to arrive at a rough structural model of the Thermus enzyme.

Hydropathy Analyses-The hydropathy profile of the sub- unit I-homologous part of the Thermus I/III sequence is compared with the average hydropathy of other cytochrome oxidase subunits I in Fig. 6A. The Thermus profile is very similar to that of the other subunits I and contains 12 ex- tended hydrophobic regions (see Fig. 4) which probably form helical segments that traverse the membrane bilayer; these are separated by hydrophilic segments of varying length. Solid bars in Fig. 6A indicate the assigned extent of each of these putative transmembrane helical segments in the T. thermo- philus sequence. We note that some of these hydrophobic segments are not consistently predicted to be transmembrane helical segments by our hydropathy analysis, primarily seg- ments 111, VI, VIII, and XI, since they often do not have a large enough total hydropathy score or are not long enough according to the criteria proposed by (Degli Esposti et al., 1990). Thus, subunit I could be composed of less than 12 transmembrane segments, as suggested previously (Capaldi et al., 1983; Bisson and Montecucco, 1985; Lundeen, 1986; Degli Esposti et al., 1989). However, recent studies using alkaline phosphatase gene fusions suggest that, in the case of the E. coli cytochrome bo, there are 12 membrane spanning segments in the region homologous to subunits I of cytochromes aa3 (Chepuri and Gennis, 1990).

The hydropathy profile of the subunit III-homologous part of the Thermus I/III sequence is presented in Fig. 7A. It is

Profiles calculated from the alignments of the oxidase subunit I and I11 sequences were used to search the National Biomedical Research Foundation Protein Identification Resource (release 31.0) and the Swiss Protein Data Base (release 21.0) for similar proteins. Only cytochrome oxidase subunits were found to a have significant matches to these profiles, suggesting that there were no other proteins in the data bases with significant similarity to the oxidase subunits.

5398 Cytochrome caa3 Subunits

+ ====1===;"1====..=~=~=+ ===mhi.iii== Bos "-"""""----- FINRWLFSTNHKDIGTLYLLFGAWAGMVGTALSLLIRAELGQPGT-LL------------"-" GDDQIYNWIJTAHAFVM 6 Wht M----------------TNMVR~FSTNHKDIGTLYFIFGAIAG~GTCFSVLIRMELARPGDQIL-"""-"--"-- G G N H Q L ~ L I T ~ ~ 6

sc """"""------- VQRWLYSTNAKDIAVLYFMLAIFSGMAGTAHSLIIRLELAAPGSQYL---------------- H G N S Q L F ~ L ~ G ~ V ~ 6 Pd M-~AAVH~HGDHHDTRGFFTRWFMSTNHKDIGILYLFTAGIVGLISVCFT~LQHPGVQYMCLEGA~RLIADAS~CTPN~L~ITYHG~ 9 pQ3 """- STIARKKGVGAVLWDYLTTVHKKIAHLYLISGGFFFLLGGLEALFIRIQL~P~FL------------------ VGGLYNEVLTMHGTTM 7 Tt M-----AITAKPKAGVWAVLWDLLTTVDHKKIGLMYTATATAFFAFALAGWSLLIRTQLAVP~QFL------------------ TGEQYNQILTLHGATM 7

I 1 0 0 0 2 0

* * * r* * * * * I

=II===L.=="= - =,=EPF===IxI====l-fo= BoS IFF"PIMIGGFGNWLVP~IGAPD"NNMSFWLLPPSFLLLLASS~AGAG-----TGWTVYPPLA~LAHAGAS~LTIFSLHLAGVSSILG 16 wht IFF"PAMIGGFGNWFVPILIGAPDMAFPRLNNISFWLLPPSLLLLLSSALVEVGSG-----TGWTVYPPLSGITSHSGGAVLAIFSLHLSGISSILG 16

Pd MFFWIPALFGGFGNYFMPLHIGAPDMAFPRLNNLSrWMYVCGVALGVASLLAPGGNDQMGSGVGWVLYPPLST--TEAGYSHDLAIFAVHVSGASSILG 19 SC IFFLVMPALIGGFGNYLLPLIGATDTAFPRINNIAFWVLPMGLVCLVTSTL~SGAG-----TGWTVYPPLSSIQMSGPSVLAIFALHLTSISSLLG 16

Ps3 IFLAAMPLVF-AFMNAWPLQIGARDVAFPFLNALGFWMFFFGGLFLNCSWFL-GGAPD----AGWTSYASLS-LDSI(AHHGIDFYTLGLQISGFGTIMG 16 Tt LFFFIIQAGLTGFGNFVVPLLGRRDVALPR~AFSYWAFLGAIVLALMSYFFPGGAPS----VGWTFYYPFSA-QSES--GVDFYLAAILLLGFSSLLG 17

- =,,,=,="IvPP=

* r t n y * t * n * * r r* * rtr * r rr 00 00

I====== +. ==========v=~=;======= -+ I=l,th~opVI-========I BOS AINFITT~I~PPAMSQYQTPLFWSVMITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIVTY 26 Wht SINFITTIF~GPGMTMHRLPLFWSVLVTAFLLLLSLPVLAGAI~MLLTDRNFNTTFFDPAGGGDPILYQHLFWFFGHPEWILILPGFGIISHIVST 26 SC AINFIVTTL~TNGMTMHKLPLFVWSIFITAFLLLLSLPVLSAGITMLLLD~FNTSFFEVAGGGDPILYEHLFWFFGHPEWILIIPGFGIISHWST 26 Pd AINIITTFL~GMTLFKVPLFAWS~ITAWLILLSLPVLAGAITMLLMDRNFGTQFFDPA~DP~YQHILWFIIILPGFGIISHVIST 29

PS3 AINFLVTII~GMTFMRMPMFTWATFVTSALILFAfPPLTVGLIFMMMDRLFGGNFFNPAACGNTIIWEHLFWVfGHPEWILVLPAFGIFSEIFAT 26 Tt NANFVATIYNLRAQGMSLWKMPIrVWSVFAASVLNLFSLAGLTAATLLVLLERKIGLSWFNPAVGGDPVLFQQFFWFYSHPTWVMLLPYLGILAEVAST 27

* * e * * t * t * 6 n * t r* t t ** t** t * t * t 0 0 0

+ ~==-=-PI-VII-=-hh--PI """"VIIIII+""" "ll"I"IX"*LPII"= BOS YSGKKEPFGYMG~AMMSIGFLGFIVWAHHMF~GHDVDTRAYFTSATMIIAIPTGVKVFSWLATLHGGNIKWS-P~ALGFIFLFTVGGLTGIVLA 35 Wht FSRKP-VFGYLG~AMISIGVLGFLVWAHHMFTVGLDVDTRAYFTMTMIIAVPTGIKIFSWIATMWGGSIQYK-TPMLFAVGFIFLFTIGGLTGIVLA 36

Pd FAKKP-IFGYLPMVLAMAAIGILGFVVYJMHMYTAGMSLTQQAYFMLATMTIAVPTGIKVFSWIATMWGGSIEFK-TPMLWAFGFLFLFTVGGVTGWLS 39 SC YSKKP-VFGEISMWAMASIGLLGFLVWSHHMYIVGLDATRAYFTSATMIIAIPTGIKIFSWLATIYGGSIRLA-TPMLYAIAFLFLFTMGGLTGVALA 35

PS3 FSRKR-LFGYSSMVFATVLIAFLGF~MHMFTVGMGPIANAIFAVATMTIAVPTGVXIFNWLFTMWCIGSIKFT-TPMHYAVAFIPSFVMGGVTGVMLA 36 T t FARKP-LFGYRQ~AQMGIWLGT~~HMFTVGESTLFQIAFAfFTALIAVPTGVLZLFNIIGTLWGGKLQMK-TPLrWVWFIFNFLLGGITGVMLS 36

* ** II A A t ** * * A r * t * ** *** t n n I n * * * * * 0 0 0 0 0 3

- - ==~~~~-=I=XPPI=-E==== ~,====P==xI"PP""=P + 809 NSSLDIVLHDTYYWMFHYVLSMGAVFAIMGGFVHWFPLFSGYTLM)TWAKIHFAIMFVGVNMTFFPQHFLGLSGMPRRYSDYPDAYTMW---NTISSM 45 Wht NSGLDIALHDTYYWAHFHYS~AVFALFAGFYYWVGKIFGRTYPETLGQIHFWITFFG~LTFFP~FLGLSGMPRRIPDYPDAYAGW---N~SSF 45

mPP==

Sc NASLDVAFHDTYYWGHFHYVLSMGAIFSLFAGYYYWSPQILGLNYNEKLAQIQFWLIFIGANVIFFPMHFLGINGMPRRIPDYPDAFAGW---NYVASI 45 P~QAPLDRVYHDTYYWAHFHYSLGA~GIFA~~IGKMSGRQ~EWAGQLHF~FIGSNLIFFPQHFLGR~MPRRYIDYP~FA~---~ISSI 4 9

T~MT P L D Y Q F H D S YFWAHFH~LHA~S~FG~AGLY~P~TG~YDERLGRLHFWLFLVGYLLTFLPQYALGYLGMP~YYTYN~IAG~ELNLLSTI 4 6 PS3 SAAADYQYHDSYFWAHFHWIVGGVVfALLAGTHYWWPKMFGRMLNETLGKITFWLFFIGFHLTFFIQHFLGLTGMPRRVFTYLPHQ-GWETGNLISTI 46

* n n n *n * * * t * * * * * t * * r t t * r** . . * 0 0 0 0 0 0 00 0 0 0 0

====xII===p.x===== - BOS GSFISLTAVMLMVFIIWEAFASKREVLT--------"--- VDL""""" TTTNLEWLNGC-PPPYHTF-EEPTY~LK 51 WhtGSYISWGICCFF~AITLSSGKNQKC------- AESPWAVEQ-"""".- NSTTLEWLVQS-PPAFHTF-GELPAVKETKS 52 SC GSFIATLSLFLFIYILYDQL~G~K~NKSVIYWAPDFVESNTIFNLNTV-KSSSIEFLLTS-PPAVHSF-NT-PAVQS 53 PdGAYISFASFLFFIGIWYTLFAGKRVNV--------~- PNYWNE----------HADTLEWTLPS-PPPEHTF-ETLPKREDWDRAHAH 55

PS3 GAFFIRAATVILLINIWTTAKGEKVPG----------D-AWGD----------- GRTLEWAIAS-PPPWNF-AQTPLVRGLDAFWLEKMEGKKE ... (61 Tt GAYILGLGGLVWIYTHWKSLRSGPKAP----------- DNPWG"""""-- GYTLEWLTAS-PPKAHNFDVKLPTEFPSERPLYDWKKKGVE... ( ?

* * ? I

5 0 i r

FIG. 4. Comparison of the amino acid sequences of cytochrome c oxidase subunits I. An alignment of 37 cytochrome c oxidaE (cytochrome aa3) subunit I amino acid sequences was prepared (including the deduced 7'. thermophilus sequence; the individual sequenc references are listed under "Experimental Procedures"); an annotated subset is presented here. The complete alignment can be obtained up0 request. The subunit sequences shown are from oxidases that have been purified and characterized (to varying extents) and also represent wide phylogenetic range: Bos, Bos taurus (bovine); Wht, Triticum aestiuum (wheat); Sc, Saccharomyces cereuisiae (yeast); Pd, Paracocc~ denitrificans; PS3, Bacillus PS3; and T t , T. t h e r m o p h i ~ ~ ~ . Amino acids are depicted using the standard one-letter code. Gaps are indicated h dashes in the sequences. For simplicity, gaps of any size present in all six displayed sequences (i.e. an insert in one or more omitted sequence: are shown as a gap of length one marked below with a number: I , the amino terminus of Br~yrhizobium japonicum subunit I is 7 residur longer than any other sequence; 2, an insert of 10 residues in the R. sphaeroides subunit I; 3, 1 additional residue in Schiaosaccharomycc pombe subunit I, 4 , 1 additional residue in S. pombe subunit I; 5, 1 additional residue in echinoderm subunits I (the total length of t k alignment is 620 positions, extending to the end of the Neurospora crmsa subunit I sequence, which is nine positions beyond the arbitral ends chosen for the Bacillus and Thermus subunit I-homologous sequences). Positions containing identical residues in all the cytochrome M subunits I are marked with a star below the sequences. A carat marks positions identical in all but one subunit I sequence, unless the or sequence which differs is that of T, t ~ r m o p h i l ~ , in which case a lowercase t is placed below the position. Positions at which a small grou of closely related sequences differs from a11 the others are marked with B lowercase r ("ciosely related sequences" here means subunit

Cytochrome caa3 Subunits 5399

highly similar to the profiles of other subunits 111, which are often described as consisting of seven hydrophobic segments that are probably transmembrane helices (e.g. see Wikstrom et al. (1985) or Saraste (1990)). As in the case of subunit I, not all of the hydrophobic segments satisfy the criteria for the prediction of transmembrane helical segments in many of the subunit 111 sequences. The first two hydrophobic segments generally have a very short connecting sequence and do not appear to have a sufficient combined length of hydrophobic residues to traverse the membrane bilayer twice as helices (2 X 21 residues; hence the bars below segments I’ and 11’ in Fig. 7 are shorter than those below the others, which represent 21 residues). Segment 111’ contains an invariant glutamate residue that is reactive toward N,N’-dicyclohexylcarbodiimde in the bovine oxidase (Casey et al., 1980), yet overall is fairly hydrophobic; in comparing different species, segment IV’ also frequently has a rather low signature in the hydropathy plot. Nevertheless, genetic fusion experiments with subunit 111 of the E. coli cytochrome bo are best interpreted in terms of a model with seven membrane-spanningsegments (Chepuri and Gennis, 1990).

The similarity of the various subunit I and subunit 111 hydropathy plots, including those from Thermus cytochrome caa3, together with the functional, spectroscopic, and sequence similarities, suggests a cognate secondary and tertiary struc- ture among the family of heme/copper oxidases. The hydrop- athy analysis and sequence similarities suggested the topolog- ical presentation of the Thermus cytochrome ma3 I/JII protein shown in Fig. 8. In this model, the NH2-terminal two-thirds of the protein is homologous to subunit I of mitochondrial cytochrome aa3 oxidases and contains 12 transmembrane helices, whereas the COOH-terminal region is homologous to subunits 111 of other oxidases and contains seven transmem- brane helices. These two regions are separated by a relatively long and very hydrophilic connecting segment. The orienta- tion shown, with the NH2 terminus and the long hydrophilic segment on the cytoplasmic (negative) side of the membrane, is analogous to the orientation deduced for the corresponding subunits of the E. coli cytochrome bo (Chepuri et al., 1990) and for subunit 111 of the bovine cytochrome aa3 (Malatesta and Capaldi, 1982). All the helices are presented in standard 21-residue helical net format except helices I’ and 11’ in subunit 111, as noted above; for these we have drawn an

intermediate structure. Considerations of mass and charge distribution, as shown in Fig. 8, are discussed elsewhere (Fee et al., 1993).

Metal Binding and Conserved Residues-The preponder- ance of evidence now supports the idea that the CUA site is associated with subunit II(c) and involves coordination by two conserved histidine residues (cf. Mather et al., 1991)‘j The remaining three cannonical metals are evidently associated with subunit I. Elegant isotopic substitution work demon- strated that cytochrome a3 coordinates one histidine (Stevens and Chan, 1981), cytochrome a coordinates a t least one and probably 2 histidines (Peisach, 1978; Martin et al., 1985), whereas other studies indicate three or four nitrogen atoms coordinated to CuB (Cline et al., 1983; Shapleigh et al., 1992; Surerus et al., 1992; cf. Palmer (1987) for review). Sequence data for subunits I presently reveal 6 invariant histidine residues (Fig. 4; their positions in the Thermus sequence are indicated by bold boxes in Fig. 8).? These histidines are all predicted to lie within the membrane embedded section of the oxidase closer to the outer (positive) surface by our interpre- tation of the hydropathy analysis (Fig. 8). A 7th histidine, predicted to be in the loop between transmembrane helices IX and X (Thermus position 377, enclosed by a hexagon in Fig. 8) is conserved in all published sequences except that of B. japonicum, where it is replaced by a glutamine. Recent site- directed mutagenesis studies from the laboratories of Gennis, Anraku, and Ferguson-Miller (Lemieux et al., 1992; Minagawa et al., 1992; Shapleigh et al., 1992) suggest that all the con- served histidine residues except His-377 (His-411 in the Rho- dobacter sphaeroides and E. coli sequences) are essential for synthesis of a fully functional enzyme. His-377 may also be important, as some substitutions for this residue are active whereas others are not. The interpretation given was that the invariant histidines, but not His-377, are ligands of the metal centers in subunit I. This appears to be the simplest interpre-

R. J. Gurbiel, M. M. Werst, K. K. Surerus, J. A. Fee, and B. M. Hoffman, submitted for publication.

’The sequences of all other cytochrome c oxidases published to date contain 8 invariant histidines in subunit I. The Thermus subunit I sequence contains a glutamine at position 243, corresponding to His-233 in the bovine sequence, and a tyrosine at position 438, which corresponds to His-426 in the bovine sequence. These observations reduce the number of conserved histidine residues in known cyto- chrome UQ subunit I sequences to six (see Fig. 4).

sequences from clearly related organisms that are at least 75% identical; the relevant cases here are the vertebrates, the yeasts S. cereuisiae and Kluyueromyces lactis, the bacterial species P. denitrificans and Rhodobacter sphueroides, and the bacterial species Bacillus PS3 and Bacillus subtilus.). The addition of a lowercase o underneath a marked position indicates that the sequence of subunit I of the E. coli ubiquinol oxidase cytochrome bo differs from the conserved or consensus residue found at that position of the cytochrome c oxidase alignment (i.e. the status of that position in the alignment would be altered by inclusion of cytochrome bo in the alignment). Twelve extended hydrophobic segments of amino acids (potential transmembrane segments), as predicted by hydropathy analysis (see Fig. 6A), are delineated by ==== above the aligned sequences; they are arbitrarily assumed to extend for 21 residues (as found in the transmembrane segments of the bacterial photosynthetic reaction centers). Positions at which normally charged residues are present throughout the alignment are marked above that position with a minus for conservation of Asp/Glu or a plus for conservation of Arg/Lys. The positions of conserved histidine residues are marked above with a lowercase h. Some mitochondrial DNA sequences for cytochrome oxidase subunit genes from protozoan species have been reported but were not included in the alignment because the messages transcribed from these genes may undergo RNA editing (so far mRNA editing is established in three varieties of mitochondria: trypanosomes (Benne et al., 1986), plants (Covello and Gray, 1989; Gualberto et al., 1989), and the slime mold Physarum (Mahendran et al., 1991). Although higher plant mitochondrial transcripts are also edited, the number of changes introduced are comparatively few, and many of the substitutions can be predicted (Covello and Gray, 1990; Gualberto et al., 1990). Therefore, modified plant sequences were included in the alignment containing changes we predicted to occur at some of the potential RNA editing sites which make the deduced plant sequence more similar to the other oxidase subunit I sequences. For the wheat sequence shown, these alterations, relative to the sequence predicted from the DNA, are as follows: position 468, Cys for Arg; position 469, Cys for Arg; position 478, Leu for Ser; and position 497, Ser for Pro. The 5’. cereuisiae subunit I sequence shown differs from that found in the Protein Identification Resource data base and recent oxidase literature due to the inclusion of additional residues in the carboxyl-terminal part of the subunit as a result of the recognition that the last two introns predicted in the initial report of the sequence (Bonitz et al., 1980) probably do not exist (Burger et al., 1982; Lang, 1984; Vahrenholz et al., 1985; Hardy and Clark-Walker, 1991; Pel et al., 1992). Finally, a Gly to Trp correction at position 237 in the K. lactis subunit I sequence (bovine position 236), preserving the conservation of Trp at that position, results from the adjustment of the proposed intron boundaries of K. Lactis c o d intron 2 (Hardy and Clark-Walker, 1991) by four nucleotides so that the boundaries more closely match the group I intron consensus.

5400 Cytochrome caa3 Subunits

Bos Wht

Pd sc

PS3 Tt

Bos Wht

sc Pd

PS3 Tt

Bos Wht

sc Pd

PS3 Tt

""__="_ "_" "-I'======== =_===== 1 I I===== = M------THQTHAYHMVNPSPWPLTGALSALLMTSGLTMWFHFH-------- M--"-IESQRHSYHLVDPSPWPISGSLGALATTVGGVMYMHSFQ--"--GGATLLSLGLIFILYTMFVWWRDVLRESTL---EGHHTKAVQLGLRYGF

SMTLLMIGLTTNMLTMYQWWRDVIRESTF-"QGHHTPAVQKGLRYGM

MTHLERSRHQQHPFH"PSPSPIWSFALLSLALSTALTMHGYI--"--GN"VYLALFVLLTSSILWFRDIVAEATY---LGDHTMAVRKGINLGF M------AHVMDYQILPPSIWPFFGAIGAIGAFVMLTGAVA~GITFFGLPVEGP~LIGLVGVLY~GWW~~EGET----GEHTP~IGLQYGF ---ELTPAEPLGDIHMPNSSFLPFVIAFGLFVAAFGFT--YHNDAGWGLPV-----AILGLLITLGS~L--RSVIDDHGFHI//ERATLEGKNK-FLGF ---VELKPEDPAHIHLPNSSFWPFYSAATLFAFFVAVA-----""ALPV--PNVWMWVFLALFAYGLV--RWALED-EYSHPVEHHTVTGKSNAWMGM

*

- -_ "_ - = _"" III'======== 0=5=L==E=IIV'====-E-r ===-======V'======= I L F I I S E V L F F T G F F W A F Y H S S L A - - - - T P E L G G C W P P T G I - H S I T W A H H S L - M E G D R K H M L Q A L F I T I T L G V Y F T L L Q ILFIVSEVMFFFAFFWAFFHSSLAP"---T~IGGI~PKGI-GVLDPWEIPLLNTLILLSSG~VTWAHHAI-LAGKEKRA~ALVATVLLALVFTGFQ LMFVLSEVLIFAGLFWAYFHSA"----DVTLGAC~PVGI-EAVQPTELPLLNTIILLSSGATVTYSHHAL-IAGNRNKALSGLLITFWLITFWLIVIFVTCQ ILFIMSEVMFFVAWFWAFIKNALYPMGPDSPIKDGWJPPEGI-VTFDPWHLPLINTLILLLSGVAVTWAHHAFVLEGDRKTTINGLIVAVILGVCFTGLQ WLFLGGETVLFASLFATYLALKDK---------TNGGPSA---EELFQMPWFMATMLLLTSSLTSVYAIYHM-KNFDFKKMQLWFGITVLLGAGFLGLE AWFIVSEVGLFAILIAGYLYLRLS--------- GAATPP----EERPALWLALLNTFLLVSSSFTVHFAHHDL-RRGRFNPFRFGLLVTIILGVLFFLVQ

* r * * t I r* * * 0 0

r * r 1 0 0 0

==- ..................... ======"o=vII'======= ASEYYEAPFT----ISDGVYGSTFFVATGFHGLHVIIGSTFLIVCFFRQLKFHFTSNHHFGFEAGAWYWHFVDVVWLFLWSIYWWGS 261 GMEYYQAFT----ISDSIYGSTFFLATGFHGFHVIIGTLFLIVCGIRQYLGQMTKKHHVGFEAAAWYWHF~~LFLFVSIYWWGGI 265 YIEYTNAAFT-- - - ISDGVYGSVFYAGTGLHFLH"LAAYHLTAGHHVGYETTIIYTHVLDVIWLFLYVTFYWWGV 269 AYEYSHAAFG----LADTVYAGAFYMATGFHGAHVIIGTIFLFVCLIRLLKGQMTQKQHVGFEAWHFVDVVWLFLFLFWIYIWGR 274 IYEFNEYVHE-GHKFTTSAFASAFYTLVGTHGSHVAFGLLWILTLMIRNAKRGLNLYNAPKFWASLYWHFIDVVWVFIFTVVYLMGMVG 207 SWEFYQFYHH--SSWQENLWTAAFFTIVGLHGLHWIGGFGLILAYLQALRGKITLHNHGTLEAASMmHLVDAWJLVIVTIFYVW 791

* I r r * * * r * A * * * * 2 0 0

83 86 91 91

608//31 621

177 180 185 190 118 707

FIG. 5. Comparison of the amino acid sequences of cytochrome c oxidase subunits 111. An alignment of 36 cytochrome c oxidase subunit 111 sequences was prepared (including the deduced T. thermophilus sequence; the individual sequence references are listed under "Experimental Procedures"); an annotated subset is presented here. The complete alignment is not shown but can he obtained upon request. The subunit sequences shown are from oxidases that have been purified and characterized and also represent a wide phylogenetic range: Bos, B. taurus (bovine); Wht, T. aestiuum (wheat); Sc, S. cereuisiae (yeast); Pd, P. denitrificans; PS3, Bacillus PS3; and Tt, T. thermophilus. Annotation is as described in Fig. 4. Gaps present in all six displayed sequences (i.e. an insert in one or more omitted sequences) are marked below with a number: 1 , 13 additional residues in trypanosome subunits 111 and 2 additional residues in S. pombe subunit 111; 2, 1 additional residue in B. subtilis (also the trypanosomes are 5 residues longer at the carboxyl terminus; the total length of the alignment is 307). Corrections to the sequence of wheat subunit 111 for RNA editing were as determined by (Gualberto et al., 1990); corrections to other plant subunit 111 sequences were made as described in the legend to Fig. 4 and by analogy to the wheat changes.

tation of the data and forms the core of the model presented below.

As a perusal of Fig. 4 will reveal, there are a number of fully conserved residues both within the putative transmembrane segments as well as in the loops between these. Many of these are glycine and proline or hydrophobic residues likely to play a role in structure. Summarized in Table I1 are positions that either bear a charge or are able to participate in hydrogen bonding in all cytochromes aa3; they are presented according to the topological model of Fig. 8. A number of these might play a role in enzyme function? For example, others have suggested that the phenolic group of tyrosine might play a role in proton translocation (Babcock and Callahan, 1983; Chan and Li, 1990), and it is possible that such residues act to guide protons over long distances within the molecule.

Functional studies indicate subunit I11 is not required for either electron transferring or proton-translocating activities (Puettner et al., 1985; Haltia et al., 1991). Examination of Fig. 5 reveals that it nevertheless contains a number of conserved residues that are also maintained in the Thermus sequence.

' Recently, the sequence was reported of the subunits of a quinol oxidase from the archaehacterium Sulfolobus acidocaldarius, which contains four A-type hemes (Luhben et al., 1992). This oxidase includes subunits described as homologs of subunits I and I1 of mitochondrial cytochromes aa3, although they have the most diver- gent sequences reported to date. Alignment of the Sulfolobus subunit I to other cytochrome oxidases (which is problematic in some regions) indicates that a number of the conserved residues/functionalities discussed here are altered or missing from the Sulfolobus oxidase, including Asn-92, Asn-110, Asn-173, Arg-182, Arg-223, Gln-243, Lys- 274, Lys-328, Asp-103, Asp-154, Glu-222, Asp-373, Tyr-31, Thr-21, and Trp-196 (Therrnus numbering) (see Table 11).

These include 2 histidines (positions 736 and 775) and gluta- mate 628 in segment III', which may be involved in the reaction with N,N'-dicyclohexylcarbodiimde?

Variability Analyses-The number of different amino acids that occur at each sequence position within a group of aligned, homologous protein domains is defined as the variability. Because amino acids on the surface of proteins vary more than those in the interior (Chothia and Lesk, 1986; Yeates et al., 1987), it has been proposed that analysis of this variability within a group of homologous membrane proteins may provide insight into the spatial arrangement of the transmembrane helices (Rees et al., 1989). From our alignments of oxidase subunits I (37 species) and subunit 111 (36 species), we deter- mined the variabilities and calculated helical variability in- dices for both subunits according to the method of (Rees et al., 1989). A plot of the variabilities, averaged over a span of 7 residues, is shown in Fig. 6B (subunit I) and Fig. 7B (subunit 111). Regions of highest conservation (low variability) are generally associated with the putative transmembrane seg- ments, whereas regions of higher variability (and gaps) occur in the connecting sequences between these segments. Segment I11 in subunit I is an exception, having a relatively high variability, whereas the latter half of segment XI1 also has a high variability. (Note that the COOH-terminal region of subunit I has relatively high variability.) Segments VI, VII, VIII, and X are associated with particularly low variability, suggesting that these segments participate in forming the

We have demonstrated that N,N'-dicyclohexylcarbodiimide binds exclusively to the A-protein under the same experimental conditions that it labels glutamate 90 in subunit 111 of bovine aa3 (P. Springer and J. A. Fee, unpublished results).

Cytochrome caa3 Subunits 5401

8

+I ' I .

5402 Cytochrome caaa Subunits

TABLE I1 Conserved polar/H-bonding residues in the subunit I sequence of Thermus cytochrome caag

Helix/Loop Conserved residues Equivalent residues in bovine Q Q ~

In this Table, conserved indicates the presence of chemically similar residues at the equivalent positions in all aag-type oxidases.

NH, terminus Lys-25, Thr-21 Lvs-13, Ser-9 I Tyr-31 Loop 1/11 Arg-50 I1 His-73 Loop II/III Asn-92, Asn-110, Asp-103 111 Trp-115

Tyr-19 Arg-38 His-61 Asn-80, Asn-98, Asp-91 TID-103

Loop III/IV IV

Asp-154, Tyr-142, Trp-139 Asp-144, Tyr-129, Trp-126 - -

Loop IV/V

Loop V/VI

Loop VI/VII

V

VI

VI1 Loop VII/VIII VI11 Loop VIII/IX

Loop IX/X IX

X Loop X/XI XI Loop XI/XII XI1 COOH terminus

Asn-173, Thr-177, Arg-182, Trp-196 Asn-163, Thr-167, Lys-172, Trp-186

Arg-223, Gln-243, Glu-222 Arg-213, His-233, Asp-212 His-250, Tyr-254, Trp-246 His-240, Tyr-244, Trp-236

His-299, His-300, Trp-297 His-290, His-291, Trp-288

- -

Lys-274 LYS-264

Thr-318, Thr-325, Lys-328 -

-

-

His-377, Asp-373, Ser-379 His-385, His-387

Arg-447 -

Glu-505 -

Thr-309, Thr-316, Lys-319 -

-

- His-368, Asp-364, Thr-370 His-376, His-378

Arg-438 -

Glu-493 -

The dash indicates there are no conserved residues in this structural region.

structural core of the enzyme and/or contain essential func- tional residues. Indeed, segments VI, VII, and X contain conserved histidines likely to coordinate the heme and copper centers associated with subunit I; however, segment I1 also contains a conserved histidine but does not have a particularly low variability. In subunit 111, the segment 11’ region has high variability, whereas the variability is lowest in the COOH- terminal hydrophobic segments VI‘ and VII’. Segments I’ and V’ also have a fairly high variability, whereas the varia- bility of segments 111’ and IV’ is intermediate.

The helical variability index, as used here, is a theoretical measure of the difference in variability on opposing faces of an a-helix (Rees et al., 1989). The helical variability index plots for the aligned subunits I and subunits I11 are shown in Fig. 6C and Fig. 7C, respectively. Peaks in the index plots indicate segments in the sequence at which a (hypothetical) helix has a variable face and an opposing relatively conserved face (see Fig. 9). Such a helix might be predicted to lie on the surface of the protein with the variable face directed out toward the lipid, since the most variable residues in a protein tend to be found on the surface (Yeates et al., 1987).” Helices not having opposing variable and conserved faces might be predicted to be largely exposed if the overall variability is high or essentially buried in the protein interior if the overall variability is low. In the plot for subunit I (Fig. 6C), there are substantial peaks associated with hydrophobic segments I (offset toward the COOH-terminal side of the segment), IV (COOH-terminal side), VI1 (NHz-terminal side), and IX, and marginal peaks associated with I1 (NHz-terminal side), V

lo Of course, if the segment in question is not actually part of a helix in the protein, the prediction is meaningless. Note further that a relatively long window (19 residues) is used in the calculation of the helical index, presumably because the method was developed for use with integral membrane proteins expected to have long trans- membrane helices, and therefore, surface helices much shorter than 19 residues will generally not produce a peak in the plot. See footnotes in our previous paper (Mather et al., 1991) for other caveats in the use of this analysis.

(NHz-terminal side), VIII, XI1 (NHz-terminal side), and XI1 (COOH-terminal side). There are also significant peaks lo- cated between hydrophobic segments I1 and 111, immediately following (and overlapping slightly) segment V, and between segments X and XI; this may indicate (long) extramembrane surface helices in these connecting regions between the trans- membrane segments. In the case of subunit I11 (Fig. 7C), there are peaks in the helical index plot associated with all of the hydrophobic segments except 11’ and VII’, the most variable and most conserved segments, respectively. We have prepared helical wheel diagrams for each of the hydrophobic segments of subunits I and I11 (Fig. 9) and indicated thereon an inter- pretation (subjective and likely oversimplified) in terms of possible surface (variable) and internal (conserved) areas. This information should be helpful in placing the helices into hypothetical three-dimensional models.

We have attempted to combine information from our hy- dropathy and variability analyses with site-directed mutagen- esis results from other laboratories to arrive at a packing model for the structural core of subunit I. Fig. 10 shows a cross-section taken parallel to the membrane bilayer just below the outer surface of the membrane. The model juxta- poses the putative helices of subunit I with the lowest varia- bility (VI, VII, VIII, and X) to form the structural core of the model, with the variable surfaces of helices VI1 and VI11 oriented toward the outer surface. This orientation of helix VI1 brings its two conserved histidine residues into the inte- rior of the protein. Helices VI and X contain 3 more conserved histidines thought to coordinate the metal centers. Within the constraints suggested by the variability analysis, and consistent with the results of mutagenesis reported for oxi- dases from E. coli and Rhodobacter sphueroides (Lemieux et al., 1992; Minagawa et al., 1992; Shapleigh et al., 1992), each conserved histidine residue in helices VI and X can be oriented toward the 2 conserved histidines of helix VI1 in the interior of the protein thus creating a pocket in which to coordinate the a3-CuB binuclear center. There are a limited number of

Cytochrome caa3 Subunits 5403

Subunit I11

nit I1

FIG. 10. Possible packing of the transmembrane helices of the core subunits of cytochrome u u ~ . A hypothetical cross-section of the membrane bilayer-embedded core of the oxidase, including the putative transmembrane helices of subunits I, 11, and 111, is shown. The putative transmembrane helices are represented as circles, and arcs outside the circles indicate the potential surface areas suggested by the variability analysis (see Fig. 9). Roman numerals identify the specific hydrophobic segments from subunit I. The putative helices from subunit I1 are cross-hatched, whereas those from subunit 111 are striped. The particular packing shown for subunit 111 is essentially an arbitrarily chosen close-packed structure. The gray loop between helix XI1 of subunit I and the NH, terminus of subunit I11 represents the continuation of subunit I, via a long hydrophilic connecting sequence, to include the NH,-terminal part of subunit 111.

ways to position the remaining helices of subunit I about this core, we suggest that shown in Fig. 10 as the basis of a working model. Here helix 11, which contains the 6th conserved histi- dine residue, is positioned to bracket a heme between 2 conserved histidine residues, thereby forming cytochrome a. This is in accord with mutagenesis results suggesting that His-73 (Therrnus sequence numbering) in helix I1 and His- 387 in helix X are the axial ligands of cytochrome a (Lemieux et al., 1992; Minagawa et al., 1992; Shapleigh et al., 1992)." To complete the arrangement, helices I11 and X I 1 are given the most peripheral positions, in accord with their relatively high variabilities. One obvious shortcoming of this arrange- ment is the possibility that the distance between helix I11 and helix IV is too great, but this loop contains 24 amino acids, which are probably sufficient to span the distance. An addi- tional apparent defect is that the variable side of helix X is not exposed to the outside of the molecule; however, helix X has an overall low variability, with only two positions near the COOH terminus showing much variation (Fig. 6); its placement is thus reasonable. In general, the ability of a model, based on hydropathy and variability considerations, to accomodate, with very little forcing, the results of site- directed mutagenesis studies suggests considerable parsimony in the merging of the two arguments and lends credence to the picture shown in Fig. 10. Nevertheless, the model is not unique and is only intended as a working hypothesis."

There is no published experimental information regarding the relative locations of subunits I1 and I11 with respect to

S. Ferguson-Miller and R. Gennis, Biophysical Society Meeting, February, 1992, Houston, TX.

We have constructed a very rough three-dimensional model of the helix bundle for subunit I shown in Fig. 10 (Fee et al., 1993). Although little more can be said at this time, it is noteworthy that the conserved polar residues in helix VI11 (see Table 11) face into the space containing the cytochrome a&uB pair that is formed by helices VI, VII, VIII, and X; if real, this may be of functional significance.

subunit I. In accord with the view that the subunits I and I11 are joined in Thermus, we have placed subunit I11 helices in proximity to the COOH-terminal region of subunit I. Subunit I1 helices are placed near the NH2 terminus of subunit I, relatively close to the hypothetical cytochrome a site, to allow for proximity of the CuA site to cytochrome a (cf. Goodman and Leigh, 1985).

In summary, the sequencing of the genes encoding the core polypeptides of cytochrome caa3 from T. therrnophilus has been completed (Mather et al., 1991 and this work), and the deduced amino acid sequences have been analyzed in con- junction with those of other cytochrome c oxidases. The adjacent c a d and caaB genes encode four components of the respiratory chain that are clearly homologous to mitochon- drial counterparts: subunits I, 11, and I11 of cytochrome c oxidase plus a cytochrome c. The fusion of the subunit I1 and cytochrome c gene is maintained at the level of the mature protein, whereas the connectivity between the subunit I and subunit I11 polypeptides remains to be established. These and other results have been incorporated into a simple model for the packing of the transmembrane helices that appear to be the core of the heme/copper oxidases.

Acknowledgments-We thank M. K. Jones and K. Beaudrie, Uni- versity of Alabama, for the WHEEL program used to generate Fig. 9; Mark Wilder of the Life Sciences Division of LANL for making their VAX computers available and for assisting us in their use; M. Dewor for assistance with protein chemistry; and Andy Keightley for valuable discussions and preliminary experimentation in identifying the caag gene organization.

REFERENCES Babcock, G. T., and Callahan, P. M. (1983) Biochemistry 22, 2314-2318 Benne, R., Van den Burg, J., Brakenhoff, J. P. J., Sloof, P., Van Boom, J. H.,

Bonitz, S. G., Coruzzi, F., Thalenfeld, B. E., Tzagoloff, A,, and Macino, G. Bisson, R., and Montecucco, C. (1985) J. Inorg. Biochem. 2 3 , 177-182

Bowen, D., Littlechild, J. A,, Fothergill, J. E., Watson, H. C., and L. H. (1988)

Burger, G., Scriven, C., Machleidt, W., and Werner, S. (1982) EMBO J. 1 ,

Buse, G., Hensel, S., and Fee, J. A. (1989) Eur. J . Biochem. 181,261-268 Capaldi, R. A. (1990) Arch. Biochem. Biophys. 280, 252-262 Capaldi, R. A,, Malatesta, F., and Darley-Usmar, V. M. (1983) Biochim. Biophys.

Casey, R. P., Thelen, M., and Ami, A. (1980) J. Biol. Chem. 255,3994-4000 Chait, B. T., and Kent, S. B. H. (1992) Science 257 , 1885-1894 Chan, S. I., and Li, P. M. (1990) Biochemistry 29,l-12 Chepuri, V., and Gennis, R. B. (1990) J. Biol. Chem. 265 , 12978-12986 Chepuri, V., Lemieux, L., Au, D. C.-T., and Gennis, R. B. (1990) J. Biol. Chem.

Chothia, C., and Lesk, A. M. (1986) EMBO J. 5,823-826 Cline, J., Reinhammar, B., Jensen, P., Venters, R., and Hoffman, B. M. (1983)

Covello, P. S., and Gray, M. W. (1989) Nature 341,662-666 Covello, P. S., and Gray, M. W. (1990) Nucleic Acids Res. 18,5189-5196 Degli Esposti, M., Ghelli, A., Luchetti, R., Crimi, M., and Lenaz, G. (1989) Ital.

Degli Esposti, M., Crimi, M., and Venturoli, G. (1990) Eur. J. Biochem. 190,

and Tromp, M. C. (1986) Cell 46,819-826

(1980) J. Biol. Chem. 255,11927-11941

Biochem. J. 254,509-517

1385-1391

Acta 726, 135-148

265,11185-11192

J. Biol. Chem. 258,5124-5128

J. Biochem. 3 8 , 1-22

207-21 9 Dekker, K., Yamagata, H., Sakaguchi, K., and Udaka, S. (1991) J. Baeteriol.

Devereux, J., Haeberli, P., and Smithies, 0. (1984) Nucleic Acids Res. 12,387-

".

173,3078-3083

245 D&&le, R. F. (1981) Science 214, 149-159 Doolittle, R. F. (1986) Of URFS and ORFS: A primer on How to Analyze Deriued

Fawcett, T. W., and Bartlett, S. G. (1990) BioTechniques 9,46-49 Fee, J. A,, Choc, M. G., Findling, K. L., Lorence, R., and Yoshida, T. (1980)

Fee, J. A,, Kuila, D., Mather, M. W., and Yoshida, T. (1986) Biochim. Biophys.

Fee, J. A., Mather, M. W., Springer, P., Hensel, S., and Buse, G. (1988a) Ann.

Fee, J. A,, Zimmermann, B. H., Nitsche, C. I., Rusnak, F., and Miinck, E.

Fee, J. A,, Yoshida, T., Surerus, K. K., and Mather, M. W. (1993) J. Bioenerg.

Amino Acid Seqwnces, University Science Books, Mill Valley, CA

Proc. Natl. Acad. Sci. U. S. A. 77, 147-151

Acta 853,153-185

N. Y. Acad. Sci. 550,33-38

(198813) Chem. Scr. 28A, 75-78

Goodman, G., and Leigh, J. S., Jr. (1985) Biochemistry 24,2310-2317 Gribskov, M., Devereux, J., and Burgess, R. T. (1984) Nucleic Acids Res. 12,

Gribskov, M., McLachlan, A. D., and Eisenberg, D. (1987) Proc. Natl. Acad.

Gribskov, M., Homyak, M., Edenfield, J., and Eisenherg, D. (1988) Comp.

Biomembr., in press

539-549

Sci. U. S. A. 84,4355-4358

Applic. Biol. Sci. 4, 61-66

5404 Cytochrome caa3 Subunits Gualberto, J. M., Lamattina, L., Bonnard, G., Weil, J.-H., and Grienenberger,

Gualberto, J. M., Weil, J.-H., and Grienenberger, J.-M. (1990) Nucleic Acids

Haltia, T., Saraste, M., and Wikstrom, M. (1991) EMBO J. 10 , 2015-2021

Henikoff, S. (1984) Gene (Amst.) 2 8 , 351-359 Hardy, C. M., and Clark-Walker, G. D. (1991) Curr. Genet. 2 0 , 99-114

Holm, L., Saraste, M., and Wikstrom, M. (1987) EMBO J. 6 , 2819-2823 Hon-nami, K., and Oshima, T. (1984) Biochemistry 23,454-460 Ishizuka, M., Machida, K., Shimada, S., Mogi, A,, Tsuchiya, T., Ohmori, T.,

Souma, Y., Gonda, M., and Sone, N. (1990) J. Biochem. (Tokyo) 108,866- 873

J.". (1989) Nature 3 4 1 , 660-662

Res. 18 , 3771-3776

Itaya, M., and Kondo, K. (1991) Nucleic Acids Res. 19,4443-4449 Kagawa, Y., Nojima, H., Nukiwa, N., Ishizuka, M., Nakajima, T., Yasuhara,

Koyama, Y., and Furukawa, K. (1990) J. Bacteriol. 172,3490-3495 Kyte, J., and Doolittle, R. F. (1982) J. Mol. Biol. 157 , 105-132 Lang, B. F. (1984) EMBOJ. 3,2129-2136 Lauer, G., Rudd, E. A., McKay, D. L., Ally, A., Ally, D., and Backman, K. C.

Lemieux, L. J., Calhoun, M. W., Thomas, J. W., Ingledew, W. J., and Gennis,

Lubben, M., Kolmerer, B., and Saraste, M. (1992) EMBO J. 11,805-812 Ludwig, B. (1987) FEMS Microbiol. Reu. 46,41-56 Lundeen, M. (1986) Znorg. Chem. 25,4852-4856 Mahendran, R., Spottswood, M. R., and Miller, D. L. (1991) Nature 349,434-

Malatesta, F., and Capaldi, R. A. (1982) Biochem. Biophys. Res. Commun. 1 0 9 ,

Malmstrom, B. G. (1990) Chem. Reu. 90,1247-1260 Martin, C. T., Scholes, C. P., and Chan, S. I. (1985) J. Biol. Chem. 260,2857-

Mather, M. W. (1988) BioTechniques 6,444-447 Mather, M. W., Springer, P., and Fee, J. A. (1990) in 41st Mosbach Colloquium

(Hauska, G., and Thauer, R., eds) pp. 94-104, Springer-Verlag, Mosbach,

Mather, M. W., Springer, P., and Fee, J. A. (1991) J. Biol. Chem. 2 6 6 , 5025- Germany

5035 Minagawa, J., Mogi, T., Gennis, R. B., and Anraku, Y. (1992) J. Biol. Chem.

267,2096-2104 Nureki, O., Muramatsu, T., Suzuki, K., Kohda, D., Matsuzawa, H., Ohta, T.,

Miyazawa, T., and Yokoyama, S. (1991) J. Biol. Chem. 266,3268-3277 Oshima, T., and Imahora, K. (1974) Int. J. Syst. Bacteriol. 24,102-112 Palmer, G. (1987) Pure Appl. Chern. 69,749-758 Peisach, J. (1978) in Frontiers of Biological Energetics. Volume IZ: Electrons to

T., Tanaka, T., and Oshima, T. (1984) J. Biol. Chem. 259,2956-2960

(1991) J. Bacteriol. 173,5047-5053

R. B. (1992) J. Biol. Chem. 267,2105-2113

438

1180-1185

2861

Tissues (Dutton, P. L., Leich, J. S., and Scarpa, A,, eds) pp. 873-881, Academic Press, New York

Pel, H. J., Tzagaloff, A,, and Grivell, L. A. (1992) Curr. Genet. 2 1 , 139-146 Platt, T. (1986) Annu. Rev. Biochem. 65,339-372 Puettner, I., Carafoli, E., and Malatesta, F. (1985) J. Biol. Chem. 2 6 0 , 3719-

Raitio, M., Jalli, T., and Saraste, M. (1987) EMBO J. 6 , 2825-2833 Rees, D. C., Komiya, H., Yeates, T. O., Allen, J. P., and Feher, G. (1989) Annu.

Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proc. Natl. Acad. Sci. U. S. A.

Saraste, M. (1990) Quart. Reu. Biophys. 23,331-366 Saraste, M., Metso, T., Nakari, T., Jalli, T., Lauraeus, M., and van der Oost, J.

(1991) Eur. J. Biochem. 195,517-525 Sato, S., Nakada, Y., Kanaya, S., and Tanaka, T. (1988) Biochim. Biophys.

Acta 950,303-312 Seidler, L., Peter, M., Meissner, F., and Sprinzl, M. (1987) Nucleic Acids Res.

15,9263-9277 Shapleigh, J. P., Hosler, J. P., Tecklenbur , M. M. J., Kim, Y., Babcock, G. T.,

Gennis, R. B., and Ferguson-Miller, S. f1992) Proc. Natl. Acad. Sci. U. S. A.

Shaw, J. M., Feagin J. E. Stuart K. and Sim son, L. (1988) Cell 53,401-411 89,4786-4790

Steffens, G. J., and Bus;, G. (lb79j Hoppe-Lfqler's Z . Physiol. Chem. 360 ,

Stevens, T., and Chan, S. I. (1981) J. Biol. Chem. 2 6 6 , 1069-1071 Surerus, K. K., Oertling, W. A,, Fan, C., Gurbiel, R. J., Einarsdbttir, 6.,

Antholine, W. E., Dyer, R. B., Hoffman, B. M., Woodruff, W. H., and Fee, J. A. (1992) Proc. Natl. Acad. Sci. U. S. A. 89,3195-3199

3723

Reu. Biochem. 58,607-633

74,5463-5467

613-619

Thony-Meyer, L., Stax, D., and Hennecke, H. (1989) Cell 67,683-697 Vahrenholz, C., Pratje, E., Michaelis, G., and Dujon, B. (1985) Mol. & Gen.

van der Spek, H. Speller D., Arts, G.-J., Van den Burg, J., Van Steeg, H., Genet. 2 0 1 , 213-224..

Vieira, J., and Messing, J. (1982) Gene (Amst.) 19,259-268 Sloof, P., and Benne, R.'(1990) EMBO J. 9,257-262

Wikstrom, M., Saraste, M., and Penttila, T. (1985) in The Enzymes ofBiolo ical Membranes (Martonosi, A. N., ed) pp. 111-148, Plenum Press, New Yo$

Xu, J., Seki, M., Denda, K., and Yoshida, M. (1991) Biochem. Biophys. Res. Commun. 1 7 6 , 1313-1318

Yakhnin. A. V.. Vorozhevkina. D. P.. and Matvienko. N. I. (1990) Nucleic Acids " . , Res. 16 , 3659-3660

Res. (India) 39 , 796-801

Natl. Acad. Sci. U. S. A. 84,6438-6442

J. A. (1984) J. Biol. Chem. 269,112-123

. .

Yasunobu, K. T., Tanaka, M., Wei, Y.-H., and King, T. E. (1980) J. Sci. Znd.

Yeates, T. O., Komiya, H., Rees, D. C., Allen, J. P., and Feher, G. (1987) Proc.

Yoshida, T., Lorence, R. M., Choc, M. G., Tarr, G. E., Findling, K. L., and Fee,

Continued on next page.

5405 Cytochrome caa3 Subunits Supplementary Material to

Nucleotide Sequence of the Fused Gene and Analysis of the Deduced Primary Structures for Cytochrome Oxidase Genes from Thermus rhermophilus.

Subumts 1 and 111 of Cytochrome caua)

Michael W. Mather. Penelope Springer, Sieghard Hensel, Gerhard Buse, and lames A. Fee

Experimental Procedures Kemmererctol.(1989)PlontMol.Biol. 13, 121 -124. [32] Sendacrol.(1991)Curr.Gencl. 19,

Materials - Restfiction endonucleases, terminal deoxynucleotidyl transferase. and T4 DNA ligase 175 - 181. (331 Boererol. (1986) Nucl. Acids Res. 14, 7506 - 7507. I341 Cummings el aT (1989) Curr. Genet. 16.381 - 406. 1351 Cummings et a / . (1990) Curr. Gener. 17, 375 .402. [361

were obtained from New England Biolabs, Pharmacia LKB, Bethesda Research Laboratory or U. S . Biochemicals. Modified 'I7 DNA Polymerase (Sequenase) was purchased from U. S.

Burgeretol. (1982) EMBO J . 1, 1385 - 1391. [371 de Jonge etol. (1983) Curr. Gener. 7.21 - 28.

Biochemicals. An exonuclease Ill deletion kit (Erase-=-Base) was obtained from Promega. [38] Browning et 01. (1982) I . Biol. Chcm. 257. 5253 - 5256. [39] Waring et nl. (1984) EMBO J .

Ultrapu~lelectropho~sis grades of tris(hydroxymethyl)aminomethane. acrylamide. and urea were 3,2121 - 2128. 1401 Netzkererol. (1982)Nucleic Acids Res. 10,4783 - 4794. [411 Bonitz el a / . (1980)J.B~ol .Chem. 255,11927-11941. [42] Thalenfelderal.(1980)J.Biol.Chem. 255.6173

used in the preparation of sequencing gels. Spectrophotometric grade formamide (Aldrich) was deionized by treatment with Amberlite M B - I (ICN Biomedicals)

-6180. (431 Hardyetal.(1991)Curr.Genet. 20.99-114. 1 4 4 1 Lang(1984)EMBOJ. 3.2129- 2136. [45] Trinkl er 01. (1989) Nucleic Acids Res. 17. 10104. L461 Phelps ef 01. (1988) Cur. Genetics 14.401 - 403. I471 de la Cruz el 01. (1984) 1. B i d Chcm. 259. I5136 - 15147. 1481

Cloning of rhe gene for subunits I and Ill - T. rhermophilus was grown and total DNA prepared as Shaw et 01. (1988) Cell 53,401 - 411. [49] Sloof et 01. (1987) Nucleic Acids Res. 15.51 .65. described (Mather and Fee. 1990). DNA fragments containing all or pan of the subunit lnll gene, were cloned into pUC8 (Vieira and Messing, 1982) by extraction of chromosomal DNA restriction

1501 Raitio e: a / . (1990) FEES L ~ I I U S 261,431 - 435. [51] Saraste el 01. (1986) FEES Lerters

fragments from a drsed agarose gel using the strategy and procedures described (Mather. 1988; Fee 206. 154 - 156. (521 Raitio el 01. (1987) EMBO J . 6.2825 - 2833. [531 Shapleigh el a/ . (1992) Mol. Microbiol. 6,635 - 642. 1541 Bot1 el nl. (1990) Mol. Microbiol. 4.2147 - 2157. I551 Gabel

cr 01.. 1988). et 01. (1990) Nucl. Acids Res. 18, 6143. [56] Saraste et 01. (1991) Eur. J. Biochem. 195. 517 . 525. [571 Ishiruka et a/. (1990) 1. Blochem. (To&yo) 108, 866 - 873. [ 5 8 ] Chepuri et 01. (1990) J . Bid . Chem. 265. I I185 - I 1 192. 1591 Denda el 01. (1991) Biochem. Biophys. Res. Commun. 181. 316 - 322. [601 Lllbben er 01. (1992) EMBO J . 11. 805 - 812.

DNA Sequencing -Overlapping subclones were generated for sequencing both DNA strands (see Fig. I ) by subclonmg restriction enzyme fragments and by unidirectional digestion with exonuclease Ill according to the procedure of Henikoff (Henikoff, 1984). Nucleotide sequences were determined using dideoxy chain terminating sequencing reactions (Sanger et 01.. 1977) employing modified T7 DNA polymerase (Sequenase) according to the manufacturer's mstructions with modifications as described previously ((Mather et 01.. 1991) to lmprove the results obtamed with high guanine-cytosine DNA. In addition. the sequence obscured by some strong polymerase terminations was resolved by repeating the reactions with a terminal deorynucleotidyl transferase chase step (Fawcett and Banlett. 1990).

Anolysm of Sequence doto - DNA and protein sequences were analyzed usmg software from the Universlty of Wisconsin Genetics Computer Group (UWGCG package) (Devereur et ai.. 1984) I' $ +p"t ,,? f *+ and additional programs written by one of the authors (MWM). Procedures used to construct

profiles (Kyte and Doolittle, 1982; Degli Esposti el 01.. 1990) and vanability analysis (Rees era/. . sequence alignments (Gribskov er 01.. 1987: Gnbskov el ai.. 1988; Doollttle. 1986) hydropathy

1989) were implemented as described prev~ously (Mather er a / . , 1991). The alignments shown in Figs. 4 and 5 were made by comparmg the sequences of

cytochrome oxidase subunits 1 and I l l that were known 10 us by December. 1991. The sequences were obtalned from GenBank. the Protein Identification Resource (PIR). or directly from the primary publication. Reference to the literature is made as follows: Specte: Subunn(s). Asscesa~on f - "

-,? e*' - 1 " ". " " " " " -" " t+ -"

"" " , * ,.y' ,e> -<-I

+ + ""

9 e' ++ I

~ \ \~ \ \~ \ \ \ \ \ \U \~ \ \ \~ \ \ \ \ \ \ \~ \ \~ \~ \ \~

I I I kb

4 - 4 4 4

+ " " + " Ct c c * c c

c t c c - c

Number(s) Reference(s). Homo soprenr; I and Ill; '400662; 111. Bas lawus; I and Ill; V00662: 121. BnloenDprerophysolus; I and I l l ; X61 145; [3]. R a m s norveglcus; I and 111: 101435, X14848; 141. 151. Mus musculus; I and Ill; V0071 I ; [6]. Gollus ga/lus; 1 and 111; X52392 171. Xenopus laevrs: I and 111; M10217; [X]. Cyprinus carpio; I and 111: X61010, X17006, [91. Salmo galrdnen; 111; not available; [ I O ] . Oncorhynchus nerka: 111; not available; [IO]. Drorophllo melanogosrer; I and 111; 1014C4; [ I l l . Drosophila yokubo; X03240 [ I Z ] . 1131. Apis melll/ero: I : M23409; 1141. Locum mrgmrorio: 111; X13975; [15]. Coenorhabdms elegons; I and 111; X54254 [161. Ascaris s u m ; I

purpurarus; I and 111: X12631: [ I X ] . Pdsoster ochraceus; I and I l l ; X55514: 1191. Trlricum and 111; X54254; [16]. Porocentrorus Itvidus; I and I l l ; JO4815; 1171. Strongylocenlrorus

oesrivum; I; Yw417; [ZO] ; 111; X52539; [21.22]. Oryza S U ~ I M ; I; X15990 [23]. Viciofobo; 111; Not available; [24]. Zeo mays; I; X02660 [ 2 5 ] ; 111: X12728: (261. Sorghum btcolor Milo: I; M14453; [27]. Glycine mar; I and 111; M16884. XI5131; [28]. 1291. Oenothero berreriana: 1 and

Chlamydomonas reinhardtrt; I; XC4381; 1331. Podospora onsenno; I and I l l ; M61734; 134,351. 111; X05465, XC4764; [30]. Plsum mtivum; I; X14409; [311 Beta vulgaris; I; M57645': [321.

Neorosporo cmsso; I and 111; M36958, XO1850.101430: [361. 1371. 1381. Aspergillus nidulonr; I and Ill; X00790, X069M). 1391. [401. Saccharomyces cerevlsioe; I and 111; 101481, J01478; 1411. [42]. Kluyveromyces loctis; I; X57546; 1431. Schizosaccharomgces pomhe: I and 111: X54421,

M10126*; 1471. [481. Cnthidia/osciculota; 111; X05063*: [491. (481. Porococcus denrtnficons; I X16868; 1441. [45]. Schimphyllum commune; I l l ; M36270 1461. Leishmania torenlolne; Ill;

Brodyrhizobium jopomcum; I; X54800. X54318; 1541, 1551. Bocrllur subnlis; I and 111: X54140 and 111; Y05733. XO5828; [50l. [SI]. (521. Rhodobacrer sphoerordes; I ; X62645: [531.

this work. In addition. subunit I and Ill sequences from the ubiquinol oxidase of Ercherichio colt 1561. Bacillus PS3; I and I l l , not available; (571. Thermus rhermophilus HBX; I and 111; M84341;

I581 and oxidases from Paracoccus denitrtficons (subunit la) [521, Hnlohocrerrum halohum (subunit 1)[591, and Sulfolobus ocidocoldortus (subunit I) 1601 were aligned but not included in other analyses presented herein (cf Table 111). The starred accession numbers mdicate that the sequences obtained from GenBank were corrected as detailed in the accompanying reference(s). See Fig. 4 regarding corrections to exon boundaries for S. cerevisioe and K . l ams subunits 1. References to the sequences follow: [ I ] Anderson et a/. (1981) Narure 290, 458 . 460. 121

356 - 368. [41 Gadaleta el a/. (1989) 1. Mol. Evol. 28. 497 - 516. I51 Grasskopf et nl. (1981) Anderson el a/. (1982) 1. Mol. Biol. 156. 683 - 717. 131 Amason el a/. (1991) J. Mol. Evoi. 33,

Curr. Genet. 4, 151 - 158. 161 Blbberal . (1981)Cell 26. 167 - 180. 171 Deslardins ernl. (1990) J. Mol. B i d . 212,599 - 634. [ X ] Roe et al. (1985) J . Biol. Chem. 260, 9759 - 9774. 191 Chang el

al.(1991)SubmirtedroGenSonkin1991 [IO] Thomaseta/.(1989)J.MOI.Evol. 29,233-245.

Res. 10, 6619 - 6637. [I31 Clary etnl . (1985) 1. Mol. Evolutton. 22. 252 - 271. [I41 Crozieret [ I I ] de Bruijn (1983) Narure (LondonJ 304. 234 - 240. [I21 Clary et 01. (1982) Nuclezc Acids

a / . (1989) Mol. and Biol. Evol. 6, 399 - 41 1. [ 151 Haucke er a/ . (1988) Curr. Genet. 14, 471 -

J. BIOI. Chem. 264. 10965 - 10975. [I81 Jacobs et 01. (1988) J . Mol. B i d 201, 185 - 217. [I91 476. [I61 Okimotoefal.(1990)Nucle~cAcidsRes. 18.6113-6118. [I71 Cantatoreerd(1989)

Smith et a/. (1990) J. Mol. Evolution 31, 195 - 204. 1201 Bonen et 01. (1987) Nuclejc Acids Res. 15, 6734. 1211 Gualbeno et a/. (1990) Nucl. Acids Res. 18. 3771 - 3776. [22] Gualbeno el 01. (1990) Curr. Generics 17, 41 - 47. [23] Kadowaki et 01. (1989) Nuclezc Acids Res. 17, 7519. [24] Macfarlane CI 01. (1990) Cur. Generics 17. 33 - 40. 1251 Isaac et 01. (1985) EMBO J . 4. 1617 - 1623. [261 McCany el a/. (1988) Nuclerc Acids Res. 16.9873. 127) Bailey-Serres e t a / .

(1989) Plont Moi. B i d 13, 595 - 597. [30] H i e d et a/. (1987) EMBO J . 6. 29. 34. [31] (1986) Cell 47,567 - 576. [28] Grabau (1986) PlontMol. B d 7,377 - 384. 1291 Grabau er 01.

Fig. 1. Physical location of and cloning and sequencing strategies for the T. rhermophilus cytochrome caa) subunit lilll gene. The upper bar shows the relationship of the cloned Bcl I . Hind 111. and Pst I genomic DNA fragments and the cytochrome cna) subunit genes. The striped area marked c o d indicates the gene for subunit IIc (Mather er 01.. 1991) and that marked caaB indicates the gene for subunits I and 111. Sequence upstream from caoA. designated orfl? , appears to be the 3'-ponion of a gene encoding a protein homologous to the predicted P . denirrificonr orfl gene product (Raitio et 01.. 1987) and the B. subtilis CtaB protein (Saraste et 01.. 1991) (M. Saraste, personal communicat~on). The location of the DNA site which hybridizes with the oligonucleotide probe i s marked with * (see also Fig. 2). T h e lower bar

obtained from rertnction fragments subcloned into M I 3 or phagemid vector^ i s designated by and arrows show the sequencing strategy for the eooB gene. The extent of sequence data

I+, from fragments generated by tuned exonuclease 111 digestion IS designated by +, and from sequencing reactions primed with internal oligonucleotide primers is designated by H, Due to the high GC content of Thermur DNA, each arrow indicates only the extent of high quality sequence data confirmed by the use of inosine andlor 7-deazaguanosine in the sequencing reactions to remove compressmns: additional data of lower confidence is available for most subclones. further verifying the indicated overlaps.

5406 Cytochrome caa3 Subunits

GCC CTG CCC AAG TTC TAA GGGAGGTGCGGG ATG GCG ATC ACG GCA SD

A L P K F . - M A I T A

AAG CCG AAA GCG GGC GTT TGG GCG GTC CTT TGG GAC CTG CTC ACT K P K A G V W A V L W D L L T

ACG GTG GAC CAC AAG AAG ATC GGC CTC ATG TAC ACG GCC ACG GCC T V D H K K I G L M V T A T A

TTC TTC GCC TTC GCC CTG GCG GGG GTC TTC TCC CTC CTC ATC CGC F F A F A L A G V F S L L I R

ACC CAG CTG GCC GTG CCC AAC AAC CAG TTC CTC ACC GGG GAG CAG T O L A V P N N O F L T G E O

TIC AAC CAG ATC CTC ACC CTG CAC G G G GCC ACC ATG CTC TTC TTC V N O I L T L H G A T H L F F

TTC ATC ATC CAG GCC GGG CTC ACC GGC TTC GGT AAC TTC GTG GTG F I I O A G L T G F G N F V V

CCC CTG ATG CTG G G G GCG CGG GAC GTG GCC CTC CCC CGG GTG AAC L M L -5 _A_ -R- -D- -v- -A- - L _ - p _ _ v _ _ N _

GCC TTC AGC TAC TGG GCC TTC TTG GGG GCC ATC GTC CTC GCC CTC A F S V W A F L G A I V L A L

ATG AGC TAC TTC TTC CCC GGC GGC GCC CCC AGC GTA GGC TGG ACC M -S- V F F P G G A _ P _ _ S _ _ V _ G -W_ _ T _

TTC TAC TAC CCC TTC TCC GCC CAG TCG GAA AGC GGG GTG GAC TTC F V V P F S A 0 S E -S- -G- -V- _D- _ F _

TAC CTG GCG GCC ATC CTC CTT CTG GGC TTC TCC AGC CTC CTT GGT

""""""""

AAC GCC AAC TTC GTG GCC ACC ATC TAC AAC CTC CGG GCC CAG GGG N A N F V A T I V N L R A O G

ATG AGC CTC TGG AAG ATG CCC ATC TIC GTC TGG AGC GTC TTC GCC M S L W K M P I V V W S V F A

GCC AGC GTC CTC AAC CTC TTC TCC CTG GCG GGG CTC ACC GCA GCC A S V L N L F S L A G L T A A

ACC CTC CTG GTG CTT CTG GAG CGG AAG ATC GGC CTC TCC TGG TTC T L L V L L E R K I G L S W F

AAC CCG GCC GTG GGC GGG GAC CCC GTT CTC TTC CAG CAG TTC TTC N P A V G G D P V L F O O F F

TGG TTC TAC TCC CAC CCC ACG GTC TAC GTG ATG CTC CTC CCC TAC W F V S H P T V V V M L L P V

CTC GGC ATC CTC GCC GAG GTG GCC TCC ACC TTC GCC CGG AAG CCC L G I L A E V A S T F A R K P

CTC TTC GGC TIC CGG CAG ATG GTC TGG GCC CAG ATG GGG ATC GTG

_ v _ _ L _ -A- -A- _ I _ _ L _ _ L _ _ L - G -F- -s- s - L _ _ L _ -G-

""

L F G V R ~ M V W A ~ M G I V

GTC CTG GGG ACC ATG GTC TGG GCC CAC CAC ATG TTC ACC GTG GGC V L G T M V W A H H M F T V G

GAG TCC ACC CTC TTC CAG ATC GCC TTC GCC TTC TTC ACC GCC CTC E S T L F O I A F A F F T A L

ATC GCC GTG CCC ACG GGG GTC AAG CTC TTC AAC ATC ATC GGC ACC I A V P T G V K L F N I I G T

CTC TGG GGC GGG AAG CTG CAG ATG AAG ACC CCC CTC TAC TGG GTT L W G G K L O M K T P L V W V

TTG GGC TTC ATC TTC AAC TTC CTC CTC GGG GGG ATC ACC G G G GTC L G F I F N F L L G G I T G V

-

ATG CTC TCC ATG ACC CCC CTG GAC TAC CAG TTC CAC GAC TCC TAC M L S M T P L D V O F H D S V

TTC GTG GTG GCC CAC TTC CAC AAC GTC CTC ATG GCG GGC TCC GGC F V V A H F H N V L M A G S G

4 5 5

2 0 9 0

1 3 5 3 5

1 8 0 5 0

2 2 5 6 5

2 7 0 8 0

3 1 5 9 5

3 6 0 1 1 0

4 0 5 1 2 5

4 5 0 1 4 0

4 9 5 1 5 5

5 4 0 1 7 0

5 8 5 1 8 5

6 3 0 2 0 0

6 7 5 2 1 5

7 2 0 2 3 0

7 6 5 2 4 5

8 1 0 2 6 0

8 5 5 2 7 5

2 9 0 9 0 0

9 4 5 3 0 5

3 2 0 9 9 0

1 0 3 5 3 3 5

1 0 8 0 3 5 0

1 1 2 5 3 6 5

1 1 7 0 3 8 0

1 2 1 5 3 9 5

Fig. 2. Nucleotide sequence of the T. lhrrmophilur gene encoding subunit I a n d subunit 111 of cytochrome coaf and the deduced amino acid sequence of the possible precursor form of the proteins. The amino acid sequence is given below the nucleotide sequence in the single lelter code, numbered from the initial methionine of the putative precursor. 7 h e last 5 codons of the subunit IIc gene (Mather el ai . . 1991) and twelve apparently

here equals position 1186 of the sequence previously reponed (Mather el ai.. 1991)). A putalive non-tranac"kd bases are shown preceding the gene (nucleotide position I of the sequence shown

nbosome binding site preceding the gene is underlined m !he nucleotide sequence and labeled SD. Peptide sequencespreviausly obtained for cyanogen bromide fragments (Buse eroi., 1989) are

peptide sequences are indicated by an intermpion of the underline. The single discrepancy underlined in the amino acid sequence; residues that were present but indeterminate in the

TTC GGG GCC TTC GCC GGC CTT TIC TAC TGG TGG CCC AAG ATG ACG 1 2 6 0 F G A F A G L Y V W W P K M T I l O

GGC CGG ATG TAC GAC GAG AGG CTG GGC CGG CTC CAC TTC TGG CTC 1 3 0 5 G R M V D E R L G R L H F W L 4 2 5

TTC CTC GTG GGC TAC CTC CTC ACC TTC CTG CCC CAG TIC GCC CTG 1 3 5 0 F L V G V L L T F L P O V A L I P O

GGC TAC CTG GGG ATG CCC CGG CGC TIC TIC ACC TAC AAC GCC GAC 1 3 9 5

G V L G M P R R V V T V N A D 4 5 5

ATC GCC GGC TGG CCC GAG CTC AAC CTC CTC TCC ACC ATC GGC GCC 1 4 4 0 I A G W P E ~ L L S T I G A 4 7 0

TAC ATC CTG GGC CTG GGC GGG CTG GTC TGG ATC TAC ACC ATG TGG 1 4 8 5 V I L G L G G L V W I V T M W 4 8 5

AAA AGC CTC CGC TCC CGC CCC AAG GCC CCC GAC AAC CCT TGG GGC 1 5 3 0 K S L R S G P K A P D N P W G S O O

(probe 3 ' - TAC GGG GCC GCC ATG ATG TGG ATG TTG CG - 5 ' )

GGT TAC ACC CTG GAG TGG CTC ACC GCC TCG CCT CCC AAG GCC CAC 1 5 7 5 G V T L E W L T A S P P K A H ~ I S

AAC TTT GAC GTG AAG CTT CCC ACC GAG TTC CCC TCC CAI AGG CCC 1 6 2 0 N F D V K L P T E F P S E R P 5 3 0

CTT TAC GAC TGG AAG AAG AAG GGG GTG GAG CTC AAG CCC GAG GAC 1 6 6 5 L V D W K K K G V E L K P E D 5 4 5

CCG GCC CAC ATC CAC CTG CCC AAC AGC TCC TTC TGG CCC TTC TAC 1 7 1 0 P A H I H L P N S S F W P F V 5 6 0

TCG GCA GCC ACC CTC TTC GCC TTC TTC GTG GCG GTG GCG GCC CTC 1 7 5 5 S A A T L F A F F V A V A A L 5 7 5

CCC GTG CCC AAC GTC TGG ATG TGG GTC TTC CTC GCC CTC TTC GCC 1 8 0 0 P V P N V W M W V F L A L F A 5 9 0

TAC GGC CTG GTG CGC TGG GCC CTG GAG GAC GAG TAC AGC CAC CCG 1 8 4 5 V G L V R W A L E D E V S H ~ ~ ~ ~

GTG GAG CAC CAC ACC GTC ACG GGC AAA TCC AAC GCC TGG ATG GGG 1 8 9 0 V E H H T V T G K S N A W M G 6 2 0

ATG GCC TGG TTC ATC GTT TCC GAG GTG GGC CTC TTC GCC ATC CTC 1 9 3 5 M A W F I V S E V G L F A I L 6 3 5

ATC GCG GGC TAC CTC TIC CTG CGC CTC TCC GGG GCG GCC ACG CCC 1 9 8 0 I A G Y L V L R L S G A A T P 6 5 0

CCT GAG GAA AGG CCC GCC CTG TGG CTT GCC CTC CTC AAC ACC TTC 2 0 2 5 P E E R P A L W L A L L N T F 6 6 5

L -

CTC CTG GTG AGC TCC TCC TTC ACC GTG CAC TTC GCC C A C C A C GAC 2 0 7 0 L L V S S S F T V H F A H H D 6 8 0

CTC CGC CGG GGC CGG TTC AAC CCC TTC CGC TTC G G G CTT CTC GTC 2 1 1 5 L R R G R F N P F R F G L L V 6 9 5

ACC ATC ATT CTC GGC GTC CTC TTC TTC CTG GTG CAG TCC TGG GAG 2 1 6 0 T I I L G V L F F L V O S W E 7 1 0

TTC TAC CAG TTC TAC CAC CAC TCC AGC TGG CAG GAG AAC CTC TGG 2 2 0 5 F V O F V H H S S W O E N L W 7 2 5

ACC GCG GCC TTC TTC ACC ATC GTG GGC CTC CAC GGC CTG CAC GTG 2 2 5 0 T A A F F T I V G L H G L H V 7 4 0

GTG ATC GGA GGC TTC GGC CTG ATC CTC GCT TAC C T C CAG GCC cTA 2 2 9 5 V I G G F G L I L A V L O A L 7 5 5

AGG GGC A A G A T C ACC CTC CAT AAC CAC GGC ACC CTC GAG GCC GCC 2 3 4 0 R G K I T L H N H G T L E A A 7 7 0

AGC ATG TIC TGG CAC CTG GTG G A C GCC GTC TGG CTG GTG ATC GTC 2 3 8 5 S M V W H L V D A V W L V I V 7 8 5

ACC ATC TTC TAC GTC TGG TAG TCAAGAAGGCCGCAAAGCCCCCCGCTTAG 2 4 3 5

T I F V V W * A 7 9 1

GCGGGGGGCTTTGCTATGAAAAGGGAAGGGATGCG 2 4 7 0

between the nucleotide sequence and the most C-terminal peptide sequence 1s marked with a ~

below the sequence (nucleotide position 1931). In addition. peptide number 6 in table 3 In (Buse d . , 1989) is actually a mixture of two peptides. and all amino acld residues reponed far this

(residues with a broken underline). The oligonucleotide probe used to identify reslriclian "peptide" can be recognized in the translated DNA sequence fallowing either Me198 or Met126

fragments from T. rhermophilus genomic DNA for cloning IS shown in parentheses below the complementaly p o n m of the sequence. An Inverted repeat sequence following the 3'-end of the gene (possible transcriptional termmtor) IS marked by arrows under the nucleotide sequence.

Cytochrome caa3 Subunits 5407

1.0 1 1 1 "~"""""""""""""""""~

1.0- "-""""""" - - - c

0 2.0 - $ 0.0

1.0 - " " "" "_ " - " """

0.0 -3 1000 2000

Position

statastic (Gribskov er ol., 19x4) for each ofthe three posslble forward reading frames is plotted; a

Fig. 3. Codon preference analysis of the eenB gene region. The codon preference

window of 21 bases was used. T h e positlons of rare codons (codons with a frequency of less than 0.1 among synonornous codons) are indicated by shon venlcal lines below each codon preference curve. The dashed lines mdicate the calculated codon preference Statistic for a random sequence o f the same composition as the codon frequency table. The codon frequency table for this analysls was constructed from the DNA sequences o f 21 T. rhermophilus genes. The solid bar k l o w the preference curve for the first readmg frame shows the extent of h e long open reading frame proposed to k the coaB gene.

TABLE I

Amino Acid Composilion of Subunil I of cyloehromno c a w

r h r r m o p h i l u r Subunit I are compared IO some possible compositions Prev ious ly repor ted compos i t ions f rom amino ac id analyses o f T .

deduced f r o m the DNA sequence. ".'* indicates not determined.

Amino acid .n.ly..sa Amino 5 3 3 - 6 0 7 - 6 5 3 - 7 9 1 -

DNA-daduc-d compositions

acid 1 0 1 % ~ mol%< molsd rasidu. r..idue reaidu. z.eidu. mol%< moll1 101%1 molsa

Gly 9 . 0 10.8 9 . 7 PT. - 5 . 6 6.3

9 . 6 8 . 7 8 . 9 8 . 6

c y * 0 . 2 0 . 9 5 . 3 5 . 8 5 . 7 4 . 9

Ala 10.0 9 . 1 1 0 . 0 0 . 0 0 . 0 0 0 0 . 0 9 . 6 10.0 1 0 . 3 9 . 7

11. 3 . 2 4 . 2 3.0 4 . 3 4 . 0 4 . 1 4 . 4 L.u 14.3 1 2 . 5 1 2 . 6 14.3 13.8 1 3 . 6 1 4 . 2 Y.t 3 . 0 2 . 1 2 . 9 3 . 8 3 . 5 3 . 5 3 . 0 V a l 7 . 4 7 . 0 6 . 4 6 . 9 7 . 4 7 . 4 7 . 7 Ph. 8 . 4 7 . 7 7 . 7 Trp 2 . 8

8 . 8 8 . 9 8 . 6 8 . 8 3.8 4 . 1 4 . 1 4 . 3 5 . 4 5 . 3 5 . 2 4 . 9

5.r 5 . 0 6.1 5 . 1 4 . 7 4 . 8 4 . 9 4 . 9 Thr 5 . 4 4 . 4 5 . 8 6 . 2 5 . 6 5 . 7 5 . 7 A q 3.0 3 . 1 3 . 5 2 . 6 2 . 5 2 . 5 2 . 8 Lye 2 . 8 2 . 4 2 . 9 Eis 2 . 8 3 . 0 3.1

2 . 8 3 . 1 3 . 1 2 . 7

AS" 1.9 2 . 1 2 . 3 3 . 2

Gln 3 . 2 3 . 1 3 . 1 3 . 0 2 . 8 2 . 5 2 . 3 2 . 4

Asp - Glu

2 . 3 2 . 3 2 . 1 2 . 0 1 . 9 2 . 5 2 . 1 2 . 7

As= 5 . 2 5 . 4 6.1 Glx 5 . 6 7 . 1 6 . 5

5 . 4 5 . 4 5 . 2 5 . 1 4 . 7 4 . 9 5 . 1 5 . 1

Tyr 4 . 8 4 . 5 4 . 4

'Tho.. .n.ly..s that 1.r. insemp1.t. have b..n r.c.lcu1lt.d after

arm mor. dir-ctly Comparlbl. to th. DNA-d.duc.d compositions. b T I O I Bus. et .1. ( 1 9 8 9 ) . conv.rsion to 101% basad on 4 9 8

inclumion of an .v.raq. valu. for trp 14.1 -1%) SO that th. Valu..

.atim.t.d r.*idu.s.

..timatad rasidu.. . 'TTrom Yoahid. at -1. ( 1 9 8 4 ) . conv.rsion to mol% basad on 5 7 4

dF=om E o n - n u l and Oahima ( 1 9 8 4 ) . data providmd mol). 'COzT..pond. to a "mitochondrial-type" aubunit I, deduced

fCorr.spond. to a "b.cillus-typ." subunit I, deduced Yr 6 8 , 2 0 0 , korr.spond* to an .xt.nd.d subunit 1, dadused Mr = 7 3 , 2 0 0 . 'Th. Put.tiV. *"bunit I + I11 pr.cur.or prot-in, d.duc.d

Yr = 5 9 , 7 0 0 .

** = 8 9 , 2 0 0 .

1

1 - - - " - - - - - , - - , :'It

WSmON

Fig. 6. Hydropathy and variabillty analyses 01 cytochrome 8 8 3 rubunltr I. In A , the

hydropathy plot lor the ponwn 01 the deduced Thermus sequence homologous to subuntts I is shown

as the lower curve, and the combined average hydropathy plot lor the other 36 oxldase SUbmlS I In the

alignment described I" Ftg. 4 is g'ven by the upper curve The abscissa. or position scale, corresponds

to the sequence pos~t~ons 01 the lull subunit i alignment (see Fig. 4). The ordinate. or hydropathy (HP)

Score Scale, lor the upper curve 1s on the rlght slde 01 thE Lgure. and that lor the lower curve on the len.

The curves were otlset by 0 5 m t s lor clarity. Hydropathy prollles (Kyle and Doo1~lt1e. 1982) were

calculated usmg the stat8stlcal AMP07 scale 01 degll ESPOSII, et al. (1990) with the recommended

window 01 7 residues. The average hydropathy plot (upper curve) was calculated by averagmg the

AMP07 hydrophoblc propenslty values for the up 10 36 amino aads at each posmn 01 the allgnmsnt

(not includmg the Thhermus sequence) and usmg the rssultmg average hydropathy value at each

POSitm In calculatlng the hydropathy prollle The calculat~on was interrupted when gaps were

encountered. and thls results ~n blank spaces the plot (lor this purpose. gaps are detmed as pos~tlans

In the ailgnment at Whlch fewer than halt the allgned sequences contribute a resldue. the gaps 8n the

plot are extended by one-halt wmdow length On each %de of each gap to avold including gap positlons

In the wlndow averaging) Approprlats spaces were also InSened In the Thermus hydropathy plot 01

subunit i aner calcuiat,on to malntaln alignment

In E , the vanab,llty at each pas~tton m the ailgnment, averaged over a wlndow 01 seven

residues, IS dlsplayed. Variabmy. as detined ~n Rees etal.. (1989). IS the number 01 dlllerent amino

acids that are found at each posmon 01 an allgnment. Blanks in the plot are the result 01 gaps m the

allgnment (see above) in C, the helical Index plot resunlng from the vartabdlity analysls 01 37 allgned cytochrome

oxldase subund I sequences IS presented. The lull alignment described on Flg. 4. The variabllhly Index

plot was calculated uslng a slight modlticatlon 01 the algorithm of Rees e: af. (1989) (usmg an a - h e l m

range 01 85' - 115" and a window of 19 restdues). Gaps m an indwfdual allgned sequence were

treated as mismatches an cornputmg the vanabiltty; gaps in the allgnment as a whole (see above) were

not Included In the CalCulatiOn. Peaks I" the plot lndcate Stretches 01 amino aclds I" the allgned

proteln sequences Whch may t o m m each 01 the homologous proteins. hellcal slruclures that have a

COmparatlvely MnSBNed lace and a more vanabie lace (Rees etal.1989). A vanabilny Index value 01

2 was suggested as the approximate divlslon between probable Surlace helices and mternal hellces or

statistual liuduations (Rees e1 a1.1989). Bars below the plot indicate the extent 01 the hydrophobic

regions predicted by comparatwe hydropathy analysts 01 the aligned SUbunltS I.

5408 Cytochrome caas Subunits

I 1

" - "" J

c

8 . !'I'; ............................. h / . . . . . . . . . . . . . \ . . . .

" - " - - 100 200

romo*

$00

FIQ 7 Hydropathy and variability analyses 01 cytochrome aa3subunits 111. In A , the

hydropathy plot lor the ponlon 01 the deduced Jhermus sequence homologous lo SubUnltS 111 IS Shown

as the lower CuWB, and the comblned average hydropathy plot lor 33 Other Oxidase wbunlts 111 15 given

by the upper curve The abscissa. or posttion scale, corresponds to the sequence pas~mns 01 the

allgnment In Flg 5 Th0 ordinate, or hydropathy (HP) score scale for the upper curve 15 on the rlght

sde 01 Ihe tlgurs: and that lor the lower curve on the len. The CUNBS were ott6et by 0 5 units tor clarlty

Hydropathy proflles (Kyte and Doolittle. 1982) were calculated using the rtat8stical AMP07 scale 01

degll Espost1, et a i (1990) Wlth a window 01 7 restdues The average hydropathy plot (upper curve)

was calculated by averaglng the AMP07 hydrophablc prOpenSlIy values lor the up 10 33 amino acids at

each posltlon and usmg the resultlng average hydropathy value at each posll~on ln calculating the

hydropathy proflle. The Calculation was interrupted when gaps were encountered, and thls results ~n

blank spaces 10 the plot (lor thls purpose, gaps are dellned as posdtlons ~n the allgnment at whch fewer

than hall the allgned sequences contribute a restdue. the gaps In the plot are extended by one-halt

wlndow length on each side 01 each gap to wold mcluding gap pos~tlons in the wlndow ~ e r a g l n g )

Appropriate spaces were also lnsened the JhermuS Subunll 111 hydropathy plot aner Calculation to

mamtain allgnmenl

In E , the Vartabllhty at each pos8tlon I" the alignment. averaged Over a wlndow 01 seven

residues. IS displayed. Varlablllty. as dellned Rees et al.. (1989). IS the number 01 ddlerent amino

acids that are found at each position 01 an alignment Blanks 10 the plot are the result 01 gaps ~n the

allgnment (see above)

In C , the heIICa1 Index plot re5uIllnQ from the varlabmty analysis 01 36 aligned cylachrome

oxldars subunit 111 sequences 15 presented The lull allgnment described In Flg. 5. The varlabmty

index plot Was calculated using a slight mod~l~camn 01 the algor8thm 01 ReeS stat (1989) (uslng an a-

hellcal range 01 85' - 115' and a wlndow 01 19 rwdues) Gaps m an lndlvldual allgned sequence

were treated as mismatches ~n computing the varlablllty; gaps in the allgnment as a whole (see above)

were not lncluded ln the Calculat~on. Peaks 8n the plot mdcate stretches 01 amlno m d s ~n the allgned

prolein sequences whch may lorm, each 01 the homologous protems. hellcal s1ruclures that have a

camparatlvely consewed lace and a more vanable lace (Rees sta1.1989). A var#ab&l#ty index value of

2 was suggested as the approrlmate dlvlsion between probable Surface helices and lnternai helices or

statistlcai flUCfUatlOns (Rees el ai.1989). Bars below the plot lndlcate the extent 01 the hydrophoblc

regions predlcled by camparatlve hydropathy analysls 01 the allgned SUbUnltS 111

B

3

Flg. 9. Hell~al wheel diagrams dlsplaylng the varlabllhes 01 each 01 the putative transmembrane

hydrophoblc segments as I f viewed from the top 01 a helix are shown. A roman numeral in the Center 01

each wheel identifies each putatwe heIIX. "Top" is delmed as belng toward the outer (poslt~ve) sde 01

the membrane ~n whlch the proteln 1s embedded The numbers in the c~rdes surround8ng the wheel

are the vanabllitles at each corresponding sequence position. The fwst posttion IS at the top (0") 01 the

wheel and bars connect SUCCBSSIY~ c~rcles ~n the wheels and are flllad in wlth darker Shadlng as they

progress down through the hellx (each succeeding restdue IS spaced from Its predecessor by 100'. as

in an ideal a-helix). The smallar lenen lndlcate the amino acld found at that posltion in the Thermus

sequence; the pasit1on number 1s given lor the N-termma1 restdue ~n each putatwe helix (note that the

N-termlnal resodue 1s on the OppOSBe sde of the helix lor even- and odd-numbered segments). The

last three c~dcles are onset away from the helix center. as they would ofhewise be obscured below the

f m t three clrcles. Solad arcs coverlng reglons wlth a relatively hlgh vatiability suggest posslble surface

expased pmons 01 the heIICeS: hatched arcs cover reglans wlth a more ambiguous vanabillty panern:

regions w!th a relatively low varlabhty are uncovered.