11
Vol. 173, No. 11 DNA Sequences of Three 3-1,4-Endoglucanase Genes from Thermomonospora fusca GUIFANG LAO, GURDEV S. GHANGAS, ELYSE D. JUNG, AND DAVID B. WILSON* Section of Biochemistry, Molecular and Cell Biology, Cornell University, Ithaca, New York 14853 Received 26 December 1990/Accepted 21 March 1991 The DNA sequences of the Thermomonosporafusca genes encoding cellulases E2 and E5 and the N-terminal end of E4 were determined. Each sequence contains an identical 14-bp inverted repeat upstream of the initiation codon. There were no significant homologies between the coding regions of the three genes. The E2 gene is 73% identical to the celA gene from Microbispora bispora, but this was the only homology found with other cellulase genes. E2 belongs to a family of cellulases that includes celA from M. bispora, cenA from Cellulomonasfimi, casA from an alkalophilic Streptomyces strain, and cellobiohydrolase II from Trichoderma reesei. E4 shows 44% identity to an avocado cellulase, while E5 belongs to the Bacillus cellulase family. There were strong similarities between the amino acid sequences of the E2 and E5 cellulose binding domains, and these regions also showed homology with C. fimi and Pseudomonas fluorescens cellulose binding domains. An important step toward understanding the mechanism of action of an enzyme is the determination of its amino acid sequence. In recent years, this usually has been done by determining the DNA sequence of the structural gene that encodes the protein, as DNA sequencing is simpler and more precise than protein sequencing. The sequences of a number of cellulase genes have been determined, and this work has been reviewed by Beguin et al. (1). We have been studying the cellulases of a thermophilic, filamentous soil bacterium, Thermomonospora fusca, and have purified five antigenically distinct cellulases, designated E1 to E5, from the culture supernatant of an extracellular- protease-negative mutant of T. fusca (34). All five enzymes are P-1,4-endoglucanases, but they show considerable vari- ation in their specific activities on several substrates and in their physical properties. The enzymes from T. fusca are heat stable and active over a broad pH range with an optimum centered at pH 6.5. While no complex formation between the cellulases has been seen, enzyme E3 acts synergistically with E2 and E5. Evidence for coordinate regulation (20, 22) and the recent identification of a regula- tory protein that binds near the translational start of the E5 gene (21) show that there are probably regulatory sequences present in each gene. As part of our study of enzymatic mechanisms of cellulose degradation, we determined the DNA sequences of the structural genes encoding three (E2, E4, E5) of the five purified T. fusca cellulases. Comparisons of the amino acid sequences of these cellulases with each other and with other cellulases yielded information about the similarities and differences among cellulases. Such comparisons may pro- vide insight into the catalytic and regulatory mechanisms of these enzymes. MATERIALS AND METHODS Bacterial strains and plasmids. The host strain for all transformations and transfections was Escherichia coli JM101 (rK' mK' supE thi A(/ac-proAB) [F' traD36 proAB lacl"ZAM15]) (36), except for the subcloning of the E4 gene, for which E. coli HB101 (F- hsdS20 [rB- mB-] supE44 * Corresponding author. ara-14 ga/K2 lacYl proA2 rspL20 xyl-5 mtl-l recA13) was used. The cellulase genes were cloned from T. fiusca YX, acquired from Dexter Bellamy, Cornell University (3). The gene for E2 was subcloned from plasmid pGG91 (10), that for E4 was subcloned from plasmid pGG76 (see below), and that for EB was subcloned from plasmid pD318 (5). Cloning of the E4 gene. The E4 gene was isolated from a T. fusca DNA library constructed in Streptomyces lividans by using plasmid pIJ702 (18) as described previously (10). A cellulase-excreting S. /ividans clone was identified by a carboxymethyl cellulose overlay procedure (34). This strain contained a plasmid (pGG75) that consists of a 1.8-kb BglII fragment of T. fusca DNA inserted into the BglII site of plasmid pIJ702. This plasmid was introduced into Strepto- myces albus J1074 by protoplast transformation (10). The BglII insert from pGG75 was subcloned into the BamHI site of pBR322 (pGG76) and pBR327 (pGG77), and these plas- mids were introduced into E. coli HB101 by transformation; both classes of transformants were cellulase positive. Subcloning of cellulase genes. The genes for E2, E4, and E5 were subcloned into M13mpl8 and M13mpl9 bacteriophage vectors (Bethesda Research Laboratories), allowing se- quencing in both orientations. Transformation of JM101 and preparation of single-stranded DNA were performed by methods described in the M13 CloninglDideoxy Sequencing Instruction Manual from Bethesda Research Laboratories. Restriction maps of the subclones are shown in Fig. 1. Isolation and purification of E4 protein. The E4 gene product was isolated from S. albus J1074 containing pGG75, as this strain produced higher levels of E4 than did S. /ividans transformants. Mycelia harvested from a 15-ml culture were transferred to 1.5 liters of tryptone soya broth containing 10 pLg of thiostrepton per ml and 300 Vig of phenylmethylsulfonyl fluoride per ml and grown for 40 to 50 h at 28°C. This level of phenylmethylsulfonyl fluoride does not affect S. albus growth or cellulase production, which is 1.5 + 0.5 U/ml. The supernatant was recovered after filtra- tion through glass wool and adjusted to 35% (NH4)2SO4 saturation (208 g added to each liter). After 30 min at 4°C, the suspension was centrifuged at 10,000 x g for 20 min and the supernatant was adjusted to 65% (NH4)2SO4 saturation, allowed to sit for 30 min, and centrifuged as above. The precipitate was dissolved in a minimal volume of 5 mM 3397 JOURNAL OF BACTERIOLOGY, June 1991, p. 3397-3407 0021-9193/91/113397-11$02.00/0 Copyright C 1991, American Society for Microbiology on March 16, 2018 by guest http://jb.asm.org/ Downloaded from

Thermomonospora fusca

  • Upload
    voquynh

  • View
    261

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Thermomonospora fusca

Vol. 173, No. 11

DNA Sequences of Three 3-1,4-Endoglucanase Genes fromThermomonospora fusca

GUIFANG LAO, GURDEV S. GHANGAS, ELYSE D. JUNG, AND DAVID B. WILSON*Section of Biochemistry, Molecular and Cell Biology, Cornell University, Ithaca, New York 14853

Received 26 December 1990/Accepted 21 March 1991

The DNA sequences of the Thermomonosporafusca genes encoding cellulases E2 and E5 and the N-terminalend of E4 were determined. Each sequence contains an identical 14-bp inverted repeat upstream of theinitiation codon. There were no significant homologies between the coding regions of the three genes. The E2gene is 73% identical to the celA gene from Microbispora bispora, but this was the only homology found withother cellulase genes. E2 belongs to a family of cellulases that includes celA from M. bispora, cenA fromCellulomonasfimi, casA from an alkalophilic Streptomyces strain, and cellobiohydrolase II from Trichodermareesei. E4 shows 44% identity to an avocado cellulase, while E5 belongs to the Bacillus cellulase family. Therewere strong similarities between the amino acid sequences of the E2 and E5 cellulose binding domains, and theseregions also showed homology with C. fimi and Pseudomonas fluorescens cellulose binding domains.

An important step toward understanding the mechanismof action of an enzyme is the determination of its amino acidsequence. In recent years, this usually has been done bydetermining the DNA sequence of the structural gene thatencodes the protein, as DNA sequencing is simpler and more

precise than protein sequencing. The sequences of a numberof cellulase genes have been determined, and this work hasbeen reviewed by Beguin et al. (1).We have been studying the cellulases of a thermophilic,

filamentous soil bacterium, Thermomonospora fusca, andhave purified five antigenically distinct cellulases, designatedE1 to E5, from the culture supernatant of an extracellular-protease-negative mutant of T. fusca (34). All five enzymesare P-1,4-endoglucanases, but they show considerable vari-ation in their specific activities on several substrates and intheir physical properties. The enzymes from T. fusca areheat stable and active over a broad pH range with anoptimum centered at pH 6.5. While no complex formationbetween the cellulases has been seen, enzyme E3 actssynergistically with E2 and E5. Evidence for coordinateregulation (20, 22) and the recent identification of a regula-tory protein that binds near the translational start of the E5gene (21) show that there are probably regulatory sequencespresent in each gene.As part of our study of enzymatic mechanisms of cellulose

degradation, we determined the DNA sequences of thestructural genes encoding three (E2, E4, E5) of the fivepurified T. fusca cellulases. Comparisons of the amino acidsequences of these cellulases with each other and with othercellulases yielded information about the similarities anddifferences among cellulases. Such comparisons may pro-vide insight into the catalytic and regulatory mechanisms ofthese enzymes.

MATERIALS AND METHODS

Bacterial strains and plasmids. The host strain for alltransformations and transfections was Escherichia coliJM101 (rK' mK' supE thi A(/ac-proAB) [F' traD36 proABlacl"ZAM15]) (36), except for the subcloning of the E4 gene,for which E. coli HB101 (F- hsdS20 [rB- mB-] supE44

* Corresponding author.

ara-14 ga/K2 lacYl proA2 rspL20 xyl-5 mtl-l recA13) wasused. The cellulase genes were cloned from T. fiusca YX,acquired from Dexter Bellamy, Cornell University (3). Thegene for E2 was subcloned from plasmid pGG91 (10), that forE4 was subcloned from plasmid pGG76 (see below), and thatfor EB was subcloned from plasmid pD318 (5).

Cloning of the E4 gene. The E4 gene was isolated from a T.fusca DNA library constructed in Streptomyces lividans byusing plasmid pIJ702 (18) as described previously (10). Acellulase-excreting S. /ividans clone was identified by acarboxymethyl cellulose overlay procedure (34). This straincontained a plasmid (pGG75) that consists of a 1.8-kb BglIIfragment of T. fusca DNA inserted into the BglII site ofplasmid pIJ702. This plasmid was introduced into Strepto-myces albus J1074 by protoplast transformation (10). TheBglII insert from pGG75 was subcloned into the BamHI siteof pBR322 (pGG76) and pBR327 (pGG77), and these plas-mids were introduced into E. coli HB101 by transformation;both classes of transformants were cellulase positive.

Subcloning of cellulase genes. The genes for E2, E4, and E5were subcloned into M13mpl8 and M13mpl9 bacteriophagevectors (Bethesda Research Laboratories), allowing se-quencing in both orientations. Transformation of JM101 andpreparation of single-stranded DNA were performed bymethods described in the M13 CloninglDideoxy SequencingInstruction Manual from Bethesda Research Laboratories.Restriction maps of the subclones are shown in Fig. 1.

Isolation and purification of E4 protein. The E4 geneproduct was isolated from S. albus J1074 containing pGG75,as this strain produced higher levels of E4 than did S./ividans transformants. Mycelia harvested from a 15-mlculture were transferred to 1.5 liters of tryptone soya brothcontaining 10 pLg of thiostrepton per ml and 300 Vig ofphenylmethylsulfonyl fluoride per ml and grown for 40 to 50h at 28°C. This level of phenylmethylsulfonyl fluoride doesnot affect S. albus growth or cellulase production, which is1.5 + 0.5 U/ml. The supernatant was recovered after filtra-tion through glass wool and adjusted to 35% (NH4)2SO4saturation (208 g added to each liter). After 30 min at 4°C, thesuspension was centrifuged at 10,000 x g for 20 min and thesupernatant was adjusted to 65% (NH4)2SO4 saturation,

allowed to sit for 30 min, and centrifuged as above. Theprecipitate was dissolved in a minimal volume of 5 mM

3397

JOURNAL OF BACTERIOLOGY, June 1991, p. 3397-34070021-9193/91/113397-11$02.00/0Copyright C 1991, American Society for Microbiology

on March 16, 2018 by guest

http://jb.asm.org/

Dow

nloaded from

Page 2: Thermomonospora fusca

3398 LAO ET AL.

Cla I Sal1 8ph 1E2 1 I I I

ooRl

(Acc ) -(EcoRl )

(EcoRil)

PVU II

Oma 1)

E4ZCoRl Sgl IXI SphL. - I I

L-0 - 1(EcoRl) (Sphl)

(Sphl)

boRl B311 NoolE5 I I I I I

Sil Xhol

I(Eoi(EcoRl )

Aval Ipal Bgi I I SphlI . I I

(Kpnl)

II ,1(Kpnl) Bgl II (Sphl)

s8i Stul 3gll Pvu III I I I

Pv II

(Smal) (Pvu II)

(Smal)

(Sal) tmal,I OWL I - I I I ,

(Sall) (Smal) 800 bp

FIG. 1. Subclones used in the determination of the DNA sequences of E2, E4, and E5. The T. fusca DNA that was sequenced is shownon the top line of each section; subclones, drawn below each main line, contain arrows to indicate the direction in which the DNA wassequenced. T. fusca DNA was subcloned into M13mpl8 and M13mpl9 phage vectors. pBR322 DNA is indicated by dashed lines.

potassium phosphate (pH 6.5) and applied to a 350-mlSephadex G-150 column (height, 90 cm) which was equili-brated and eluted with the same buffer. The active fractionswere analyzed by sodium dodecyl sulfate (SDS) gel electro-phoresis. Fractions of the highest purity were combined,precipitated with 80% (NH4)2SO4, and dissolved in 1 ml ofH20. The proteins were then precipitated with 10% tri-chloroacetic acid. The trichloroacetic acid was removedfrom the precipitate by washing with ice-cold acetone, andthe pellet was immediately dissolved in 200 ,ul of SDSloading dye containing 0.1 M Tris base. The sample waselectrophoresed on a 10% polyacrylamide gel and stainedwith Coomassie blue. The 55-kDa band was excised from thegel and electroeluted as described by Hunkapiller et al. (16)using half-strength elution buffer. The purified protein gave asingle band on SDS gel electrophoresis. Antibody raisedagainst this purified protein in a rabbit reacted strongly withT. fusca E4 protein and did not react with the other purifiedT. fusca enzymes (E1, E2, E3, E5) when tested on a Westernimmunoblot gel (32). Proteins that reacted with the E4antibody were detected by the alkaline phosphatase kit fromBio-Rad Laboratories.DNA sequencing. DNA was sequenced by the dideoxy-

chain termination method (30) using modified T7 DNApolymerase. A commercially available kit (Sequenase;United States Biochemical Corp., Cleveland, Ohio) was

used to sequence single-stranded templates. Both dGTP anddITP labeling mixes were used to resolve areas of secondarystructure that produced compressions in sequencing gels.Because the DNA of T. fusca has a high G+C content, bandcompressions occurred frequently. Formamide was added tothe 6% polyacrylamide sequencing gels (to 20%, vol/vol) toreduce problems arising from secondary structure in thetemplates. Synthetic primers, usually 15 bases, were ob-tained from the Oligonucleotide Synthesis Facility, CornellUniversity. The subclones used for sequencing each geneare shown in Fig. 1.Amino acid compositions. Amino acid compositions were

determined on E2 and E5 that had been purified from thesupernatants of S. lividans strains containing the appropriateplasmids. Samples were hydrolyzed in 6 N HCl for 95 min at150°C, and the hydrolysates were fractionated on a WatersPico Tag analyzer. The N-terminal sequences of E2, E4, andE5 have been reported previously (34). The N-terminalsequence of the cellulase purified from an S. albus straincarrying the cloned E4 gene was determined on an AppliedBiosystems 470A gas-phase protein sequencer and wasidentical to the published sequence of E4 purified from T.fusca except that the order of the residues in positions 5 and6 and in positions 8 and 9 was reversed. The DNA sequencedetermined for the E4 clone predicts the amino acid se-quence found in the protein purified from S. albus.

J. BACTERIOL.

on March 16, 2018 by guest

http://jb.asm.org/

Dow

nloaded from

Page 3: Thermomonospora fusca

.Accl /Cl a1 ACIBC ACGATATGGATGATCTGACGTCTGAATCCCCTTGTCACC 60

61 CTAGACATTCACCCATTTTGTCGCTTTTACGGCTTTCTTTGGGAGTTCTCCGTTTCACCA 120

121 AGGAACAAAACCGCAACGGAGAGTAGGCGCGGTCTTTACAGCTCCCTTGCCAATGGTTAT 180

181 CGTCCGAACGGAAAACGATCTCCGAGCCCTCCCAGCCATGCGCTCCTCTTCGTGCCCCTC 240

241 ACTTCTTTTGAGCCTTGTGCTCGTTAGGAGCCCCGAATGTCCCCCAGACCTCTTCGCGCT 300MetSerProArgProLeuArgAla

301 CTTCTGGGCGCCGCGGCGGCGGCCTTGGTCAGCGCGGCTGCTCTGGCCTTCCCGTCGCAA 360LeuLeuGlyAlaAlaAlaAlaAlaLeuValSerAlaAlaAlaLeuAlaPheProSerGln

361 GCGGCGGCCAATGATTCTCCGTTCTACGTCAACCCCAACATGTCCTCCGCCGAATGGGTG 420AlaAlaAlaAsnAsoSerProPheTvrValAsnProAsnMetSerSerAlaGl uTrpVal

421 CGGAACAACCCCAACGACCCGCGTACCCCGGTAATCCGCGACCGGATCGCCAGCGTGCCG 480ArgAsnAsnProAsnAspProArgThrProValIleArgAspArgIleAlaSerValPro

Sall481 CAGGGCACCTGGTTCGCCCACCACAACCCCGGGCAGATCACCGGCCAGGTCGACGCGCTC 540

GlnGlyThrTrpPheAlaHisHisAsnProGlyGlnIleThrGlyGlnValAspAlaLeu

541 ATGAGCGCCGCCCAGGCCGCCGGCAAGATCCCGATCCTGGTCGTGTACAACGCCCCGGGC 600MetSerAlaAlaGlnAlaAlaGlyLysIleProIleLeuValValTyrAsnAlaProGly

601 CGCGACTGCGGCAACCACAGCAGCGGCGGCGCCCCCAGTCACAGCGCCTACCGGTCCTGG 660ArgAspCysGlyAsnHisSerSerGlyGlyAlaProSerHisSerAlaTyrArgSerTrp

661 ATCGACGAATTCGCTGCCGGACTGAAGAACCGTCCCGCCTACATCATCGTCGAACCGGAC 720IleAspGluPheAlaAlaGlyLeuLysAsnArgProAlaTyrIleIleValGluProAsp

721 CTGATCTCGCTGATGTCGAGCTGCATGCAGCACGTCCAGCAGGAAGTCCTGGAGACGATG 780LeuIleSerLeuMetSerSerCysMetGlnHisValGlnGlnGluValLeuGluThrMet

aphI781 GCGTACGCGGGCAAGGCCCTCAAGGCCGGGTCCTCGCAGGCGCGGATCTACTTCGACGCC 840

AlaTyrAlaGlyLysAlaLeuLysAlaGlySerSerGlnAlaArgIleTyrPheAspAla

841 GGCCACTCCGCGTCGGACTCGCCCCAGCAGATGGCTTCCTGGCTCCAGCAGGCCGACATC 900GlyHisSerAlaSerAspSerProGlnGlnMetAlaSerTrpLeuGlnGlnAlaAspIle

901 TCCAACAGCGCGCACGGTATCGCCACCAACACCTCCAACTACCGGTGGACCGCTGACGAG 960SerAsnSerAlaHisGlyIleAlaThrAsnThrSerAsnTyrArgTrpThrAlaAspGlu

961 GTCGCCTACGCCAAGGCGGTGCTCTCGGCCATCGGCAACCCGTCCCTGCGCGCGGTCATC 1020ValAlaTyrAlaLysAlaValLeuSerAlaIleGlyAsnProSerLeuArgAlaValIle

1021 GACACCAGCCGCAACGGCAACGGCCCCGCCGGTAACGAGTGGTGCGACCCCAGCGGACGC 1080AspThrSerArgAsnGlyAsnGlyProAlaGlyAsnGluTrpCysAspProSerGlyArg

1081 GCCATCGGCACGCCCAGCACCACCAACACCGGCGACCCGATGATCGACGCCTTCCTGTGG 1140AlaIleGlyThrProSerThrThrAsnThrGlyAspProMetIleAspAlaPheLeuTrp

1141 ATCAAGCTGCCGGGTGAGGCCGACGGCTGCATCGCCGGCGCCGGCCAGTTCGTCCCGCAG 1200IleLysLeuProGlyGluAlaAspGlyCysIleAlaGlyAlaGlyGlnPheValProGln

1201 GCGGCCTACGAGATGGCGATCGCCGCGGGCGGGCACCAACCCCAACCCGAACCCCAACCC 1260AlaAlaTyrGluMetAlaIleAlaAlaGlyGlyHisGlnProGlnProGluProGlnPro

1261 GACGCCCACCCCCACTCCGACCCCCACGCCGCCTCCCGGCTCCTCGGGGGCGTGCACGGC 1320AspAlaHisProHisSerAspProHisAlaAlaSerArgLeuLeuGlyGlyValHisGly

1321 GACGTACACGATGCCAACGAGTGGAACGACGGCTTCCAGGCGACCGTGACGGTCACCGCG 1380AspValHisAspAlaAsnGluTrpAsnAspGlyPheGlnAlaThrValThrValThrAla

1381 AACCAGAACATCACCGGCTGGACCGTGACGTGGACCTTCACCGACGGCCAGACCATCACC 1440AsnGlnAsnIleThrGlyTrpThrValThrTrpThrPheThrAspGlyGlnThrIleThr

1441 AACGCCTGGAACGCCGACGTGTCCACCAGCGGCTCCTCGGTGACCGCGCGGAACGTCGGC 1500AsnAlaTrpAsnAlaAspValSerThrSerGlySerSerValThrAlaArgAsnValGly

1501 CACAACGGAACGCTCTCCCAGGGAGCCTCCACAGAGTTCGGCTTCGTCGGGCTCTAAGGG 1560HisAsnGlyThrLeuSerGlnGlyAlaSerThrGluPheGlyPheValGlyLeuEnd

1561 CAACTCCAACTCTGTTCCGACCCTTACCTG CGCCAGGGGATCCTCTAGAGTCGACC 1620

1621 CCA&CrAMrC 1630

FIG. 2. DNA sequence and corresponding protein sequence of cellulase E2. A 14-base sequence, present in the regulatory binding site ofthe Es gene of T. fusca, is underlined. It is located 61 nucleotides upstream from the translational start codon. The N-terminal amino acidsequence (underlined) is preceded by a 31-amino-acid signal sequence. The vector sequence (M13mpl8) is italicized and underlined.

3399

on March 16, 2018 by guest

http://jb.asm.org/

Dow

nloaded from

Page 4: Thermomonospora fusca

3400 LAO ET AL.

Computer analysis of DNA sequences. DNA Inspector lIe(TEXTCO, West Lebanon, N.H.) was used to determinereading frames, codon usage, and predicted molecularweights of peptides. The WORDSEARCH program (Genet-ics Computer Group, University of Wisconsin) (6), whichuses the search method of Wilbur and Lipman (33a), wasused to identify protein sequences that showed potentialhomology with E2, E4, and E5. Cellulase proteins thatappeared to be similar to the T. fusca cellulases were initiallyaligned by using the BESTFIT program (Genetics ComputerGroup). Protein pairs that showed the greatest degree ofsimilarity were subjected to a statistical analysis of thealignment, using the Statcom2 program (Kansas State Uni-versity) (28). The purpose of the statistical analysis was todetermine whether the best alignment of two proteins wassignificantly better than that which could be obtained fromaligning random -sequences generated from the chosen pro-teins. Using 100 randomizations as a basis of comparison, anoptimum number of gaps was determined for a pair ofproteins. The number of gaps that gave the highest score inthe statistical analysis was then used in conjunction with anMDM78 scoring matrix to generate the best alignment of thetwo protein sequences.

RESULTS

Protein size and composition. The DNA sequences deter-mined for E2, E4, and E5 (Fig. 2 to 4) were analyzed, and areadi'ng 'fratne that coded for the N-terminal sequence of thepurified protein was located in each gene. The amino acidcompositions predicted by the selected open reading frameswere in good agreement with the experimentally determinedamino acid compositions (Table 1). Furthermore, the loca-tions of the start and stop codons in the E2 and E5 sequencesdefiied regions coding for proteins with molecular weightssinilar to those determined by SDS gel electrophoresis (34).The large difference between the predicted and experimentalmolecular weights of E4 shows that only a portion of the E4gene is contained in pGG76. The fact that no stop codon wasfound in the E4 insert of pGG91 is further evidence that onlya part of the gene was cloned. A stop codon in pBR327probably is used during translation of the E4 clone in E. coli.

Additional confirmation of the E5 gene sequence comesfrom comparisons with the peptides generated by CNBrcleavage (23a). Cyanogen bromide cleavage of E5 generatedpeptides with molecular weights of 15,500 and 11,000 and abroad band (presumably a doublet) at 5,000. This correlateswell with 'the peptides with molecular weights of 15,500,11,100, 5,100, and 4,700 predicted from the positions of themethionine codons in the E5 gene DNA sequence.G+C content. The DNA sequencing gels revealed a G+C

content of- approximately 65% in the cloned DNA. A highG+C content in the third position would be expected for thecorrect reading frame of each cellulase, and computer anal-ysis of the G+C content in the third positions of the codonswas used to determine the correct reading frame. Thereading frames that code for E2 and E5 were found to haveG+C contents of 93% in the third positions of the codons,while the reading frame of E4 had 92% G+C. The DNA

sequences of the E2, E4, and E5 inserts had G+C contents of68, 67, and 67%, respectively. This is very close to the valuefor the chromosomal DNA of a related organism, Thermo-monospora curvata (27), which has a G+C content of 67%.

Signal sequences and ribosome binding sites. The regioncoding for the N terminus of each cellulase is preceded by aregion characteristic of a signal sequence. The typical rangeof actinomycete signal sequence lengths is 24 to 56 residues,and they usually contain multiple Arg residues preceding thehydrophobic core (17). The lengths of the signal sequencesfor E2, E4, and E5 are 31, 46, and 36 amino acids, respec-tively. The signal sequences contain all the features charac-teristic of such regions, including a long hydrophobic region,a leader peptidase cleavage site, and positively chargedamino acids near the N terminus. Arginine is found inresidues 4 and 7 of E2 and in residues 9 to 11, 13, 16, 18, and19 of E4, while E5 has lysine in residues 3 and 9 and anarginine in residue 8 (Fig. 2 to 4). The amino acid sequencesof the three signal peptides were all different from eachother. Each gene is preceded by a potential ribosome bindingsite showing strong homology to the 3' end of S. lividans 16SRNA. For E2, the sequence GGAG, which is 6 bases beforethe initiating AUG, is perfectly complementary to the 3' 16SRNA sequence, while for E4, the complementary sequenceTGGAG is 5 bases before the AUG and for E5 the comple-mentary sequence TGGAGGAA starts 2 bases before theAUG. We found that E5 is expressed in S. lividans at a levelthat is 10 times higher than that of E2, but at this time we donot know whether this is due to differences in their promot-ers, their ribosome binding sites, or both.

Sequence homology among T. fusca celiulases E2, E4, andEs. All three DNA sequences contain an A+T-rich region,characteristic of promoter sequences, that precedes thetranslation initiation site. This region has been shown tocontain three closely linked promoters in the E5 gene by S1mapping of mRNA isolated from T. fusca (22). It is likelythat the promoter sequences of E2 and E4 are also locatedwithin these regions, but they do not show homology witheach other or the E5 promoters. Within the A+T-rich regionof each gene is an identical 14-bp inverted repeat sequence,underlined in Fig. 2 to 4. In the E5 gene, this region is thebinding site for a protein that is involved in the induction ofE5 gene transcription (21). Gel retardation assays haveshown that a protein, also present in T. fusca celtextracts,binds to-both the E2 and E4 genes (18a). The binding regionis about 50 bp downstream of the putative promoters of E5(21). A downstream location for a binding site is unusual, asmost regulatory proteins bind upstream of the transcriptioninitiation site. This binding site was not found to be homol-ogous to any known regulatory sequences.A computer-assisted homology search indicated no other

areas of homology between the three T. fusca DNA se-quences. However, computer analysis of the E2 and E5amino acid sequences revealed that the C-terminal region ofE2 an'd'the N-terminal region of E5 show some similarities.Both regions show homology to the cellulose binding do-mains that have been identified in endoglucanase A and theexoglucanase from Cellulomonasfimi (11). For E5, there was40% identity in a 94-amino-acid overlap with the endogluca-

FIG. 3. DNA sequence and corresponding protein sequence of cellulase E4. A 14-base sequence, almost identical to the putativeregulatory protein binding site in E5, is underlined. The translational start codon is located 35 nucleotides downstream from this 14-basesequence. The N-terminal amino acid sequence of the protein (underlined) is preceded by a 46-amino-acid signal peptide. The vector sequence(pBR322) is italicized and underlined.

J. BACTERIOL.

on March 16, 2018 by guest

http://jb.asm.org/

Dow

nloaded from

Page 5: Thermomonospora fusca

* * P D Pl . * .1 GATCTCCCCCGCTGCTCCGTGTCAGCGCACGCGAA 60

61 GGGCGACTACTTAAGGTTCACCCATTGAACGCAATCGGTCACCACGTCGCACCATGCTGG 120

121 AGACGTCACTCACAGAGAGCCAACACCAAACTGGo rACACCGGTGTGTTCG 180

181 CACAATCCCCCTGGAGACCCCATGTCCGTCACTGAACCTCCTCCCCGCCGACGCGGCCGT 240MetSerValThrGluProProProArgArgArqGlyArg

241 CACAGCCGGGCGCGCCGCTTCCTCACTTCCCTGGGAGCTACCGCGGCCCTCACCGCGGGC 300HisSerArgAlaArgArgPheLeuThrSerLeuGlyAlaThrAlaAlaLeuThrAlaGly

301 ATGCTGGGCGTCCCCTTGGCCACGGGAACCGCCCACGCCGAACCGGCGTTCAACTACGCC 360MetLeuGlyValProLeuAlaThrGlyThrAlaHi2AlaCluuPro aPheAsnTYrAlm

361 GAAGCCCTCCAGAAGTCGATGTTCTTCTACGAGGCCCAACGCTCCGGGAAACTCCCGGAG 420CluAlaLeuGlnLysSerMetPh.PheTyrCluAlaGln SerGlyLysLeuProGlu

421 AACAACCGGGTCTCCTGGCGCGGCGACTCCGGGCTCAACGACGGCGCGGACGTGGGACTC 480AsnAsnArgValSerTrpArgGlyAspSerGlyLeuAsnAspGlyAlaAspValGlyLeu

481 GACCTCACCGGCGGCTGGTACGACGCCGGCGACCACGTGAAATTCGGCTTCCCCATGGCC 540AspLeuThrGlyGlyTrpTyrAspAlaGlyAspHisValLysPheGlyPheProMetAla

541 TTCACCGCGACCATGCTCGCCTGGGGCGCATCGAAGCCCGGAAGGCTACATCGCTCCGGC 600PheThrAlaThrMetLeuAlaTrpGlyAlaSerLysProGlyArgLeuHisArgSerGly

601 CAGATGCCCTACCTCAAGGACAACCTGCGCTGGGTCAACGACTACTTCATCAAAGCCCAC 660GlnMetProTyrLeuLysAspAsnLeuArgTrpValAsnAspTyrPheIleLysAlaHis

661 CCCTCGCCCAACGTGCTGTACGTGCAGGTCGGCGACGGCGACGCCGACCACAAGTGGTGG 720ProSerProAsnValLeuTyrValGlnValGlyAspGlyAspAlaAspHisLysTrpTrp

721 GGTCCGGCCGAAGTCATGCCGATGGAGCGGCCCAGCTTCAAAGTGGACCCCTCCTGCCCG 780GlyProAlaGluValMetProMetGluArgProSerPheLysValAspProSerCysProAval .....

781 GGCAGCGACGTCGCAGCCGAAACCGCCGCGGCCATGGCCGCGTCCTCCATCGTGTTCGCC 840GlySerAspValAlaAlaGluThrAlaAlaAlaMetAlaAlaSerSerIleValPheAla

841 GACGACGACCCTGCGTACGCGGCCACCCTCGTGCAGCACGCCAAGCAGCTCTACACGTTC 900AspAspAspProAlaTyrAlaAlaThrLeuValGlnHisAlaLysGlnLeuTyrThrPhe

901 GCCGACACCTACCGCGGCGTGTACTCCGACTGCGTGCCGCCGGAGCGTTCTACAACTCCT 960AlaAspThrTyrArgGlyValTyrSerAspCysValProProGluArgSerThrThrPro

961 GGTCGGGCTACCAGGACGAGCTCGTCTGGGGCGCCTACTGGCTGTACAAGGCCACCGGGG 1020GlyArgAlaThrArgThrSerSerSerGlyAlaProThrGlyCysThrArgProProGly

1021 ACGACTCCTACTTGGCGAAGGCCGAGTACGAGTACGACTTCCTCTCCACCGAGCAGCAGA 1080ThrThrProThrTrpArgArgProSerThrSerThrThrSerSerProProSerSerArg

1081 CCGACCTCCGCAGCTACCGGTGACCATCGCCTGGACGACAAGTCCTACGGCACCTACGTG 1140ProThrSerAlaAlaThrGlyAspHisArgLeuAspAspLysSerTyrGlyThrTyrVal

1141 CTGCTCGCCAAGGAAACCGGCAAGCAAAAATACATCGACGACGCCAACCGGTGGCTCGAC 1200LeuLeuAlaLysGluThrGlyLysGlnLysTyrIleAspAspAlaAsnArgTrpLeuAsp

1201 TACTGGACGGTCGGCGTCAACGGCCAGCGCGTGCCCTACTCCCCCGGCGGGATGGCTGTG 1260TyrTrpThrValGlyValAsnGlyGlnArgValProTyrSerProGlyGlyMetAlaVal

1261 CTCGACACCTGGGGAGCCCTGCGCTACGCCGCTAACACCGCGTTCGTCGCCCTCGTCTAC 1320LeuAspThrTrpGlyAlaLeuArgTyrAlaAlaAsnThrAlaPheValAlaLeuValTyr

1321 GCCAAGGTGATCGACGACCCCGTCCGCAAGCAGCGGTACCACGACTTCGCGGTGCGGCAG 1380AlaLysValIleAspAspProValArgLysGlnArgTyrHisAspPheAlaValArgGln

1381 ATCAACTACGCGCTCGGCGACAACCCGCGGAACTCCAGCTACGTGGTGGGCTTCGGCAAC 1440IleAsnTyrAlaLeuGlyAspAsnProArgAsnSerSerTyrValValGlyPheGlyAsn

1441 AACCCGCCGCGCAACCCCCACCACCGCACCGCGCACGGGTCGTGGACCGACAGCATCGCC 1500AsnProProArgAsnProHisHisArgThrAlaHisGlySerTrpThrAspSerIleAla

1501 TCGCCCGCGGAGAACCGGCACGTCCTCTACGGCGCCCTCGTCGGCGGTCCCGGCTCCCCG 1560SerProAlaGluAsnArgHisValLeuTyrGlyAlaLeuValGlyGlyProGlySerPro

1561 AACGACGCCTACACCGACGACCGGCAGGACTACGTCGCCAACGAAGTCGCCACCGACTAC 1620AsnAspAlaTyrThrAspAspArgGlnAspTyrValAlaAsnGluValAlaThrAspTyr

1621 AACGCCGGATTCTCCAGCGCGCTGGCCATGCTGGTCGAAGAGTACGGCGGCACCCCGCTG 1680AsnAlaGlyPheSerSerAlaLeuAlaMetLeuValGluGluTyrGlyGlyThrProLeu

1681 GCGGACTTCCCGCCCACCGAGGAGCCCGACGGACCGGAGATCC TCTACGCCGACGCATC 1740AlaAspPheProProThrGluGluProAspGlyProGluIle

1741 TZrQQQErCA 1753

3401

on March 16, 2018 by guest

http://jb.asm.org/

Dow

nloaded from

Page 6: Thermomonospora fusca

3402 LAO ET AL.

1

61

121

181

241

301

361

421

481

541

601

EcoRlTATGACCATGATTACGAATTCCAGGATGGTCGTGTGGACTGCGGCCAAGCGGGGGTCGGT

CGACGTGCTCGGGGGTGATCCCGTGGACGCGATCTTCTCCTGCTCGAACTCCTCGACGGG

GGCCGGGGGACGGACCAGGGTGGTGTAGCGGTCGACGATCCGCCCGTCGGTGACTCGCAC

GAGTCCCACCGCGCACACGCTGCCCTGGTTGCGGTTGGCGGTCTCGAAGTCGATAGCGGT

CCAGGTGTGCTCGATCGCCGGTCGTCCTCCCTTGGGCCGCACTGGCCAGGCGTCGGGGTT

CCTCCCATGCTCCCGGAACTTCACCTCATCCAGCACCCCTCCTTCTTTGAGTGACCTAGA* * . 1i 12 . 13

TCACTTATTGCTCTTGACTTTGTGCTACGGTCTCACTACATAACGAGAACGGCGCATTCC

TCCTTCCCCGATTCTGTCACGGAATGCGCATCCCTATAGGGAGCGCTCCCATAGCCGTC

TTCCTCCCCTCTCCCCTTGGAGGAACCATGGCGAAATCCCCCGCCGCCCGGAAGGGCXGCMetAlaLysSerProAlaAlaArgLysGlyXxx

CCTCCGGTCGCTGTCGCGGTGACCGCGGCCCTCGCCCTGCTGATCGCGCTCCTCTCCCCCProProValAlaValAlaValThrAlaAlaLeuAlaLeuLeuIleAlaLeuLeuSerPro

GGAGTCGCGCAGGCCGCCGGTCTCACCGCCACAGTCACCAAAGAATCCTCGTGGGACAACGlyValAlaGlnAlaAlaGlyLeuThrAlaThrValThrLysGluSerSerTrpAspAsn

.Xhol661 GGCTACTCCGCGTCCGTCACCGTCCGCAACGACACCTCGAGCACCGTCTCCCAGTGGGAG

GlyTyrSerAlaSerValThrValArgAsnAspThrSerSerThrValSerGlnTrpGlu

721 GTCGTCCTCACCCTGCCCGGCGGCACTACAGTGGCCCAGGTGTGGAACGCCCAGCACACCValValLeuThrLeuProGlyGlyThrThrValAlaGlnValTrpAsnAlaGlnHisThr

781 AGCAGCGGCAACTCCCACACCTTCACCGGGGTTTCCTGGAACAGCACCATCCCGCCCGGASerSerGlyAsnSerHisThrPheThrGlyValSerTrpAsnSerThrIleProProGly

841 GGCACCGCCTCTTCCGGCTTCATCGCTTCCGGCAGCGGCGAACCCACCCACTGCACCATCGlyThrAlaSerSerGlyPheIleAlaSerGlySerGlyGluProThrHisCysThrIle

901 AACGGCGCCCCCTGCGACGAAGGCTCCGAGCCGGGCGGCCCCGGCGGTCCCGGAACCCCCAsnGlyAlaProCysAspGluGlySerGluProGlyGlyProGlyGlyProGlyThrPro

961 TCCCCCGACCCCGGCACGCAGCCCGGCACCGGCACCCCGGTCGAGCGGTACGGCAAAGTCSerProAspProGlyThrGlnProGlyThrGlyThrProValGluArgTyrGlyLysVal

1021 CAGGTCTGCGGCACCCAGCTCTGCGACGAGCACGGCAACCCGGTCCAACTGCGCGGCATGGlnValCysGlyThrGlnLeuCysAspGluHisGlyAsnProValGlnLeuArgGlyMet

1081

1141

AGCACCCACGGCATCCAGTGGTTCGACCACTGCCTGACCGACAGCTCGCTGGACGCCCTGSerThrHisGlyIleGlnTrpPheAspHisCysLeuThrAspSerSerLeuAspAlaLeu

AlaTyrAspTrpLysAlaAspIleIleArgLeuSerMetTyrIleGlnGluAspGlyTyr

1201 GAGACCAACCCGCGCGGCTTCACCGACCGGATCGACCAGCTCATCGACATGGCCACGGCGGluThrAsnProArgGlyPheThrAspArgIleAspGlnLeuIleAspMetAlaThrAla

1261 CGCGGCCTGTACGTGATCGTGGACTGGCACATCCTCACCCCGGGCGATCCCCACTACAACArgGlyLeuTyrValIleValAspTrpHisIleLeuThrProGlyAspProHisTyrAsn

1321 CTGGACCGGGCCAAGACCTTCTTCGCGGAAATCGCCCAGCGCCACGCCAGCAAGACCAACLeuAspArgAlaLysThrPhePheAlaGluIleAlaGlnArgHisAlaSerLysThrAsn

60

120

180

240

300

360

420

480

540

600

660

720

780

840

900

960

1020

1080

1140

1200

1260

1320

1380

FIG. 4. DNA sequence and corresponding protein sequence of cellulase E5. The 5' ends of the three mRNAs, indicated by scroll marks,are located near the end of an A+T-rich region that begins at nucleotide 343. A 14-base sequence (underlined) located 35 nucleotides upstreamfrom the start codon has been identified as the binding site for a regulatory protein (18a, 21). The N-terminal end of the mature protein isunderlined. The vector DNA (M13) is italicized and underlined.

nase and there was 32% identity in a 115-amino-acid overlapwith the exoglucanase. For E2, there was 37% identity in an82-amino-acid overlap with the endoglucanase and 35%identity in a 123-amino-acid overlap with the exoglucanase(11). Both regions also show homology to a region at the Cterminus of a Pseudomonas fluorescens carboxymethyl cel-lulase, with E2 showing 40% identity to a 75-amino-acidsequence and E5 showing 32% identity to a 90-amino-acidsequence that includes the 75 amino acids showing homol-

ogy to E2 (14). Furthermore, a proteolytic derivative of E2that is missing about 110 residues from its C-terminal enddoes not bind to cellulose (10). No significant homology wasobserved when the protein sequences of E2, E4, and E5 werealigned with one another and subjected to statistical analy-S1S.Sequence homology with other cellulases. The DNA se-

quences of E2, E4, and E5 were compared with the se-quences of cellulase genes in the computer data base Gen-

J. BACTERIOL.

on March 16, 2018 by guest

http://jb.asm.org/

Dow

nloaded from

Page 7: Thermomonospora fusca

ENDOGLUCANASE GENES FROM THERMOMONOSPORA FUSCA 3403

1381 GTGCTCTACGAGATCGCCAACGAACCCAACGGAGTGAGCTGGGCCTCCATCAAGAGCTACValLeuTyrGluIleAlaAsnGluProAsnGlyValSerTrpAlaSerIleLysSerTyr

1441 GCCGAAGAGGTCATCCCGGTGATCCGCCAGCGCGACCCCGACTCGGTGATCATCGTGGGCAlaGluGluValIleProValIleArgGlnArgAspProAspSerValIleIleValGly

1501 ACCCGCGGCTGGTCGTCGCTCGGCGTCTCCGAAGGCTCCGGCCCCGCCGAGATCGCGGCCThrArgGlyTrpSerSerLeuGlyValSerGluGlySerGlyProAlaGluIleAlaAla

1561 AACCCGGTCAACGCCTCCAACATCATGTACGCCTTCCACTTCTACGCGGCCTCGCACCGCAsnProValAsnAlaSerAsnIleMetTyrAlaPheHisPheTyrAlaAlaSerHisArg

*~~~~~~~~~~~1t* 1

1621AspAsnTyrLeuAsnAlaLeuArgGluAlaSerGluLeuPheProValPheValThrGlu

1681 TTCGGCACCGAGACCTACACCGGTGACGGCGCCAACGACTTCCAGATGGCCGACCGCTACPheGlyThrGluThrTyrThrGlyAspGlyAlaAsnAspPheGlnMetAlaAspArgTyr

1741 ATCGACCTGATGGCGGAACGGAAGATCGGGTGGACCAAGTGGAACTACTCGGACGACTTCIleAspLeuMetAlaGluArgLysIleGlyTrpThrLysTrpAsnTyrSerAspAspPhe

1801 CGTTCCGGCGCGGTCTTCCAGCCGGGCACCTGCGCGTCCGGCGGCCCGTGGAGCGGTTCGArgSerGlyAlaValPheGlnProGlyThrCysAlaSerGlyGlyProTrpSerGlySer

1861 TCGCTGAAGGCGTCCGGACAGTGGGTGCGGAGCAAGCTCCAGTCCTGATGTGCAGCGCGGSerLeuLysAlaSerGlyGlnTrpValArgSerLysLeuGlnSerEnd

1921 TGGCTGCGCGCCCCTGAGCACAGCAAGCAGCCCGCGAGGCTCAGCCCCGCGGGCCGCTTC

1981

2041

2101

TCTGTCCTTCCTCCCCCGTL1GGGATGAGCAuGGA ATCAGCGAGCGGTGCAGCTCAG

CGTCAGCGAGCTAGGCACGTCGCCGAGCCGATGAAGCCGAACGTGGCGGACTGTCCCGCA

CCCAGGGGGATCCTCTACAGTCGACCTGCAGGCATGCAA 2139

FIG. 4-Continued.

Bank. No notable similarities were seen in the codingregions for these structural genes, with the exception ofstriking homology between the E2 gene and the DNA se-quence of a cellulase gene (celA) of Microbispora bispora

TABLE 1. Predicted and experimentally determined amino acidcompositions for E2 and E5a

Composition (mol%)Amino E Eacid E5

Analysis Prediction Analysis Prediction

Ala 13.8 13 8.8 8Arg 3.4 4 4.0 5Asx 11.9 14 10.5 10Cys 1.1 1 1.0 1Gly 9.9 9 8.7 4Glx 10.8 9 11.7 16His 2.4 3 2.9 3Ile 5.8 5 3.9 5Leu 3.9 4 3.9 5Lys 1.7 2 2.4 2Met 2.0 3 1.7 1Pro 8.5 7 6.7 6Phe 2.7 2 3.6 3Ser 8.4 10 10.6 10Thr 2.4 2 8.8 8Tyr 2.4 2 3.8 3Val 6.0 6 5.8 6Trp 2 3

a The predicted values were obtained from the DNA sequences of themature proteins. Standard procedures for amino acid analysis were used toidentify amino acids in proteins isolated from culture supernatants of T.fusca.Tryptophan was destroyed during the amino acid analysis.

(6a). The two sequences were 73% identical when analyzedby the BESTFIT computer program. The E5 promoterregion was not homologous to the promoter sequences of thecellulases of Cellulomonas fimi (35) or Clostridium thermo-cellum (2).Computer analysis was used to determine the degree of

relatedness between the amino acid sequences of the T.fusca cellulases and cellulases from other organisms. Proteinsequences were considered to be significantly related whenthe score of the optimally aligned matrix for each pair ofsequences was at least 4 standard deviations greater than themean score for sequences randomly generated from the pair.This statistical analysis revealed a striking similarity be-tween the protein sequences of E2 and a cellulase from M.bispora and one from Cellulomonas fimi (35). The compari-son matrix score for E2IM. bispora was 67 standard devia-tions above the mean and that for E2/Cellulomonas fimi was17 standard deviations above the mean. The E5 proteinsequence was observed to be similar to cellulases fromBacillus subtilis (25, 29) and several Bacillus spp. (7-9). ForE5/Bacillus sp., the comparison matrix score was 27 stan-dard deviations above the mean.Optimum alignment of the E2 and Cellulomonasfimi cenA

protein sequences, using nine gaps, resulted in 108 of the 279aligned positions (39%) being conserved. A total of 217positions (78%) were either identical or similar in bothsequences. Sequences are generally considered to be evolu-tionarily related when at least 25% identity is found. Evengreater identity was observed for aligned positions of E2 andthe celA endoglucanase from M. bispora, in which 54% of allaligned positions were conserved and 82% were identical orsimilar. When all three cellulases were aligned, the percent

1440

1500

1560

1620

1680

1740

1800

1860

1920

1980

2040

2100

VOL. 173, 1991

on March 16, 2018 by guest

http://jb.asm.org/

Dow

nloaded from

Page 8: Thermomonospora fusca

3404 LAO ET AL.

A: 13 AAAALVSAAALAFPSQAAANDSPFYVNPNMSSAE-WVRNNPNDPRTPVIRB: 16 TAGVGAIVTAIASAGPAHAYDSPFYVDPQSNAAK-WVAANPNDPRTPVIRC :151 T P T P T P T P T P T P T P T V T P Q P T S G F Y V D P T T Q G Y R A W Q A A S G T D - - - K A L L

+ + + + * ***+ * + + +++ * ++ + +* + +

A :63 D R I A S V P Q G T W F A H H - N P G Q I T G Q V D A L M S A A Q A A G K I P I L V V Y N A P G R DB : 66 DRIAAVPTGRWFANY-NPSTVRAEVDAYVGAAAAAGKIPIMVVYAMPNRDC :201 EKIALTPQAYWVGNWADASHAQAEVADYTGRAVAAGKTPMLVVYAIPGRD

*+ +**_+ **++_*_++++* ++*+ +*+* + ++ + ++_*+_** * *+*+ + ***++* +**A :113 C G N H S S G G A P S H S A Y R S W I D E F A A G L K N R P A Y I I V E P D L I S L M S S C M Q H VB :116 C G G P S A G G A P N H T A Y R A W I D E I A A G L R N R P A V I I L E P D A L P I M T N C M S P S

B :166 EQAEVQASMAYAGKKFKAASSQAKVYFDAGHDAWVPADEMASRLRGADIAC :301 GQ G D R V GF L K Y AG V VL TLKG- A RVY I DAG H A KW L S V D T P V NRL N QV G F E

A :213 N S A H G I A T N T SNY RW T AD E VAY A KAV AG SA S L R A V I DT SR N G NGPA G

B :2166 ASSDGIALN S YYTSGGLKF IAASYSQA KSVLYSDAIGHDASWHVLPAVIDTEMAS RNLRGNAGDPI

B :216 N S A D G I A L N V S N Y R Y T S G L I S Y A K S V L S A I G A S H L R A V I D T S R N G N G P L GC :351 Y A V - G F A L N T S N Y Q T T A D S K A Y G Q Q I S Q R L G G K K F--V I D T SR N G N G S N G

A :263 N E W C D P S G R A I G T P S T T N T G D P M I D A F L W I K L P G E A D G C I A G A G Q F V P Q AB :266 S E W C D P P G R A T G T W S T T D T G D P A I D A F L W I K P P G E A D G C I A T P G V F V P D RC :401 - E W C N P R G R A L G E R P V A V N D G S G L D A L L W V K L P G E S D G A C - - - - - - - - - -

FIG. 5. Comparison of the protein sequences of cellulases from T. fusca, M. bispora (6a), and Cellulomonasfimi (35). Line A, E2 fromT. fusca- line B, CelA from M. bispora; line C, CenA from Cellulomonasfimi. Identical matches are indicated by *; chemically similar aminoacids by +, and dissimilar residues by -. Of 258 aligned positions, 33%t are conserved in all family members, and 84% are identical or similar.Numbering of the amino acids is based on sequences that contain a signal peptide.

conserved identity decreased to 33%, but a high degree ofsimilarity (86%) was maintained. The most homologoussections of the aligned sequences are shown in Fig. 5. Thisanalysis shows that E2 belongs to family B of Henrissat et al.(15). This family also includes casA from an alkophilicStreptomyces strain (24) and cellobiohydrolase II from Tri-choderma reesei (4, 31). Two strongly conserved regionscorresponding to amino acids 218 to 228 and 315 to 324 in E2are shown in Fig. 6. In these regions, at least one negativelycharged amino acid is conserved in all sequences. Thenegatively charged amino acids aspartic acid and glutamic

1. l22. raJA3. £a.A

4. CadaA5. CBHL.II

VVYNAPGRDCGVVYAMPNRDCGVVYAIPGRDCGVPYMIPFRDCGVVYDLPDRDCA

110 (D)247 (D)

acid are capable of stabilizing the positive charge that isproposed to be generated during cellulase-catalyzed hydro-lysis. Sites containing such conserved amino acids maycontribute to the active sites of these cellulases. CasA,endoglucanase A, and cellobiohydrolase II were comparedpreviously (15), and five potential active site residues werenoted. Only four of these are conserved when E2 and celAare added to the family.Optimum alignment of E5 and the 3-1,4-endoglucanase

from Bacillus sp. revealed that 38% of the aligned positionswere identical, while 51% of the positions were conserved or

YFDAGHSAWHYFDAGEDAWVYIDAGHAKWL

YYDVGHSAWHYLDAGNADWL

187 (D) 190 (3)318 (D) 321 (H)

IDAFLWI

IDAFLWI

LDALLWV

VDAFLWI

LDSFVWV

280 (D)411 (D)

FIG. 6. Highly conserved regions of cellulases that contain amino acid residues that may be involved in a potential active site. Theboldface residues are also conserved in other cellulases in the same family (15). Line 1, E2 from T. fusca; line 2, CelA from M. bispora (6a);line 3, endoglucanase A (EG A) from Cellulomonas fimi (35); line 4, CasA from an alkalophilic Streptomyces strain (24); line 5,cellobiohydrolase II (CBH II) from T. reesei (31). The positions of the amino acids that are conserved are given for E2 and endoglucanase A;numbering is based on sequences that include a signal peptide.

J. BACTERIOL.

on March 16, 2018 by guest

http://jb.asm.org/

Dow

nloaded from

Page 9: Thermomonospora fusca

ENDOGLUCANASE GENES FROM THERMOMONOSPORA FUSCA 3405

A : 18 FIIGNTTAADDYSVVEEHGQB: 16 LFVGNSTSANNGSVVEQNGQC: 22 LLPSPASAAGTKTPVAKNGQD : 151 P S P D P G T Q P G T G T P V E R Y G K

L S I S N GL S I Q N GL S I K G TVQVCGT

A : 68 QFVNYESMKWLRDDWGITVF'RAAMYTSSGB: 66 QFVNYDSIKWLRDDWGITVFRAAMYTSSGC : 72 DFVNKDSLKWLRDDWGITVFRAAMYTADGD :201 HCLTDSSLDALAYDWKADIIRLSMYIQED

E L V N D R G E P V Q L K G M S S H G L Q W Y GQLVNEHGDPVQLKGMS SHGLQWYGQLVNRDGKAVQLKG I SSHGLQWYGQ L C D E H G N P V Q L R G M S T H G I Q W F D

G Y I E DG Y I E DG Y I D NG Y E T NGYIED+

P - s vP - s vP - s vP R G F* + _

KEKVKEAVEAAIKEKVKEAVEAAIKNKVKEAVEAAKTDR I DQLIDMAT

A : 118 DLGIYVIIDW H I L S D N D PB : 116 DLGIYVIIDW H I L S D N D PC : 122 ELGIYVIIDW H I L N D G N PD : 251 ARGLYVIVDW I L T P G D P

N I Y K E E A K D F F DN I Y K E EA K E FN Q N K E K A K E FH Y N L D RA K T F

E M S E L Y G D Y P NF D E MF K E MF A E I

V I Y E I AS A L Y G D Y P N V I Y E I AS S L Y G N T P N V I Y E I AA QRH AS KT NV LY EI A

A : 168 N G S D V T W D N QB : 166 N G H N V R W D S HC : 172 NG-DVNWKRDD : 301 N G V S - - - W A S

I K P Y A EI K P Y A EI K P Y A EI K S Y A E

E V I P VE V I P VE V I S VE V I P V

I R N N DI R A N DI R K N DI R Q R DIR*NN+D

A : 218 H A A D N Q L T D P N V M Y A F H F YA G T H GQ NB : 216 EAADNQLDDPNVMYAFHFYAGTHGQQC : 222 DAADDQLKDANVMYALHFYAGTHGQSD : 351 E I A A N P V N A S N I M Y A F H F Y A A S H R D N

A : 268 ATGDGGVFLDEAQVWIDFMDERN:B : 266 ATGDGGVFLDEAQVWIDFMDERN:C :272 A S GN GG V F L DQ S RE W L N Y L DS K ND :401 Y T G D G A ND FQMADRY I D L M AE R K

A :318 - W T E AE L S P SG T FV R E K IRES A TB :316 - W T A AE L S P SG A FV R E K IR ES A SC :322 - W P L TD L T A SG T FV R E N IR GT K DD :451 P W S G SS L K A SG Q WV R Q A QS----

P N N I I IP N N I V IP D N I I IP D S V I I

V G T G T W S - -VGTATWS--V G T G T W S - -

V G T R G W S S L*VGTGTWS--

G - - - G

G V S E G

Q D V HQ D V HQ D V NS G P A

L R D Q V D Y A L D Q G A A I F V S E W G T S EL R N Q V D Y A L S R G A A I F V S E W G T S AL R D K A N Y A L S K G A P I F V T E W G T S DY L N A L R - E A S E L F P V F V T E F G T E T

SWANWSSWANWSSWVNWNGWTKWN

L T H K D E S S A A L M P G A S P T G GL T H K D E S S A A L M P G A N P T G GL S D K F E S GS A L K P G A S K T G GY S D D F R S G A V F Q P G T C A S G G_ . . . . . * + + + + + * * + _ + + * *

T P P S D P T P P S D P D P G E P - - - - - - - - - -

I P P S D P T P P S D P D P G E P D P T P P S D P G ES T K D V P E T P A Q D N P T Q E K G V S V Q Y K A G-_ - _ _ _ __- __- __- __- __- __- _- __- _ _- _ _- _ _-

FIG. 7. Comparison of the protein sequences of cellulases from T. fusca, Bacillus sp., and B. subtilis. Line A, CeIA from Bacillus sp. (9);line B, CelB from Bacillus sp. (9); line C, cellulase from B. subtilis (29); line D, E. from T. fusca. Identical matches are indicated by *,chemically similar amino acids by +, and dissimilar residues by -. Of the 307 aligned positions shown, 29% were conserved in all familymembers and 79% were identical or similar. The numbering of the above sequences includes the signal peptide sequences of the genes. Aminoacids that may be involved in a potential active site are enclosed by a box. These residues are conserved in most cellulases of the Bacillusfamily, including Bacillus strain N-4 (8), CelF from Bacillus strain 1139 (7), CeIZ from Erwinia chrysanthemi (13), and a cellulase fromClostridium acetobutylicum (37). Numbering of the amino acids is based on sequences that include a signal peptide.

showed a high degree of similarity. The region of greatesthomology (amino acids 128 to 434) is located approximatelymidway between the N-terminal end and the C-terminal endof E5. The cellulases from several strains of B. subtilis alsoshowed significant similarity to E5, indicating that E. be-longs to family A of Henrissat et al. (15).A statistical determination of the appropriate number of

gaps and a subsequent alignment of the sequences of twocellulases from Bacillus sp., one from B. subtilis, and E5were performed. The results indicate that out of 336 alignedpositions, 29% were conserved and 79% were identical orsimilar in all sequences (Fig. 7). The most strongly con-served region includes several stretches near the C terminusof E5 and the middle of the Bacillus cellulases. All enzymescompared were from 409 to 508 amino acids long. Moderatesimilarity was found when E5 was compared with the cellu-lases from alkalophilic Bacillus strain 1139 (7). When thecellulase from alkalophilic Bacillus strain 1139 was alignedwith the above-mentioned family of cellulases, the percent

identity decreased to 20%. However, two regions (255 to 263and 292 to 302 in E5) are strongly conserved. Within theseregions, a histidine residue (amino acid 260 in E5) and aglutamic acid residue (amino acid 299 in E5) are found in allsequences (Fig. 7). Conservation in such a large number ofcellulases is suggestive of the involvement of these aminoacids in the active site. E5 also shows homology to acellulase (celA) present in Clostridium acetobutylicum, with39% identity in a 299-amino-acid overlap region (37).The protein sequence of E4 showed some similarity to that

of the P-1,4-endoglucanase from Bacillus sp. However, aBESTFIT alignment of the two protein sequences indicatedonly 33% similarity and 14% identity when a gap weight of6.0 was used. (E5 showed 60% similarity and 39% identityunder the same analysis conditions.) Therefore, E4 cannotbe considered to be homologous to the Bacillus sp. cellulase.Homology between the N terminus of the E4 protein se-quence and that of a cellulase from avocado (33) wasrevealed by a BESTFIT alignment, which showed 44%

N E .N E PN E PN EP*IEI*

VOL. 173, 1991

on March 16, 2018 by guest

http://jb.asm.org/

Dow

nloaded from

Page 10: Thermomonospora fusca

3406 LAO ET AL.

1 EPAFNYAEALQKSMFYEAQRSGKLPENNRVSWRSGLINGADVGDLT 50..:1:1:11I::1 :1:1111111.1.1:.1 111111.11.. :11.

1 ASDLHYSDALEKSILFFEGQRSGKLPTNQRLTKRMSGLSDGSSYHVDLV 50

51 GGWYDAGDHVKGFPMAFTATLJASKPGRLHRSGOYLKDNLRWVN 100

51 GGYYDAGDNIXFGIPMFTTTALAGIIEFGCL.QEQVENARAALRNST 99

101 DYFIKA. HPSPNVLYVQVGDGDADHKNWGPAEVhGEPSFKVDPSCPGS 14911I::I11 :..1 111111:.:111I: 1:.:1 1. .1 :11.. 111

100 DYILKASTATSNSLYVQVGEPNADHRCWERPEDMDTPRNVYKVSTQNPGS 149

.150 DVAAETAAAMAASSIVFADDDPAYAATLVQHAKQLYTFADTYRGVYSDCV 199

150 DVAAETAAALAAASIVFGDSDSSYSTKLLHTAVKVFEFADQYRGSYSDSL 199

200 PA..GAFYNSWSGYQDELVWGAYWLYKATGDDSYLAKAEYEYDFLSTEQQ 247

200 GSVVWPFYCSYSGYNDELLWGASWLHRASQNASYMTYIQSNGHTLGADDD 249

FIG. 8. Alignment of the N-terminal regions of E4 from T. fuscaand a cellulase from avocado (33). The sequence of E4 appears onthe top line. Identity between amino acids (50%) is indicated by asolid line; strong chemical similarity is represented by a colon, andmoderate similarity is shown by a period. Numbering does notinclude the signal peptide.

overall identity with 10 gaps in the entire sequence and 50%identity with 3 gaps in a 247-residue sequence (Fig. 8). E4also showed strong homology (45% identity) to a Dictyostel-ium discoideum spore-germinating protein (12).

DISCUSSION

The results of the protein sequence comparisons showedthat T. fusca contains cellulases which belong to at leastthree different families. E2 belongs to family B and E4 tofamily D, while E5 belongs to family A of Henrissat et al.(15). These results suggest that the T. fusca cellulases haveall originated from different organisms, rather than from asingle source. While T. fusca E2 and Cellulomonas fimiCenA show strong homology in their catalytic domains, theE2 cellulose binding domain is at its C terminus and theCenA cellulose binding domain is at its N terminus. Thesefindings suggest that the location of the cellulose bindingdomain is not important for the function of these enzymes.A surprising result is that while the DNA sequences of the

E2 gene and the M. bispora celA gene showed 73% identity,the protein sequences of the cellulases showed only 54%identity. An analysis of the differences between these twocellulases by the method of Li et al. (19) showed that thegenes diverged about 1.4 x 108 years ago, assuming a rate of5 x 10' nucleotide substitutions per year. Furthermore, theratio of synonymous to nonsynonymous substitutions is only2.3, while the ratio for most genes is 10 to 100 (26). Thissuggests that the constraints maintaining the amino acidsequence were quite low in these genes.

ACKNOWLEDGMENTS

We gratefully acknowledge the assistance of Mike Spezio with thehomology studies reported here and Charles Aquadro with thesequence divergence studies.

This work was supported by grant 84ER13233 from the Depart-ment of Energy and by a grant from the Cornell BiotechnologyProgram, which is sponsored by the New York State Science andTechnology Foundation, a consortium of industries, the U.S. ArmyResearch Office, and the National Science Foundation.

REFERENCES1. Beguin, P., N. R. Gilkes, D. G. Kilburn, R. C. Miller, Jr., P.

O'Neill, and R. A. J. Warren. 1987. Cloning of cellulase genes.Crit. Rev. Biotechnol. 6:129-162.

2. Beguin, P., M. Rocancourt, M. Chebrou, and J. P. Aubert. 1986.Mapping ofmRNA encoding endoglucanase A from Clostridiumthermocellum. Mol. Gen. Genet. 202:251-254.

3. Calza, R. E., D. C. Irwin, and D. B. Wilson. 1985. Purificationand characterization of two ,3-1,4-endoglucanases from Thermo-monospora fusca. Biochemistry 24:7797-7804.

4. Chen, C. M., M. Gritzall, and D. W. Stafford. 1987. Nucleotidesequence and deduced primary structure of cellobiohydrolase IIfrom Trichoderma reesei. Bio/Technology 5:274-278.

5. Colimer, A., and D. B. Wilson. 1983. Cloning and expression ofa Thermomonospora YX endocellulase gene in E. coli. Bio/Technology 1:594-601.

6. Devereux, J., P. Haeberli, and 0. Smithies. 1984. A comprehen-sive set of sequence analysis programs for the VAX. NucleicAcids Res. 12:387-395.

6a.Eveleigh, D. Unpublished data.7. Fukumori, F., T. Kudo, Y. Narahashi, and K. Horikoshi. 1986.

Molecular cloning and nucleotide sequence of the alkalinecellulase gene from the alkalophilic Bacillus sp. strain 1139. J.Gen. Microbiol. 132:2329-2335.

8. Fukumori, F., T. Kudo, N. Sashihara, Y. Nagata, K. Ito, and K.Horikoshi. 1989. The third cellulase of alkalophilic Bacillus sp.strain N-4: evolutionary relationships within the cel gene family.Gene 76:289-298.

9. Fukumori, F., N. Sashihara, T. Kudo, and K. Horikoshi. 1986.Nucleotide sequences of two cellulase genes from alkophilicBacillus sp. strain N-4 and their strong homology. J. Bacteriol.168:479-485.

10. Ghanagas, G. S., and D. B. Wilson. 1988. Cloning of theThermomonospora fusca endoglucanase E2 gene in Streptomy-ces lividans: affinity purification and functional domains of thecloned product. Appl. Environ. Microbiol. 54:2521-2526.

11. Gilkes, N. R., R. A. J. Warren, R. C. Miller, Jr., and D. J.Kilburn. 1988. Precise excision of the cellulose binding domainsfrom two Cellulomonas fimi cellulases by a homologous prote-ase and the effect on catalysis. J. Biol. Chem. 263:10401-10407.

12. Giorda, R., T. Ohmachi, D. R. Shaw, and H. L. Ennis. Biochem-istry, in press.

13. Giuseppi, A., B. Cami, J. L. Aymerie, Y. Ball, and N. Creuzet.1988. Homology between endoglucanase Z of Erwinia chrysan-themi and endoglucanases of Bacillus subtilis and alkophilicBacillus. Mol. Microbiol. 2:159-164.

14. Hall, J., and H. J. Gilbert. 1988. The nucleotide sequence of acarboxymethylcellulase gene from Pseudomonas fluorescenssubsp. cellulosa. Mol. Gen. Genet. 213:112-117.

15. Henrissat, B., M. Claeyssens, P. Tomme, L. Lemesie, and J.Mornon. 1989. Cellulase families revealed by hydrophobic clus-ter analysis. Gene 81:83-95.

16. HunkapilHer, M. W., E. Lujan, F. Ostrander, and L. E. Hood.1983. Isolation of microgram quantities of proteins from poly-acrylamide gels for amino acid sequence analysis. MethodsEnzymol. 91:227-326.

17. Hutter, R., and T. Eckhardt. 1988. Genetic manipulation, p.89-184. In M. Goodfellow, S. T. Williams, and M. Mordarski(ed.), Actinomycetes in biotechnology. Academic Press, Inc.,New York.

18. Katz, E., C. J. Thompson, and D. A. Hopwood. 1983. Cloningand expression of the tyrosinase gene from Streptomyces anti-bioticus in Streptomyces lividans. J. Gen. Microbiol. 129:2703-2714.

18a.Lao, G. Unpublished data.19. Li, W., C. Wu, and C. Luo. 1985. A new method for estimating

synonymous and nonsynonymous rates of nucleotide substitu-tion considering the relative likelihood of nucleotide and codonchanges. Mol. Biol. Evol. 2:150-174.

20. Lin, E., and D. B. Wilson. 1987. Regulation of P-1,4-endogluca-nase synthesis in Thermomonospora fusca. Appl. Environ.Microbiol. 53:1352-1357.

21. Lin, E., and D. B. Wilson. 1988. Identification of a celE binding

J. BACTERIOL.

on March 16, 2018 by guest

http://jb.asm.org/

Dow

nloaded from

Page 11: Thermomonospora fusca

ENDOGLtJCANASE GENES FROM THERMOMONOSPORA FUSCA 3407

protein and its potential role in induction of the celE gene inThermomonospora fusca. J. Bacteriol. 170:3843-3846.

22. Lin, E., and D. B. Wilson. 1988. Transcription of the celE genein Thermomonosporafusca. J. Bacteriol. 170:3838-3842.

23. McGinnis, K. Unpublished data.24. Nakai, R., and S. Durand. 1988. Cloning and nucleotide se-

quence of a cellulase gene casA, from an alkalophilic Strepto-myces strain. Gene 65:229-238.

25. Nakamura, A., T. Uozumi, and T. Beppt. 1987. Nucleotidesequence of,a cellulase gene of Bacillus subtilis. Eur. J. Bio-chem. 164:317-320.

26. Nei, M. 1987. Molecular evolutionary genetics. Columbia Uni-versity Press, New York.

27. Petricek, M., K. Stajner, and P. Tichy. 1989. Cloning of athermostable amylase gene from Thermomonospora curvataand its expressiop in Streptomyces lividans. J. Gen. Microbiol.135:3303-3309.

28. Reeck, G. R., P. J. Isackson, and D. C. Teller. 1982. Domainstructure in high molecular weight high mobility group nonhis-tone chromatin proteins. Nature (London) 300:76-78.

29. Robson, L. M., and G. H. Chambliss. 1987. Endo-,B-1,4-gluca-nase gene of Bacillus subtilis DLG. J. Bacteriol. 169:2017-2025.

30. Sanger, F., S. Nicklen, and A. R. Coulson. 1977. DNA sequenc-ing with chain-terminating inhibitors. Proc. Natl. Acad. Sci.USA 74:5463-5467.

31. Teern, T. T., P. Lehtovaara, S. Kauppenen, I. Salovuori, and J.Knowles. 1987. Homologous domains in Trichoderma reesei

cellulolytic enzymes: gene sequence and expression of cellobio-hydrolase II. Gene 51:43-52.

32. Towbin, H., T. Staehelin, and J. Gordon. 1979. Electrophoretictransfer of proteins from polyacrylamide gels to nitrocellulosesheets: procedure and some applications. Proc. Natl. Acad. Sci.USA 76:4350-4354.

33. Tucker, M. C., M. L. Durben, M. T. Clegg, and L. N. Lewis.1987. Avocado cellulase: nucleotide sequence of a putativefull-length cDNA clone and evidence for a small gene family.Plant Mol. Biol. 9:197-203.

33a.Wilbur, W. J., and D. J. Lipman. 1983. Rapid similaritysearches of nucleic acid and protein data banks. Proc. Natl.Acad. Sci. USA 80:726-730.

34. Wilson, D. B. 1988. Cellulases of Thermomonospora fusca.Methods Enzymol. 160:314-323.

35. Wong, W. K. R., B. Gerhard, Z. M. Guo, D. G. Kilburn,R. A. J. Warren, and R. C. Miller, Jr. 1986. Characterizationand structure of an endoglucanase gene cenA of Cellulomonasfimi. Gene 44:315-324.

36. Yanisch-Perron, C., J. Vieira, and J. Messing. 1985. ImprovedM13 phage cloning vectors and host strains: nucleotide se-quences of the M13mpl8 and pUC19 vectors. Gene 33:103-119.

37. Zappe, H., W. A. Jones, D. T. Jones, and D. R. Woods. 1988.Structure of an endo-beta-1,4-glucanase gene from Clostridiumacetobutylicum P262 showing homology with endoglucanasegenes from Bacillus spp. Appl. Environ. Microbiol. 54:1289-1292.

VOL. 173, 1991

on March 16, 2018 by guest

http://jb.asm.org/

Dow

nloaded from