9
THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1991 by The American Society for Biochemistry and Molecular Biology, Inc. Vol. 266, No. 16, Issue of June 5, PP. 10461-10469,1991 Printed in U. S. A. Homologies between Members of the Germin Gene Family in Hexaploid Wheat and Similarities between TheseWheat Germins and Certain Ph ysarum Spherulins” (Received for publication, November 14, 1990) Byron G. Lane$, Franpois Bernierp, Ella Dratewka-Kos, Roshan Shafai, Theresa D. Kennedy, Caron Pyne, J. Ronald Munro, Tristan Vaughan, Dawn Walters, and Filiberto Altomare From the Biochemistry Department, University of Toronto, Toronto, Ontario, Canada M5S lA8 and the SDeparternent de Biologie, Facult6 des Sciences et de Genie, Uniuersiti Laval, Ste.-Foy, Qdbec GlK 7P4, Canada By screening -10’ plaques in a wheat DNA library with a “full-length” germin cDNA probe, two genomic clones were detected. When digested with EcoRI, one clone yielded a 2.8-kilobase pair fragment (gf-2.8) and the other yielded a 3.8-kilobase pair fragment (gf-3.8). By nucleotide sequencing, each of gf-2.8 and gf-3.8 was found to encode a complete sequence for germin and germin mRNA, and to contain appreciable amounts of 5’- and 3”flanking sequences. The ‘cap” site in gf- 2.8 was determined by primer extension and the cor- responding site in gf-3.8 was deduced by analogy. The mRNA coding sequences in gf-2.8 and gf-3.8 are in- tronless and 87% homologous with one another. The 6’-flanking regions in gf-2.8 and gf-3.8 contain rec- ognizable sites of what are probably cis-acting ele- ments but there is otherwise little if any significant similarity between them. In addition to putative TATA and CAAT boxes in the 5”flanking regions of gf-2.8 and gf-3.8, there are AT-rich inverted-repeats, GC boxes, long purine-rich sequences, two 19-base pair direct-repeat sequences in gf-2.8, and a remarkably long (200-base pair) inverted-repeat sequence (-90% homology) in gf-3.8. An 8% difference between the mature-protein coding regions in gf-2.8 and gf-3.8 is reflected by a corresponding 7% difference between the corresponding 201-residue proteins. Most signifi- cantly, the same 8% difference between the mature- protein coding regions in gf-2.8 and gf-3.8 is allied with no change whatever in a central part (61-151) of the encoded polypeptide sequences. It seems likely that this central, strongly conserved core in the germins is of first importance in the biochemical involvements of the proteins. When an equivalence is assumed between like amino acids, the gf-2.8 and gf-3.8 germins show significant (-44%) similarity to spherulins la and l b of Physarumpolycephalum, a similarity that increases to -50% in the conserved core of germin. Near the middle (87-96) of the conserved core in the germins is a rare PH(I/T)HPRATEI decapeptide sequence which is shared by spherulins (la and lb) and germins (gf- 2.8 and gf-3.8). These similarities are discussed in the context of evidence which can be interpreted to suggest * Continuous financial support, solely by Grant MRC-MT-1226 from the Medical Research Council of Canada over the past 30 years, is warmly appreciated. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solelyto indicate this fact. The nucleotide sequencefs) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession number($ M63223andM63224. + To whom correspondence should be addressed. that the biochemistry of germinsandspherulinsis involved with cellular, perhaps cell-wall responses to desiccation, hydration, and osmotic stress. Of special interest in this regard, the 5“flanking region in the gf-2.8 gene contains twosequences which are charac- teristic of auxin-responsive genes. Mature embryos (-5% water) can be isolated en masse from dry grains of field-ripened wheat (Johnston and Stern, 1957). When cultured in water, these embryos imbibe water (-1 h) but the resulting, partially hydrated embryos (-60% water) resume growth only after a lag period (-4 h). The lag interval between the end of the period of partial hydration and the resumption of growth has been called a period of “germina- tion” (Marcus, 1969). During germination, the translatable mRNA population undergoes significant change (Thompson and Lane, 1980):a conserved mRNA population in the mature embryo is replaced by a newly synthesized mRNA population which supports renewed growth of the germinated embryo (reviewed in Lane (1988)). Although change in the translatable mRNA population is largely completed during germination, it is not finished only in concert with the onset of renewed growth at -5 h postim- bibition, there is nascent synthesis of a translatable mRNA that encodes a novel protein we initially called g (Thompson and Lane, 1980) and later named germin (Lane, 1985). The resumption of growth leads, by -24 h postimbibition (Lane et al., 1986),to full hydration of the embryos (-85% water)- andthis “water growth” (Jaikaran et al., 1990) occurs in alliance with a selective accumulation of germin mRNA (Rah- man et al., 1988) and germin itself (Lane and Kennedy, 1981; Grzelczak et al., 1982, 1985; Grzelzcak and Lane, 1983, 1984; Lane et al., 1986, 1987;Lane, 1988). Germin is a rather rare water-soluble homopentameric pro- tein (-0.1% of the soluble proteins). It is refractory to diges- tion by broad-specificity proteases and to dissociation in SDS-containing reducing environments (Grzelczak and Lane, 1984). Germin is made in antigenically related isoforms during germination of all of the economically important ce- reals examined barley, oat, rye, wheat (Grzelczak et al., 1985), corn, and rice.’ Because of its peculiar temporal expression, it is possible that a significant part of the changes that occur during cereal germination is directed toward expression of the germin gene. The abbeviations used are: SDS, sodium dodecyl sulfate; bp, base D. Walters, E. Dratewka-Kos, T. D. Kennedy, and B. G. Lane, pair; kbp, kilobase pair. unpublished results. 10461

OF CHEMISTRY Vol. 266, No. 16, Issue of June 5, PP. 10461 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1991 by The American Society for Biochemistry and Molecular Biology, Inc. Vol

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: OF CHEMISTRY Vol. 266, No. 16, Issue of June 5, PP. 10461 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1991 by The American Society for Biochemistry and Molecular Biology, Inc. Vol

THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1991 by The American Society for Biochemistry and Molecular Biology, Inc.

Vol. 266, No. 16, Issue of June 5 , PP. 10461-10469,1991 Printed in U. S. A.

Homologies between Members of the Germin Gene Family in Hexaploid Wheat and Similarities between These Wheat Germins and Certain Ph ysarum Spherulins”

(Received for publication, November 14, 1990)

Byron G. Lane$, Franpois Bernierp, Ella Dratewka-Kos, Roshan Shafai, Theresa D. Kennedy, Caron Pyne, J. Ronald Munro, Tristan Vaughan, Dawn Walters, and Filiberto Altomare From the Biochemistry Department, University of Toronto, Toronto, Ontario, Canada M5S lA8 and the SDeparternent de Biologie, Facult6 des Sciences et de Genie, Uniuersiti Laval, Ste.-Foy, Qdbec GlK 7P4, Canada

By screening -10’ plaques in a wheat DNA library with a “full-length” germin cDNA probe, two genomic clones were detected. When digested with EcoRI, one clone yielded a 2.8-kilobase pair fragment (gf-2.8) and the other yielded a 3.8-kilobase pair fragment (gf-3.8). By nucleotide sequencing, each of gf-2.8 and gf-3.8 was found to encode a complete sequence for germin and germin mRNA, and to contain appreciable amounts of 5’- and 3”flanking sequences. The ‘cap” site in gf- 2.8 was determined by primer extension and the cor- responding site in gf-3.8 was deduced by analogy. The mRNA coding sequences in gf-2.8 and gf-3.8 are in- tronless and 87% homologous with one another. The 6’-flanking regions in gf-2.8 and gf-3.8 contain rec- ognizable sites of what are probably cis-acting ele- ments but there is otherwise little if any significant similarity between them. In addition to putative TATA and CAAT boxes in the 5”flanking regions of gf-2.8 and gf-3.8, there are AT-rich inverted-repeats, GC boxes, long purine-rich sequences, two 19-base pair direct-repeat sequences in gf-2.8, and a remarkably long (200-base pair) inverted-repeat sequence (-90% homology) in gf-3.8. An 8% difference between the mature-protein coding regions in gf-2.8 and gf-3.8 is reflected by a corresponding 7% difference between the corresponding 201-residue proteins. Most signifi- cantly, the same 8% difference between the mature- protein coding regions in gf-2.8 and gf-3.8 is allied with no change whatever in a central part (61-151) of the encoded polypeptide sequences. It seems likely that this central, strongly conserved core in the germins is of first importance in the biochemical involvements of the proteins. When an equivalence is assumed between like amino acids, the gf-2.8 and gf-3.8 germins show significant (-44%) similarity to spherulins l a and l b of Physarumpolycephalum, a similarity that increases to -50% in the conserved core of germin. Near the middle (87-96) of the conserved core in the germins is a rare PH(I/T)HPRATEI decapeptide sequence which is shared by spherulins ( la and lb) and germins (gf- 2.8 and gf-3.8). These similarities are discussed in the context of evidence which can be interpreted to suggest

* Continuous financial support, solely by Grant MRC-MT-1226 from the Medical Research Council of Canada over the past 30 years, is warmly appreciated. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequencefs) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession number($ M63223andM63224. + To whom correspondence should be addressed.

that the biochemistry of germins and spherulins is involved with cellular, perhaps cell-wall responses to desiccation, hydration, and osmotic stress. Of special interest in this regard, the 5“flanking region in the gf-2.8 gene contains two sequences which are charac- teristic of auxin-responsive genes.

Mature embryos (-5% water) can be isolated en masse from dry grains of field-ripened wheat (Johnston and Stern, 1957). When cultured in water, these embryos imbibe water (-1 h) but the resulting, partially hydrated embryos (-60% water) resume growth only after a lag period (-4 h). The lag interval between the end of the period of partial hydration and the resumption of growth has been called a period of “germina- tion” (Marcus, 1969). During germination, the translatable mRNA population undergoes significant change (Thompson and Lane, 1980): a conserved mRNA population in the mature embryo is replaced by a newly synthesized mRNA population which supports renewed growth of the germinated embryo (reviewed in Lane (1988)).

Although change in the translatable mRNA population is largely completed during germination, it is not finished only in concert with the onset of renewed growth at -5 h postim- bibition, there is nascent synthesis of a translatable mRNA that encodes a novel protein we initially called g (Thompson and Lane, 1980) and later named germin (Lane, 1985). The resumption of growth leads, by -24 h postimbibition (Lane et al., 1986), to full hydration of the embryos (-85% water)- and this “water growth” (Jaikaran et al., 1990) occurs in alliance with a selective accumulation of germin mRNA (Rah- man et al., 1988) and germin itself (Lane and Kennedy, 1981; Grzelczak et al., 1982, 1985; Grzelzcak and Lane, 1983, 1984; Lane et al., 1986, 1987; Lane, 1988).

Germin is a rather rare water-soluble homopentameric pro- tein (-0.1% of the soluble proteins). It is refractory to diges- tion by broad-specificity proteases and to dissociation in SDS-containing reducing environments (Grzelczak and Lane, 1984). Germin is made in antigenically related isoforms during germination of all of the economically important ce- reals examined barley, oat, rye, wheat (Grzelczak et al., 1985), corn, and rice.’ Because of its peculiar temporal expression, it is possible that a significant part of the changes that occur during cereal germination is directed toward expression of the germin gene.

’ The abbeviations used are: SDS, sodium dodecyl sulfate; bp, base

D. Walters, E. Dratewka-Kos, T. D. Kennedy, and B. G. Lane, pair; kbp, kilobase pair.

unpublished results.

10461

Page 2: OF CHEMISTRY Vol. 266, No. 16, Issue of June 5, PP. 10461 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1991 by The American Society for Biochemistry and Molecular Biology, Inc. Vol

10462 Germin Gene Family

A virtually full-length germin cDNA was isolated (Rahman et al., 1988) and its polynucleotide sequence was determined (Dratewka-Kos et al., 1989). This germin cDNA has been used as a probe to show that germin is encoded in a multigene family which maps primarily, to chromosomes 4A (-5 copies), 4B (-3 copies), and 4D (-9 copies) in hexaploid wheat.3 In this study, germin cDNA has been used to detect genomic clones by screening -lo6 plaques in a wheat DNA library (Murray et al., 1984). The nucleotide sequences of a 2.8-kbp fragment (gf-2.8) from one genomic clone, and of a 3.8-kbp fragment (gf-3.8) from another clone have been determined each encodes a germin mRNA and contains 5'- and 3'- flanking sequences.

Structural-protein (224 amino acids) coding regions in gf- 2.8 and gf-3.8 are intronless and 90% similar, but aside from some scattered sites of what are likely to be cis-acting ele- ments, there is no similarity between the 5"flanking se- quences in gf-2.8 and gf-3.8. The degree of similarity (92%) is fairly constant throughout the mature-protein (201 amino acids) coding regions in gf-2.8 and gf-3.8 but significantly, a central part (91 amino acids) of the corresponding proteins is fully conserved. This conserved core in the wheat germins is -50% similar to the corrresponding region in two of the slime- mold spherulins (la and l b ) that have been shown to accu- mulate, specifically, during spherulation of the Physarum polycephalum plasmodium (Bernier et al., 1986, 1987).

Spherulation is a process that is induced if the Polyce- phalum physarum plasmodium is subjected to starvation, osmotic stress, extremes of temperature, or other forms of environmentral stress, and it leads to encystment, desiccation and developmental arrest (Jump, 1954; Chet and Rusch, 1969) (see also Gorman and Wilkins (1980) and Raub and Aldrich (1982)). The principal spherulation-specific mRNAs (for spherulins la, lb, 2a, 3a) are not present in encysting amoe- bae, sporulating plasdmodia, or vegetative plasmodia but, by 24 h after the onset of plasmodium starvation, they account for -10% of the total mRNA in the organism (Bernier et al., 1986). Full nucleotide sequences have been determined for the principal spherulin-specific mRNAs (Bernier et al., 1987). Similarities between germins (gf-2.8 and gf-3.8) and spheru- lins ( la and Ib) imply, not only that these proteins may have related biochemical involvements, but that there may also be a previously unsuspected affinity, at the molecular level, be- tween the biology of cereal germination and Physarum spher- ulation. These and other findings and their implications are subjects of this report.

EXPERIMENTAL PROCEDURES

Materials-The wheat DNA library used in this work was gener- ously supplied by Dr. Michael G. Murray of the Advanced Research Division at Agrigenetics Corporation (Madison, WI). Dr. Murray also supplied the host organism (ED87671 used to prepare his Charon 32 library (Murray et al., 1984). The library was constructed in Novem- ber, 1982, using 15-23-kbp fragments, which had been derived by partial EcoRI digestion of wheat DNA (Murray and Thompson, 1980). The library originally contained 1.5 X lo6 different clones, enough to ensure 99% probability of detecting a single-copy sequence in the wheat genome (6 X lo9 bp/haploid genome), and when it was used in this investigation, in January, 1987, it had a titer of -0.5 X 10' plaque-forming units/ml. The virtually full-length cDNA used to screen the wheat DNA library was prepared as described (Rahman et al., 1988). The polynucleotide sequence of this cDNA has been re- ported (Dratewka-Kos et al., 1989). Deoxyribonucleotide primers used for primer-extension studies were prepared by and purchased from The Biotechnology Service Centre of The Hospital For Sick Children Research Institute (Toronto). ["SIDNA molecular weight markers were purchased from Amersham Life Science Products (catalog No.

'' M. D. Gale, unpublished results.

SJ.5000) and unlabeled DNA (DRIgest) markers (catalog No. 27- 4056-01) were purchased from Pharmacia LKB Biotechnology Inc.

Procedures Used to Screen the Wheat DNA Library-The library was diluted 50-fold with SM buffer (0.58 g of NaCI, 0.2 g of MgSO,. 7H20,5 ml of 1 M Tris chloride (pH 7.5), 0.5 ml of 2% gelatin in 100 ml of water) and the diluted library (50 pl) was mixed with 300 pl of ED8767 in 10-ml Falcon tubes for incubation at 37 "C (2 h) before being mixed with 6.5 ml of top-agarose (0.7%) and then poured over bottom-agar (1.5%) in a 137-mm Petri plate (150 cm'). The top- agarose and bottom-agar were prepared in NZC medium and the liquid culture of ED8767 was prepared by inoculating a single colony (from an overnight NZC plate) in 50 ml of NZC broth (supplemented with 500 pl of 20% maltose); after shaking overnight to obtain AeW nm -1.5, the culture was centrifuged and the pellet was suspended in 10 mM MgSO, for storage at 4 "C before use. After solidification of the top-agarose, 20 such plates were inverted and incubated at 37 "C for -6 h to obtain an appropriate plaque density (50,000 plaque-forming units/pIate).

In order to screen the plaques, duplicate lifts (first 2 min; second 3 min) were prepared using either Schleicher and Schuell BA85 or Millipore HATF nitrocellulose. The filters were placed, in succesion, in each of the following solutions: 1.5 M NaCl, 0.5 M NaOH (1 min), 1.5 M NaCl, 0.5 M Tris chloride (pH 8) (5 min) and 2 X SSC (5 min) before air-drying on filter paper. The filters were prewashed (first 0.5 h; second 1.5 h) at -45 "C in a solution that was prepared by mixing 100 ml of 1 M Tris chloride (pH 8), 116.8 g of NaC1, 4 ml of 0.5 M EDTA, 10 ml of 20% SDS, and diluting to a final volume of 2 liters; 1 liter of this prewash solution was used for 40 filters. The filters were then "prehybridized" (3 h) at -60 "C in a solution that was made by mixing 250 ml of 20 X SSC, 100 ml 5% Blotto (5 g of Carnation milk powder, 10 ml of phosphate-buffered saline, 90 ml of sterile water), 6.9 g of NaH2P04.H20, 5 ml of 20% SDS, 10 g of

5.51-1 Hpall-8 Sphl-1 8.~1.2 Sphl.2

0 . 3 1.2 1.0 2.0 2.5

0.07 0.8 1.1 1.8 2.8 EcoRI.1 EeoRI.2

Xb.l.1 Xb.l.2 CI.l.1 CI.1-2

Hind111 Kpnl

4- - + - t

" - e

gf-2.8

0.5 1.1 1.7 2.1 EcoRI-1

2.8 3.2 3.7

0.2 0.0 1 . o 1.5 2.1 2.B EooRI-2

3.8

EcoRV Xb.1 FnUkHI-3 Hlndl l l -1 Hlndlll.2 Hlndlll-3 4- c

4= "

gf-3.8 FIG. 1. Sites at which scissions were made with restriction

enzymes and overlapping sequences that were determined in arriving at the deoxynucleotide sequences of the genomic fragments gf-2.8 and gf-3.8. Restriction sites are oriented with respect to the 5'-end of the noncoding strands of DNA.

Page 3: OF CHEMISTRY Vol. 266, No. 16, Issue of June 5, PP. 10461 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1991 by The American Society for Biochemistry and Molecular Biology, Inc. Vol

FIG. 2. Deoxynucleotide sequence of the noncoding strand in gf-2.8. The parts of the sequence which corre- spond to possible regulatory and mRNA sequences in the germin gene are in up- per-case lettering and the remainder of the sequence is in lower case.

1 q..ttcc.q. aaq.t.t.at arqca..... I 10 I 20 I 30

121 cttqqqtrqq SCSEqqcqc. qccEcC.qtq 61 .q.ca..qqt tR.q...Cc 255.......

111 MTTGGTCTA ATaqCtcaqa cCCoC..CCa 241 MTtaqatta ~ a . L c c t a a t CamtaCqaa 301 CCcqqaqRE tttccC~Lqq acRtq..qC

421 cacaqccqqq aqqcat*acc aqttqecqla 361 * t a C t t C a q . q t q U c l C R 1Rttq.atc

4 8 1 q e c u t q t c c caaqqcaacc CtcqtaqCta 541 CqcSctqEat t t t tqcacqt tCarqtqaEa 601 atatcaccca r t rcqt tatc Ccaq.cq.qt 661 CtcaIaCCqt qqtaqqtqaq aoaqacccac 721 C C O ~ ~ C S C C ~ tScaqqttCq Caaqacastt 781 arttqactaq taaaaacaaq t t a t a c c q a a 841 aqacaqqqaq wqccqqscq qcqcqctqqa 901 cqccaeaqqc CCqttqattg ECaRqaEaa 961 cttcccqcqa tcqaatat t t aaactqcqaq

lo81 Cattqqcaac aaqtcatcqa CtCAGMCAG 1141 AGAGTAGttl C q S S q S W C ~ aqCaqCCTCC 1201 cqqscqqcqq cqaqaccaqc t*qc.c,qqtt 1261 CqqRttttC aCCtrqC.tq rcqccarqca 1321 Rqctqaatq aaqaCSSfaa cCCttCaCCa 1381 acqaaaccca LaCqcacqac tqacqaqcaa 1141 Saaqaqcaat e.CCCCaLCqa qat.CaCCCC 1501 CaCaGCAGCC GGCqtGGCC t t q a R R q a 1561 RcgctqCR L q C t t c q t q C qtccATTTCC 1621 taqctraat t aqetCCATGC ATCCACTqR 1681 acqcctcacs I ~ ~ c c A C T C A I CCACCACAGC 1741 ACTCTCCATC AACAAACTCT AGCTGATCM

1861 CTTCGCCACC GACCCACACC CTCTCCAGGA 1101 GGGGTACTCC AUACCCTAG TAGCTGGCCI

1921 GGTCTCGGTG MCGGGCACA CGTGCMGCC 1981 CTCGTCCMG TTGGCCMGG CCGGCMCAC 2041 GCTCGACGTG GCCGAGTGGC CCGGTACCM 2101 CTTTGCTCCC GCAGGCACCA ACCCACCACA 2161 CGTGATGAAA GGTGAGCTTC TCGTCGGAAT 2221 CTACTCGAGG GIGGTGCGCG CCGUCAGAC 2281 CCAGTTCMC GTCGGTMGA CCGAGGCCTC 2341 CGGCATTGTC TTCGTGCCCC TCACGCKIT 2401 GCTCACCMG GCACTCCGGG TGCAGGCCAG 2461 CGCTGGGTTT TAATTTCTAG GAGCCTTCCC 2521 CATGCTAGCA A A A T T T M I A A T K T U C C A 2581 TCGCATGTAG TCGTGTMTA AGATTCMCA 2641 MCCMTATG AGGMTTGM TGTACTACTT 2701 CGGMTATAT MTMGCATT TTCGTataTT 2761 CtcCataqCC C a C q L C I D q a Cqqcaiatqt

I 10 I 20 I 30

1021 RtqatCaR E.CCqaCCqa tSaqCaqCaq

21121 tc

TCCTAGCTAA GCTTATTACA TAGCMCCAI 1800 polypcpt& Inadion Codon: 1799. GTTCGCMTC CTGTTACTAG CTCCGGCCGT 1860 CTTCTGTGTC GCCGACCTCG ACGGCMCGC 1920 CATGKGGAG GCCGGCGACG ACTTCCICTT 1980 GTCCACCCCG CACGCTGGGT CATCCACCCG CCTTGGUCC GTTCCTCATC CITGGTCCTC

MCGCCTCCG GTGKCATGA CGTGCCACCG CTCGACTCCG CCACGGGGCC T C C T T C U U

CCGTGACGGA ACCGCGTGGA AGATCGGUT GGAACMCCT TCATGCACTT G c c I G u l c c c

2040 2100 2160 2220 2280 2140

CGGCTCCMC CCGCCCATCC CMCGCCGGT 2400 GGTCGTGGAA CTTCTCMGT CCAAGTTTGC 2460

””” ~~~~~ ~~~ ~ ~.~~~ ~~ ~

TCMATGATA ATTATATMI TCCATATATG 2520 Polypeptide Tenn*rorion Codon: 2471- GAAGACATGT ATTCAAGTTI C A G R T M T C 2580 AGTTAGCCTC ATGGTGTIGC CITCGATCAG 2640 TTTATTGTCG TCTTTGTTCT TTTCACIGM 2700 TGTCCGAITA C C T T T T ~ ~ U qtcaaacatq 2760 mRNA Tm’narion Signal: 2729- tqaqcqatqt cqamtcact c.q*aaqaaZ 2820

I 40 I 50 I 60 2822

Germin Gene Family 10463

gf-2.8

glycine, 200 mg of yeast tRNA (in 40 ml of 1 M Tris (pH 7.5), and diluting to a final volume of 1 liter, using 1 liter of prehybridization solution for 40 filters.

For hybridization, 40 filters (20 duplicates), in 230 ml of hybridi- zation medium, were shaken -16 h at 65 “C at a speed setting of 3.25 (Brunswick Shaker) in a FridgeOSeal bowl. The hybridization me- dium contained 6 X lo6 cpm (5 ng) of cDNA probe per ml of prehybridization buffer, the probe having been made by the random- hexamer procedure (Feinberg and Vogelstein, 1983): 12.5 y1 of cDNA (0.1 yg/pl), 12.5 pl of random hexamers (0.1 unit/pl) and 55 pl of sterile water were mixed and the resulting solution was heated for 3 min at 100 ‘C before being mixed with 100 pl of 2.5 X random- hexamer buffer, 10 p1 of bovine serum albumin (10 yg/pl), 10 pl of Klenow fragment and 50 yl of [a-32P]dCTP (3000 Ci/mmol; 10 pCi/ gl), in a total volume of 250 yl. The probe was freed of unreacted nucleoside triphosphates by passage through a Sephadex G-50 column before use. After hybridization, the 40 filters were washed in the following solutions: 1 liter of 2 X SSC/O.l% SDS at room temperature (10 min), 1 liter of 2 X SSC/O.l% SDS at room temperature (15 min), twice in 1 liter of 2 X SSC/O.l% SDS at 65 “C (1 h), and twice with 1 liter of 1 X SSC/O.l% SDS at 65 “C (0.5 h). After air-drying, the filters were exposed to X-Omat AR film overnight in order to detect signals.

Procedures Used for Nucleotide Sequencing-Standard procedures used for deoxyribonucleotide sequencing have been described else- where (Dratewka-Kos et al., 1989); in addition, for this study, M13mp18(19) phage (Messing, 1988) as well as pEMBL18(19) plas- mids (Dente et al., 1983) were used to generate single-strand tem- plates, and Sequenase (U. S. Biochemical, catalog No. 70700)) and Taq polymerase (Promega, catalog No. PRQ5530) were used to tran- scribe single-strand templates. Full sequences were deduced from the overlaps obtained when deoxyribonucleotide sequences were deter- mined for (-70% of) each strand of the gf-2.8 and gf-3.8 clones.

RESULTS

Isolation of Genomic Clones-A virtually full-length clone of germin cDNA (Rahman et al., 1988) was labeled with 32P in order to screen -lo6 plaques in a wheat DNA library. The “stuffer fragments” in the library were derived by partial

EcoRI digestion of wheat DNA and were incorporated between the 17-kbp and 13-kbp “lambda arms” of Charon 32 (Murray et al., 1984). Seven possibly positive signals were detected in the primary screen and two of these proved to be bona fide positive plaques after rescreening. After full digestion with EcoRI and separation of the resulting products by electropho- resis in 0.8% agarose gel, each genomic clone yielded a series of discrete fragments, some of which gave positive signals when DNA blots of the digestion products were screened with “P-labeled germin cDNA. One genomic clone (stuffer frag- ment -11 kbp) yielded three such fragments (-0.6, -2.8, and -7 kbp), and the other (stuffer fragment -16 kbp) yielded two such fragments (-3.8 and -11 kbp) which hybridized with germin cDNA. The strongest signals were obtained with the 2.8-kbp (gf-2.8) and 3.8-kbp (gf-3.8) fragments and there- fore these were chosen for detailed study. Each of gf-2.8 and gf-3.8 was found to encode a full sequence for germin and germin mRNA.

Nucleotide Sequencing of gf-2.8 and gf-3.8-Strategies used to obtain overlaps from which the full deoxyribonucleotide sequences of gf-2.8 and gf-3.8 could be deduced are shown in Fig. 1. The sequence of the putative noncoding strand in gf- 2.8 is shown in Fig. 2, where segments corresponding to the mRNA sequence and possible 5’- and 3’-flanking regulatory sequences are depicted in upper-case lettering, whereas the remainder of the sequence is shown in lower case. The mRNA sequence of gf-2.8 is identical with that previously determined for a virtually full-length germin cDNA (Dratewka-Kos et ai., 1989), excepting only that the results of primer-extension mapping (see below) have shown that the “cap” site is at a position which is displaced 19 nucleotide residues 5’- to the 5’-end of the virtually (1075 bp) full-length (1094 bp) cDNA previously prepared (Rahman et al., 1988) by the technique of Gubler and Hoffman (1983). The sequence of the putative noncoding strand in gf-3.8 is shown in Fig. 3, and again,

Page 4: OF CHEMISTRY Vol. 266, No. 16, Issue of June 5, PP. 10461 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1991 by The American Society for Biochemistry and Molecular Biology, Inc. Vol

10464 Germin Gene Family

FIG. 3. Deoxynucleotide sequence of the noncoding strand in gf-3.8. The parts of the sequence which corre- spond to possible regulatory and mRNA sequences in the germin gene are in up- 2041 ATGCTACCM MATTMTCA TTCTCCACAC

per-case lettering and the remainder of 2161 CCATCMTTC AAATCTACTG CTTTTTATTT

the sequence is in lower case.

1621 TTTGCACCAC GCGCCACCAA CCCACCCCAC 1681 GTGATCAAAG CTGWCTCCT CCTTGGTATC 1741 TACTCCAGGG TCGTCCCCGC TCGAGAGACC 1801 CAGTTCMCC TCGGTMGAC GGACCCCTCC 1861 WCGTCGTCT TCGTCCCACT CACCCTCTTC 1921 CTCACCMCC CTCTCCCCCT CCAGCCTGCG 1981 GGTCCCTCTT MTTCCTCCC AGCCACCCCT

2101 CCATCCACTT CTAATMGAT TCMTMDTT

2221 TATAATTATC AT‘TTTTGCaa CTTTTTCATG 2201 agcctacgcc gcgasggqcc aacacgcaac 2341 qttacaqaaa racattqtcs aastataaaa 2 4 0 1 gacgagcam t tggaaqqa aggaacagca 2461 qqaCEttcta ccacgsacgg ggcatcacgt 2521 qaaatcfgqq c c a c C t t t t t ggctaCtatc 25111 aaatttcCCC Cccttattco aaaqtatatc 2641 Cgcacccgtc ttaaattaga cttgctgata 2701 cacttctqaa caaasggaaa aaoctaargc 2761 caaccaaaaa ggcagcataa atcrcaraps 2821 aaCatcattf ctt9aaaa.a catqctccca

2941 aaaataccaa tttcccgqac aaqacgctaq 3001 aqaCttgtag gtc ta t tacc aagaatgcac

3121 ctccctaaga aagatggqat ggccgacatt 3061 ataatggcac g a a c t t t c a t t ~ t ~ g ~ a a t t

3181 gggatcccca agatcctctc caagaccccc 3241 CtCatcgctg acaataatca gagacaatcc 3301 tsaagtaatr ccatcagaqa aagattacca 3361 ct t tcgacat tg t t tcq tgg c c s c r c r t g c 3121 agqugtggc atactcaaga ggtutaccag 3481 a a c a t ~ f t t t csccargcaa aaatt&acct

3601 gCtaqatatt g c t a t ~ a g ~ t ttcgcttacg 3541 aqacttcaac ttc9Ctaa.q tgtgfaaacc

3661 aatqqataec acgacagcaa aatttggcat 3721 tcgTcgctCC zatttctcct aatactgaat

I 10 I 20 I 30

2881 ttC..gCtCf gcgcaCrlt.t gta..cgc.t

gf-3.8

segments corresponding to the mRNA sequence and possible 5’- and 3”flanking regulatory sequences are depicted in upper case and the remainder of the sequence is in lower case.

Comparison of the mRNA Coding Sequences in gf-2.8 and gf-3.8-The parts of gf-2.8 and gf-3.8 that correspond to germin mRNA, and germin itself, are elaborated in Fig. 4 and subdivided into (i) 5’-untranslated regions (lower case), (ii) structural-protein coding regions that correspond to signal- peptide and mature-protein sequences (upper case), and (iii) 3’-untranslated regions (lower case). The 5’- and 3”bounda- ries in gf-3.8 mRNA were deduced from the structure of gf- 2.8 mRNA, the sequence of which, excepting only a 5‘- terminal extension of 19 nucleotides (see the primer-extension study, below), is identical with that reported for germin cDNA (Dratewka-Kos et al., 1989). Hollow symbols are used for gf- 3.8 mRNA in Fig. 4 if nucleotide residues in gf-3.8 mRNA differ from those in gf-2.8 mRNA (Dratewka-Kos et al., 1989), or if amino acid residues encoded by gf-3.8 differ from those encoded bygf-2.8. Deletion (-) and insertion (underlining) of nucleotides in gf-3.8, relative to gf-2.8, are also indicated in Fig. 4. There are no amino acid deletions or insertions between the gf-2.8 and gf-3.8 germin sequences.

There is extensive similarity between the 5’-UTR (80%) regions in gf-2.8 and gf-3.8 mRNA, as there is between their 3’-UTR (77%) regions. There is also impressive similarity between the 672-residue intronless nucleotide sequences which encode the signal-peptide (75%) and mature-protein (92%) sequences in gf-2.8 andgf-3.8, and accordingly, there is

extensive similarity between the 224-residue amino acid se- quences of the signal peptide (70%) and mature protein (93%) encoded in gf-2.8 and gf-3.8. A large (91 amino acid) central domain of identity exists between the mature-protein se- quences encoded in gf-2.8 and gf-3.8 (see “Discussion”).

Possible Regulatory Elements in the 5’- and 3’-Flanking Regions of gf-2.8 and gf-3.8-Aside from some recognizable sites of possible cis-acting elements in the 5”flanking regions of both sequences there is little similarity between the 5’- flanking regions in gf-2.8 and gf-3.8 (Fig. 5A). The sites of possible regulatory sequences in the 5’- (Efstratiadis et al., 1980) and 3‘- (Birnstiel et al., 1985) flanking regions of gf-2.8 are indicated by lower case lettering in Fig. 2: upstream of the cap site, at -31 bp, a TATA box flanked by G + C rich sequences; at approximately -50 bp, two CAAT boxes; at approximately -110 through -150 bp, three AT-rich inverted repeat sequences; at approximately -180 bp, two GC-rich boxes; at approximately -600 bp, a 45-bp purine-rich se- quence; at approximately -1500 bp, two direct repeat (19 bp) sequences; and downstream of the poly(A) addition site, at +4 bp, a T-rich sequence that is probably involved with transcription termination. Possible proximal sites of cis-act- ing elements in the 5“flanking region of gf-3.8 are less rec- ognizable, but upstream of the cap site, at -26 bp, there is a possible TATA box, and at approximately -50 bp, a possible CAAT box (Fig. 3). The most unusual 5”flanking sequence in gf-3.8 is a long inverted-repeat (-200 bp) sequence at approximately -400 bp.

Page 5: OF CHEMISTRY Vol. 266, No. 16, Issue of June 5, PP. 10461 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1991 by The American Society for Biochemistry and Molecular Biology, Inc. Vol

Germin Gene Family 10465

tanacatagcaagc

105- ATG GGO TAC TCC M A ACC CTA GTA GCT GGC CTG H C GCA ATG

147- CTG TTA CTA GCT CCG GCC GTC TTG GCC

Slgnal-Peptlde Codlng Sequence: 69

1 M G Y S K T L V A G L F A M

1 5 L L L A P A V L A

174- ACC GAC CCA GAC CCT CTC CAG GAC TTC TGT GTC GCC GAC CTC

216- GAC GGC AAG GCG GTC TCG GTG AAC OGG CAC ACG TGC AAG CCC

258- ATG TCG GAG GCC GGC GAC GAC TTC CTC TrC TCG TCC AAG TTG

300- GCC AAG GCC GGC AAC ACG TCC ACC CCG AAC GGC TCC GCC GTG

342- ACG GAG CTC GAC GTG GCC GAG TGG CCC GGT ACC AAC ACG CTG

384- GGT GTG TCC ATG AAC CGC GTG GAC TIT GCT CCC GGA GGC ACC

426- AAC CCA CCA CAC ATC CAC CCG COT GCC ACC GAG ATC GGC ATC

468- GTG ATG AAA GGT GAG CTT CTC GTG GGA ATC CTT GGC AGC CTC

510- GAC TCC GGG AAC AAG CTC TAC TCG AGG GTG GTG CGC GCC GGA

552- GAG ACG TTC CTC ATC CCA CGG GGC CTC ATG CAC TTC CAG TTC

594- AAC GTC GGT AAG ACC GAG GCC TCC ATG GTC GTC TCC l T C AAC

Mature-Proteln Coding Sequence 603

1~~~~~~~~~~~~~~

v x 0 G K A V S V N G H T C K P

a M S E A G D D F L F S S K L

u A K A G N T S T P N G S A V

n T E L O V A E W P G T N T L

7 3 G V S M N R V D F A P G G T

a N P P H I H P R A T E I G I

w V M K G E L L V G I L G S L

m D S G N K L Y S R V V R A G

W E T F L I P R G L M H F Q F

636- AGC CAG AAC CCC GGC ATr GTC TTC GTG CCC CTC ACG CTC TTC w N V G K T E A S M V V S F N

659- GGCTCC AAC CCG CCC ATC CCA ACG CCG GTG CTC ACC AAG GCA ~ S S ~ N P G I V F V P L T L F

720- CTC CGG GTG GAG GCC AGG GTC GTG GAA C i T CTC AAG TCC AAG l s s G S N P P I P T P V L T K A

762- l T T GCC GCT GGG TTT I w L R V E A R V V E L L K S K I 1s7F A A G F

J'-Untran+latad Sequence: 318 ~-taamcfaggagccnccelgaaatgalaanatalaan~Iatatgcalgcfagcaaaamaataanctca~gaagacai~ ancaagntcagghaaIct~cat~agtcgIglaataagangaacaa~agcctcaiggIgIagc~cgatcaga~aatatga ggaangaa~g~acfacmt tang~cglc fng~c~cactgaa~gaata~a~aa laag~~c~~~ I

93-28

1. acfcatccaccatagc---icagcagcaa~ac~~gccatagacactctc~~~aac~~t~tagcfRaIca~Ucta~Cl aagcgtgmgcatagcaagcm

Signel-Peptide Codlng Sequence: 69 102- ATG GGG TAC TCU A M AAC ATA GCO TCC GGC ATG l 7 GCC ATG

144- CTG QTC C l T GCT TCA GCC GTC CTG UCC

I"untranr1ated Sequence: 101

r M G Y S K W O A $ G B O F A M

, I L L L A S A V L S

171-UCCAAC CCU CAC CCT CTC CAG GAC lTC TGT GTC GCC GAC CTC

213- GAT GGC AAG GCG GTC TCG GTG AAC GGG CAC AUG TGC AAG CCC

255- ATG TCG GAG GCC GGC GAC GAC TrC CTC TTC TCT TCC AAG CTT

297- GCC AAG GCC GGC AAC ACA TCC ACC CCG AAC GGC TCC GCU GTG

339- ACG GAU CTC AAC GTG GCC GAG TOG CCU GGT ACO AAC ACA CTG

381- GGT GTG TCC ATG AAC CGU GTG GAC TIT GCA CCA GGO GGC ACC

423- AAC CCA CCO CAC hTC CAC CCG CGC GCC ACT GAG ATC GGC ATC

465- GTG ATG AAA GGT GAG CTC CTC G l T GGU ATC CTO GGC AGC CTC

507-GAC TCU GGG AAC AAG CTC TAC TCC AGG GTG GTG CGC GCU GGA

549- GAG ACG TTC CTC ATC CCO CGC GGO CTC ATG CAC TTC CAG TTC

Mature-Protaln Coding Ssquenc.: 603

, B W P W P L Q D F C V A D L

r 5 D G K A V S V N G H M C K P

s M S E A G D D F L F S S K L

a A K A G N T S T P N G S A V

n T D L W V A E W P G T N T L

n G V S M N R V D F A P G G T

a N P P H I H P R A T E I G I

w V M K G E L L V G I L G S L

I I ~ D S G N K L Y S R V V R A G

591- AAC GTC GGT AAG ACO GAG GCC TCC ATG GTC GTC l T C TTC AAC I ~ E T F L I P R G L M H F Q F

633- AGC CAG AOC CCC AGC OTO GTC TTC GTG CCA CTC ACG CTC TTC

675- GGCTCC AAC CCG CCC ATC CCO AAA CCG GTG CTC ACC AAG GCU

7 4 r N V G K T E A S M V V F F N

~ S S S Q S P S V V F V P L T L F

717- CTC CGG GTG GAG GCU QGG GTC GTG GAA CTT CTC A A I TCC AAG

759- TTC GCU GOT GGG TCT

t s a G S N P P I P K P V L T K A

I m L R V E A Q V V E L L K S K

rsrF h O G 8

77~Iaa~~cfpggag~cegccctgaaatgaI~aa-tataIaaII~atatatgcaIgcfagcaaa~ttaat~anct~mcagaaga catgIancaag~ncngettaa~ctc-catgcagngt--aataagaligaanaagtt~cct~~g~gccttcg~----aac~ a a t a ~ a R g a a n g a a m l g t a c l n m t a t m M g t n n g n a n t ~ ~

J'-Untranslatad Sequence: 312

gf-3.8

FIG. 4. Deoxynucleotide sequences of noncoding strands in mRNA regions of gf-2.8 and gf-3.8. The nucleotide sequences are subdivided into four parts: 5'-UTR (lower case), 3'-UTR (lower case) as well as protein- coding regions for the signal-peptide (upper case) and mature-protein (upper case) parts of germin. The correspond- ing sequences of the proteins are shown in the one-letter code for amino acids. Differences between nucleotides and amino acids in gf-2.8 and gf-3.8 are indicated, in the case of gf-3.8, by hollow lettering, and sites of de!etion (-) or insertion (underlining), used to maximize homology between gf-2.8 and gf-3.8, are likewise indicated in the structure of gf-3.8. BoMface lettering is used in the 5'-UTR to indicate the cap site (position 1) as well as position 20 (gf -2 .8) or 17 ( gf-3.8), which correspond to the 5'-end of a virtually full-length cDNA in gf-2.8 (Fig. Z), or TI233 In . gf-3.8 (fig. 3)) (see text).

Determination of Cap Sites in gf-2.8 and gf-3.8 by Primer Extension-The primer-extension approach to the determi- nation of cap sites (Qu et al., 1983; Shelness and Williams, 1985; Fouser and Friesen, 1986; Kunz et al., 1989; Ham et al., 1989) was used in this study (Shafai, 1989). Primers (20 mers) were chosen in order to maximize differences between gf-2.8 and gf-3.8 mRNA. The primers were complementary to resi- dues l8I7CTA.. . CGC'836 and 1862TTG.. . CCC1ss' in gf-2.8 (Fig. 2) and they were 65-70% homologous with the corre- sponding primers prepared for gf-3.8: 1336ATA . . . TGC'355 and

tween the respective single-strand templates: Sph- 11KpnI:1594-2062 (Fig. 2) in M13mp19 for gf-2.8, and Pst- ISac12: 1153-1693 (Fig. 3) in pEMBL19 for gf-3.8, i.e. the primers only gave discrete sequencing ladders with their 100% homologous partners.

As illustrated in Fig. 6, when bulk mRNA from germinated wheat embryos (isolated at 35-h postimbibition) was used as a source of templates, the gf-2.8 primer, in this case lR"CTA. . . CGP3'j, gave a number of bands, most of which were between residues -1690-1720. This is the expected neighborhood of the cap site since residue 1714 corresponds to the 5'-end of the virtually full-length cDNA that was previously sequenced (Dratewka-Kos et al., 1989). The strong- est band, at residue A"jg5, was also strongest when the other

1 3 8 1 ~ ~ ~ . . . CCC'400 (Fig. 3). These primers discriminated be-

gf-2.8 primer, 18'j2TTG.. . CCC'ssl, was used as primer with the same mRNA (data not shown), and it is therefore assumed to be the cap site in the mRNA that is encoded by gf-2.8.

In accord with the results of Northern analyses, which have indicated that the amount of gf-3.8 mRNA is at least 10-fold smaller than the amount of gf-2.8 mRNA in bulk mRNA from germinated wheat embryo^,^ prominent bands were not ob- served when either of the gf-3.8 primers (e.g. see Fig. 6) was used to prime synthesis with the same bulk mRNA specimen. The 5'-UTR domains are the most highly conserved parts of the regions which are 5'- to the structural-protein coding regions in homologous genes (Efstratiadis et al., 1980). Ac- cordingly, because strong similarity between gf-2.8 (Fig. 2) and gf-3.8 (Fig. 3) begins, very abruptly (Fig. 5A), at sites which correspond to A"jg5 (the cap site) in gf-2.8, and to A'217 in gf-3.8, it seems highly probable that the cap site can be assigned to residue A'217 in gf-3.8. It is relevant to note that this interpretation conforms well with the conclusion (see above) that the putative sites of TATA and CAAT boxes in gf-2.8 andgf-3.8 are about the same distances upstream of the putative cap sites in both mRNA molecules (see above).

DISCUSSION

Studies of germin and its allied coding elements (mRNA and DNA) were initiated in order to broaden perspectives

T. Vaughan, unpublished results.

Page 6: OF CHEMISTRY Vol. 266, No. 16, Issue of June 5, PP. 10461 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1991 by The American Society for Biochemistry and Molecular Biology, Inc. Vol

10466 Germin Gene Family

gf-3.8 1 400 800 120016002000240021003200~ nu 1 1 1 , 1 1 1 1 1 1 1 1 I

gf-2.8 cDNA 1 100 200 300 400 500 600 700 800 9OOlOOO l B I I I I I I I I

200

'\ 00 Spherulin l b cDNA

FIG. 5. Homology matrix comparisons of gf-2.8 and gf-3.8 ( A ) and of germin and spherulin l h cDNAs (R). The search element was 75 nucleotides and the numher of allowahle mismatches was 35, using the Inspector I1 DNA program. In A , the uproard pointing arrorohmd indicates the positions of the cap sites in gf-2.8 and gf"3.8 and the dolcrnumrd pointing arrowhead indicates the posi- tions of t he 3'-ends of the "mRNA sequences" in gf-2.8 and ~ f -3 .8 . In H , the arrorrhcads define the 5' - ( k f f ) and 3'- ( r ight ) extremities of the sequence which encodes the conserved core in the germins.

about the molecular basis of developmental change in germi- nating wheat (Lane, 1988). It is now apparent that investi- gations of the biochemistry and molecular biology of germin have expanding consequence for studies of other cereals, other organisms, and disparate biological phenomena. Accordingly, this discussion of our current findings will he subdivided into two parts, one that deals with the molecular biology of cereal development, and another which deals with what appear to he germin-related processes in closely and distantly related organisms.

Molecular Biology of Cereal Development-The previously reported sequence (Dratewka-Kos et al., 1989) of a virtually full-length germin cDNA (Rahman et al., 1988) was precisely the same as the mRNA part of the gf-2.8 germin-gene se- quence determined in this study (Fig. 2), excepting only that the 5'-end of the virtually full-length germin cDNA corre- sponded to residue G""' (rather than A1"'") in the mRNA part of the gf-2.8 germin gene. It is interesting that 15 of the first 16 residues in the putative mRNA domains of gf-2.8 and gf- 3.8 are identical (Fig. 4) hut that the three residues which intervene between this sequence and G"' in gf-2.8, or T" in gf-3.8, are putative sites of deletion in gf-3.8 (Fig. 4). This suggests that termination of reverse transcription a t C"', duringpreparation of the cDNA, may he related to a structural idiosyncrasy at, or immediately adjacent to C2". It is not surprising that A"'!'" (Fig. 2) is the principal cap site in germin mRNA since the (type 0) m'GpppA cap structure is the principal (4040%) form found in hulk mRNA from germi- nating (Kennedy and Lane, 1979) and mature (Lane, 1981) wheat embryos.

GATS- -m

FIG. 6. Autoradiogram (%day exposure) of a dried sequenc- ing gel that was used to determine the cap sites in gf-2.8 and gf-3.8. The sequencing ladder lor gf-2.8 WIIS generated using as template. SphllK~nl:1694-'LO~2 (from gf-2.8, Fig. 'L) in single- stranded Ml3mpl9, and using as primer. a svnthetic oligonucleotide complementary to residues 1817-1836 ofgf-2.8 (Fig. 2). The sequenc- ing ladder forgf-3.8 was generated using as template / ' . ~ ~ I S o r l ~ : ~ 15% 169.7 (from gf-3.8, Fig. 3 ) in single-strand pEVHL19 and using as primer a synthetic oligonucleotide which was G5"; homologous with the gf-2.8 primer, hut 100% homologous to the corresponding region in gf-3.8. I'rimer-extension reaction mixtures contained gf-2.X ( n , h, c) orgf-3.8 (a' , h', c ' ) primers and mRNA ( 1 p g ) In. 0 ' ) . or h u l k NnCI- insoluhle RNA (10 pg) from germinated ( h , h ' ) or mature (r . c ' ) emhryos. The reference residues (TI'".'. 'r"""'. A""" C.""), indicated hv arrowheads in thegf-88 ladder. denote sites in the noncoding strand ofgf-2.8 (mRNA sense) and are complementary to the correspontliny: residues (A, A. T, C ) in the ladder (for the coding strand ol gf-2.8). Rands were not seen in primer-extension experiments when

Page 7: OF CHEMISTRY Vol. 266, No. 16, Issue of June 5, PP. 10461 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1991 by The American Society for Biochemistry and Molecular Biology, Inc. Vol

Germin Gene Family 10467

Spherullnlb ~ - T ~ T P L P S S A A S P E L V A ~ L L N A P S E L D R I K L Germlngf-2.8 T D P D P L O D F C V A D L D G K A V S V N G H T C K P M S

S l - L K D N Q F V F D F K N S K L G V T a Q T Q G K T V A T - S E A Q D D F L F S S K L A K A G N T S T P N G S A V T E L D

6 1 - R T N F P A V I Q H N V A M T V Q F I E A C G I N L P H T H V A E W P G T N T L Q V S M N R V D F A P Q G T N P P H I H .

~ ~ - P R A T E I N F I A ~ G K F E A G F F - - L E N ~ A K F I G P R A T E ~ Q I V M K G E L L V G I L Q ~ L D S Q N K L Y S

~ Z ~ - H T L E A G M A T V F P ~ G A I H F E I N M N C E P A M F V

. . R V V R A G E T F L I P R G L M H F Q F N V Q K T E A S M V .

~ ~ ~ - A A F N N E D P G V ~ T T A S S F F G L P A D V V G V S L I v S F N S Q N P G I V F V P L T L F G S N P P I P T P V L T

1 8 l . S S l o T V E D L Q K H L P a N P A V A M a A C M K R C Q F S D K A L R V E A R V V E L L K S K F A A G F

FIG. 7. Comparison of the amino acid sequences of spherulin l b and gf-2.8 germin. The one-letter code for amino acids is used and the signal-peptide and mature-protein sequences are numbered, separately, each beginning with the numeral one. For simplicity, a single numbering system (the one for germin) is used for both proteins; this overlooks five deletions (-) and two insertions which were entered in the spherulin-lb sequence in order to maximize similarities with germin gf-2.8. The two insertions in sphemlin lb, one a 15-member amino acid sequence between Ale and P1’ in the signal-peptide sequence, and the other, an asparagine residue between L”’ and in the mature-protein sequence, have been omitted. The larger font is used to show similarities detected by the “Simplify” program (see text), and boldface is used to emphasize identities. The rectangle defines the conserved-core region of germin (residues 61-151) and the asterisks indicate F = L = I = M sites in the core which, if included in the comparison, increase the similarity between spherulin l b and germin gf-2.8 to -60% in this region (see text).

This investigation has shown that there is strong (87%) similarity between the mRNA parts of the gf-2.8 and gf-3.8 germin genes (Fig. 4). An 8% difference between their mature- protein coding regions (174-776) is generally reflected in a 7% difference between the corresponding polypeptide se- quences of the mature proteins (1-201) but most significantly, in a central region of the mature protein, the same 8% difference between their mature-protein coding regions (354- 627) results in no change whatever between the corresponding polypeptide sequences (61-151). Constraint against change between residues 61 and 151 of the mature protein (Fig. 4) strongly suggests that this is a biochemically important part of germin.

Two lines of evidence suggest that there is selective expres- sion of gf-2.8 mRNA during germination: first, the only two germin cDNA clones detected among 2000 colonies in a “full- length” wheat cDNA library (Rahman et al., 1988) were identical and corresponded to the mRNA part of gf-2.8; sec- ond, the amount of gf-2.8 mRNA greatly exceeded the amount of gf-3.8 mRNA in Northern analyses of bulk mRNA from germinated wheat embryos (see “Results”). It therefore seems not unlikely that cis-acting elements in the 5“flanking region of the gf-2.8 gene (see “Results” and Fig. 2) are selectively responsive to trans-acting factors which are present in ger-

mRNA from ungerminated embryos was used as template (data not shown). Accordingly, most bands obtained with bulk NaC1-insoluble RNA are not related to germin mRNA since they are identical for the bulk NaC1-insoluble RNA of germinated and ungerminated em- bryos. It is in fact doubtful if a number of the bands seen in primer- extension experiments, most particularly any which correspond to

In the gf-2.8 ladder, are related to cap sites, since m7GpppU cap structures are not present in the bulk mRNA of wheat embryos (Kennedy and Lane 1979; Lane, 1981).

TI6992 and T1693 ‘

minated wheat embryos. Moreover, because chromosome mapping of hexaploid wheat shows that gf-2.8 derives from chromosome 4D, there may even be selective expression, during germination, of the germin gene of chromosome 4D, which derives from the Tauschii/Aegilops (weed) progenitor of hexaploid wheat (Sears, 1974). Additionally, since the isoform of germin that is peculiar to mature embryos (pseu- dogermin) is present in germinated embryos in only small proportion (relative to the amount of germin) (Lane, 1988), as is gf-3.8 mRNA (relative to gf-2.8 mRNA), the gf-3.8 and gf-2.8 genes may encode the different isoforms.

EcoRI digests of the parent genomic fragment (-11 kbp) from which gf-2.8 was prepared yielded two other fragments (-7 and -0.6 kbp) that gave hybridization signals when Southern blots were probed with germin cDNA. Similarly, EcoRI digests of the parent genomic fragment (-16 kbp) from which gf-3.8 was prepared also yielded an -11-kbp fragment that gave a hybridization signal when Southern blots were probed with germin cDNA (see “Results”). Since each of gf- 2.8 and gf-3.8 contains a full mRNA sequence, these findings indicate that (at least part of) the germin-mRNA sequence (or a closely related sequence) is repeated in each of the parent -11- and -16-kbp fragments. When Southern blots of EcoRI digests of either of the parent genomic fragments (-11 or -16 kbp) were probed with a fragment made from the 5’- flanking region of gf-2.8 (ie. EcoRIISphI1:l-1600 in Fig. 2), only gf-2.8 (from the -11-kbp genomic fragment) gave a hybridization signal. Similarly, when Southern blots of EcoRI digests of either of the parent genomic fragments (-11 or -16 kbp) were probed with a fragment made from the 5”flanking region of gf-3.8 (i.e. EcoRIIPstI:l-1153 in Fig. 3), only gf-3.8 (from the -16-kbp genomic fragment) gave a hybridization

Page 8: OF CHEMISTRY Vol. 266, No. 16, Issue of June 5, PP. 10461 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1991 by The American Society for Biochemistry and Molecular Biology, Inc. Vol

10468 Germin Gene Family

signal. Summarily then, unlike the structural-protein coding sequences in each of gf-2.8 and gf-3.8, the bulk of the 5’- flanking sequences in each ofgf-2.8 andgf-3.8 are not repeated in the parent genomic fragments.

Information about sequence divergence among structural- protein coding alleles in the A, B, and D homeologues of hexaploid wheat (Sears, 1974) is sparse. Since divergence of the A, B, and D progenitors of hexaploid wheat -4 X IO6 years ago5 would not be expected to lead to extreme divergence between allelic genes in hexaploid wheat (Smith and Raikhel, 1989), the absence of similarity between their 5’-flanking sequences indicates that gf-2.8 and gf-3.8 are unlikely to be allelic. Relevant information will emerge as information from chromosome mapping of gf-2.8 and gf-3.8 becomes available, and as nucleotide sequences are determined for the linked structural-protein coding regions in the parent fragments (-11- and -16-kbp) from which gf-2.8 andgf-3.8 are derived. In the meantime, it is encouraging that comparable informa- tion about linked (Futers et al., 1990) and possibly unlinked (Guiltinan et al., 1990) genes for the Em protein of hexaploid wheat may soon become available.

Implications of These and Other Investigations for Under- standing the Biochemical Involvements of Germin and Related Proteins in Cereals and Other Organisms-A protein which had previously been implicated in the osmotic-stress response of salt-resistant barley cultivars (Hurkman, 1990) was re- cently identified as germin (Hurkman et al., 1990). Our early studies had shown that standard barley cultivars synthesize germin during germinative growth (Grzelczak et al., 1985) and our more recent studies (Jaikaran et al., 1990) led us, also, to relate germin to the osmotic properties of cells. To explain the “water growth” that follows germination of wheat embryos (see the Introduction), we proposed (Jaikaran et al., 1990) that germin may play a role in altering the properties of cell walls during germinative growth. In this context, it is of interest that the 5“flanking region in gf-2.8 contains se- quences (‘612GCACATGCA’6’o and ‘63zGCTCCATGCA’640) which are very similar to ones which are known to occur in auxin-responsive genes (McClure et al., 1989)

An important similarity between the structural-protein cod- ing regions in germin mRNAs (gf-2.8 and gf-3.8) (Fig. 4) and spherulin ( la and lb) mRNAs (Bernier et al., 1987), all of which may encode cell-wall proteins, is detectable under con- ditions of reduced stringency that fail to detect similarities between the 5”flanking regions in gf-2.8 andgf-3.8 (Fig. 5A). This is shown in the comparison of the mRNA sequences that encode germin (gf-2.8) and spherulin l b (Fig. 5B). Most significantly, the similarity between germin and spherulin mRNAs is greatest in a region in which the gf-2.8 and gf-3.8 mRNAs differ by 8% but still encode the same, absolutely conserved protein sequence: amino acids 61-151 (see Fig. 4). Overall, there is 44% similarity between the germin gf-2.8 and spherulin Ib sequences when they are compared using the “Bestfit” program, and the default “grouping” of amino acids, in the “Simplify” program of the sequence-analysis package from the University of Wisconsin Genetics Computer Group (UWGCG) (Fig. 7): Pro = Ala = Gly = Ser = Thr; Gln = Asn = Glu = Asp; His = Lys = Arg; Leu = Ile = Val = Met; Phe = Tyr = Trp. The similarity between germins (gf-2.8 and gf- 3.8 ) and spherulins ( la and Ib) increases to 50% in the region of protein sequence that is absolutely conserved between the gf-2.8 and gf-3.8 germins, and if F = L = I = M equivalence is allowed (to emphasize hydrophobicity distribution), simi- larity between the germins and spherulins increases to -60% in the same conserved core of germin (residues 61-151). In a

’ J. Dvorak, personal communication.

more restricted domain (63-123), one which encompasses -60% of the conserved germin core and includes a decapeptide that is 90% homologous with a sequence in Escherichia coli glycerophosphate acyl transferase (Dratewka-Kos et al., 19891, the Bestfit program also shows -50% similarity be- tween gf-2.8 germin and spherulin l b (Fig. 7). In a still more restricted domain (81-96), one that encompasses -20% of the conserved germin core and contains a unique decapeptide sequence (PH(T/I)HPRATEI) that is 90% conserved between the germins (gf-2.8 and gf-3.8) and the spherulins ( la and Ib), similarity increases to -70% even without allowance for equivalences between different amino acids.

These structural relations between germins and spherulins la and Ib are especially interesting in the context of a recently reported evolutionary relation between vertebrate-lens crys- tallins and spherulin 3a (Bernier et al., 1987). Molecular modeling has shown that spherulin 3a (unrelated, structurally, to spherulins l a and lb) can adopt the tertiary structure which is characteristic of a single y-crystallin domain. It has been suggested that spherulin 3a and y-crystallins, together with protein S, which is found in bacterial spores, are part of a superfamily whose members share a similar three-dimen- sional architecture. It is posited (Wistow, 1990) that the earliest members of the family predate the prokaryote/eukar- yote separation and that they originated in cellular responses to environmental stress, including osmotic stress (e.g. during spore and spherule formation). Accordingly, it may be that spherulins generally, and germins in particular, emerged and evolved in a common or related biological context: cellular desiccation/hydration.

In connection with the desiccation of developing wheat embryos, we once suggested (Hofmann et al., 1984) and later adduced evidence in support of (McCubbin et al., 1985) an “anhydrobiosis” role for a protein that we initially designated “spot 7” (Cuming and Lanc, 1979), later called the E protein (Grzelczak et al., 1982) and finally named the Em protein in order to distinguish it from the Ec protein (Hanley-Bowdoin and Lane, 1983), which latter has since been shown to be a zinc metallothionein (Lane et al., 1987). As the most abundant protein in mature wheat embryos (Grzelczak et ai, 1982), the Em protein is well-equipped to “infiltrate” and conserve cy- toplasmic structures in the desiccated cytoplasm by virtue of its high content of hydrophilic amino acids (Grzelczak et al., 1982) and its random-coil conformation (McCubbin et al., 1985).

A similar role has since been proposed for a variety of other glycine-rich proteins in mature seed-embryos (Galau et al., 1987; Chandler et al., 1988; Gomez et al., 1988; Mundy and Chua, 1988; Close et al., 1989), including an Em analogue (Dl9 protein) among the so-called Lea (Baker et al., 1988) or WSP (water-stress proteins) (Dure et al., 1989) proteins in cotton-seed embryos (for review, see Morris et al. (1990)). There is a reciprocal relation between the degradation of the Em protein (Thompson and Lane, 1980; Grzelczak et al., 1982; Cuming, 1984) and the emergence of germin (Thompson and Lane, 1980; Grzelczak and Lane, 1983; Rahman et al., 1988) in germinating wheat embryos: degradation of Em and its mRNA is completed just as the nascent synthesis of germin and its mRNA begin at -5 h postimbibition. The roles of Em and germin in desiccation and hydration during wheat-em- bryo development, maturation and germination will be sub- jects of continuing study in the laboratory, as will the relation between germins and spherulins.

Acknowledgments-It is a pleasure to express our profound grati- tude to Dr. Michael G. Murray (Agrigentics Corporation, Madison, WI) who kindly provided the wheat DNA library that was used in

Page 9: OF CHEMISTRY Vol. 266, No. 16, Issue of June 5, PP. 10461 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1991 by The American Society for Biochemistry and Molecular Biology, Inc. Vol

Germin Gene Family 10469

this investigation. The counsel of Prof. Robert Dunn and Dr. Anthony Rafalski in connection with procedures used to screen the DNA library is warmly acknowledged, as is the counsel of Dr. Jan Dvorak and Dr. Ken Armstrong in connection with the estimated divergence of the A, B, and D genomes in hexaploid wheat, and of Dr. Wenyan Shen in connection with the use of M13mp18(19) phage for deoxy- nucleotide sequencing. We are also pleased to thank Prof. P. N. Lewis for assistance in connection with the Write Now Computer Program.

REFERENCES Baker, J., Steele, C., and Dure, L. S., 111, (1988) Plant Mol. Biol. 1 1 ,

Bernier, F., Seligy, V. L., Pallotta, D., and Lemieux, G. (1986)

Bernier, F., Lemieux, G., and Pallotta, D. (1987) Gene (Amst . ) 5 9 ,

Birnstiel, M. L., Busslinger, M., and Strub, K. (1985) Cell 41, 349-

Chandler, P. M., Walker-Simmons, M., King, R. W., Crouch, M., and

Chet, I., and Rusch, H. P. (1969) J. Bacteriol. 100,674-678 Close, T. J., Kortt, A. A., and Chandler, P. M. (1989) Plant Mol. Biol.

Cuming, A. C. (1984) Eur. J . Biochem. 145,351-357 Cuming, A. C., and Lane, B. G. (1979) Eur. J. Biochem. 99,217-224 Dente, L., Cesareni, G., and Cortese, R. (1983) Nucleic Acids Res. 11 ,

Dratewka-Kos, E., Rahman, S., Grzelczak, Z. F., Kennedy, T. D., Murray, R. K., and Lane, B. G. (1989) J. Biol. Chem. 2 6 4 , 4896- 4900

Dure, L. S., 111, Crouch, M., Harada, J., Ho, T.-H. D., Mundy, J., Quatrano, R. S., Thomas, T., and Sung, Z. R. (1989) Plant Mol. Biol. 12,475-486

Efstratiadis, A., Posakony, J. W., Maniatis, T., Lawn, R. M., O’Con- nell, C., Spritz, R. A., DeRiel, J. K., Forget, B. G., Weissman, S. M., Slightom, J. L., Blechl, A. E., Smithies, O., Baralle, F. E., Shoulders, C. C., and Proudfoot, N. J. (1980) Cell 21,653-668

Feinberg, A. F., and Vogelstein, B. (1983) Anal. Biochem. 132,6-13 Fouser, L. A., and Friesen, J. D. (1986) Cell 45,81-93 Futers, S., Vaughan, T. J., Sharp, P. J., and Cuming, A. C. (1990) J.

Galau, G. A. Bijaisoradat, N., and Hughes, D. W. (1987) Deu. Biol.

Gomez, J., Sanchez-Martinez, D., Stiefel, V., Rigau, J., Puigdome- nech, P., and Pages, M. (1988) Nature 3 3 4 , 262-264

Gorman, J. A., and Wilkins, A. S. (1980) in Growth and Dijjerentiation in Physarum polycephalum (Dove, W. F., and Rusch, H. P., eds) pp. 157-202, Princeton University Press, Princeton, NJ

Grzelczak, Z. F., and Lane, B. G. (1983) Can. J. Biochem. Cell Biol.

Grzelczak, Z. F., and Lane, B. G. (1984) Can. J. Biochem. Cell Biol.

Grzelczak, Z. F., Sattolo, M. H., Hanley-Bowdoin, L. K., Kennedy,

Grzelczak, Z. F., Rahman, S., Kennedy, T. D., and Lane, B. G. (1985)

Gubler, U., and Hoffman, B. J. (1983) Gene (Amst.) 2 5 , 263-269 Guiltinan, M. J., Marcotte, W. R., and Quatrano, R. S. (1990) Science

Ham, J., Moore, D., Rosamond, J., and Johnston, I. R. (1989) Nucleic

277-291

Biochem. Cell Biol. 64,337-343

265-277

359

Close, T. J. (1988) J. Cell. Biochem. Suppl. 12C, 143

13,95-108

1645-1655

Theoret. Appl. Genet. 80,43-48

123,198-212

61,1233-1243

62, 1351-1353

T. D., and Lane, B. G. (1982) Can. J. Biochem. 60,389-397

Can. J. Biochem. Cell Biol. 6 3 , 1003-1013

250,267-271

Acids Res. 17, 5781-5792

Hanley-Bowdoin, L., and Lane, B. G. (1983) Eur. J. Biochem. 135,

Hofmann, T., Kells, D. I. C., and Lane, B. G. (1984) Can. J. Biochem.

Hurkman, W. J . (1990) in Enuironmental Injury to Plants (Katter-

Hurkman, W. J., Tao, H. P., and Tanaka, C. K. (1990) Plant Physiol.

Jaikaran, A. S . I., Kennedy, T. D., Dratewka-Kos, E., and Lane, B.

Johnston, F. B., and Stern, H. (1957) Nature 179 , 160-161 Jump, J. A. (1954)Am. J. Bot. 41, 561-567 Kennedy, T. D., and Lane, B. G. (1979) Can. J. Biochern. 57, 927-

Kunz, D., Zimmermann, R., Heisig, M., and Heinrich, P. C. (1989)

Lane, B. G. (1981) Can. J. Biochem. 59,868-870 Lane, B. G. (1985) in Lipmann Symposium: Cellular Regulation and

Malignant Growth (Ebashi, S., ed) pp. 311-319, Japan Scientific Societies Press, Tokyo

Lane, B. G. (1988) in The Roots of Modern Biochemistry (Kleinkauf, H., von Dohren, H., and Jaenicke, L., eds) pp. 457-476, Walter de Gruyter and Co., New York

Lane, B. G., and Tumaitis-Kennedy, T. D. (1981) Eur. J. Biochem.

Lane, B. G., Grzelczak, Z. F., Kennedy, T. D., Kajioka, R., Orr, J., D’Agostino, S., and Jaikaran, A. (1986) Biochem. Cell Biol. 64,

Lane, B., Grzelczak, Z., Kennedy, T. D., Hew, C., and Joshi, S. (1987)

Lane, B., Kajioka, R., and Kennedy, T. (1987) Biochern. Cell Biol.

Marcus, A. (1969) Symp. SOC. Exp. Biol. 23 , 143-160 McClure, B. A., Hagen, G., Brown, C. S., Gee, M. A., and Guilfoyle,

McCubbin, W. D., Kay, C. M., and Lane, B. G. (1985) Can. J.

Messing, J. (1988) Focus (Bethesda Research Laboratories) 10,21-26 Morris, P. C., Kumar, A., Bowles, D. J., and Cuming, A. C. (1990)

Mundy, J., and Chua, N. H. (1988) EMBO J . 7,2279-2286 Murray, M. G., and Thompson, W. F. (1980) Nucleic Acids Res. 8,

Murray, M. G., Kennard, W. C., Drong, R. F., and Slightom, J. L.

Qu, L. H., Michot, B., and Bachellerie, J-P. (1983) Nucleic Acids Res.

Rahman, S., Grzelczak, A., Kennedy, T., and Lane, B. (1988) Biochem. Cell Biol. 6 6 , 100-106

Raub, T. J., and Aldrich, H. C. (1982) in Cell Biology of Physarum and Didymium (Aldrich, H. C., and Daniel, J. W., eds) Vol. 2, pp. 21-75, Academic Press, New York

Sears, E. R. (1974) in Handbook of Genetics, Vol. 2, Plants, Plant

New York Viruses, and Protists (King, R. C. ed) pp. 59-91, Plenum Press,

Shafai, R. (1989) The Polynucleotide Structure of a Germin Gene.

Shelness, G. S., and Williams, D. L. (1985) J. Biol. Chem. 260,8637- MSc. thesis, University of Toronto

8646 Smith, J. J., and Raikhel, N. V. (1989) Plant Mol. Biol. 13, 601-603 Thompson, E. W., and Lane, B. G. (1980) J. Biol. Chem. 255,5965-

Wistow, G. (1990) J. Mol. Euol. 3 0 , 140-145

9-15

Cell Biol. 2 , 908-913

man, F., ed) pp. 205-229, Academic Press, San Diego, CA

9 3 , (suppl.) 108

G. (1990) J. Biol. Chem. 265,12503-12512

931

Nucleic Acids Res. 17 , 1121-1138

114,457-463

1025-1037

Biochem. Cell Biol. 65,354-362

65,1001-1005

T. J. (1989) Plant Cell 1, 229-239

Biochem. Cell Biol. 63,803-811

Eur. J. Biochem. 190,625-630

4321-4325

(1984) Gene (Amst.) 30,237-240

11,5903-5920

5970