10
Sequence analysis of the cupin gene family in Synechocystis PCC6803 Article Published Version Dunwell, J. (1998) Sequence analysis of the cupin gene family in Synechocystis PCC6803. Microbial & comparative genomics, 3 (2). pp. 141-148. ISSN 1090-6592 doi: https://doi.org/10.1089/omi.1.1998.3.141 Available at http://centaur.reading.ac.uk/7956/ It is advisable to refer to the publisher’s version if you intend to cite from the work. To link to this article DOI: http://dx.doi.org/10.1089/omi.1.1998.3.141 Publisher: Mary Ann Liebert, Inc. All outputs in CentAUR are protected by Intellectual Property Rights law, including copyright law. Copyright and IPR is retained by the creators or other copyright holders. Terms and conditions for use of this material are defined in the End User Agreement  www.reading.ac.uk/centaur   CentAUR Central Archive at the University of Reading 

Sequence analysis of the cupin gene family in ...centaur.reading.ac.uk/7956/1/1998_Dunwell_Mol_Comp_Gen.pdf · ... Sequence Analysis of the Cupin Gene Family in Synechocystis PCC6803

  • Upload
    buikiet

  • View
    238

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sequence analysis of the cupin gene family in ...centaur.reading.ac.uk/7956/1/1998_Dunwell_Mol_Comp_Gen.pdf · ... Sequence Analysis of the Cupin Gene Family in Synechocystis PCC6803

Sequence analysis of the cupin gene family in Synechocystis PCC6803 Article 

Published Version 

Dunwell, J. (1998) Sequence analysis of the cupin gene family in Synechocystis PCC6803. Microbial & comparative genomics, 3 (2). pp. 141­148. ISSN 1090­6592 doi: https://doi.org/10.1089/omi.1.1998.3.141 Available at http://centaur.reading.ac.uk/7956/ 

It is advisable to refer to the publisher’s version if you intend to cite from the work. 

To link to this article DOI: http://dx.doi.org/10.1089/omi.1.1998.3.141 

Publisher: Mary Ann Liebert, Inc. 

All outputs in CentAUR are protected by Intellectual Property Rights law, including copyright law. Copyright and IPR is retained by the creators or other copyright holders. Terms and conditions for use of this material are defined in the End User Agreement  . 

www.reading.ac.uk/centaur   

CentAUR 

Central Archive at the University of Reading 

Page 2: Sequence analysis of the cupin gene family in ...centaur.reading.ac.uk/7956/1/1998_Dunwell_Mol_Comp_Gen.pdf · ... Sequence Analysis of the Cupin Gene Family in Synechocystis PCC6803

Reading’s research outputs online

Page 3: Sequence analysis of the cupin gene family in ...centaur.reading.ac.uk/7956/1/1998_Dunwell_Mol_Comp_Gen.pdf · ... Sequence Analysis of the Cupin Gene Family in Synechocystis PCC6803

MICROBIAL & MMPARATWE GENOMICS Volume 3, Number 2, 1998 Mary Aon Liebert. Inc.

Sequence Analysis of the Cupin Gene Family in Synechocystis PCC6803

JIM M. DUNWELL

ABSTRACT

The recentlv dewrihed cunin sunerfamilv of rotei ins includes the eermin and zerminlike . . . . proteins, of which the cereal oxalate oxidase is the best characterized. This superfamily also includes seed storage proteins, in addition to several microbial enzymes and proteins with unknown function. All these proteins are characterized by the conservation of two central motifs, usually containing two or three histidine residues presumed to be involved with metal binding in the catalytic active site. The present sNdy on the coding regions of Synechocys- tis PCC6803 identifies a previously unknown gronp of 12 related cupins, each containing the characteristic two-motif signature. This group comprises 11 single-domain proteins, rang- ing in length hom 104 to 289 residues, and includes two phmphomannme isomerases and two epimerases involved in cell wall synthesis, a member of the pirin group of nudear pro- teins. a oossihle transerintional rermlstor, and a d w e relative of a cytochrome 6 5 1 from . . Rhodococcus. Additionally, there is a duplicated, two-domain protein that has close simi- laritv to an oxalate decarhoxylase from the fungus Collybia velutipes and that is a putative progenitor of the storage proteins of land planis.

INTRODUCTION

he cupin s u p e r f d y (Dunwell, 1998) of functionally diverse proteins has been designated recently to include the germin and germinlike pmteins from plants, their duplicated two-domain relatives, includ- T .

ing the seed storage proteins (Bhmlein et d., 1995). and a wide range of other enzymes and blnding pro- teins from microbes, plants, and animals (Dunwell and Gane, 1998). The name cupin (h the Latin c u p , a small b a m l or cask) is derived fmm the tertiary pbarrel elemen~ which comprises either the central core of these proteins (e.g., oxalate oxidase) or one of a number of discrete domains (e.g., a a C uanscnptim factors). Characterisbcally, the mpxn element of these proteins has two histidine-containing motifs, which together with other c o n w e d proline and glycine residue8 mak up the srmcmral framework and the pu- tauve metal-bmding active site (Gane et al., 1998). The two conserved motifs are separated by a variable regron, usually 15-20 reaidues in length.

The aim of Ule mesent study was to ldentifv and cateeorize all c u ~ i n sequences within a single batter- . - lal ~enome. namelv. that of the unicellular cvanobacteriwn Svncchocvstzs PCC6803 (Kaneko et al.. 1996). . . Iht, c,rgarutrn rclvcr as the prukqou: madcl for sludy~n$ plandlkr. o i ) p n s photorynthcl~,, and the 111-

tcnuon of [he prcxa stud) w l r 10 pro\~de a bass Imm which th~. prol~fzrauan of related plant iupnn- muld

oepamncnt of Agric~lnval Botany, Schoal of Plant Sciences, The Universlry of Readmg, Reading R06 6AS. UK.

141

Page 4: Sequence analysis of the cupin gene family in ...centaur.reading.ac.uk/7956/1/1998_Dunwell_Mol_Comp_Gen.pdf · ... Sequence Analysis of the Cupin Gene Family in Synechocystis PCC6803

be examined systematically. For example, it is estimated that the Arobidopsis genome contains at least 12 ge&e pmteins (GLPs) (Dunwell, 1998), in addition to a large number of other related cupins.

METHODS AND RESULTS

A senes of demled database searches was conducted using the gapped BLAST (Altschul et al., 1997) and BLOCKS (Hedoff and Hedof f , 1991) programs to identify those Syneehocystis protein sequences that contain the two conserved histidine-conrainine motifs described bv Dunwell and Gane (1998). These two man!, are pan 01 Ihc p->umd\ Jc\lgnrad. reqpccl~vel). CII) and G/H wlrhln the luo phhrrr.1 elemcnc~ of the bean \I<,rrge pracln phrrmltn ( L a w n c c r.1 al . 1~94 , A major lhcrne m thc proem aaly31\. and the probable reason that rhlr gene fml l ) had not k e n ldenuficd prevlourl). rr that the mgmn between Ihc two motifs is variable in length, with a minimum distance of 15 residues for many of the bacterial proteins, increasing to amund 20 for the germins and GLPs and >20 for the storage proteins. The maximum of 64 resldues is found m a barley globulin (gi1421978). This variable region, which can tolerate a range of in- sutions, 1s equivalent to the DiF loop of the @barrel srmcme. The main result achieved in the present analysis was identification of a total of 12 cupin sequences, of

which 11 are single-domain pmtems, with one example of a two-domain smcm (Fig. 1). Each of these 12 has the characterisne cupin two-motif m g e m e n t , with the consensus of motif 1 being PO(X)+IXH(X)+E(X)7G and that of motif 2 being G(X)IPXG(X)~H(X)~N.

The sequsnccs can be subdivided into several classes on the basis of a range of criteria and an analysis of thev nearest neighhors according to a BLASTP seareh (Table 1). In an assessment of potential function, md considering fxst the 11 single-domain proteins, two of the sequences (gillW1180. 1111652486) are thought to be phosphomannose ismerases (PMls). one (gi11653678) to be a dTDP4dehydmrhmose 3.5- epimetase, and one (gi11651977) a dTDP-6.deoxy-~-mmose-dehydrogenase. The latter two are very sim- ilar in sequence and should probably both be considered as epimerases.

It should be noted that residues 61-129 of the PM1 sequence gill652486 are identical to Ihe sequence encoded by nucleotides 3-208 (with a frameslun correction at position 104) in the upstream region of gi1287460, a sequence including the Synechocystis groES and groEL genes @he1 et d., 1993). Resum- ably, this panial PM1 coding region was aceidenmlly ligated to the other c d n g regions during the cloning procedvn and then was not identified because of the frameshift introduced by a sequencing error. Another sequence from which a function can be deduced wrth some rel~abiliIy is gi11652717. This has

as tu closest neighbor the Escherichzn coli sequence gill 176281. a member of the recently designated group

sguenco Motif l Loop Motif 2

FIG. 1. Sequence alignment of Ihe central core of cupin sequences fmm Synrchocyrrir PCC6803 Sequences are de- noled according to Ihc GenBanlr gi identifier. The nvmbcr followed by a and b denote Ihe fxst and vcond domm in this l w - d u n protein.

Page 5: Sequence analysis of the cupin gene family in ...centaur.reading.ac.uk/7956/1/1998_Dunwell_Mol_Comp_Gen.pdf · ... Sequence Analysis of the Cupin Gene Family in Synechocystis PCC6803

SEQUENCE ANALYSIS OF CUPIN GENE FAMILY

Saqwncr Ciososr w i g b r

langrh ldrnrrty Szmilarity Gaps 8, gr species M % % % Function

1657504 E. coli lwll80 Syncchocystls 1652486 Syneehocyocy~~s 1169942 M. crysfaiiinum 347174 Rhodococcus 1176281 E. coii 141363 S. mfrrico 1361427 S. giaucercem l652724 Synrchocyrfis l652726 Sywchocyrtir 1652724 Synochocystis 1604990 C. vriutiper

'Es6matcd by use of the g a p d BLASTP pm-, showing the Length of the region of ==test similanty. Ulc percentages of ldmrical and sdarresidues. and the percentago8 of gaps needed to give maximum similarity (TBLASTN was used Cm the 1w~damainGI~1652630syucna, which has aDNA sequence as nu neighbor).

of nuclear proteins, the pirins (Pig. 2) (Wendler et al., 1997). AdWonally, gi11653078 is very s W to the Rhodococcus sequence gi1347174, a putative C551 cytachme, and gill657504 is similar to the E coli transniptional regulator gi11657504. A lower d e w of sirmlarity, with gaps, is found between @(l652906 and its nearest neighbor, the GLP (gi11169942) from Merembry~1nlhemum oyslollinum.

Although no function can be assigned yet to Ihe remaining single-domain sequences, the twodomon 394-AA omtein (ei11652630) mav be an oxalate deearborvlase. as medicted from its similaritv (Pis. 3) to , . - , the 4 4 7 - h onal~edegrad& enzyme (Mehta and Dam, i99l j fro; the wood-rotring fungus CoNybia ve- lutipes. Its sequence (gi11604990) has been published recently (Dam et al., 1996).

A detliled quantitaflve analysis of the cupin sequences shows a number of interesting features. If the single- domain proteins are c m i d m d fist, in general I-s the distaaee between m d f s inaesws in line wth the in-

~~~~~ Motif l Loop Motif 2

sonun.ms G F P H P H R G F E V,). L 41 B D O 1. (1 A 0 G H S E /

- 5 7 z ? = z s - FIG. I. Sequence alignment of the central mre of pirins fmm a mge of micmks, plants, and animals. The delPjls of ule sequences vsed are as foUows: Mycobocterium rukrrulosw (gi12213518), Eschrrichio coii (glI789W). Snep- tomycer irvrdnnr (wanslatlon of part of the -ve svaod of the hehophuge phi C31 anachment site gil48953). Homo sopi- ens (gi/I9070761, Murmuscuiua @ST g1112827951, Synrchocyrtir (gx/1652749),Arobidoprrr thoham (EST g11950773). Ory'o roluu (tS1 p 1631547,. U.t.r)l rnirwn d.mm<deum t-,r ,rrand 302.3 of upmeam icqwncr ui .pA p n c .

18 11172X8,. Alzr).lobocriiur arrJorulwnzc Imandly 4lwd lrcm vc rlrnnd up.;tmam .equcncc of u n ) l a ~ gene g8 19301. l'ihrlo ch,irror t<rqucnlc ~nclwkd I" 53h n~~lc.?lld: unfinlshcd fragmnl #dllGRGVCCX37K,

Page 6: Sequence analysis of the cupin gene family in ...centaur.reading.ac.uk/7956/1/1998_Dunwell_Mol_Comp_Gen.pdf · ... Sequence Analysis of the Cupin Gene Family in Synechocystis PCC6803

115120 128 F L L T ~ S H V P M S V Z ~ A A W S H I P n P Q L Y I F P S B P P A n u Q P D P V S P Q o - - T V 184 F +TDWLSH P+ +N A + +P +Q+YI S PA +F'QG +

-c. 174 FSVTD~SHTPIAWVEENL-GWAAQVAQLPKKQVYI-SSYGPASGP~SATPaGaT~I 131

Mollf l..

..rcotn l Yotlf2 115110 34. VTIPAAQSVASTPDYQ tCmLmmURF'A 403 m~~&~~z~~d L + W + -c. 191 SQKPLDIVYYiMXiDYQ 351

PIG. 3. Rotcm tcqumcrs of the ~~n proems from Symehm)snr PCC6803 (p 1652630) and CoMybul velu- "per (l25120). by BLASW S- = 210 bill 6581. c r p n = 1-53, mdmnocr - 123fl30 (37%). posrnvss (T ) = 179t330 (53%). = 111330 (3%) The b i e l duww mDnh I md 2 wlbm each d-o

crease in a r d size of the protein (Pig. l), horn a minimum of l3 lesidues in the 129-AA @l1652486 to 36 lesidues in the 289-AA @I1652717. T k ane norable exception m this hend is the 232-AA dI1652749, which has a 1 6 - d u e humnotifloap. As reparted, the hwdmab xqmriee is r e W to a fungal deearboxylase. Both have 20 residues between motifs, B spacing chmmr idc d m y d the GLF's from higher p h .

DISCUSSION

Befme the present ~Ndy, only four of the Synrchocystis s q u m m had ken i d m S d as cupins W- well and Gane. 15981. Three of these were sinele-domaio mmim. namelv. the PMI eil1001180 and its two related seq-ks, d1652906 and gi116537087Tk other*as thetwoddmain pt&gi[l652630. The * tal number h this &ogous family is now inneased to 12 (Fig. l arid Table l), although not all of these pmteins fulfil the deftnition used in the m a t analysis of the E. coli genome (Blattoer et al. 1997). These authors used the tenn to include all those ORB that share at Least 30% sequence identity over more than MI% of their lengths. However. it is pmbably not appropriate m apply this suict citedon to the pc- smt gmup of l2 sequmces, which vary in length fmm l04 to 394 AAs and in which the two most con- w e d motifs m sepmated by a nonwlw~ed region of variable length. Four of the poreins idmfifid in Ule present sudy have apossible fvnctianal c m 4 4 in that they an

concerned with cell wall syothesis. The best h o r n are the PMIs (EC 5.3.1.8). enzymes mat catalyze the interconversion ofmannaso6-phosphate and f m M s e - 6 - p h o s p h a t e . - containing mdopmteins, an FdJIQ-hydroxyphenykhine site has bem identified recently in the PMI fmm Condidn dbiconr when expressed in E. coli (Roudfaot et al., 1996; Smith et al.. 1997). On the basis of sequence campadson, PMIs have bem divided into three classes (Roudfoot et al., 1594), within which

Page 7: Sequence analysis of the cupin gene family in ...centaur.reading.ac.uk/7956/1/1998_Dunwell_Mol_Comp_Gen.pdf · ... Sequence Analysis of the Cupin Gene Family in Synechocystis PCC6803

SEQUENCE ANALYSIS OF CUPIN GENE FAMILY

the class U enzvmes (those described here) are involved in a variehi of oathwavs. includine caosular mlvsac- . . , . , . v . . , charide biosynthesis and mmannose metabolism. Interestingly, in Synechorysris there seem to be no example of the bifmctional GDP-mannose pyrophosphorylMMI enzyme (the PMI domain of about 130 AAs is I* catedatthec-terminus of the pmrein) found m some Archaea (e.g., the448-AAgi12649495 fmmArchoeoglobur &id14 and many ="bacteria (e.g., the 428-AA ~11230580 from Vibrio cholem and rhe 470-AA gi12313118 from Helicobocrerpylor~> and thought to be ~nvalved for example, in the polymerization of alginate, a viscous mucoid exopolysacchsride. Instead, lhe two funetions in Synechocysris, as in Merhnnococcur jo~tapehii (J.M. h w e l l , unpubhhed observations), are carried out by two separate enzymes, rhe phosphorylase function by the 367-AA gill653346 and the rsomerase funetion by rhe two PMk (appmximately 128 AA), gi11001180 and gi11652486, identified previously. Other bacteria. such as E. o l i , contain a family of related enzymes, includ- ing PMls (e.g., Ule 152-AA gi147164). as well as several bifumional enzymes (e.g., the 471-AA gi1585853, Ule 478-AA gi/1155018, and the 483-AA gi/1584629). Enzymes, such as the dTDP4dehydmhmmois 3.5qimerase M11653678) i d d e d here, are involved in

the synthesis of bacterial exapolysaccharides. These enzymes include the ExpA8 pmtein recently show m be mvolved in the synthesis of galaetoducan (exopolysacchmde 11) inRhizobh melioti (Becke t al., 1997) and the TDPdmxvelume evimerase (U318II rhat is oan of the biosmthetic ~athwav for ~scmlose. a Iimm1vsac- .- . , . , . . . , chande component an Yerrdnto predruhen ubrdr ilhonon s ai . 14931 The wlated dTDP-h-dcory-L-man- no*deh!drogmasc M 16519771, tdtnl~fial ~n \Ndy a, a ;upln. I* also mrolved m ceU wall rynlhcsls

In c o n m t to the presumed function in cell wall synthesss ass~gned to the previous four sequences con- adered, the identity of sequence gi11652749 seems to be as the nuclear protein pirin, one of a highly con- served group of proteins (Fig. 2) thought to interact with the ~uclear factor UCCAAT box hanscription fac- tor (NFlICTFI) (Wcndlcr et al., 1997). Of the other sequences, gill652237 has the transcriptional regulator gill657504 fmm E, col; as i a CIOSCS~ nolghbor, and gill653078 is similar to gi1347174, a cytochmme CS51 sene from Rhodoeoceus.

Unfadnalrl, nu h~!!s.llon can he awgncd to an) of me ulhcr vngle-domon ~upln, II-LC~I Flgur~ I. dthough 11 I\ hoped l h t a. thl, \upcrfarmly IS analyzed ~n mor: deul., thc \uhgrc)up~ will k ~dcnuficd q,- tematieally by their homology to proteins of knom functio- process that enabled the oxalate oxidase identity of the wheat gf 2.8 germin to be established (Lane et al., 1993).

The delded analysis summarized in Figure 1 confirmed that all the cupin sequences found in Synechosyr- ris contain the characteristic two-motif srmcrure, with the intermotif &stance v q m g m lengh from 13 (a uruquely shon spacing for this superfamily) to 34 residues. Despite the range of spacing f m d m this sNdy, it is interntine to note that none of the sin&-domain cuoins in Sv~ehocvsris has an lntermohf dismce of 20 , . , mndua, lltc .~.+LIII bund m the rwc+doman q u e n c e 1652636 'Ihew arc a numkr 01 pawhle cxplana- uuns for h r pxxul#arll) Rnt, 11 may k rhat thr dupilc~uvn w r u d ~n a pmleln wllh a 16-rcidue r p l n g . lulluuul b) lllc Illvcnloll of I rrrlduc. !n each domain. l h ~ i m? unllkcly Alama!ncl). Ule 20-esldue pm@ enitor may have been lost Ulrough narural selection or may simply not have been identified in Uus sNdy. A g m it seems unlikely that any other cupin sequence8 in this genome mnain undkcovered In a M e r mpl to find close relanves to the t w d m a i n 20-resldue protein from other ~ ~ r , the 57-AA sequence sp- the two moofs of domain 1 was used as a probe in a BLASTP search. Tixs revealed an Arobidopsis GLP (U75207) as the closest neighbor, with fhe closest nonplant sequence being the hypathehcal protein gi2128971 from the m h a m M jannmchii (37% identical, 48% similar over a dismce of 51 AAs). However, this 125- AA pmtein has only a 16reridue gap. Sirmlarly, use of an equivalent sequence from domm 2 revealed the 113-AA sequence gi11881251 from Bonllur s d n b as the nearest neighbor (37% ~dentical, 62% similar over 51 residues). This sequence also has a 16-AA gap. Therefore, to date. there is no known example of a single- domain. 20-AA gap protein fmm a pmharyote. This apparent lack of any progenitor 20 protein, allied to the multirude of pmharyatic two-domain proteins with the 20 spacmg, suggesa that there was only one 20 p m tmupin, which underwent a duplication to pmduce gill652630 and i a equivalents in oUler species and left no extant progenitor (or at least none idenlitied to date).

Comparison of the alignment of Ule m-domain sequences from Synechocysris and C, veluriper (Fig. 3) c o f l i s that they are probably denved from the same progenitor and that the former sequence may be the direet progenitor of the latter. This view is reinforced by analysis not only of the conserved motif regions but also of the intervening internotif regions. Specifically, there are several identical resldues in the two

Page 8: Sequence analysis of the cupin gene family in ...centaur.reading.ac.uk/7956/1/1998_Dunwell_Mol_Comp_Gen.pdf · ... Sequence Analysis of the Cupin Gene Family in Synechocystis PCC6803

20-AA intcmotmf mgtonr of the frrrt domaln of each pdtain tc<mvvnu, X'I XIXXXXXEGXXXIXXVXX, and 3 d~fftrenl VI of 8denllcal reqlducr (conrensu, XXXXI'XFAXXXXASXXXXQX) m the lnlermauf m- gions of the second domm. This suggests strongly that there was divergence of the two domains of a pre- cursor protein after duphcation of a W-residue protoeupin. followed by further minm bvergence dming the postduplication phase, leading evenwally to the present day sequences.

In terms of both their number and their ranee of sire and the msence of a two-domm seauenee. the - . .~ elldencr pmsenled rugpcrlr that Ihe rpccuum uf cupln* fcund ~n S):)nrrkr,c)x)lll, IS closer to that of bgher plat \ rrther than !cl L a found m more pnmrtlvc bactcnl (to &te. only two cupam hate been ldcntlficd m AI pnm%r.h,ij (Ilunucll. 19971 Presumably. there wr\ s tapld expdnrt~,n of CUP^ ~ ~ Y C ~ S I I ) dunng the c\#,- lution of Synechocystis. This conclusion may be related to the observation that the gename of this organ- ism contains 99 ORFs with slmilarily to transposases (Kaneko et al., 1996). It was suggested that this high frequency could be IioLed to kquent rearrangement of the genome dming and after establishment of this species. Mare recently. Cassier-&"vat et al. (1997) characterized three specific inrenion sequences from Synechcyslir and suggested, on the basis of homology, that they were spread h u g h horizontal mmfer between evolutionarilv distinct areanisms. , - - -- -

Idennficruc~n of the pmkaqot.~ Iwo-domam \qur.nir. ~n the prcxnr m d ) also compl~cntc\ thc ~ t m ~ l u - blun rca'hcd by Rdurnleln et al. (1995, that "the exunl iphcrul~nr and gcrmmb mght rcprcsunt rkge of seed globulin gene evolution before [my emphasis1 the domain duplication event had occurred."

If the linear suucfure of these 12 Synechocystis cupin sequences is considered, it can be seen that there is an overall increase in spacing between the motifs. comcident with the increase in length of the proteins (Fie. 1). If it is assumed that the oroeemtor of these oroteins was the smallest and most simole of the ore- . - turuplni, a\ the overall sequence g~.u, ~n Imglh b) addlt~on of mrrdur.s at crch end, thlr mu,[ ha\c been aciunlpanocd h) ~t>\cn~on of rr.nduc\ ln1.l the vanrble repnon. namrl), the EIF loop n thc :nd uf me &bar- rc t<irne cl d . 1997 . Tha padual I n c r u x ~n complc\ln led cveauall) 10 addltlun of n-hcltirl reg8ons at each end of the protem, duplication into two long a + 0 domains, and fmally to assembly of the protein subunits W give the uimeric quaternary suucrurc found in the abundant storage proteins of Land plants (Lawrence et al., 1994) Pable 2). In addition, by the present smge of the evolutionary process, two of the three conserved histidines have ken lost in most storage proteins (all three in some examples), and as these restdues are im~licated as beme catalvticallv active (Gane et al.. 19981. it is likelv that these multlmene " . . pmlclns no longer hrvu an) enqmr ar!mv#t) (none s Lnuunj In thl, regard. I, 15 also not h o w n ~f the S)wrhoc),lu two-dau~dln pnltcln cr an) of Ihe mow pnmrtlvc two-dommn rsmge protelnr rho* oralate ~Itcarbuxylase acuvll) (cf. 125120)

Duplicauan of danains, followed by their subsequent divergence during evolution, is h a w to occur in other proteins, such as the zinc meralloenzyme glyoxylase 1. which catalyzes the glutathione-dependent in- activation of toxic rnethylglyoryl (Camemn et al., 1997). In this example, sequence alignment showed that 13 of 74 resldues were identical in domain 1 and 13 of 59 in domain 2. Such relatively low levels of iden- tity mu presumably because of the small number of residues required to pmvide the conserved structural elements in such proteins. Undoubtedly, as analysis techniques become more sophisticated, more cases of

PM1 Symchorisns 128 3 13 I 1 GLP A. rhiinnn 2W 3 20 1 L O x a l a oxldase T oesnvm 201 3 23 1 5 Oxatate decarboxylate C. veiunpps 447 3 20 2 l Vicilii P. vuigans 397 1 27 2 3

T h e proteins are classifled acmrding to their name, species of nigh. total 1engU1. number of anserved histldine resldues m the two motifs, length of fie ininmotif Imp, number of dam-, and number of subunits in Ule mature pmtein

Page 9: Sequence analysis of the cupin gene family in ...centaur.reading.ac.uk/7956/1/1998_Dunwell_Mol_Comp_Gen.pdf · ... Sequence Analysis of the Cupin Gene Family in Synechocystis PCC6803

SEQUENCE ANALYSIS OF CWIN GENE FAMILY

this rype of evolutionary process will be revealed, and the number of basic underlying 3D smctures will be found to be resmcted (Gadzik, 1997).

ACKNOWLEDGMENTS

I would like to acknowledge lhe financial support of Zeneca plc and the Biotechnology and Blologieal Sciences Research Council and the expertise of Paul Gane for preparation of Ule figures.

REFERENCES

ALTSCHUL, S F. MADDEN, T.L.. SCHAFFER, A.A.. ZHANO, I., ZHANG, Z. MILLER. W . et a1 (1997). Gappcd BLAST and PSI-BLAST: A new genecarion of protein databax search pmgrammes. Nucleic Acrds Rcs 25, 2%*L%""" <<",-<-A.

BAUMLEIN, H . BRAUN, H., KAKHOVSKAYA, I.A., and SHUTOV. A.D. (1995). Seed starage nroteins of SW-

matophytes share a common ancestor with desiccation proteins of fungi. J Mol Evol41, 107lLf075. BECKER, A., RUBERG. S., KUSTER. H.. ROXLAU, A A., KELLER, M,. IVASHINA. T., et al. (1997). The 32-lulo-

base rrp gene cluster of Rhriobium mliori dlrecung Ule h~osynthesis of galact~gl~ean: Genetic and ~mDuties of the encoded eene omducts. J Baneriol 179. 1375.1384

B L A ~ N E R , F.R., PL&TT,'G, BLOCH. C A . P E ~ A . N.T., BURLAND. V., RILEY. M,. et al. (1997). The complctc gcnamc rquence uf Eclicri<hm coii K-12. Science 277, 1453-1462.

CAMERON. A.D.. OLW. B.. RIDDERSTR~M. M,, M A N ~ V I K . B.. and JONES. T.A. 0997). crlstsl srmctorr of human glyoxylase I-Evidence for gene duplicallon and 3D domain swapping. EMBO 1 16,33863395,

CASSIER-CHAUVAT. C., PONCELET. M.. and CHAWAT. F (1997) Three insemon sequences h m thccyanobac- tenum Syneehocysrrs PCC6803 support Ule accurrence of horizontal DNA transfer among bacrena. Gcne 195, 25-266.

DATTA. A.. MEHTA. A.. and NATARAIAN, K. (1996). Oralate Decarboxylase. US Patent 5547870. DUNWELL. I.M. (1997). Cuphs: A new superfamily of functionally diver% pmtelns tbat mclude g e m and plant

storage ~mteins. Biotech Gener Ene Rev 15, 1-32. DUNWELL. J.M.. and GANE, P.J. 6997). Mlcmbld relanves af seed storage pmtems Consenratlon of motlfs m a

hctianally dtvcne ruplfamlly of enzymes. J Mol Evol45, i47-154. CANE, P.J., DUNWELL, J.M., and WARWICKER, J. (1997). Modeling based on the srmctorc of v e h s predicts a

hisadhe clusler in tbe active site of ordale oxidase. J Mol Evol 45,488493. GODuK, A. (1997). Caunnng and classlfymg poss~blc pmein folds. Tibrech 15, 147-151. HENIKOfF. S . and HENMGFF. J.G. 11991). Auromated assemblv ofomtein blocks for database searchine. Nucleic

Acidr Rer 19,65654572. . .

KANRKO T., SATO, S., KOTANI, H.. TANAKA, A., ASAMIZU, E., NAKAMURA, Y., s al. (1996). Sequence analysk of the genome of the unicellvlar cyanobactenum Synechocysrrs 8p s m PCC6803. U. Sequence delem- nanm of ~ h r rntlrc gcnome and angment of pnmlrd pr~!~.tn-cum& wglon. D ~ A RI. 3, Ilr) 136

1 AM, n G . n I w L L . J M . KAY. I . scHMrr1. M K . rod C U ~ ~ I N G . \ C I IUY~ t iurm~n. -prorctn marker of crrh nlanl drvrlanmcnl 15 an oxllale orrdaw I Brul Chem 2hR 1221.1 12222 , . . . -~ ~ -~ - ~ - - ~ ~ - - -

LAWRENCE. M C . IZARD. T.. BEUCHAT. M,. BLAGRGYE. R.J.. and CGLMAN. P.M. (19941. Srmcare of ohase- . . alm a1 2 2 A nsolution: Im~leanons for a common vieil~nlleeumin structure and the eenetc eneloeeMe of seed - - - slorage pmtems. J Mol ~ih38.748-776.

LEHEL, C., M S , D . WADA, H., GYORGYEI, J.. HORVATH. I.. KOVACS, E.. et d . (1993). A secand groEL-lie gene, organized in a 8roESL opcron Is present m h e genome of Synechocystrs sp. PCC 6803. J Blol Chem 268, IWL1RM . . . . . . .

MEHTA, A.. and DATTA, A. (1991). Gxalate decarboxylase fmm Collybio veluriper. Punkation, charartcriration and cloning. J Blol Chem 266, 23548-23553.

PROUDFOOT. A.E., OOFFIN, L., PAYTON, M.A., WELLS, T.N., and BERNARD. A.R (1996). In vivo and m v180 f o l b z of a rccombhanr metalloenrvme, phos~homaonmaonse isomerase. Biahem J 318,43742.

PROUDFOOT, A.E.I., TURCATI1, d., WELLS. T.N.C., PAYTON, M.A., and SMITH, D.J. (1994) Punfrcauon, <DNA cloning and heteralogous erpresston of human phosphomannose i samem. Eur J Biahem 219,415-423.

SMITH, J.J., THOMSON, A.J.. PROUDFOOT. A.E., and WELLS, T.N. (1997). Identification af an Fe(m)-dihy-

Page 10: Sequence analysis of the cupin gene family in ...centaur.reading.ac.uk/7956/1/1998_Dunwell_Mol_Comp_Gen.pdf · ... Sequence Analysis of the Cupin Gene Family in Synechocystis PCC6803

dmxyphenylalanmc sire in recombloan1 phosphammose isomerare horn Condida olbicons. Ew I Biochem 244, 325-333.

THORSON, 1,s.. M. S.F.. PLOUX. 0 . . HE. X.. and LW. H.W. (1994). Srudzes on the biosyntheris of 3.6-dideoxy- hexows: Molecular clmng and characterization of the orc (a~carylose) region fmm Yerrmio pseudo&erculosis semgmvp VA. l Bacterial 176,5483-5493.

m m , W.M.F.. KREMMER, E.. FOERSTER R., and wi3NACKER.E.L. (1997). Ident~fication of pinn, a novel highly conserved nuclcar protein. I Biol Chrm 272,84824489

Address repmt requests to: h m M. Dunwell

Deportment of Agricultural Botany School of Plant Sciences

The University of Rending Whiteknighe PO Box 221

Reading RC6 6AS UK