4
29 March 1999 ELSEVIER PHYSICS LETTERS A Physics Letters A 253 ( 1999) 354-357 Molecular content relations in the genetic code Gerald Rosen Department of Physics, Drexel University, Philadelphiu. PA 19104, USA Received 8 October 1999; accepted for publication 20 January 1999 Communicated by A.P. Fordy Abstract The codons are numbered from 1-64 by a simple formula based on the number of carbon, nitrogen and oxygen atoms in the nucleotides of each triplet. The codon range numbers that follow for the 20 amino acids are shown to be given by linear Diophantine and explicit molecular content equations in the number of carbon, nitrogen, oxygen and sulfur atoms in each amino acid. Thus the universal genetic code that associates codons and amino acids is expressed in a precise way by purely physical molecular content relations. @ 1999 Published by Elsevier Science B.V. PAC.% 87.15.-v; 87.10.+e Keywords: Genetic code; Codons; Atomic content 1. Introduction Ever since the decisive resolution of the univer- sal genetic code by Brenner and Crick in 1966, the codon-amino acid association (see Tables 1 and 2) has been the intriguing subject of many research efforts. To account for this codon-amino acid cor- respondence, research studies have pursued Crick’s “frozen accident” conjecture and elaborated upon Woese’s amino acid-codon physicochemical inter- action proposal (expanded in the past decade to include biosynthetic relationships between precursor and product amino acids) [l-7]. However, recent analyses by Tate and Mannering [8] and also by Amirnovin [9 ] suggest that the codon-amino acid correlations investigated in former works are simply statistical in essence and are not manifestations of an underlying precise theoretical association. Hence it is of considerable interest that precise molecular content relations of a purely physical nature (without reference to chemical interaction properties) exist between the codons and the amino acids. As shown in the present communication, these molecular content relations depend in a straightforward manner on the number of carbon, nitrogen, oxygen and sulfur atoms in the amino acid and associated codon nu- cleotides. The physically-based codon number [ lo] is required in order to formulate the molecular content relations, and thus the codon number prescription is reviewed here first as a necessary preliminary. 2. Codon number Contained in U, C, A and G, respectively, the pyrim- idines uracil and cytosine and purines adenine and gua- nine are progressively larger in molecular size. This monotonic size-gradation is essentially a consequence of the relative number of carbon, nitrogen and oxygen atoms in the bases and does not depend on the number 037%9601/99/$ - see front matter @ 1999 Published by Elsevier Science B.V. All rights reserved. PIISO375-9601(99)00075-4

Molecular content relations in the genetic code

Embed Size (px)

Citation preview

29 March 1999

ELSEVIER

PHYSICS LETTERS A

Physics Letters A 253 ( 1999) 354-357

Molecular content relations in the genetic code

Gerald Rosen Department of Physics, Drexel University, Philadelphiu. PA 19104, USA

Received 8 October 1999; accepted for publication 20 January 1999

Communicated by A.P. Fordy

Abstract

The codons are numbered from 1-64 by a simple formula based on the number of carbon, nitrogen and oxygen atoms in the nucleotides of each triplet. The codon range numbers that follow for the 20 amino acids are shown to be given by linear Diophantine and explicit molecular content equations in the number of carbon, nitrogen, oxygen and sulfur atoms in each amino acid. Thus the universal genetic code that associates codons and amino acids is expressed in a precise way by purely physical molecular content relations. @ 1999 Published by Elsevier Science B.V.

PAC.% 87.15.-v; 87.10.+e

Keywords: Genetic code; Codons; Atomic content

1. Introduction

Ever since the decisive resolution of the univer-

sal genetic code by Brenner and Crick in 1966, the codon-amino acid association (see Tables 1 and 2)

has been the intriguing subject of many research

efforts. To account for this codon-amino acid cor- respondence, research studies have pursued Crick’s “frozen accident” conjecture and elaborated upon Woese’s amino acid-codon physicochemical inter- action proposal (expanded in the past decade to include biosynthetic relationships between precursor

and product amino acids) [l-7]. However, recent analyses by Tate and Mannering [8] and also by Amirnovin [9 ] suggest that the codon-amino acid correlations investigated in former works are simply statistical in essence and are not manifestations of an underlying precise theoretical association.

Hence it is of considerable interest that precise molecular content relations of a purely physical nature

(without reference to chemical interaction properties)

exist between the codons and the amino acids. As shown in the present communication, these molecular

content relations depend in a straightforward manner on the number of carbon, nitrogen, oxygen and sulfur atoms in the amino acid and associated codon nu-

cleotides. The physically-based codon number [ lo]

is required in order to formulate the molecular content relations, and thus the codon number prescription is

reviewed here first as a necessary preliminary.

2. Codon number

Contained in U, C, A and G, respectively, the pyrim- idines uracil and cytosine and purines adenine and gua- nine are progressively larger in molecular size. This monotonic size-gradation is essentially a consequence of the relative number of carbon, nitrogen and oxygen atoms in the bases and does not depend on the number

037%9601/99/$ - see front matter @ 1999 Published by Elsevier Science B.V. All rights reserved. PIISO375-9601(99)00075-4

G. Rosen/Physics Letters A 253 (1999) 354-357 355

Table 1 The universal genetic code that associates 61 RNA codons - triplets of the RNA nucleotides U, C, A, G - with 20 different amino acids;

in particular, AUG for methionine starts a protein while UAA, UAG and UGA signal “stop” by not being associated with any amino acid

Codon Amino acid Codon Amino acid Codon Amino acid Codon Amino acid

uuu

uuc

UUA

UUG

cuu

cut

CUA

CUG

AUU

AUC

AUA

AUG

Phenylalanine

Phenylalanine

Leucine

Leucine

Leucine

Leucine

Leucine

Leucine

tsoleucine

lnoleucine

lsoleucine

Methionine

(start)

ucu

ucc

UCA

UCG

ecu

ccc

CCA

CCG

ACU

ACC

ACA

ACG

Serine

Serine

Serine

Serine

Proline

Proline

Proline

Proline

Threonine

Threonine

Threonine

Threonine

UAU

UAC

UAA

UAG

CAU

CAC

CAA

CAG

AAU

AAC

AAA

AAG

Tyrosine

Tyrosine

stop

stop

Histidine

Histidine

Glutamine

Glutamine

Asparagine

Asparagine

Lysine

Lysine

UGU

UGC

UGA

UGG

CGU

CCC

CGA

CGG

AGU

AGC

AGA

AGG

Cysteine

Cysteine

stop

Tryptophan

Arginine

Arginine

Arginine

Arginine

Serine

Serine

Arginine

Arginine

GUU Valine

GUC Valine

GUA Valine

GUG Valine

GCU

GCC

GCA

GCG

Alanine

Alanine

Alanine

Alanine

GAU

GAC

GAA

GAG

Aspartic acid

Aspartic acid

Glutamic acid

Glutamic acid

GGU

GGC

GGA

GGG

Glycine

Glycine

Glycine

Glycine

of small peripheral hydrogen atoms; the size index is expressed by A E 2(n~ - no) + no + 2 and has the values

uracil, C4N202 : A=O,

adenine, CsNs : A=2,

cytosine, CdNsO : A= 1,

guanine, C5NsO: A=3,

where the hydrogen content is ignored in writing the molecular formulas. Let these size indices 0, 1, 2, 3 for the bases be assigned to the associated nucleotides

U, C, A, G,

fi(U)=O, il(C)=l,

A(A) =2, h(G) =3.

Then the generic codon 5’ (XYZ)3’ can be given the unique codon number

n(XYZ) - 4A(X) + 16A(Y) + A(Z) + 1. (1)

For example, the codon AUG is assigned the codon numbern(AUG) =4A(A)+l6A(U)+A(G)+l = 12 by formula ( 1). Table 2 shows the sixty-four codon

numbers as annotations on the standard representa- tion. Clearly, the correspondence between codons and codon numbers is one-to-one and simply matched to

the hierarchial column and row progression in the stan- dard representation.

3. Codou range numbers and molecular content of the amino acids

Table 3 displays the codon range numbers n; and nf, the smallest and largest codon numbers (such that n, ,< n < nf) for each amino acid, and the number of carbon, nitrogen, oxygen and SUlfUr atoms (no, nN , no and ns, respectively) contained in each amino acid.

The five amino acids with a ring in their molecular structure (viz. Phe, Pro, Tyr, His, Trp) appear with their symbols enclosed by squares. Also note that the amino acids are grouped in four subsets determined exclusively by their molecular content,

At = {IPheJ, Leu, Ile, Met, Val, m} ,

(nN.no) = (1,3) : A,, = {Ser, Thr, ITyr]} , (3)

356 G. Rosen/Physics Letters A 253 (1999) 354-357

Table 2 Table 3 Contemporary textbook representation of the genetic code, anno-

tated with the codon numbers given by formula ( I). An amino

acid or the stop signal is associated with each of the 64 codons

- ordered triplets of the RNA nucleotides U. C, A, G. Thus, for

example, the codon AUG prescribes the amino acid methionine

(Met). However, the U, C, A, G row and column labels are actu-

ally superfluous here and can be deleted without loss of informa-

tion because the codon number itself implies a uniquely associated

codon by formula ( 1).

Codon range numbers and molecular content of the amino acids.

Ellipsed amino acid symbols indicate molecular ring structure, and

the vertical groupings in the table correspond to the amino acid

subsets (2)-(5). Notice that the stop signals (not associated with

any amino acid and given by codons 35, 36 and 51 in Table 2)

occur at the breaks between Au, Au1 and dill, drv.

Subset Iii nr Amino nc llN 4 4 acid

First Second position Third

position U C A G position

(5’ end) (3’end)

U 1 Phe 17 Ser 33 Tyr 49 cys u

2 Phe I8 Ser 34 Tyr 50 cys c

3 Leu 19 Ser 35 stop 51 stop A

4 Leu 20 Ser 36 Stop 52 Trp G

C 5 Leu 21 Pro 37 His 53 Arg U

6 Leu 22 Pro 38 His 54 Arg C

7 Leu 23 Pro 39 Gin 55 Arg A

8 Leu 24 Pro 40 Gln 56 Arg G

A 9 Ile 25 Thr 41 Asn 57 Ser U

10 Ile 26 Thr 42 Asn 58 Ser C

11 Ile 27 Thr 43 Lys 59 Arg A

12Met 28Thr 44Lys 60 Arg G

G 13 Val 29 Ala 45 Asp 61 Gly u

14 Val 30 Ala 46 Asp 62 Gly C

15 Val 31 Ala 47 Glu 63 GIy A

16 Val 32 Ala 48 Glu 64 Gly G

A

Au

Au

Av

(nC,nN,nO) = (39 192)

or96 (nc+nN+nO) 6 11

with (nN, no) # ( 1,2) :

AlI1 = {Ala,m, Gln, Asn, Lys, Asp, Glu, Cys} ,

(4)

codon range numbers ni, nr and the molecular content numbers nc, nN, no, ns. These relations take the form of linear Diophantine equations [ 111 for both positive

integers n; and nf in the case of -A, and d,v and

explicit equations for ni, nf in the case of AI* and d,II,

(nc + 2nN) > 14 or nc = Ito < 3 :

kv = {m, Arg, Ser, Gly} . (5)

Observe that only Ser satisfies the entrance require-

ments for more than one subset (viz., AI* and dIv) while only Arg has a two-interval composite range in its subset (drv).

4. Molecular content relations

For the amino acids in each subset given by (2)- (5) there is an essentially linear relation between the

For AI : 2nf--n;=49-6nc-7ns+8r, (6)

For A,, : nf - n; = 2 + sgn( 33 - n,f) ,

nf=2(2-r)(2nc-l), (7)

For drrl : nf - n, = 2 + sgn(33 - nf) ,

nf = 14 + 6(nN + no + 3ns)

*2(nc - 3), (8)

For drv : 3nf - 2n; = 90 - 2nc - 8no. (9)

In Eqs. (6)-(8) there appear

I 2 fFiJ 3 8 Leu

9 I1 Ile

12 12 Met

13 16 Val

21 24 lprol

17 20 Ser

25 28 Thr

33 34 15rl

29 32 Ala

37 38 IHisJ

39 40 Gln

41 42 Asn

43 44 Lys

4s 46 Asp

47 48 Glu

49 50 cys

52 52 lTrpl 53, 59 56. 60 Arg

57 58 Ser

61 64 Gly

9 I 6 1

6 1

5 I 5 I 5 1

3 I 4 I 9 I

3 1

6 3

5 2

4 2

6 2

4 I 5 I 3 I

II 2

6 4

3 I 2 1

2

2

2

2

2

2

3

3

3

2

2

3

3

2

4

4

2

2

2

3

2

0

0

0

I

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

G. Rosen/Phy.rits Letters A 253 (1999) 354-357 351

r = 0 for amino acids without a molecular ring,

= 1 for amino acids with a molecular ring, (IO)

sgn(33-nf)r+l fornf<33,

=--I fornf>33, (11)

and the Z!Z in the second member of (8) is defined by

*~+l fornN#3#no,

-_ -1 fornN=3orno=3. (12)

To see how these relations work, first consider (6) for At. By evaluating the right side of (6) for the At

amino acids in Table 3, one obtains

2rzf - nj = 3 for m,

2nf - 12; = 13 for Leu and Ile,

2nf - n, = 12 for Met,

2i11 - 12; = 19 for Val,

2nf - n; = 27 for m. (13)

Eqs. ( 13) are all satisfied by the ni, nf numbers for the amino acids of the dr subset in Table 3. Conversely, the linear Diophantine equations ( 13) yield the correct ni, nf numbers successively for the six amino acids in dr subject to maximal codon utilization (requir-

ing, for example, n; = 1 and not 3 for IPhe(), and the

precedence of Leu over its isomer Ile in assigning the

codon range number solutions (n;, nf) = (3,8) and (ni, nf) = (9,ll) generated by the second member of ( 13). With the latter rather modest supplementary postulates, Eq. (6) is both necessary and sufficient for

the codon ranges featured by the six amino acids in

At. The explicit equations (7) and (8) are easily veri-

fied to be necessary and sufficient for the amino acids in dir and dtn by direct substitution of the appropri- ate nc, nN, no, ns numbers shown in Table 3. Finally, the linear Diophantine equations (9) for dtv produce

3n, - 2n; = 52 for ITrpl,

3n.f - 2n; = 62 for Arg,

3t1r - 2iri = 60 for Ser,

3n, - 2n; = 70 for Gly. (14)

The first member in ( 14) and the general requirement ni < nf imply that n; = nf = 52 form, because Cys in dtn already has nf = 50. Similarly, the second, third and fourth members of ( 14) and the postulate of max- imal codon utilization imply the correct II,, nf codon

range numbers for Arg, Ser and Gly successively. The

two-interval composite range for Arg, 53 6 f~ < 56 and 59 < n < 60, emerges with a characteristic de-

gree of Diophantine economy [ 11 ,I 21 while the so-

lution to the fourth member of ( 14) for Gly follows immediately from n,f < 64.

5. Concluding remarks

The molecular content relations (6)-( 9) are neces- sary and sufficient for all 20 amino acids with the sup- plementary postulates of maximal admissible codon utilization and the precedence of Leu (leucine) over Ile (isoleucine) in assigning the two solutions ob-

tained from the second member of (13). Therefore,

the universal genetic code is essentially expressed in a precise manner by the purely physical molecular con- tent relations (6)-( 9), equations linear in fz;, nf, The

number of carbon, nitrogen, oxygen and sulfur atoms in an amino acid thus relates directly to the number of carbon, nitrogen and oxygen atoms in the three bases

of an associated codon.

References

[ 11 O.V. Davydov, Dok. Akad. Nauk. Belarusi 38 ( 1994) 80.

[ 2 1 T. Avager. G. Graham, D. Hutchison, .I. Westbgard, J. Chem.

Info. Comput. Sci. 34 (1994) 820.

[ 31 M. Digiulio, M.R. Capobianco, M. Medugno, J. Theor. Biol.

168 (1994) 43.

[4] T.H. Jukes, Cell. Molec. Biol. Res. 39 ( 1994) 685.

151 A. Jimenez-Sanchez, J. Molec. Evol. 41 ( 1995) 712.

[6] S.N. Rodin, S. Ohmo, Proc. Nat. Acad. Sci. 94 (1997) 5183.

171 R. Ferrein. A.R.D. Cavalcanti, Orig. Life 27 (1997) 397.

181 W.P. Tate, S.A. Mannering, Molec. Microbial. 21 (1996)

213.

19) R. Amimovin, J. Molec. Evol. 44 ( 1997) 473.

[IO1 G. Rosen, Bull. Math. Biol. 53 (1991) 845.

[ I1 1 L.J. Mordell, Diophantine Equations (Academic Press,

London, 1969). in particular pp. 30-33.

[ 121 S. Lang, The Beauty of Doing Mathematics (Springer, New

York, 1985) pp. 31-69.