Upload
gerald-rosen
View
214
Download
2
Embed Size (px)
Citation preview
29 March 1999
ELSEVIER
PHYSICS LETTERS A
Physics Letters A 253 ( 1999) 354-357
Molecular content relations in the genetic code
Gerald Rosen Department of Physics, Drexel University, Philadelphiu. PA 19104, USA
Received 8 October 1999; accepted for publication 20 January 1999
Communicated by A.P. Fordy
Abstract
The codons are numbered from 1-64 by a simple formula based on the number of carbon, nitrogen and oxygen atoms in the nucleotides of each triplet. The codon range numbers that follow for the 20 amino acids are shown to be given by linear Diophantine and explicit molecular content equations in the number of carbon, nitrogen, oxygen and sulfur atoms in each amino acid. Thus the universal genetic code that associates codons and amino acids is expressed in a precise way by purely physical molecular content relations. @ 1999 Published by Elsevier Science B.V.
PAC.% 87.15.-v; 87.10.+e
Keywords: Genetic code; Codons; Atomic content
1. Introduction
Ever since the decisive resolution of the univer-
sal genetic code by Brenner and Crick in 1966, the codon-amino acid association (see Tables 1 and 2)
has been the intriguing subject of many research
efforts. To account for this codon-amino acid cor- respondence, research studies have pursued Crick’s “frozen accident” conjecture and elaborated upon Woese’s amino acid-codon physicochemical inter- action proposal (expanded in the past decade to include biosynthetic relationships between precursor
and product amino acids) [l-7]. However, recent analyses by Tate and Mannering [8] and also by Amirnovin [9 ] suggest that the codon-amino acid correlations investigated in former works are simply statistical in essence and are not manifestations of an underlying precise theoretical association.
Hence it is of considerable interest that precise molecular content relations of a purely physical nature
(without reference to chemical interaction properties)
exist between the codons and the amino acids. As shown in the present communication, these molecular
content relations depend in a straightforward manner on the number of carbon, nitrogen, oxygen and sulfur atoms in the amino acid and associated codon nu-
cleotides. The physically-based codon number [ lo]
is required in order to formulate the molecular content relations, and thus the codon number prescription is
reviewed here first as a necessary preliminary.
2. Codon number
Contained in U, C, A and G, respectively, the pyrim- idines uracil and cytosine and purines adenine and gua- nine are progressively larger in molecular size. This monotonic size-gradation is essentially a consequence of the relative number of carbon, nitrogen and oxygen atoms in the bases and does not depend on the number
037%9601/99/$ - see front matter @ 1999 Published by Elsevier Science B.V. All rights reserved. PIISO375-9601(99)00075-4
G. Rosen/Physics Letters A 253 (1999) 354-357 355
Table 1 The universal genetic code that associates 61 RNA codons - triplets of the RNA nucleotides U, C, A, G - with 20 different amino acids;
in particular, AUG for methionine starts a protein while UAA, UAG and UGA signal “stop” by not being associated with any amino acid
Codon Amino acid Codon Amino acid Codon Amino acid Codon Amino acid
uuu
uuc
UUA
UUG
cuu
cut
CUA
CUG
AUU
AUC
AUA
AUG
Phenylalanine
Phenylalanine
Leucine
Leucine
Leucine
Leucine
Leucine
Leucine
tsoleucine
lnoleucine
lsoleucine
Methionine
(start)
ucu
ucc
UCA
UCG
ecu
ccc
CCA
CCG
ACU
ACC
ACA
ACG
Serine
Serine
Serine
Serine
Proline
Proline
Proline
Proline
Threonine
Threonine
Threonine
Threonine
UAU
UAC
UAA
UAG
CAU
CAC
CAA
CAG
AAU
AAC
AAA
AAG
Tyrosine
Tyrosine
stop
stop
Histidine
Histidine
Glutamine
Glutamine
Asparagine
Asparagine
Lysine
Lysine
UGU
UGC
UGA
UGG
CGU
CCC
CGA
CGG
AGU
AGC
AGA
AGG
Cysteine
Cysteine
stop
Tryptophan
Arginine
Arginine
Arginine
Arginine
Serine
Serine
Arginine
Arginine
GUU Valine
GUC Valine
GUA Valine
GUG Valine
GCU
GCC
GCA
GCG
Alanine
Alanine
Alanine
Alanine
GAU
GAC
GAA
GAG
Aspartic acid
Aspartic acid
Glutamic acid
Glutamic acid
GGU
GGC
GGA
GGG
Glycine
Glycine
Glycine
Glycine
of small peripheral hydrogen atoms; the size index is expressed by A E 2(n~ - no) + no + 2 and has the values
uracil, C4N202 : A=O,
adenine, CsNs : A=2,
cytosine, CdNsO : A= 1,
guanine, C5NsO: A=3,
where the hydrogen content is ignored in writing the molecular formulas. Let these size indices 0, 1, 2, 3 for the bases be assigned to the associated nucleotides
U, C, A, G,
fi(U)=O, il(C)=l,
A(A) =2, h(G) =3.
Then the generic codon 5’ (XYZ)3’ can be given the unique codon number
n(XYZ) - 4A(X) + 16A(Y) + A(Z) + 1. (1)
For example, the codon AUG is assigned the codon numbern(AUG) =4A(A)+l6A(U)+A(G)+l = 12 by formula ( 1). Table 2 shows the sixty-four codon
numbers as annotations on the standard representa- tion. Clearly, the correspondence between codons and codon numbers is one-to-one and simply matched to
the hierarchial column and row progression in the stan- dard representation.
3. Codou range numbers and molecular content of the amino acids
Table 3 displays the codon range numbers n; and nf, the smallest and largest codon numbers (such that n, ,< n < nf) for each amino acid, and the number of carbon, nitrogen, oxygen and SUlfUr atoms (no, nN , no and ns, respectively) contained in each amino acid.
The five amino acids with a ring in their molecular structure (viz. Phe, Pro, Tyr, His, Trp) appear with their symbols enclosed by squares. Also note that the amino acids are grouped in four subsets determined exclusively by their molecular content,
At = {IPheJ, Leu, Ile, Met, Val, m} ,
(nN.no) = (1,3) : A,, = {Ser, Thr, ITyr]} , (3)
356 G. Rosen/Physics Letters A 253 (1999) 354-357
Table 2 Table 3 Contemporary textbook representation of the genetic code, anno-
tated with the codon numbers given by formula ( I). An amino
acid or the stop signal is associated with each of the 64 codons
- ordered triplets of the RNA nucleotides U. C, A, G. Thus, for
example, the codon AUG prescribes the amino acid methionine
(Met). However, the U, C, A, G row and column labels are actu-
ally superfluous here and can be deleted without loss of informa-
tion because the codon number itself implies a uniquely associated
codon by formula ( 1).
Codon range numbers and molecular content of the amino acids.
Ellipsed amino acid symbols indicate molecular ring structure, and
the vertical groupings in the table correspond to the amino acid
subsets (2)-(5). Notice that the stop signals (not associated with
any amino acid and given by codons 35, 36 and 51 in Table 2)
occur at the breaks between Au, Au1 and dill, drv.
Subset Iii nr Amino nc llN 4 4 acid
First Second position Third
position U C A G position
(5’ end) (3’end)
U 1 Phe 17 Ser 33 Tyr 49 cys u
2 Phe I8 Ser 34 Tyr 50 cys c
3 Leu 19 Ser 35 stop 51 stop A
4 Leu 20 Ser 36 Stop 52 Trp G
C 5 Leu 21 Pro 37 His 53 Arg U
6 Leu 22 Pro 38 His 54 Arg C
7 Leu 23 Pro 39 Gin 55 Arg A
8 Leu 24 Pro 40 Gln 56 Arg G
A 9 Ile 25 Thr 41 Asn 57 Ser U
10 Ile 26 Thr 42 Asn 58 Ser C
11 Ile 27 Thr 43 Lys 59 Arg A
12Met 28Thr 44Lys 60 Arg G
G 13 Val 29 Ala 45 Asp 61 Gly u
14 Val 30 Ala 46 Asp 62 Gly C
15 Val 31 Ala 47 Glu 63 GIy A
16 Val 32 Ala 48 Glu 64 Gly G
A
Au
Au
Av
(nC,nN,nO) = (39 192)
or96 (nc+nN+nO) 6 11
with (nN, no) # ( 1,2) :
AlI1 = {Ala,m, Gln, Asn, Lys, Asp, Glu, Cys} ,
(4)
codon range numbers ni, nr and the molecular content numbers nc, nN, no, ns. These relations take the form of linear Diophantine equations [ 111 for both positive
integers n; and nf in the case of -A, and d,v and
explicit equations for ni, nf in the case of AI* and d,II,
(nc + 2nN) > 14 or nc = Ito < 3 :
kv = {m, Arg, Ser, Gly} . (5)
Observe that only Ser satisfies the entrance require-
ments for more than one subset (viz., AI* and dIv) while only Arg has a two-interval composite range in its subset (drv).
4. Molecular content relations
For the amino acids in each subset given by (2)- (5) there is an essentially linear relation between the
For AI : 2nf--n;=49-6nc-7ns+8r, (6)
For A,, : nf - n; = 2 + sgn( 33 - n,f) ,
nf=2(2-r)(2nc-l), (7)
For drrl : nf - n, = 2 + sgn(33 - nf) ,
nf = 14 + 6(nN + no + 3ns)
*2(nc - 3), (8)
For drv : 3nf - 2n; = 90 - 2nc - 8no. (9)
In Eqs. (6)-(8) there appear
I 2 fFiJ 3 8 Leu
9 I1 Ile
12 12 Met
13 16 Val
21 24 lprol
17 20 Ser
25 28 Thr
33 34 15rl
29 32 Ala
37 38 IHisJ
39 40 Gln
41 42 Asn
43 44 Lys
4s 46 Asp
47 48 Glu
49 50 cys
52 52 lTrpl 53, 59 56. 60 Arg
57 58 Ser
61 64 Gly
9 I 6 1
6 1
5 I 5 I 5 1
3 I 4 I 9 I
3 1
6 3
5 2
4 2
6 2
4 I 5 I 3 I
II 2
6 4
3 I 2 1
2
2
2
2
2
2
3
3
3
2
2
3
3
2
4
4
2
2
2
3
2
0
0
0
I
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
G. Rosen/Phy.rits Letters A 253 (1999) 354-357 351
r = 0 for amino acids without a molecular ring,
= 1 for amino acids with a molecular ring, (IO)
sgn(33-nf)r+l fornf<33,
=--I fornf>33, (11)
and the Z!Z in the second member of (8) is defined by
*~+l fornN#3#no,
-_ -1 fornN=3orno=3. (12)
To see how these relations work, first consider (6) for At. By evaluating the right side of (6) for the At
amino acids in Table 3, one obtains
2rzf - nj = 3 for m,
2nf - 12; = 13 for Leu and Ile,
2nf - n, = 12 for Met,
2i11 - 12; = 19 for Val,
2nf - n; = 27 for m. (13)
Eqs. ( 13) are all satisfied by the ni, nf numbers for the amino acids of the dr subset in Table 3. Conversely, the linear Diophantine equations ( 13) yield the correct ni, nf numbers successively for the six amino acids in dr subject to maximal codon utilization (requir-
ing, for example, n; = 1 and not 3 for IPhe(), and the
precedence of Leu over its isomer Ile in assigning the
codon range number solutions (n;, nf) = (3,8) and (ni, nf) = (9,ll) generated by the second member of ( 13). With the latter rather modest supplementary postulates, Eq. (6) is both necessary and sufficient for
the codon ranges featured by the six amino acids in
At. The explicit equations (7) and (8) are easily veri-
fied to be necessary and sufficient for the amino acids in dir and dtn by direct substitution of the appropri- ate nc, nN, no, ns numbers shown in Table 3. Finally, the linear Diophantine equations (9) for dtv produce
3n, - 2n; = 52 for ITrpl,
3n.f - 2n; = 62 for Arg,
3t1r - 2iri = 60 for Ser,
3n, - 2n; = 70 for Gly. (14)
The first member in ( 14) and the general requirement ni < nf imply that n; = nf = 52 form, because Cys in dtn already has nf = 50. Similarly, the second, third and fourth members of ( 14) and the postulate of max- imal codon utilization imply the correct II,, nf codon
range numbers for Arg, Ser and Gly successively. The
two-interval composite range for Arg, 53 6 f~ < 56 and 59 < n < 60, emerges with a characteristic de-
gree of Diophantine economy [ 11 ,I 21 while the so-
lution to the fourth member of ( 14) for Gly follows immediately from n,f < 64.
5. Concluding remarks
The molecular content relations (6)-( 9) are neces- sary and sufficient for all 20 amino acids with the sup- plementary postulates of maximal admissible codon utilization and the precedence of Leu (leucine) over Ile (isoleucine) in assigning the two solutions ob-
tained from the second member of (13). Therefore,
the universal genetic code is essentially expressed in a precise manner by the purely physical molecular con- tent relations (6)-( 9), equations linear in fz;, nf, The
number of carbon, nitrogen, oxygen and sulfur atoms in an amino acid thus relates directly to the number of carbon, nitrogen and oxygen atoms in the three bases
of an associated codon.
References
[ 11 O.V. Davydov, Dok. Akad. Nauk. Belarusi 38 ( 1994) 80.
[ 2 1 T. Avager. G. Graham, D. Hutchison, .I. Westbgard, J. Chem.
Info. Comput. Sci. 34 (1994) 820.
[ 31 M. Digiulio, M.R. Capobianco, M. Medugno, J. Theor. Biol.
168 (1994) 43.
[4] T.H. Jukes, Cell. Molec. Biol. Res. 39 ( 1994) 685.
151 A. Jimenez-Sanchez, J. Molec. Evol. 41 ( 1995) 712.
[6] S.N. Rodin, S. Ohmo, Proc. Nat. Acad. Sci. 94 (1997) 5183.
171 R. Ferrein. A.R.D. Cavalcanti, Orig. Life 27 (1997) 397.
181 W.P. Tate, S.A. Mannering, Molec. Microbial. 21 (1996)
213.
19) R. Amimovin, J. Molec. Evol. 44 ( 1997) 473.
[IO1 G. Rosen, Bull. Math. Biol. 53 (1991) 845.
[ I1 1 L.J. Mordell, Diophantine Equations (Academic Press,
London, 1969). in particular pp. 30-33.
[ 121 S. Lang, The Beauty of Doing Mathematics (Springer, New
York, 1985) pp. 31-69.