The genetic code as a clue to understanding of molecular evolution

  • Published on

  • View

  • Download


<ul><li><p>J. theor. Biol. (1989) 141, 379-389 </p><p>The Genetic Code as a Clue to Understanding of Molecular Evolution </p><p>VITALY V. SUKHODOLETS </p><p>Molecular Genetics Division, Institute of Genetics and Selection of Industrial Microorganisms, Moscow 113545, U.S.S.R. </p><p>(Received 15 December 1988, and accepted in revised form 19 June 1989) </p><p>The genetic code is comprised of a system concerning the distribution of doublets of the first two codon bases among amino acids. According to this system a definite order in the relative distribution of the first and the second codon bases coincides with a definite order among the common amino acids and their distribution for the number of hydrogen atoms per molecule (an unexpected parameter). The pattern of the relative distribution of the first and the second codon bases suggests it originated from a crystalline-like structure in which the set of bases AUGC served as an elementary structural unit and the base doublets played the role of structural analogs to the amino acids. These hypothetical crystalline-like aggregates are com- posed of the free molecules of amino acids and bases, and although ditferent in their composition, should have an even number of hydrogen atoms per standard structural module. </p><p>Introduction </p><p>A recurring theme in many reports on the genetic code is the search for a stereochemical affinity between the amino acids and their codons. Recently, Hendry et al. (1981a, b) drew attention to a structural similarity in the amino acid radicals and the second bases of their codons: according to these data almost all amino acids fit "cavities" formed by their second codon bases in the B-DNA helix. Another stereochemical approach to the genetic code involves studies on the specific interac- tion between the anticodon dinucleotides (complementary to the first two codon nucleotides) and their cognate amino acids (Weber &amp; Lacey, 1978; Jungck, 1978, Shimizu, 1987). </p><p>Hendry et al. (1981a, b) have cited more than 40 references in which different ideas on a stereochemical rationale for the genetic code were put forward. However, in the present paper, we do not intend to analyse the extensive literature devoted to the genetic code. A suitable review was given in the paper of Root-Bernstein (1982b). This author argued that three chemical criteria determined the evolution of the genetic code: codon-anticodon pairing, codon-amino acid pairing and amino acid pairing. This approach implies that "the code was always specific" due to "stringent constraints of stereochemistry and thermodynamics". Such a notion seems to be fairly acceptable in the context of biological evolution. Yet our own work suggests that the evolutionary basis of the code should be treated as a structure rather than as a pairing of molecules. </p><p>379 </p><p>0022-5193/89/230379+ 11 $03.00/0 1989 Academic Press Limited </p></li><li><p>380 v .v . SUKHODOLETS </p><p>The present paper deals with the genetic code itself. We feel that to solve the problem of the origin of the genetic code it is necessary to perceive the existence of a certain system in which the relative distribution of the first two codon bases and the distribution of the common amino acids are based on a seemingly unimpor- tant criterion--the number of hydrogen atoms per molecule (Sukhodolets, 1980, 1985). It would appear to be true in this case because it is possible that life arose from "homologous" crystalline-like composities of amino acids and bases which had an even number of hydrogen atoms per standard structural module. </p><p>A certain system in the distribution of the common amino acids and their cognate base doublets does not extend to some codons. Table 1 represents the genetic code; such codons are indicated by braces; possibly, they result from a "fortuitous" evolution (Sukhodolets, 1982). Firstly there are the minor codon domains for the amino acids Leu, Ser and Arg each having six codons. It should also be said that the order, or the system mentioned, does not extend to the third codon base and, in the case of the amino acids Met, Tyr, His, Glu-- i t also does not extend to the second codon base (see below). </p><p>TABLE 1 </p><p>The genetic code. The braces indicate those codons in which doublets of the first and second bases are believed to change their sense during </p><p>evolution </p><p>UUU UCU UAU UGU Phe Tyr Cys </p><p>UUC UCC UAC UGC Ser </p><p>~UUA Leu UCA UAA Ter {UGA Ter IUUG UCG UAG UGG Trp </p><p>CUU CCU CAU CGU His </p><p>CUC CCC CAC CGC Leu Pro A~ </p><p>CUA CCA CAA CGA Gin </p><p>CUG CCG CAG CGG </p><p>AUU ACU AAU (AGU Asn ~ Ser </p><p>AUC lie ACC AAC LAGC Thr </p><p>AUA ACA AAA fAGA Lys ~ Arg </p><p>AUG Met ACG AAG LAGG </p><p>GUU GCU GAU GGU Asp </p><p>GUC GCC GAC GGC Val Ala Gly </p><p>GUA GCA GAA GGA Glu </p><p>GUG GCG GAG GGG </p></li><li><p>THE GENETIC CODE AND MOLECULAR EVOLUTION 381 </p><p>The aim of this work is to shed light on the problem of where a particular order for the hydrogen atoms among the common amino acids and bases come from. </p><p>A System of Distribution of the 20 Common Amino Acids and the Distribution of their Cogante Base Doublets in the Families </p><p>In Table 2 the 20 common amino acids are distributed into groups in relation to the number of hydrogen atoms in the molecules. According to this distribution separate groups contain, as a rule, either one or four amino acids. To follow this regularity we have arbitrarily combined, into a single group, the amino acid pair Leu and Ile, each containing 13 hydrogen atoms and the pair Arg and Lys, each containing 14 hydrogen atoms. As a result we have obtained four large groups, of four amino acids, each containing 7, 9, 11 and, 13 or 14 hydrogen atoms respectively. Within these large groups we have arranged the amino acids in an order which makes it easier to notice a certain pattern in the parallel arrangement of the first and the second codon bases (Table 2). </p><p>TABLE 2 </p><p>An order in the genetic code revealed in the distribution of amino acids into groups according to number of hydrogen atoms per molecule (see text) </p><p>Base of codon No. of group Amino acids First Second </p><p>7 Ala Ser Asp Cys ~ 8 Asn A A 9 Pro Thr Glu His ~ A A </p><p>10 </p><p>11 Val Phe Met Tyr ~ U A 12 Trp G </p><p>13-14 Leu lie Arg Lys ~ </p><p>The amino acid families inferred and corresponding base doublets: </p><p>I II III IV V </p><p>Ala Ser Val Phe Asp Cys Gly Trp Glu His GC UC GU UU GA UG GG UG G(A) C(A) CU AU CC AC CG AA CA AA A(U) U(A) Leu lie Pro Thr Arg Lys Gin Asn Met Tyr </p><p>In fact, the amino acid groups mentioned are repeating sets of the first codon bases (G-U, or C-A) and the second codon bases (C-C, or U-U, or A-G) for the same amino acid pairs. Thus, for example, a set of the first codon bases G-U is repeated for the amino acid pairs Ala-Ser, Asp-Cys, Val-Phe, while a set of the second codon bases C-C is repeated for the amino acid pairs Ala-Ser and Pro-Thr, and so on. </p></li><li><p>382 v .v . SUKHODOLETS </p><p>From these repetitions one can discriminate between the amino acid pairs Ala-Ser, Asp-Cys, Pro-Thr, Val-Phe, Leu-I le, and Arg-Lys, and then the 20 common amino acids could be divided into five particular sets or families, each containing four amino acids. </p><p>In Table 2, the repeating sets of the bases are boxed, and for the first codon bases their grouping into families is indicated by arrows. The amino acid families and the corresponding base doublets comprised of the first two codon letters are given in the lower part of Table 2. </p><p>Formal criteria which permit the isolation of the amino acid families are the following: </p><p>(1) the first codon bases of amino acids which belong to a single family form the set AUGC. </p><p>(2) The second codon bases of amino acids in a single family are either purines or pyrimidines. Moreover, if two amino acids in the same family have the first codon bases showing Watson-Crick complementarity, i.e. A -U or G-C, then their second codon bases give the pairs C -U or A-G. Hence, sets of the second codon bases in families are UUCC or AAGG. (An exception is family V in which there are no regularities at all relative to the second codon bases.) </p><p>(3) If two amino acids of a single family have the first codon bases showing Watson-Crick complementarity (as, e.g. the pairs Ala-Leu, Ser-Ile, etc. see Table 2) such amino acids usually contain a total of 20 hydrogen atoms. Hence, the amino acids of one family contain a total of 40 hydrogen atoms. This rule would not seem to be observed in family I I I since the pairs Asp-Arg and Cys-Lys each contain 21 hydrogen atoms. It will be shown below that family II I is actually comprised of amino acids from two different families, and in effect, this "exception" proves a more general basic rule. The 15 hydrogen atoms contained in the pair Gly-Gln in family IV is another exception; it could be explained by the necessity of taking into account two glycine molecules. </p><p>It should be stressed that the amino acid groups or families inferred in Table 2 are the only possible. The criteria mentioned leaves no freedom in choosing which amino acids to group together. For instance, Aia and Ser can not be placed together with Arg and Lys because in this case a proper set of the second codon bases (i.e. UUCC or AAGG) would not be obtained. Likewise, Tyr from family V cannot replace Phe in family II or, His cannot replace Pro because in these cases proper combinations of the second codon bases in family II would not be maintained. </p><p>Thus, the above mentioned criteria represent some stringent rules of selection, in each family, on one hand, the amino acids and, on the other, their cognate codon bases. Such a parallelism could arise because the amino acids did serve as analogs to base doublets: in this case a definite order for the amino acids should be consistent with some order for the codon bases as well. What does this overall order mean? </p><p>Hypothetical Crystalline-like Structures Composed of Amino Acids and Bases </p><p>Bearing in mind that the set AUGC conforms to the first codon bases in all families and that two sets of the second codon bases from different families (namely, </p></li><li><p>THE GENET IC CODE AND MOLECULAR EVOLUTION 383 </p><p>UUCC and AAGG) might be equivalent in their arrangement to the double AUGC, one could come to the possibility that the set AUGC served as standard unit or block in crystalline-like aggregates in which the amino acids and the base doublets might alternate as structural analogs. </p><p>In this case different amino acid families could correspond to different structures in the relative packing of the bases. Although, so far the actual relative arrangement of the bases within the hypothetical association AUGG, is not known, one could assume that there are Watson-Crick interactions in the pairs A-U and G-C and one could depict some conditional mutual orientation for these two base pairs. Then, such conditional structures as "x-associates" (Sukhodolets, 1980) could be arranged relative to each other in such a way as to obtain the compounds of neighbouring bases corresponding to the base doublets in family I or to those ones in family II (Fig. 1). </p><p>I I I </p><p>G C </p><p>A </p><p>G </p><p>/u A"-- I u </p><p>C </p><p>c 6 J </p><p>A U C G </p><p>A U </p><p>G C </p><p>FIG 1. Two different variants for the relative orientation of x-associates formed by their neighbouring bases, the doublets correspond to those in families I and II. Sets of the bases forming the four doublets of a single family are isolated by rectangles. </p><p>The second codon bases in families I and II are pyrimidines and for this reason the "external" x-associates depicted in Fig. 1 are oriented in such a way that purine bases are "facing" the outside. Hence, the complete regular structure would comprise of alternating compartments with the x-associates surrounded either by purines or pyrimidines. Such structures as A and B would have been composed from the base doublets in families I and II, respectively (Fig. 2). </p><p>The most important inference from examining structures A and B is that the one amino acid pair in family III, namely, Arg-Asp fits structure A, whereas another amino acid pair in this family, Lys-Cys, conforms to structure B (Fig. 2). Moreover, in structure A the pair Arg-Asp should have as its neighbours (within the "purine" compartment, Fig. 2) the base doublets UA and AG which have no amino acid analogs in our system (Table 1). </p><p>Therefore, family III, as it is depicted in Table 2, seems to include amino acid pairs representing the two different families designated here as III A and III B (Table 3). As hypothetical elementary constituents of the prebiological structures, </p></li><li><p>--[c </p><p>AJ </p><p>Ile</p><p> I- </p><p>Ato</p><p> G</p><p> AU</p><p>l IL</p><p>e </p><p>A(o</p><p> G</p><p>C U </p><p>'&gt; </p><p>A G </p><p>.eu </p><p>Se</p><p>t i </p><p>&lt; </p><p>i I </p><p>Se</p><p>r &gt;</p><p> A </p><p>U </p><p>GC</p></li></ul>