Transcript

J. theor. Biol. (1989) 141, 379-389

The Genetic Code as a Clue to Understanding of Molecu lar Evolution

VITALY V. SUKHODOLETS

Molecular Genetics Division, Institute of Genetics and Selection of Industrial Microorganisms, Moscow 113545, U.S.S.R.

(Received 15 December 1988, and accepted in revised form 19 June 1989)

The genetic code is comprised of a system concerning the distribution of doublets of the first two codon bases among amino acids. According to this system a definite order in the relative distribution of the first and the second codon bases coincides with a definite order among the common amino acids and their distribution for the number of hydrogen atoms per molecule (an unexpected parameter). The pattern of the relative distribution of the first and the second codon bases suggests it originated from a crystalline-like structure in which the set of bases AUGC served as an elementary structural unit and the base doublets played the role of structural analogs to the amino acids. These hypothetical crystalline-like aggregates are com- posed of the free molecules of amino acids and bases, and although ditferent in their composition, should have an even number of hydrogen atoms per standard structural module.

Introduction

A recurring theme in many reports on the genetic code is the search for a stereochemical affinity between the amino acids and their codons. Recently, Hendry et al. (1981a, b) drew attention to a structural similarity in the amino acid radicals and the second bases of their codons: according to these data almost all amino acids fit "cavi t ies" formed by their second codon bases in the B-DNA helix. Another stereochemical approach to the genetic code involves studies on the specific interac- tion be tween the ant icodon dinucleotides (complementary to the first two codon nucleotides) and their cognate amino acids (Weber & Lacey, 1978; Jungck, 1978, Shimizu, 1987).

Hendry et al. (1981a, b) have cited more than 40 references in which different ideas on a stereochemical rationale for the genetic code were put forward. However, in the present paper, we do not intend to analyse the extensive literature devoted to the genetic code. A suitable review was given in the paper of Root-Berns te in (1982b). This author argued that three chemical criteria determined the evolution of the genetic code: codon-ant icodon pairing, codon-amino acid pairing and amino acid pairing. This approach implies that " the code was always specific" due to "str ingent constraints of stereochemistry and thermodynamics" . Such a notion seems to be fairly acceptable in the context o f biological evolution. Yet our own work suggests that the evolutionary basis of the code should be treated as a structure rather than as a pairing of molecules.

379

0022-5193/89/230379+ 11 $03.00/0 © 1989 Academic Press Limited

380 v . v . SUKHODOLETS

The present paper deals with the genetic code itself. We feel that to solve the problem of the origin of the genetic code it is necessary to perceive the existence of a certain system in which the relative distribution of the first two codon bases and the distribution of the common amino acids are based on a seemingly unimpor- tant cri terion--the number of hydrogen atoms per molecule (Sukhodolets, 1980, 1985). It would appear to be true in this case because it is possible that life arose from "homologous" crystalline-like composities of amino acids and bases which had an even number of hydrogen atoms per standard structural module.

A certain system in the distribution of the common amino acids and their cognate base doublets does not extend to some codons. Table 1 represents the genetic code; such codons are indicated by braces; possibly, they result from a "fortuitous" evolution (Sukhodolets, 1982). Firstly there are the minor codon domains for the amino acids Leu, Ser and Arg each having six codons. It should also be said that the order, or the system mentioned, does not extend to the third codon base and, in the case of the amino acids Met, Tyr, His, Glu- - i t also does not extend to the second codon base (see below).

TABLE 1

The genetic code. The braces indicate those codons in which doublets of the first and second bases are believed to change their sense during

evolution

UUU UCU UAU UGU Phe Tyr Cys

UUC UCC UAC UGC Ser

~UUA Leu UCA UAA Ter {UGA Ter IUUG UCG UAG UGG Trp

CUU CCU CAU CGU His

CUC CCC CAC CGC Leu Pro A~

CUA CCA CAA CGA Gin

CUG CCG CAG CGG

AUU ACU AAU (AGU Asn ~ Ser

AUC lie ACC AAC LAGC Thr

AUA ACA AAA fAGA Lys ~ Arg

AUG Met ACG AAG LAGG

GUU GCU GAU GGU Asp

GUC GCC GAC GGC Val Ala Gly

GUA GCA GAA GGA Glu

GUG GCG GAG GGG

THE GENETIC CODE AND MOLECULAR EVOLUTION 381

The aim of this work is to shed light on the problem of where a particular order for the hydrogen atoms among the common amino acids and bases come from.

A System of Distribution of the 20 Common Amino Acids and the Distribution of their Cogante Base Doublets in the Families

In Table 2 the 20 common amino acids are distributed into groups in relation to the number of hydrogen atoms in the molecules. According to this distribution separate groups contain, as a rule, either one or four amino acids. To follow this regularity we have arbitrarily combined, into a single group, the amino acid pair Leu and Ile, each containing 13 hydrogen atoms and the pair Arg and Lys, each containing 14 hydrogen atoms. As a result we have obtained four large groups, of four amino acids, each containing 7, 9, 11 and, 13 or 14 hydrogen atoms respectively. Within these large groups we have arranged the amino acids in an order which makes it easier to notice a certain pattern in the parallel arrangement of the first and the second codon bases (Table 2).

TABLE 2

An order in the genetic code revealed in the distribution of amino acids into groups according to number of hydrogen atoms per molecule (see text)

Base of codon No. of group Amino acids First Second

7 Ala Ser Asp Cys ~ 8 Asn A A 9 Pro Thr Glu His ~ A A

10

11 Val Phe Met Tyr ~ U A 12 Trp G

13-14 Leu lie Arg Lys ~

The amino acid families inferred and corresponding base doublets:

I II III IV V

Ala Ser Val Phe Asp Cys Gly Trp Glu His GC UC GU UU GA UG GG UG G(A) C(A) CU AU CC AC CG AA CA AA A(U) U(A) Leu lie Pro Thr Arg Lys Gin Asn Met Tyr

In fact, the amino acid groups mentioned are repeating sets of the first codon bases (G-U, or C-A) and the second codon bases (C-C, or U-U, or A-G) for the same amino acid pairs. Thus, for example, a set of the first codon bases G - U is repeated for the amino acid pairs Ala-Ser, Asp-Cys, Val-Phe, while a set of the second codon bases C -C is repeated for the amino acid pairs Ala-Ser and Pro-Thr,

and so on.

382 v . v . SUKHODOLETS

From these repetitions one can discriminate between the amino acid pairs Ala-Ser, Asp-Cys , Pro-Thr, Val-Phe, Leu-I le , and Arg-Lys, and then the 20 common amino acids could be divided into five particular sets or families, each containing four amino acids.

In Table 2, the repeating sets of the bases are boxed, and for the first codon bases their grouping into families is indicated by arrows. The amino acid families and the corresponding base doublets comprised of the first two codon letters are given in the lower part of Table 2.

Formal criteria which permit the isolation of the amino acid families are the following:

(1) the first codon bases of amino acids which belong to a single family form the set AUGC.

(2) The second codon bases of amino acids in a single family are either purines or pyrimidines. Moreover, if two amino acids in the same family have the first codon bases showing Watson-Crick complementar i ty , i.e. A - U or G - C , then their second codon bases give the pairs C - U or A-G. Hence, sets of the second codon bases in families are U U C C or AAGG. (An exception is family V in which there are no regularities at all relative to the second codon bases.)

(3) If two amino acids of a single family have the first codon bases showing Watson-Crick complementar i ty (as, e.g. the pairs Ala-Leu, Ser-I le, etc. see Table 2) such amino acids usually contain a total of 20 hydrogen atoms. Hence, the amino acids of one family contain a total of 40 hydrogen atoms. This rule would not seem to be observed in family I I I since the pairs Asp-Arg and Cys-Lys each contain 21 hydrogen atoms. It will be shown below that family III is actually comprised of amino acids from two different families, and in effect, this "except ion" proves a more general basic rule. The 15 hydrogen atoms contained in the pair G ly -Gln in family IV is another exception; it could be explained by the necessity of taking into account two glycine molecules.

It should be stressed that the amino acid groups or families inferred in Table 2 are the only possible. The criteria mentioned leaves no freedom in choosing which amino acids to group together. For instance, Aia and Ser can not be placed together with Arg and Lys because in this case a proper set of the second codon bases (i.e. UUCC or AAGG) would not be obtained. Likewise, Tyr from family V cannot replace Phe in family II or, His cannot replace Pro because in these cases proper combinations of the second codon bases in family II would not be maintained.

Thus, the above mentioned criteria represent some stringent rules of selection, in each family, on one hand, the amino acids and, on the other, their cognate codon bases. Such a parallelism could arise because the amino acids did serve as analogs to base doublets: in this case a definite order for the amino acids should be consistent with some order for the codon bases as well. What does this overall order mean?

Hypothetical Crystalline-like Structures Composed of Amino Acids and Bases

Bearing in mind that the set AUGC conforms to the first codon bases in all families and that two sets of the second codon bases from different families (namely,

THE G E N E T I C C O D E A N D M O L E C U L A R E V O L U T I O N 383

UUCC and AAGG) might be equivalent in their arrangement to the double AUGC, one could come to the possibility that the set AUGC served as standard unit or block in crystalline-like aggregates in which the amino acids and the base doublets might alternate as structural analogs.

In this case different amino acid families could correspond to different structures in the relative packing of the bases. Although, so far the actual relative arrangement of the bases within the hypothetical association AUGG, is not known, one could assume that there are Watson-Crick interactions in the pairs A-U and G-C and one could depict some conditional mutual orientation for these two base pairs. Then, such conditional structures as "x-associates" (Sukhodolets, 1980) could be arranged relative to each other in such a way as to obtain the compounds of neighbouring bases corresponding to the base doublets in family I or to those ones in family II (Fig. 1).

I I I

G C

A

G

/u A"-- I u

C

c 6 J

A U C G

A U

G C

FIG 1. Two different variants for the relative orientation of x-associates formed by their neighbouring bases, the doublets correspond to those in families I and II. Sets of the bases forming the four doublets of a single family are isolated by rectangles.

The second codon bases in families I and II are pyrimidines and for this reason the "external" x-associates depicted in Fig. 1 are oriented in such a way that purine bases are "facing" the outside. Hence, the complete regular structure would comprise of alternating compartments with the x-associates surrounded either by purines or pyrimidines. Such structures as A and B would have been composed from the base doublets in families I and II, respectively (Fig. 2).

The most important inference from examining structures A and B is that the one amino acid pair in family III, namely, Arg-Asp fits structure A, whereas another amino acid pair in this family, Lys-Cys, conforms to structure B (Fig. 2). Moreover, in structure A the pair Arg-Asp should have as its neighbours (within the "purine" compartment, Fig. 2) the base doublets UA and AG which have no amino acid analogs in our system (Table 1).

Therefore, family III, as it is depicted in Table 2, seems to include amino acid pairs representing the two different families designated here as III A and III B (Table 3). As hypothetical elementary constituents of the prebiological structures,

--[c

AJ

Il

e I-

Ato

G

AU

l IL

e

A(o

G

C U

'>

A G

.eu

Set

i

<

i I

Ser

>

A U

<o

> G

C

< Le

u U

3 A

rg

Asp

-

CG

<

UA

&

3 A

rg

Asp

--

C

G

< U

A

--'q

I L ,U

;C

~[U

;C

~uj

I

GI

I I L A

U

GC

I I 1

AU

"hr

>A

i >

G

ral

hr

> A

>- G

/at

Phe

J

< ;

< Pro

Phe

I <

; < P

ro

> C

ys

GLn

Trp

j-I

]G

JA

<

Lys

A G

~G

<G

_G

l y~

._.

JA

<

Asn

A

G

J

I

j i

I

G'i

b,

--7

i

I I

C(

i u,

A

Lo

Leu

J C

~

> G.C

< ....

A

> A

U

~ ....

. >

CG

<

1 Il

e

Ser

A

rg

Asp

C

G

AU

I

UA

G

C

ALe

Le

u I

>G

C

< U

A

I

> A

U

< >

CG

<

I Il

e

Ser

A

sp

, A

rg

A

G,

1 1

: I

__

J

G AI

I'

: U

I C

G

Va

l P

ro

Lys

>

GC

<

>U

A

<

u <

C

G

GC

AU

GC

A

U

"hr

/ol

G

> A

Fhr

I Ph

e

Pro

:,

<

j< P

he

IA

Cys

C

I A

:G

Trp

>'

C

Gln

IA

:G

Asn

3 .~

=-

<--

Gly

Alu

I I I G

~C

I A

U

I ....

........

i

c I

I _J

FiG

. 2.

Tw

o pu

tati

ve c

ryst

alli

ne s

truc

tura

l pa

tter

ns i

nclu

ding

the

amin

o ac

ids

and

the

base

s. C

ondi

tion

al r

epre

sent

atio

ns s

how

the

stru

ctur

es

:om

pose

d of

am

ino

acid

s an

d si

mul

tane

ousl

y th

eir

anal

ogs

com

pose

d of

bas

es.

The

am

ino

acid

s ar

e de

pict

ed i

n th

e fo

rm o

f ar

row

s di

rect

ed

owar

ds t

he f

irst

cod

on b

ase.

The

rec

tang

les

isol

ate

cond

itio

nal

stru

ctur

al c

ompa

rtm

ents

or

mod

ules

. A

and

A'

and

B a

nd B

' re

pres

ent

leig

hbou

ring

"la

yers

" al

tern

atin

g in

the

dir

ecti

on p

erpe

ndic

ular

to

the

plan

e of

the

fig

ure.

T H E G E N E T I C C O D E A N D M O L E C U L A R E V O L U T I O N

TABLE 3

Families of the amino acids and base doublets representing elements of the prebiological crystalline associations. The figures given in paren- thesis indicate numbers of hydrogen atoms. The dotted lines separate structural modules containing each 20 hydrogen atoms (see also text)

385

I

Ala Ile (7) (13)

(13) (7) Leu Ser

IV

Gly Asn (5) (8) (10) (12) Gin Trp

Val (11)

II

Thr (9)

(9) Pro

V

Giu (9)

(11) Met

III A III B

Asp AG GG Lys (7) (10) (10) (14)

(11) (14) (9) (10) (7) Phe Arg UA CA Cys

His (9) (11) Tyr

it would be unusual for the families I I I A and III B to simultaneously contain both the amino acids and the bases.

At this point the above mentioned exception, concerning the number of hydrogen atoms (21) in the pair Arg-Asp is elucidated. In fact, coupled with the doublets UA and AG (19 hydrogen atoms) this pair gives the necessary 40 hydrogen atoms per compartment. (Free molecules of the bases A, G, C, U, and T contain 5, 5, 5, 4, and 6 hydrogen atoms respectively.) With regard to the amino acid pair Cys-Lys, which also contains 21 hydrogen atoms, this noncorrespondence seems to be explained by the fact that two neighbouring cystein molecules will readily lose two protons on being oxidized to cystin. Therefore, the pair Cys-Lys also gives 40 hydrogen atoms per compartment in the complex with the doublets GG and CA (20 hydrogen atoms).

One can ask what the number of hydrogen atoms has to do with crystalline aggregates? Though an even distribution of all kinds of atoms would take place in any ordered molecular aggregate of the crystalline type, hydrogen atoms appear to give no useful pattern in X-ray crystallography. Yet the distribution of hydrogen atoms may reflect the state of the structure with respect to its local affinity to a proton. In other words, an even distribution of hydrogen atoms seems to reflect the kind of equilibrium installed within the crystalline-like molecular aggregate in relation to the acidic-basic properties of its constituents.

In ionic crystals anions and cations should alternate in a regular fashion to counterbalance each other. Therefore, it is natural to assume that in crystalline-like aggregates composed of amino acids an equilibrium extends to the amino acids' affinities to a proton. Another issue is in fact, that such amino acids as Asp, Cys, Glu or Arg, Lys, His have the numbers of hydrogen atoms indicated above only in the unionized form, i.e. over restricted pH values (too low or too high). However,

386 V. V. S U K H O D O L E T S

the amino acid families mentioned will just allow an electrochemical counterbalanc- ing for the acidic-basic properties of amino acids in the pairs Arg-Asp, Cys-Lys, and His-Glu in which the unionized state and, therefore, the "necessary" proton numbers are expected to be restored.

Now, let us turn our attention to the fact that there are only 19 hydrogen atoms in the association AUGC and for this reason in a crystalline-like structure of single bases, an elementary complex probably served as a set of eight bases including associates AUGC and ATGC. In other words the bases U (4 H-atoms) and T (6 H-atoms) might be alternating in a regular manner and therefore 40 hydrogen atoms would correspond to the eight bases representing the standard structural compart- ment. Thus, if structures A and B consisted of single bases they would have contained thymine and uracil in an equal number (not shown in Fig. 2). A fragment of a crystalline-like aggregate of the bases is schematically depicted in Fig. 3. Within this fragment the pairs C - G and T-A alternating from top to bot tom could be considered as a structure that preceded DNA in evolution.

The structures A and B depicted in Fig. 2 represent "infinite" crystals in which some complex molecular patterns are repeated. Meanwhile, crystalline aggregates, that served as precursors of the organisms, should have a rather finite organization. The possibility of the formation of such a finite crystalline structure as a result of combination of the half-compartments of the "pur ine" modules of structures A and

FIG. 3. Schematic representation of a general plan of the crystalline structure that is supposed to precede nucleic acids in evolution.

T H E G E N E T I C C O D E A N D M O L E C U L A R E V O L U T I O N 387

B (Sukhodolets, 1985) has been discussed. Such a combination was provided for by the participation of the amino acids of family IV (Sukhodolets, 1985).

The amino acids of family V, that show no definite order in the set of second codon bases, also seem to have served some special function in organization of the prebiological crystalline forms. Possibly, the amino acids of this family attached with the putative crystalline aggregate along the periphery.

Structural Analogy of the Amino Acids and Bases and the Origin of the Genetic Code

The base codon doublets in family IV correspond to combinations of the bases in a purine compartment in structure B (Fig. 2). Hence, the amino acid pair Trp-Asn from family IV appears to duplicate the pair Cys-Lys from family III B, i.e. these amino acids have the same base doublets as their analogs, namely UG and AA. This coincidence demonstrates indirectly, that the relativeness of the structural analogy between amino acids and base doublets, i.e. two amino acids corresponding to the same base doublet, (e.g. Trp and Cys) could not be alike from a chemical standpoint.

With regard to structures A and B (Fig. 2) the notion of structural analogy between amino acids and bases in effect means that an amino acid and its cognate base doublet performs the same function within a standard structural module. Yet in structures A and B it would seem to be mutually replaceable only in relatively large compartments, or modules, including several molecules of the amino acids or base doublets. There appears to be a similarity, or homology in the level of whole modules containing the same number of hydrogen atoms between different structural com- pounds that have initiated life.

Possibly, the crystalline-like aggregates of amino acids and bases that initiated life, were selected among many such aggregates on the basis of their structural homology. A structural homology of different crystalline forms provides for the possibility of their combining in the same pattern which occurred during the very first stages of molecular evolution. The same number of hydrogen atoms in a standard structural module which can be used as a possible indicator to the local affinity to proton would appear to bear witness to a structural homology of crystalline forms.

If the amino acids and the bases were directly compared one could suggest two analogies in the behaviour of the molecules of these two classes. In the putative crystalline-like structures (Fig. 2 and Fig. 3) there are alternating 6-amino and 6-carbonyl bases, both purines and pyrimidines, i.e. the alternating bases are A and G as well as C and U(T). In fact, such an arrangement of the bases seems to be the most probable for a crystalline-like aggregate because the 6-amino and 6-carbonyl bases, purines and pyrimidines, represent alternative structural chains from the standpoint of their electrochemical, or acidic-basic properties (Sukhodolets, 1980). It is possible that certain amino acid pairs within families and crystalline aggregates just like the 6-amino and 6-carbonyl bases play the role of alternative structural chains for their electrochemical properties. For example, in structure A-A' (Fig. 2) the amino acid pairs alternating in two dimensions are Ile-Ala and Ser-Leu.

388 v . v . SUKHODOLETS

In structure B-B' the amino acid pairs of this type are Val-Thr and Pro-Phe (Fig. 2).

Another analogy in the behaviour of the amino acids and the bases in crystalline- like aggregations may concern the specific Watson-Crick interactions in the base pairs A-U(T) and G-C. It could be supposed that amino acids themselves behave as if they were duplicating such an interaction (expect between the first bases of their cognate base doublets). The corresponding amino acid pairing in structure A would be Ile-Ser, Ala-Leu, and Arg-Asp whereas those in structure B are Thr-Phe, Val-Pro, Cys-Lys and also Gly (x2)-Gln, Trp-Asn (Fig. 2). A mutual orientation of the molecules in these pairs supposingly allows complementary hydrogen bonding between the a-amino and carbonyl groups of the two interacting amino acids. This suggestion implies that the c~-carbon atom of an amino acid occupies the position corresponding to that of the first codon base, whereas the amino acid side chain is directed towards the position of the second codon base. It should be stressed that such amino acid pairings expected in prebiological crystalline aggregates do not coincide with those proposed to take place in proteins between amino acids coded for by complementary codons (see Mekler, 1969; Blalock & Smith, 1984) or by "parallel" complementary codons (Root-Bernstein, 1982a).

How does a coding arise and what does it mean with respect to crystalline forms? Evidently, at this level of biological organization, a coding could appear in the process of heterogenous crystallization in which amino acids as well as bases might be used equally as templates for the reproduction of a definite crystalline pattern. In this view the coding in the process of translation in present day organisms represent a kind of "distant" crystallization by means of special adapters, i.e. the transfer RNA molecules.

Thus, it is plausible that at the early stage of prebiological evolution that the combination of multifarious (though similar in its organization) crystalline-like aggregates comprising amino acids and bases has occurred. Evidently, in spite on subsequent evolutionary alternations the genetic code has retained the specific combinations of bases that reveal the organization of original crystalline forms.

Conclusion

A definite order among the common amino acids for the number of hydrogen atoms per molecule and the corresponding order in the relative distribution of the first and the second codon bases would seem to be evidence that the genetic code originated from the orderly crystalline-like aggregates of amino acids and bases. The genetic code itself would suggests that the two main types of crystalline-like structures (the type A and B) existed and included various amino acids. Both in structure A and B, the standard compartments or modules include either the sets of four amino acids (corresponding to families I, II, IV, and perhaps, V) or the sets of two amino acids and two base doublets (corresponding to families III A and III B). Every set of the molecules forming a standard module in the structures A and B contains a total of 40 hydrogen atoms. This seems to reflect a kind of even electrochemical state along the structure.

T H E G E N E T I C C O D E A N D M O L E C U L A R E V O L U T I O N 389

However, besides the crystalline-like structures composed mainly of the amino acids, their structural analogs composed of the single bases also existed. In these derivatives of the A an B types there were eight bases (2A, 2G, 2C, 1U, and IT) per standard module. The modules composed of bases although different in their arrangement, each contained 40 hydrogen atoms like an amino acid module. Such a homology predetermined the amalgamation of different molecular classes, i.e. amino acids and bases, within the same structure. The subsequent functional isolation of bases as the molecules forming templates (in our time, being the constituents of nucleic acids) originates from a prebiological heterogenous crystalliz- ation in which a definite structure made up of bases could serve as a template for the production of the homologous structure composed of amino acids.

REFERENCES

BLALOCK, J. E. & SMITH, E. M. (1984). Biochim. biophys. Res. Communs 121, 203. HENDRY, L. B., BRANSOME, jr. E. D. & PETERSHE1M, M. (1981a). Orig. Life 11, 203. HENDRY, L. B., BRANSOME, jr. E. D., HUTSON, M. S. & CAMPBELL, L. K. (1981b). Proc. natn. Acad.

Sci. U.S.A. 78, 7440. JUNGCK, J. R. (1978). J. mol. Evol. 11, 211. MEKLER, L. B. (1969). Biofizika 14, 581. (in Russian) ROOT-BERNSTEIN, R. S. (1982a). J. theor. Biol. 94, 885. ROOT-BERNSTEIN, R. S. (1982b). J. theor. Biol. 94, 895. SH1MIZU, M. (1987). J. Phys. Soc. japan $6, 43. SUKHODOLETS, V. V. (1980). Genetika 16, 759. (in Russian) SUKHODOLETS, V. V. (1982). Genetika 18, 499. (in Russian) SUKHODOLETS, V. V. (1985). Genetika 21, 1589. (in Russian) WEBER, A. L. & LACEY, jr. J. C. (1978). J. molec. Evol. 11, 199.


Recommended