11
Ribosomal protein-sequence block structure suggests complex prokaryotic evolution with implications for the origin of eukaryotes Prashanth Vishwanath a , Paola Favaretto a , Hyman Hartman b , Scott C. Mohr c , Temple F. Smith a, * a BioMolecular Engineering Research Center, Boston University, 36 Cummington St., Boston, MA 02215, USA b Center for Biomedical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139-4307, USA c Department of Chemistry, Boston University, 590 Commonwealth Ave., Boston, MA 02215, USA Received 15 March 2004; revised 29 June 2004 Available online 12 September 2004 Abstract Amino acid sequence alignments of orthologous ribosomal proteins found in Bacteria, Archaea, and Eukaryota display, relative to one another, an unusual segment or block structure, with major evolutionary implications. Within each of the prokaryotic phylo- domains the sequences exhibit substantial similarity, but cross-domain alignments break up into (a) universal blocks (conserved in both phylodomains), (b) bacterial blocks (unalignable with any archaeal counterparts), and (c) archaeal blocks (unalignable with any bacterial counterparts). Sequences of those eukaryotic cytoplasmic riboproteins that have orthologs in both Bacteria and Archaea, exclusively match the archaeal block structure. The distinct blocks do not correlate consistently with any identifiable func- tional or structural feature including RNA and protein contacts. This phylodomain-specific block pattern also exists in a number of other proteins associated with protein synthesis, but not among enzymes of intermediary metabolism. While the universal blocks imply that modern Bacteria and Archaea (as defined by their translational machinery) clearly have had a common ancestor, the phylodomain-specific blocks imply that these two groups derive from single, phylodomain-specific types that came into existence at some point long after that common ancestor. The simplest explanation for this pattern would be a major evolutionary bottleneck, or other scenario that drastically limited the progenitors of modern prokaryotic diversity at a time considerably after the evolution of a fully functional translation apparatus. The vast range of habitats and metabolisms that prokaryotes occupy today would thus reflect divergent evolution after such a restricting event. Interestingly, phylogenetic analysis places the origin of eukaryotes at about the same time and shows a closer relationship of the eukaryotic ribosome-associated proteins to crenarchaeal rather than euryar- chaeal counterparts. Ó 2004 Elsevier Inc. All rights reserved. Keywords: Ribosomal proteins; Ribosome phylogeny; Molecular evolution; Amino acid sequence-alignment blocks; Prokaryotic phylogeny; Eukaryote origin(s) 1. Introduction The ribosome, with its conserved central role in pro- tein synthesis, has long constituted a prime subject for phylogenetic analysis. The study of small-subunit (SSU) ribosomal RNA sequences led Woese to his sem- inal recognition of two extant prokaryotic phylodo- mains, Bacteria and Archaea (Woese et al., 1990). Recent rapid growth of genomic and structural informa- tion on ribosomes (Ramakrishnan and Moore, 2001) has opened the way to broad comparisons, particularly of the ribosomal proteins (Caetano-Anolles, 2002; Le- compte et al., 2002; Mears et al., 2002; Tung et al., 2002; Wuyts et al., 2001). Here we report an unusual multisequence alignment block structure for these 1055-7903/$ - see front matter Ó 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.ympev.2004.07.003 * Corresponding author. Fax: +617 353 7020. E-mail address: [email protected] (T.F. Smith). Molecular Phylogenetics and Evolution 33 (2004) 615–625 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev

Ribosomal protein-sequence block structure suggests complex prokaryotic evolution with implications for the origin of eukaryotes

Embed Size (px)

Citation preview

Page 1: Ribosomal protein-sequence block structure suggests complex prokaryotic evolution with implications for the origin of eukaryotes

MOLECULAR

Molecular Phylogenetics and Evolution 33 (2004) 615–625

PHYLOGENETICSANDEVOLUTION

www.elsevier.com/locate/ympev

Ribosomal protein-sequence block structure suggests complexprokaryotic evolution with implications for the origin of eukaryotes

Prashanth Vishwanatha, Paola Favarettoa, Hyman Hartmanb,Scott C. Mohrc, Temple F. Smitha,*

a BioMolecular Engineering Research Center, Boston University, 36 Cummington St., Boston, MA 02215, USAb Center for Biomedical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139-4307, USA

c Department of Chemistry, Boston University, 590 Commonwealth Ave., Boston, MA 02215, USA

Received 15 March 2004; revised 29 June 2004

Available online 12 September 2004

Abstract

Amino acid sequence alignments of orthologous ribosomal proteins found in Bacteria, Archaea, and Eukaryota display, relative

to one another, an unusual segment or block structure, with major evolutionary implications. Within each of the prokaryotic phylo-

domains the sequences exhibit substantial similarity, but cross-domain alignments break up into (a) universal blocks (conserved in

both phylodomains), (b) bacterial blocks (unalignable with any archaeal counterparts), and (c) archaeal blocks (unalignable with

any bacterial counterparts). Sequences of those eukaryotic cytoplasmic riboproteins that have orthologs in both Bacteria and

Archaea, exclusively match the archaeal block structure. The distinct blocks do not correlate consistently with any identifiable func-

tional or structural feature including RNA and protein contacts. This phylodomain-specific block pattern also exists in a number of

other proteins associated with protein synthesis, but not among enzymes of intermediary metabolism. While the universal blocks

imply that modern Bacteria and Archaea (as defined by their translational machinery) clearly have had a common ancestor, the

phylodomain-specific blocks imply that these two groups derive from single, phylodomain-specific types that came into existence

at some point long after that common ancestor. The simplest explanation for this pattern would be a major evolutionary bottleneck,

or other scenario that drastically limited the progenitors of modern prokaryotic diversity at a time considerably after the evolution

of a fully functional translation apparatus. The vast range of habitats and metabolisms that prokaryotes occupy today would thus

reflect divergent evolution after such a restricting event. Interestingly, phylogenetic analysis places the origin of eukaryotes at about

the same time and shows a closer relationship of the eukaryotic ribosome-associated proteins to crenarchaeal rather than euryar-

chaeal counterparts.

� 2004 Elsevier Inc. All rights reserved.

Keywords: Ribosomal proteins; Ribosome phylogeny; Molecular evolution; Amino acid sequence-alignment blocks; Prokaryotic phylogeny;

Eukaryote origin(s)

1. Introduction

The ribosome, with its conserved central role in pro-

tein synthesis, has long constituted a prime subject for

phylogenetic analysis. The study of small-subunit

(SSU) ribosomal RNA sequences led Woese to his sem-

1055-7903/$ - see front matter � 2004 Elsevier Inc. All rights reserved.

doi:10.1016/j.ympev.2004.07.003

* Corresponding author. Fax: +617 353 7020.

E-mail address: [email protected] (T.F. Smith).

inal recognition of two extant prokaryotic phylodo-

mains, Bacteria and Archaea (Woese et al., 1990).

Recent rapid growth of genomic and structural informa-

tion on ribosomes (Ramakrishnan and Moore, 2001)

has opened the way to broad comparisons, particularly

of the ribosomal proteins (Caetano-Anolles, 2002; Le-compte et al., 2002; Mears et al., 2002; Tung et al.,

2002; Wuyts et al., 2001). Here we report an unusual

multisequence alignment block structure for these

Page 2: Ribosomal protein-sequence block structure suggests complex prokaryotic evolution with implications for the origin of eukaryotes

616 P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625

proteins with important evolutionary implications.

Within each of the two prokaryotic phylodomains, Bac-

teria and Archaea, the ribosomal proteins align over

nearly their entire length with high conservation and sta-

tistical significance. Yet cross-phylodomain alignments

break up into well-defined segments or blocks: thosealignable across both prokaryotic phylodomains, bacte-

rial-specific blocks (unalignable with any archaeal coun-

terpart) and archaeal-specific blocks (unalignable with

any bacterial counterpart). The prokaryotic phylodo-

main-specific blocks fall into two categories: class I,

those that appear as alignment insertions (or deletions);

and class II, those that have similar placement within

the overall protein sequences, but show unique sequencecharacteristics within each phylodomain and no cross-

phylodomain similarity even at the level of general

hydrophobicity profile. The eukaryotic ribosomal pro-

teins that have prokaryotic homologs display the block

structure of the Archaea, but with numerous N- and

C-terminal extensions.

Such taxonomic division-specific alignment segments,

particularly as deletions or insertions, have been used toresolve major evolutionary questions in the past (de

Jong et al., 2003; Gupta and Golding, 1996). The very

basis of molecular taxonomy rests on the extreme likeli-

hood that sets of species sharing multiple common se-

quence deletions/insertions and/or long regions of

uniquely conserved subsequences are monophyletic.

Clearly various forms of horizontal gene transfer may

add complexity to such analyses. However, if all themembers of a phylodomain contain a specific protein

insertion or deletion, independent horizontal transfers

are an unlikely explanation (Nikaido et al., 2001).

The riboprotein alignment blocks identified here are

continuous runs of 8–70 amino acids unique to each

protein. They have very high phylodomain-specific ami-

no acid conservation within either the Bacteria and/or

the Archaea and display characteristic patterns ofhydrophobicity, charge and probable turn induction.

Yet the blocks do not correlate with any clear function,

structure or even in most cases with any phylodomain-

specific RNA contacts or RNA sequence differences.

These blocks are shorter than typical protein domains,

yet longer than the conserved segments associated with

enzyme active-site patterns. In fact, while some meta-

bolic enzymes show taxon-specific sequence deletions/in-sertions, they do not have large class II blocks (data not

shown). Further, these deletions/insertions are either

short and confined to surface loops or are large enough

to encode a biochemical function linked to that of the

rest of the enzyme, as in multifunctional domain pro-

teins. The class II blocks identified in the ribosomal

proteins are unique. They are sequence-distinct and spe-

cific for either the Bacteria or the Archaea, but appear tohave identical functions (see Discussion on S8 and L4).

Finally, as discussed in the Methods section below, the

identified block boundaries are well defined by clear

transitions between regions of alignability across both

prokaryotic phylodomains to those alignable only in

one or the other phylodomain. Similar prokaryotic

phylodomain-specific block structure is observed in

some of the other protein synthesis-associated proteins,including the initiation and elongation factors, and at

least one aminoacyl-tRNA synthetase.

2. Methods

Of the canonical set of 21 SSU ribosomal proteins

identified in the best-studied prokaryote (Escherichiacoli), 15 orthologs appear consistently in the fully se-

quenced genomes of Bacteria, Archaea, and Eukaryota

(Harris et al., 2003). In addition, there are 19 large-sub-

unit (LSU) ribosomal proteins identified as shared

among the three phylodomains (Harris et al., 2003)

(cf. 31 total LSU proteins in E. coli). These, together

with recent detailed structural information (Ban et al.,

2000; Wimberly et al., 2000), form the central basis ofour study. Searches of GenBank using BLAST (Altschul

and Koonin, 1998; Altschul et al., 1997), PsiBLAST

(Altschul and Koonin, 1998), and the profile method

of Das and Smith (Das, 1998; Das and Smith, 2000)

identified representative taxa clearly containing all the

known ribosomal proteins shared between Bacteria

and Archaea. From those we selected the proteins from

13 bacterial and 11 archaeal species, choosing organismsso as to encompass the widest possible taxonomic and

habitat ranges and to include those for which important

structural and functional information is available. The

legend to Fig. 1 lists the species names.

We used ClustalW (Thompson et al., 1994), Psi-

BLAST and Smith-Waterman profile alignments (Das

and Smith, 2000) to generate initial multiple sequence

alignments. These were compared among themselvesand with published alignments of SSU and LSU

ribosomal proteins (Lecompte et al., 2002) (see also

http://igweb.integratedgenomics.com/Bioinformatics/

Nikos/Archaeal_Information/Translation/ABE/rPROT/

rPROT.html). For a number of proteins, the process re-

sulted in more than one potential multisequence align-

ment. And, for nearly all, the overall length of regions

that could be aligned with statistical confidence acrossthe Bacteria and Archaea was severely limited. How-

ever, treating bacterial and archaeal proteins separately

produced nearly full-length, consistent and highly sig-

nificant alignments for all proteins. Using preservation

of patterns of hydrophobicity, polarity, and probable

structural and critical fold-related features, minor

adjustments were made in gap placements including

those within the phylodomain-specific alignments. Thiswas done using the available structural data [Thermus

thermophilus (Wimberly et al., 2000), Haloarcula maris-

Page 3: Ribosomal protein-sequence block structure suggests complex prokaryotic evolution with implications for the origin of eukaryotes

Fig. 1. Multiple sequence alignment for ribosomal protein S8. Labels are Swiss-Prot codes for Archaea (top set): AERPE—Aeropyrum pernix,

ARCFU—Archaeoglobus fulgidus, HALMA—Haloarcula marismortui, METJA—Methanococcus jannischii, METKA—Methanopyrus kandleri,

METMA—Methanosarcina mazei, METTH—Methanobacterium thermoautotrophicum, PYRAB—Pyrococcus abyssi, PYRAE—Pyrobaculum

aerophilum, SULSO—Sulfolobolus solfataricus, THEAC—Thermoplasma acidophilum, and for Bacteria (bottom set): BACSU—Bacillus subtilis,

CAUCR—Caulobacter crescentus, CHLTR—Chlamydia trachomatis, CHLTE—Chlorobium tepidum, ECOLI—Escherichia coli, HELPY—Helico-

bacter pylori, FUSNN—Fusobacterium nucleatum, STRCO—Streptomyces coelicolor, SYNY3—Synechocystis sp. (strain PCC 6803), THEMA—

Thermotoga maritima, THETH—Thermus thermophilus, TREPA—Treponema pallidum. The bars between the two sets mark conserved blocks (cf.

Fig. 2). Open bars mark segments showing cross-domain (universal) conservation, the black bar marks an archaeal-specific segment, and the cross-

hatched bar marks a bacterial-specific segment. Secondary structures are from PDB 1J5E.

P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625 617

mortui (Ban et al., 2000), E. coli (Yusupov et al., 2001)

and Deinococcus radiodurans (Harms et al., 2001; Schl-uenzen et al., 2000) and including the secondary struc-

ture assignments given by Brodersen et al. (2001).

These alignment adjustments were constrained insofar

as possible—given the limited structure information—

by placing alignment gaps only in surface loops. When

possible, residues known to make rRNA (Ban et al.,

2000; Brodersen et al., 2001) contacts in the determined

crystal structures were aligned so as to conserve theamino acid type within each prokaryotic phylodomain

separately. In addition careful examination was made

to identify positions corresponding to significant pro-

tein–protein contacts or involvement with the ribosomal

assembly sequence (Held et al., 1973) and again, if pos-

sible, these were aligned as conserved.

Finally, the proteins separately multialigned within

the Bacteria and Archaea were aligned across the twoprokaryotic phylodomains to optimize cross-phylodo-

main conservation. Again, using the same criteria as

for the phylodomain-specific alignments, but without

readjusting these, we manually shifted a few gaps to at-

tain the best cross-phylodomain alignments. The full

three-phylodomain alignment was performed by align-

ing the cross-phylodomain prokaryotic alignments to

the eukaryotic phylodomain-specific alignments byusing the same approach as before. The minimal-size cri-

terion for a block was eight positions, including at least

two absolutely conserved amino acids. Many of the

block boundaries are obvious in that they correspond

to alignment gaps in one phylodomain or the other, or

are bracketed by phylodomain-specific highly conserved

amino acid clusters. In other cases the boundaries are

more arbitrary, but only to within two or three posi-tions. For these, the boundaries were placed at the edges

of secondary structures or alignment gaps as implied by

the known crystal structures if possible. We searched the

database of all known prokaryotic protein sequences for

potential non-ribosomal homologs to each of the identi-

fied blocks. In no cases were any convincing non-ribo-

somal-protein matches found. Fig. 1 gives an example

of a finished alignment (for S8). Fig. 2 gives a summaryof the proposed alignment-block structures. The com-

plete set of all finished alignments is available on the

internet (http://bmerc-www.bu.edu/RRP/RRP_home).

Finally, we carried out a wide range of phylogenetic

analyses based on these alignments. In particular, the se-

quence variations within numerous combinations of

Page 4: Ribosomal protein-sequence block structure suggests complex prokaryotic evolution with implications for the origin of eukaryotes

Fig. 2. Schematic summary of multiple sequence alignment block patterns of prokaryotic ribosomal proteins: (A) LSU riboproteins. (B) SSU

riboproteins. For each protein designated at the left, the upper line corresponds to the set of archaeal sequences and the lower line to the bacterial set.

Open bars mark segments alignable across both sets, black bars indicate archaeal-specific segments and cross-hatched bars indicate bacterial-specific

segments. Black lines indicate variable length segments for members of that division. Extended blank spaces indicate large alignment gaps. (C)

Preliminary work in aligning riboprotein sequences from Archaea and eukaryotes. The upper line corresponds to the archaeal set and lower line to

the eukaryotic set. Gray bars denote eukaryotic-specific segments; black bars indicate archaeal/eukaryotic-specific segments.

618 P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625

Page 5: Ribosomal protein-sequence block structure suggests complex prokaryotic evolution with implications for the origin of eukaryotes

P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625 619

riboproteins and block types were combined and pro-

cessed using both maximum-likelihood and -parsimony

approaches. Phylogenetic trees were constructed from

the positional variation with maximum parsimony boot-

strapped 300 times (Swofford, 1998) and maximum like-

lihood with quartet puzzling (Jones et al., 1992;Strimmer and von Haeseler, 1997). Eukaryotic parasites

like Encephalitozoon cuniculi and Plasmodium falciparum

were excluded from these analyses. The analyses were

extended to include a number of additional ribosome-as-

sociated proteins as well. The predicted likelihood and/

or bootstrap value of each inferred taxonomic split

was recorded along with all inferred branch lengths.

3. Results

Immediately obvious from inspection of the finished

cross-phylodomain prokaryote riboprotein alignments

is the amalgamated block structure. These blocks fall

into three types: (U) universal blocks (conserved in both

phylodomains), (B) bacterial blocks (conserved in Bacte-ria, but unalignable with any archaeal counterparts),

and (A) archaeal blocks (conserved in Archaea, but una-

lignable with any bacterial counterparts). Types B and A

can be subdivided into the Class I and Class II blocks

(as defined in Introduction). Out of 6597 aligned posi-

tions in the prokaryotic SSU and LSU proteins, 3834

occur in universal blocks. There are 1303 alignable posi-

tions in the riboprotein blocks specific to both Archaeaand eukaryotes, and 787 positions in bacterial-specific

blocks. For comparisons between archaeal and eukary-

otic sequences, we also identified blocks conserved in

eukaryotes, but unalignable with any prokaryotic

counterpart.

Fig. 2 schematically summarizes the block patterns

found in all 15 SSU and 19 LSU proteins with complete

sets of prokaryotic homologs. With the exception of S9,L29 and possibly S10 and L6, and ribosomal proteins

have conserved phylodomain-specific blocks. Proteins

L4, L15, L18, S10 and S19 show similar cross-phylodo-

main regions that are totally unalignable yet occupy

equivalent positions. Proteins S2, S4, S7, S8, S12, S15,

L2, L3, L5, L10, L12, L13 and L14 feature phylodo-

main-specific blocks that are absolutely unalignable

across both prokaryotic phylodomains. Only S5, S9,S11, S13, S17, L7Ae, L22, L29 and L30 do not contain

clear bacterial-specific blocks. While many of the ribo-

somal proteins have N- or C-terminal extensions, some

like S4, S15, L10 and L30 dislpay N- or C-terminal

extensions uniquely characteristic of one entire phylod-

omain. (Fig. 2C) provides a display of the block struc-

ture for a small representative set of eukaryotic

ribosomal proteins, illustrating their archaeal nature.Protein S8 illustrates the general features of many align-

ments (Fig. 1). Note that both the N- and C-terminal

regions have clearly alignable hydrophobicity patterns

and similar conservation of glycine and proline residues

across the two prokaryotic phylodomains. However, in

the central region not even the hydrophobicity pattern

is conserved.

Comparison of the relationships between the posi-tions of cross- and within-phylodomain sequence con-

servation with known SSU rRNA and/or protein

contacts (Brodersen et al., 2001) failed to show any over-

all correlations. In S4 and S7, for example, major RNA

contacts (as seen in the T. thermophilus structure) are in

regions unique to each prokaryotic phylodomain sepa-

rately, though of similar base composition, while in

S3, S9, and S13 cross-phylodomain conserved proteinregions make major contacts to RNA but in regions that

differ in base sequence between the two phylodomains.

With the exception of S12 and all but the very center

of S10, the amino acid sequences in RNA-contacting re-

gions are rather variable even within each phylodomain.

A similar situation exists for protein–protein contacts.

For example, the small region in the very center of

S10 which makes contact with S14 is particularly inter-esting in that the pattern PXGXG is absolutely con-

served in Archaea, but only the proline is (partly)

conserved in Bacteria.

Like many other riboproteins, L4 contains an ex-

tended loop region (positions 43–74 in E. coli) that pen-

etrates deep into the RNA core structure (Ban et al.,

2000; Jenni and Ban, 2003; Yusupov et al., 2001) and

can be assumed to contribute to RNA folding and/orstabilization. This L4 loop reaches through the structure

towards the back of the peptidyl transferase active site

(Jenni and Ban, 2003; Nissen et al., 2000). In both pro-

karyotic phylodomains the three-dimensional structure

shows a similar extended loop with extensive RNA con-

tacts, but for 23 residues in this loop the bacterial and

archaeal sequences have no meaningful alignment simi-

larity. In fact even the structural details and sequencelengths are different (Fig. 3). Only the GXG subse-

quence that comprises part of the inner surface of the

polypeptide exit tunnel (Jenni and Ban, 2003) is found

in both sequences, but not at equivalent alignment posi-

tions. However, L4 aligns very well across both phylod-

omains over nearly its entire remaining length (cf. Fig.

2A). Such cross-phylodomain distinctive sequence con-

text of an apparent common feature is seen in thePXGXG region of the archaeal S10, which though of

similar length and alignment position has no sequence

similarity to bacterial S10 in the same region. In that

case no structural information is available for the archa-

eal homolog and thus whether or not there is some

structural equivalence as in L4, is not yet known.

To explore phylogenetic patterns, we investigated a

variety of concatenated aligned informative positionsets: individual proteins, all SSU universal blocks, all

LSU universal blocks, as well as the archaeal-specific

Page 6: Ribosomal protein-sequence block structure suggests complex prokaryotic evolution with implications for the origin of eukaryotes

Fig. 3. Comparative sequence and structural conservation of the extended loop of L4 in Archaea (left) and Bacteria (right). (A) Sequence alignment

of the relevant archaeal segment with the bacterial segment. The sequence region shown for Archaea (Haloarcula marismortui numbering) is for

residues 38–102, and for Bacteria (Deinococcus radiodurans numbering) residues 35-95. Arrows mark the GXG segment discussed in the text. (B)

Structure of the L4 loop in H. marismortui (left) and D. radiodurans (right). Red balls represent the glycine residues of the GXG segment.

620 P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625

blocks and bacterial-specific blocks from the SSU and

LSU proteins separately. In addition, data was obtained

from multialigned protein synthesis, initiation and elon-

gation factors as well as the signal recognition complex

proteins. Fig. 4 presents a phylogenetic tree built from

the multiple alignment of the SSU protein universal

blocks. These generated identical core branch topology[Bacteria, (Euryarchaeota, (Crenarchaeota, Eukaryo-

ta))] for trees made by the concatenation of the LSU

protein universal blocks, and by the multialignment of

the elongation, EF2/EF, and initiation, IF2P/IF2, fac-

tors common to all three phylodomains (data not

shown). Among all of the maximum likelihood gener-

ated trees this core branching was observed with the

exception of the branch between the Euryarchaeotaand the Crenarchaeota, which was of zero length in

one case. Addition of the Nanoarchaea (Huber et al.,

2002) confuses this branch as well. While the maximum

parsimony bootstrapped trees also produced this same

branching topology as a consensus, there was a bit more

variation. The universal-block trees for the SSU and

LSU and other proteins clearly resolve the four phyloge-

netic taxa with bootstrap values from 76 to 100% andlikelihoods of 90–100%, but there is little consistent res-

olution within the bacterial phylodomain. Such star-like

patterns with limited phylogenetic resolving power have

been seen before among the bacterial rRNAs (Pace,

1997). On the other hand, there is generally a clean divi-

sion between the Crenarchaeota and the Euryarchaeota.

Whether or not there is sufficient information in these

sequences to provide more details between the two divi-

sions of Archaea is as yet unclear given the fewer com-

pleted archaeal genomes as compared to the Bacteria.

The average block divergences, as defined by the im-

plied maximum-likelihood branch lengths in residuereplacements per thousand are given in the legend to

Fig. 4 and the number of absolutely conserved positions

per thousand is presented in Table 1 and Fig. 5. The

LSU proteins have a higher implied variation than the

SSU proteins. For the LSU proteins there is also a high-

er implied extent of multiple replacements in the archa-

eal-eukaryotic clade, with the greatest difference leading

to the eukaryotes. Among the prokaryotic-specificblocks there is also a greater implied variation among

the Archaea than among Bacteria as inferred from the

archaeal-specific blocks. In the eukaryotes, the total var-

iation in non-conserved positions is even more pro-

nounced though, at the same time, there are more

absolutely conserved positions as compared to the Bac-

teria and the Archaea.

4. Discussion

Clearly the large number of aligned positions that

show either absolute identity or conservation of hydro-

Page 7: Ribosomal protein-sequence block structure suggests complex prokaryotic evolution with implications for the origin of eukaryotes

Fig. 4. Maximum-likelihood phylogenetic tree for the concatenated,

universal small-subunit block sequences. This is the same as that for

the elongation factors EF2/EFG and initiation factors IF2P/IF2 as

well as for the signal recognition complex. The tree was constructed

using the TREE-PUZZLE tool on the concatenated multialignment of

all the universal blocks in the SSU proteins. The tree search procedure

was Quartet Puzzling with 1000 steps and the JTT (Jones et al., 1992)

model of substitution. The topology of this tree is representative of the

topology of all the other trees built using concatenated universal

blocks in LSU ribosomal proteins, elongation/initiation factors, signal

recognition complex proteins or universal blocks in single proteins that

have a sufficient degree of variability. Average branch lengths for the

labeled branches considering SSU, LSU and factor trees (expressed in

terms of maximum-likelihood substitutions per residue) are (a)

0.667 ± 0.19 with 93% confidence, (b) 0.09 ± 0.1 with 100% confidence,

and (c) 1.03 ± 0.2 with 100% confidence. Species labels are Swiss-Prot

codes as defined in the legend to Fig. 1 with the addition of eukaryotes:

ARATH—Arabidopsis thaliana, CAEEL—Caenorhabditis elegans,

DROME—Drosophila melanogaster, HUMAN—Homo sapiens,

ORYSA—Oryza sativa, and SCHPO—Schizosaccharomyces pombe.

P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625 621

phobicity profile across both prokaryotic phylodomains

implies common ancestry for all homologous prokary-otic riboproteins. It also implies that they were carrying

out their ribosomal functions long before the bacterial/

archaeal split. Given the large number of distinct pro-

karyotic phylodomain-specific ribosomal protein fea-

tures, however, one must assume that considerable

time elapsed after that split and before the divergence

that produced their extant forms. Sequences within each

of the three block categories show maximum divergencein the universal blocks and approximately one-third that

much divergence within the two sets of phylodomain-

specific blocks. This implies that the most recent com-

mon ancestor (cenancestor) of all prokaryotes existed

much earlier than the separate bacterial and archaeal

cenancestors, and that these may have coexisted at

about the same time.

There are at least four potential explanations forthese observations: a purely statistical effect resulting

in both prokaryotic phylodomains having similar coa-

lescence times; massive horizontal gene transfer within

each phylodomain; explicit phylodomain cenancestors

having clear selective advantages; and finally a physical

bottleneck well after the last common ancestor of the

Bacteria and Archaea. Coalescence theory (Felsenstein,

2003; Zhaxybayeva and Gogarten, 2004) concludes that

in a large population of near-constant size, there will be

a single cenancestor of all extant members of that pop-

ulation that existed closer to the present day than the

true ancestor of the species. Extant prokaryotes, how-ever, have occupied highly dispersed and diverse niches

since the bacterial/archaeal split. This makes the likeli-

hood of coalescence back to a single chance last com-

mon ancestor for each phylodomain very small. The

derivation of the phylodomain–specific blocks, with

their well-defined boundaries, from horizontal gene

transfer also is unlikely as it would have to involve con-

tinuous events within each separate phylodomain glob-ally across diverse niches. The hypothesis that the last

two cenancestors had certain selective phylodomain-spe-

cific advantages also seems improbable, but cannot be

ruled out (see later discussion on blocks in ribosomal

proteins S8 and L4). Given the data and the similar se-

quence variation seen in the phylodomain-specific

blocks, a physical bottleneck seems to be the simplest

explanation for our results. It should be noted that aphysical bottleneck event is phylogenetically equivalent

to two distinct cenancestors having clear and compara-

ble selective advantages.

A survey of metabolic enzymes from prokaryotes

does not reveal the type of bacterial/archaeal phylodo-

main-specific block structure we have observed among

the riboproteins, particularly class II blocks that have

similar placement within the overall protein sequencesbut no cross-phylodomain similarity. This could come

about as the result of numerous and possibly parallel

adaptations to new habitats following the proposed bot-

tleneck. Clearly one would expect many metabolic adap-

tations to habitats with novel environmental

parameters, including very different metabolites. Also,

metabolic enzymes—in many cases consisting of a single

or only a few structural domains—may have evolved tooptimum forms long before the ribosome system. Fur-

thermore, horizontal transfers would be vastly easier

in the case of such proteins. Interestingly, however, the

block structure—though it was not so characterized—

does appear to occur in one set of published partial ami-

no acid sequence alignments from prokaryotes (Eichler,

2003). The proteins in question (SecDF) participate in

translocation of secreted proteins across the bacterialor archaeal plasma membrane, and as such are indirectly

ribosome-associated (Pohlschroder et al., 2004). They

have no known enzymatic activity.

The idea that there was some strong selective/compet-

itive advantage for the two prokaryotic phylodomain-

specific blocks requires more detailed investigation.

For example, the central blocks in S8 and L4, if they

are truly homologs (as their similar structural place-ments imply), would seem to require strong phylodo-

main-specific selective events along the line of descent,

Page 8: Ribosomal protein-sequence block structure suggests complex prokaryotic evolution with implications for the origin of eukaryotes

Table 1

Frequency (&) of absolutely conserved positions in aligned blocks for universally conserved riboproteins

Protein ABE EA AB EB E A B

S2P 79 56 101 17 112 96 157

S3P 77 64 90 13 134 25 121

S4P 49 118 88 29 206 49 79

S5P 76 91 104 35 229 76 118

S7P 53 159 68 53 174 114 106

S8P 95 52 126 10 201 43 53

S9P 57 72 79 64 100 49 50

S10P 54 44 120 11 87 162 152

S11P 108 134 150 50 266 83 75

S12P 116 132 198 82 166 83 208

S13P 100 17 133 42 224 58 125

S14P 122 82 184 0 143 81 81

S15P 34 68 34 51 186 51 152

S17P 49 61 61 73 244 24 86

S19P 123 62 173 37 99 12 99

L1P 41 10 86 5 117 41 127

L2P 108 55 145 31 186 35 149

L3P 77 90 83 32 224 103 77

L4P 35 51 45 31 176 76 4

L5P 55 61 89 27 289 49 172

L6P 42 47 53 11 100 58 120

L10P 30 10 51 0 112 30 10

L11P 63 64 85 36 274 6 168

L12P 37 49 74 0 25 186 198

L13P 83 37 120 19 120 65 130

L14P 156 65 189 57 173 49 147

L15P 40 81 56 8 210 32 73

L18P 54 126 54 9 189 18 135

L22P 27 82 36 9 182 64 55

L23P 52 52 91 26 169 78 39

L24P 101 13 101 13 76 51 76

L29P 18 17 18 17 141 105 35

SSU (Avg.) 79 81 34 38 173 69 112

LSU (Avg.) 61 53 22 20 170 58 101

IF2 137 73 4 35 171 84 87

EF2 134 70 17 33 181 76 157

See legend to Fig. 5 for definition of block categories.

622 P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625

given that there has been very minimal divergence withinthe extant representatives of the two prokaryotic phyl-

odomains. Why might phylodomain-specific selection

act in such a block-like fashion on the L4 loop region

when today there appear to be no ribosomal function

or structural characteristics to distinguish its role in

the two groups of organisms? In fact there is some

experimental evidence that the ribosome can still func-

tion with this L4 loop deleted (Zengel et al., 2003).If one assumes that the phylogenetic mean-likelihood

branch lengths estimated from the ribosomal and associ-

ated proteins are even crudely proportional to time

(Harris et al., 2003), a bottleneck event would have oc-

curred at somewhat more than half the distance back

to the last common prokaryotic ancestor, or about

two billion years ago. In addition, the failure of the se-

quence variation information to resolve the bacterialand archaeal phylogenies suggests an ancient event (or

events) followed by rapid niche divergence (Hartman,

2002). Gogarten-Boekels et al. (1995) have proposed asimilar extinction-based explanation for the absence of

deep-branching lineages in other molecular phylogenies.

Candidates for such an event would be the proposed

Paleoproterozoic ‘‘snowball earth’’ (Kirschvink et al.,

2000); major atmospheric change, resulting from the ra-

pid introduction of oxygen, as suggested recently by

Hedges et al. (2001b); and surely there are others.

A possible bottleneck of this age is interesting giventhat 2.2–2.5 Ga is the approximate time thought to cor-

respond to the rise of the first true eukaryotes (Hedges et

al., 2001a)—an event thought to coincide with signifi-

cant increase in atmospheric oxygen levels (Bekker

et al., 2004). This apparent correlation is most intriguing

given the relationship between the eukaryotes and the

two subclasses of Archaea. All of the eukaryotic ribo-

somal proteins possessed in common with the prokary-otes have the archaeal-specific block structures while

containing no bacterial-specific blocks. There are five

Page 9: Ribosomal protein-sequence block structure suggests complex prokaryotic evolution with implications for the origin of eukaryotes

Fig. 5. Venn diagram representing the frequency distribution (&) of absolutely conserved amino acid residue positions in representative alignments

of homologous ribosomal proteins from all three phylodomains. (a) Concatenated SSU proteins. (b) Concatenated LSU proteins. In both cases, the

three-circle diagram reports the number of absolutely conserved residues per 1000 residues in each category for positions in aligned universal (three

domain) blocks, the single circle reports on the bacterial-specific blocks, and the double circle reports on the blocks specific to Archaea and

eukaryotes. (The numbers for purely archaeal-specific and eukaryote-specific blocks were too small to be statistically meaningful, so these categories

have been omitted.) Upper-case letters within the diagrams have the following meanings: E—positions absolutely conserved only within the

eukaryotic sequences, A—positions absolutely conserved only within the archaeal sequences, and B—positions absolutely conserved only within the

bacterial sequences. Combinations of these have the obvious meanings, e.g., EA refers to positions absolutely conserved within eukaryotic and

archaeal, but not bacterial sequences. As an illustration, the total frequency of absolutely conserved positions among the archaeal LSU riboproteins

(in A + AB + EA + ABE blocks) is 58 + 22 + 53 + 61 = 194& (corresponding to 420 out of 2163 positions).

P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625 623

riboproteins, S25e, S26e, S30e, L13e and L38e, com-

mon only to the crenarchaea and the eukaryotes. In

addition the eukaryotic ribosome-associated proteins,

such as the initiation, termination and elongation fac-

tors, are most similar to those in the Crenarchaeota.

This contrasts with the known nuclear histone struc-

tures (Malik and Henikoff, 2003) that are common tothe Euryarchaeota and the Eukaryota, but not found

in either Bacteria or the Crenarchaeota. If one assumes

that the major proto-eukaryotic adaptation was the

ability to phagocytose, then the inclusion of the crenar-

chaeal protein-synthesizing system and the euryarchaeal

chromosomal packing system along with the later inclu-

sion of mitochondria and chloroplasts may all have re-

sulted from similar events. Given the accepted bacterialendosymbiotic origin of mitochondria and chloroplasts,

whose ribosomal protein genes were transferred to the

eukaryotic nucleus, one can ask why there has been

no recombination between those genes and the genes

of apparent crenarchaeal origin that encode nuclear

eukaryotic riboproteins. Thus while our analysis here

clearly lays out the relationships among many parts

of the protein synthesis machinery of these four major

taxa (Bacteria, Euryarchaeota, Crenarchaeota, and Eu-

karyota), it does not resolve the very complex origin of

modern eukaryotes. How the times of bacterial and

archaeal divergence and of the origin of the eukaryotic

ribosome are related remains unknown (Feng et al.,1997; Gupta and Golding, 1996; Hedges and Kumar,

2003).

The firm conclusion that emerges from our analysis is

that the wide range of habitats occupied by (and meta-

bolic systems contained in) today�s Bacteria and Ar-

chaea do not necessarily represent those occupied by

their last common ancestors—neither the last common

ancestor of all extant prokaryotes nor those of the twophylodomains. This is true since all of those habitats

and most of the metabolic systems necessarily represent

re-adaptations following diversification some time after

the complete fixation of the riboprotein phylodomain-

specific block structures. Whether this was the result

of a long selective reduction, statistical coalescence effect

Page 10: Ribosomal protein-sequence block structure suggests complex prokaryotic evolution with implications for the origin of eukaryotes

624 P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625

or a classical bottleneck does not seem to change this

conclusion.

Note added in proof

Klein et al. (2004) recently identified six large subunit

proteins making equivalent ribosomal contacts in bacte-

ria and archaea yet showing no sequence or significant

structural similarity. Thus like the L4 loop block they

are phylodomain-specific, suggesting there may have

been many alternate prokaryotic rRNA protein solu-

tions, only two sets of which served the bottleneck.

Acknowledgments

This work was supported by NSF Grant No.

00205512. We thank Yuki Moriya of Kyoto University

for preparing the sequence alignments used to construct

the elongation and initiation factor tree.

References

Altschul, S.F., Koonin, E.V., 1998. Iterated profile searches with PSI-

BLAST—a tool for discovery in protein databases. Trends

Biochem. Sci. 23, 444–447.

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z.,

Miller, W., Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST:

a new generation of protein database search programs. Nucleic

Acids Res. 25, 3389–3402.

Ban, N., Nissen, P., Hansen, J., Moore, P.B., Steitz, T.A., 2000. The

complete atomic structure of the large ribosomal subunit at 2.4 A

resolution. Science 289, 905–920.

Bekker, A., Holland, H.D., Wang, P.L., Rumble III, D., Stein, H.J.,

Hannah, J.L., Coetzee, L.L., Beukes, N.J., 2004. Dating the rise of

atmospheric oxygen. Nature 427, 117–120.

Brodersen, D.E., Clemons Jr., W.E., Carter, A.P., Wimberly, B.T.,

Ramakrishnan, V., 2001. Crystal structure of the 30S ribosomal

subunit from Thermus thermophilus: structure of the proteins and

their interactions with 16S RNA. J. Mol. Biol. 316, 725–768.

Caetano-Anolles, G., 2002. Tracing the evolution of RNA structure in

ribosomes. Nucleic Acids Res. 30, 2575–2587.

Das, S., 1998. Protein function identification using prior-based profiles

to represent protein domains biomolecular engineering. Boston

University, Boston.

Das, S., Smith, T.F., 2000. Identifying nature�s protein lego set. In:

Kim, P.S. (Ed.), Advances in Protein Chemistry. Academic Press,

San Diego, pp. 159–183.

de Jong, W.W., van Dijk, M.A., Poux, C., Kappe, G., van Rheede, T.,

Madsen, O., 2003. Indels in protein-coding sequences of Euarch-

ontoglires constrain the rooting of the eutherian tree. Mol.

Phylogenet. Evol. 28, 328–340.

Eichler, J., 2003. Evolution of the prokaryotic protein translocation

complex: a comparison of archaeal and bacterial versions of

SecDF. Mol. Phylogenet. Evol. 27, 504–509.

Felsenstein, J., 2003. Coalescent Trees Inferring Phylogenies. Sinauer

Associates, Sunderland, MA (Chapter 26).

Feng, D.F., Cho, G., Doolittle, R.F., 1997. Determining divergence

times with a protein clock: update and reevaluation. Proc. Natl.

Acad. Sci. USA 94, 13028–13033.

Gogarten-Boekels, M., Hilario, E., Gogarten, J., 1995. The effects of

heavy meteorite bombardment on the early evolution—the emer-

gence of the three domains of life. Origins Life Evolution Biosphere

25, 251–264.

Gupta, R.S., Golding, G.B., 1996. The origin of the eukaryotic cell.

Trends Biochem. Sci. 21, 166–171.

Harms, J., Schluenzen, F., Zarivach, R., Bashan, A., Gat, S., Agmon,

I., Bartels, H., Franceschi, F., Yonath, A., 2001. High resolution

structure of the large ribosomal subunit from a mesophilic

eubacterium. Cell 107, 679–688.

Harris, J.K., Kelley, S.T., Spiegelman, G.B., Pace, N.R., 2003. The

genetic core of the universal ancestor. Genome Res. 13, 407–412.

Hartman, H., 2002. Macroevolution, catastrophe and horizontal

transfer. In: Kado, C.I. (Ed.), Horizontal Gene Transfer. Chapman

& Hall, London, pp. 411–415.

Hedges, S.B., Chen, H., Kumar, S., Wang, D.Y., Thompson, A.S.,

Watanabe, H., 2001a. A genomic timescale for the origin of

eukaryotes. BMC Evol. Biol. 1, 4.

Hedges, S.B., Chen, H., Kumar, S., Wang, D.Y.-C., Thompson, A.S.,

Watanabe, H., 2001. A genomic timescale for the origin of

eukaryotes. BMC Evol. Biol. Available from: <http://www.bio-

medcentral.com/1471-2148/1/4>.

Hedges, S.B., Kumar, S., 2003. Genomic clocks and evolutionary

timescales. Trends Genet. 19, 200–206.

Held, W.A., Mizushima, S., Nomura, M., 1973. Reconstitution of

Escherichia coli 30 S ribosomal subunits from purified molecular

components. J. Biol. Chem. 248, 5720–5730.

Huber, H., Hohn, M.J., Rachel, R., Fuchs, T., Wimmer, V.C., Stetter,

K.O., 2002. A new phylum of Archaea represented by a nanosized

hyperthermophilic symbiont. Nature 417, 63–67.

Jenni, S., Ban, N., 2003. The chemistry of protein synthesis and voyage

through the ribosomal tunnel. Curr. Opin. Struct. Biol. 13, 533.

Jones, D.T., Taylor, W.R., Thornton, J.M., 1992. The rapid genera-

tion of mutation data matrices from protein sequences. Comput.

Appl. Biosci. 8, 275–282.

Kirschvink, J.L., Gaidos, E.J., Bertani, L.E., Beukes, N.J., Gutzmer,

J., Maepa, L.N., Steinberger, R.E., 2000. Paleoproterozoic snow-

ball earth: extreme climatic and geochemical global change and its

biological consequences. Proc. Natl. Acad. Sci. USA 97, 1400–

1405.

Klein, D.J., Moore, P.B., Steitz, T.A., 2004. The roles of ribosomal

proteins in the structure assembly, and evolution of the large

ribosomal subunit. J. Mol. Biol. 340, 141–177.

Lecompte, O., Ripp, R., Thierry, J.-C., Moras, D., Poch, P., 2002.

Comparative analysis of ribosomal proteins in complete genomes:

an example of reductive evolution at the domain scale. Nucleic

Acids Res. 30, 5382–5390.

Malik, H.S., Henikoff, S., 2003. Phylogenomics of the nucleosome.

Nat. Struct. Biol. 10, 882–891.

Mears, J.A., Cannone, J.J., Stagg, S.M., Gutell, R.R., Agrawal, R.K.,

Harvey, S.C., 2002. Modeling a minimal ribosome based on

comparative sequence analysis. J. Mol. Biol. 321, 215–234.

Nikaido, M., Matsuno, F., Hamilton, H., Brownell Jr., R.L., Cao, Y.,

Ding, W., Zuoyan, Z., Shedlock, A.M., Fordyce, R.E., Hasegawa,

M., Okada, N., 2001. Retroposon analysis of major cetacean

lineages: the monophyly of toothed whales and the paraphyly of

river dolphins. Proc. Natl. Acad. Sci. USA 98, 7384–7389.

Nissen, P., Hansen, J., Ban, N., Moore, P.B., Steitz, T.A., 2000. The

structural basis of ribosome activity in peptide bond synthesis.

Science 289, 920–930.

Pace, N.R., 1997. A molecular view of microbial diversity and the

biosphere. Science 276, 734–740.

Pohlschroder, M., Dilks, K., Hand, N.J., Wesley Rose, R., 2004.

Translocation of proteins across archaeal cytoplasmic membranes.

FEMS Microbiol. Rev. 28, 3–24.

Ramakrishnan, V., Moore, P.B., 2001. Atomic structures at last: the

ribosome in 2000. Curr. Opin. Struct. Biol. 11, 144–154.

Page 11: Ribosomal protein-sequence block structure suggests complex prokaryotic evolution with implications for the origin of eukaryotes

P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625 625

Schluenzen, F., Tocilj, A., Zarivach, R., Harms, J., Gluehmann, M.,

Janell, D., Bashan, A., Bartels, H., Agmon, I., Franceschi, F.,

Yonath, A., 2000. Structure of functionally activated small

ribosomal subunit at 3.3 angstroms resolution. Cell 102, 615–623.

Strimmer, K., von Haeseler, A., 1997. Likelihood-mapping: a simple

method to visualize phylogenetic content of a sequence alignment.

Proc. Natl. Acad. Sci. USA 94, 6815–6819.

Swofford, D.L., 1998. PAUP*. Phylogenetic Analysis Using Parsimony

(* and Other Methods). Sinauer Associates, Sunderland, MA.

Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. Clustal W:

improving the sensitivity of progressive multiple sequence align-

ment through sequence weighting, position-specific gap penalties

and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.

Tung, C.-S., Joseph, S., Sanbonmatsu, K.Y., 2002. All-atom homology

model of the Escherichia coli 30S ribosomal subunit. Nat. Struct.

Biol. 9, 750–755.

Wimberly, B.T., Brodersen, D.E., Clemons Jr., W.M., Morgan-

Warren, R.J., Carter, A.P., Vonrhein, C., Hartsch, T., Ramakrish-

nan, V., . Structure of the 30S ribosomal subunit. Nature 407, 327–

339.

Woese, C.R., Kandler, O., Wheelis, M.L., 1990. Towards a natural

system of organisms: proposal for the domains Archaea, Bacteria,

and Eucarya. Proc. Natl. Acad. Sci. USA 87, 4576–4579.

Wuyts, J., van de Peer, Y., de Wachter, R., 2001. Distribution of

substitution rates and location of insertion sites in the tertiary

structure of ribosomal RNA. Nucleic Acids Res. 29, 5017–5028.

Yusupov, M.M., Yusupova, G.Z., Baucom, A., Lieberman, K.,

Earnest, T.N., Cate, J.H.D., Noller, H.F., 2001. Crystal structure

of the ribosome at 5.5 A resolution. Science 292, 883–896.

Zengel, J.M., Jerauld, A., Walker, A., Wahl, M.C., Lindahl, L., 2003.

The extended loops of ribosomal proteins L4 and L22 are not

required for ribosome assembly or L4-mediated autogenous

control. RNA 9, 1188–1197.

Zhaxybayeva, O., Gogarten, J., 2004. Cladogenesis, coalescence and

the evolution of the three domains of life. Trends Genet. 20, 182–

187.