Upload
prashanth-vishwanath
View
218
Download
2
Embed Size (px)
Citation preview
MOLECULAR
Molecular Phylogenetics and Evolution 33 (2004) 615–625
PHYLOGENETICSANDEVOLUTION
www.elsevier.com/locate/ympev
Ribosomal protein-sequence block structure suggests complexprokaryotic evolution with implications for the origin of eukaryotes
Prashanth Vishwanatha, Paola Favarettoa, Hyman Hartmanb,Scott C. Mohrc, Temple F. Smitha,*
a BioMolecular Engineering Research Center, Boston University, 36 Cummington St., Boston, MA 02215, USAb Center for Biomedical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139-4307, USA
c Department of Chemistry, Boston University, 590 Commonwealth Ave., Boston, MA 02215, USA
Received 15 March 2004; revised 29 June 2004
Available online 12 September 2004
Abstract
Amino acid sequence alignments of orthologous ribosomal proteins found in Bacteria, Archaea, and Eukaryota display, relative
to one another, an unusual segment or block structure, with major evolutionary implications. Within each of the prokaryotic phylo-
domains the sequences exhibit substantial similarity, but cross-domain alignments break up into (a) universal blocks (conserved in
both phylodomains), (b) bacterial blocks (unalignable with any archaeal counterparts), and (c) archaeal blocks (unalignable with
any bacterial counterparts). Sequences of those eukaryotic cytoplasmic riboproteins that have orthologs in both Bacteria and
Archaea, exclusively match the archaeal block structure. The distinct blocks do not correlate consistently with any identifiable func-
tional or structural feature including RNA and protein contacts. This phylodomain-specific block pattern also exists in a number of
other proteins associated with protein synthesis, but not among enzymes of intermediary metabolism. While the universal blocks
imply that modern Bacteria and Archaea (as defined by their translational machinery) clearly have had a common ancestor, the
phylodomain-specific blocks imply that these two groups derive from single, phylodomain-specific types that came into existence
at some point long after that common ancestor. The simplest explanation for this pattern would be a major evolutionary bottleneck,
or other scenario that drastically limited the progenitors of modern prokaryotic diversity at a time considerably after the evolution
of a fully functional translation apparatus. The vast range of habitats and metabolisms that prokaryotes occupy today would thus
reflect divergent evolution after such a restricting event. Interestingly, phylogenetic analysis places the origin of eukaryotes at about
the same time and shows a closer relationship of the eukaryotic ribosome-associated proteins to crenarchaeal rather than euryar-
chaeal counterparts.
� 2004 Elsevier Inc. All rights reserved.
Keywords: Ribosomal proteins; Ribosome phylogeny; Molecular evolution; Amino acid sequence-alignment blocks; Prokaryotic phylogeny;
Eukaryote origin(s)
1. Introduction
The ribosome, with its conserved central role in pro-
tein synthesis, has long constituted a prime subject for
phylogenetic analysis. The study of small-subunit
(SSU) ribosomal RNA sequences led Woese to his sem-
1055-7903/$ - see front matter � 2004 Elsevier Inc. All rights reserved.
doi:10.1016/j.ympev.2004.07.003
* Corresponding author. Fax: +617 353 7020.
E-mail address: [email protected] (T.F. Smith).
inal recognition of two extant prokaryotic phylodo-
mains, Bacteria and Archaea (Woese et al., 1990).
Recent rapid growth of genomic and structural informa-
tion on ribosomes (Ramakrishnan and Moore, 2001)
has opened the way to broad comparisons, particularly
of the ribosomal proteins (Caetano-Anolles, 2002; Le-compte et al., 2002; Mears et al., 2002; Tung et al.,
2002; Wuyts et al., 2001). Here we report an unusual
multisequence alignment block structure for these
616 P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625
proteins with important evolutionary implications.
Within each of the two prokaryotic phylodomains, Bac-
teria and Archaea, the ribosomal proteins align over
nearly their entire length with high conservation and sta-
tistical significance. Yet cross-phylodomain alignments
break up into well-defined segments or blocks: thosealignable across both prokaryotic phylodomains, bacte-
rial-specific blocks (unalignable with any archaeal coun-
terpart) and archaeal-specific blocks (unalignable with
any bacterial counterpart). The prokaryotic phylodo-
main-specific blocks fall into two categories: class I,
those that appear as alignment insertions (or deletions);
and class II, those that have similar placement within
the overall protein sequences, but show unique sequencecharacteristics within each phylodomain and no cross-
phylodomain similarity even at the level of general
hydrophobicity profile. The eukaryotic ribosomal pro-
teins that have prokaryotic homologs display the block
structure of the Archaea, but with numerous N- and
C-terminal extensions.
Such taxonomic division-specific alignment segments,
particularly as deletions or insertions, have been used toresolve major evolutionary questions in the past (de
Jong et al., 2003; Gupta and Golding, 1996). The very
basis of molecular taxonomy rests on the extreme likeli-
hood that sets of species sharing multiple common se-
quence deletions/insertions and/or long regions of
uniquely conserved subsequences are monophyletic.
Clearly various forms of horizontal gene transfer may
add complexity to such analyses. However, if all themembers of a phylodomain contain a specific protein
insertion or deletion, independent horizontal transfers
are an unlikely explanation (Nikaido et al., 2001).
The riboprotein alignment blocks identified here are
continuous runs of 8–70 amino acids unique to each
protein. They have very high phylodomain-specific ami-
no acid conservation within either the Bacteria and/or
the Archaea and display characteristic patterns ofhydrophobicity, charge and probable turn induction.
Yet the blocks do not correlate with any clear function,
structure or even in most cases with any phylodomain-
specific RNA contacts or RNA sequence differences.
These blocks are shorter than typical protein domains,
yet longer than the conserved segments associated with
enzyme active-site patterns. In fact, while some meta-
bolic enzymes show taxon-specific sequence deletions/in-sertions, they do not have large class II blocks (data not
shown). Further, these deletions/insertions are either
short and confined to surface loops or are large enough
to encode a biochemical function linked to that of the
rest of the enzyme, as in multifunctional domain pro-
teins. The class II blocks identified in the ribosomal
proteins are unique. They are sequence-distinct and spe-
cific for either the Bacteria or the Archaea, but appear tohave identical functions (see Discussion on S8 and L4).
Finally, as discussed in the Methods section below, the
identified block boundaries are well defined by clear
transitions between regions of alignability across both
prokaryotic phylodomains to those alignable only in
one or the other phylodomain. Similar prokaryotic
phylodomain-specific block structure is observed in
some of the other protein synthesis-associated proteins,including the initiation and elongation factors, and at
least one aminoacyl-tRNA synthetase.
2. Methods
Of the canonical set of 21 SSU ribosomal proteins
identified in the best-studied prokaryote (Escherichiacoli), 15 orthologs appear consistently in the fully se-
quenced genomes of Bacteria, Archaea, and Eukaryota
(Harris et al., 2003). In addition, there are 19 large-sub-
unit (LSU) ribosomal proteins identified as shared
among the three phylodomains (Harris et al., 2003)
(cf. 31 total LSU proteins in E. coli). These, together
with recent detailed structural information (Ban et al.,
2000; Wimberly et al., 2000), form the central basis ofour study. Searches of GenBank using BLAST (Altschul
and Koonin, 1998; Altschul et al., 1997), PsiBLAST
(Altschul and Koonin, 1998), and the profile method
of Das and Smith (Das, 1998; Das and Smith, 2000)
identified representative taxa clearly containing all the
known ribosomal proteins shared between Bacteria
and Archaea. From those we selected the proteins from
13 bacterial and 11 archaeal species, choosing organismsso as to encompass the widest possible taxonomic and
habitat ranges and to include those for which important
structural and functional information is available. The
legend to Fig. 1 lists the species names.
We used ClustalW (Thompson et al., 1994), Psi-
BLAST and Smith-Waterman profile alignments (Das
and Smith, 2000) to generate initial multiple sequence
alignments. These were compared among themselvesand with published alignments of SSU and LSU
ribosomal proteins (Lecompte et al., 2002) (see also
http://igweb.integratedgenomics.com/Bioinformatics/
Nikos/Archaeal_Information/Translation/ABE/rPROT/
rPROT.html). For a number of proteins, the process re-
sulted in more than one potential multisequence align-
ment. And, for nearly all, the overall length of regions
that could be aligned with statistical confidence acrossthe Bacteria and Archaea was severely limited. How-
ever, treating bacterial and archaeal proteins separately
produced nearly full-length, consistent and highly sig-
nificant alignments for all proteins. Using preservation
of patterns of hydrophobicity, polarity, and probable
structural and critical fold-related features, minor
adjustments were made in gap placements including
those within the phylodomain-specific alignments. Thiswas done using the available structural data [Thermus
thermophilus (Wimberly et al., 2000), Haloarcula maris-
Fig. 1. Multiple sequence alignment for ribosomal protein S8. Labels are Swiss-Prot codes for Archaea (top set): AERPE—Aeropyrum pernix,
ARCFU—Archaeoglobus fulgidus, HALMA—Haloarcula marismortui, METJA—Methanococcus jannischii, METKA—Methanopyrus kandleri,
METMA—Methanosarcina mazei, METTH—Methanobacterium thermoautotrophicum, PYRAB—Pyrococcus abyssi, PYRAE—Pyrobaculum
aerophilum, SULSO—Sulfolobolus solfataricus, THEAC—Thermoplasma acidophilum, and for Bacteria (bottom set): BACSU—Bacillus subtilis,
CAUCR—Caulobacter crescentus, CHLTR—Chlamydia trachomatis, CHLTE—Chlorobium tepidum, ECOLI—Escherichia coli, HELPY—Helico-
bacter pylori, FUSNN—Fusobacterium nucleatum, STRCO—Streptomyces coelicolor, SYNY3—Synechocystis sp. (strain PCC 6803), THEMA—
Thermotoga maritima, THETH—Thermus thermophilus, TREPA—Treponema pallidum. The bars between the two sets mark conserved blocks (cf.
Fig. 2). Open bars mark segments showing cross-domain (universal) conservation, the black bar marks an archaeal-specific segment, and the cross-
hatched bar marks a bacterial-specific segment. Secondary structures are from PDB 1J5E.
P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625 617
mortui (Ban et al., 2000), E. coli (Yusupov et al., 2001)
and Deinococcus radiodurans (Harms et al., 2001; Schl-uenzen et al., 2000) and including the secondary struc-
ture assignments given by Brodersen et al. (2001).
These alignment adjustments were constrained insofar
as possible—given the limited structure information—
by placing alignment gaps only in surface loops. When
possible, residues known to make rRNA (Ban et al.,
2000; Brodersen et al., 2001) contacts in the determined
crystal structures were aligned so as to conserve theamino acid type within each prokaryotic phylodomain
separately. In addition careful examination was made
to identify positions corresponding to significant pro-
tein–protein contacts or involvement with the ribosomal
assembly sequence (Held et al., 1973) and again, if pos-
sible, these were aligned as conserved.
Finally, the proteins separately multialigned within
the Bacteria and Archaea were aligned across the twoprokaryotic phylodomains to optimize cross-phylodo-
main conservation. Again, using the same criteria as
for the phylodomain-specific alignments, but without
readjusting these, we manually shifted a few gaps to at-
tain the best cross-phylodomain alignments. The full
three-phylodomain alignment was performed by align-
ing the cross-phylodomain prokaryotic alignments to
the eukaryotic phylodomain-specific alignments byusing the same approach as before. The minimal-size cri-
terion for a block was eight positions, including at least
two absolutely conserved amino acids. Many of the
block boundaries are obvious in that they correspond
to alignment gaps in one phylodomain or the other, or
are bracketed by phylodomain-specific highly conserved
amino acid clusters. In other cases the boundaries are
more arbitrary, but only to within two or three posi-tions. For these, the boundaries were placed at the edges
of secondary structures or alignment gaps as implied by
the known crystal structures if possible. We searched the
database of all known prokaryotic protein sequences for
potential non-ribosomal homologs to each of the identi-
fied blocks. In no cases were any convincing non-ribo-
somal-protein matches found. Fig. 1 gives an example
of a finished alignment (for S8). Fig. 2 gives a summaryof the proposed alignment-block structures. The com-
plete set of all finished alignments is available on the
internet (http://bmerc-www.bu.edu/RRP/RRP_home).
Finally, we carried out a wide range of phylogenetic
analyses based on these alignments. In particular, the se-
quence variations within numerous combinations of
Fig. 2. Schematic summary of multiple sequence alignment block patterns of prokaryotic ribosomal proteins: (A) LSU riboproteins. (B) SSU
riboproteins. For each protein designated at the left, the upper line corresponds to the set of archaeal sequences and the lower line to the bacterial set.
Open bars mark segments alignable across both sets, black bars indicate archaeal-specific segments and cross-hatched bars indicate bacterial-specific
segments. Black lines indicate variable length segments for members of that division. Extended blank spaces indicate large alignment gaps. (C)
Preliminary work in aligning riboprotein sequences from Archaea and eukaryotes. The upper line corresponds to the archaeal set and lower line to
the eukaryotic set. Gray bars denote eukaryotic-specific segments; black bars indicate archaeal/eukaryotic-specific segments.
618 P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625
P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625 619
riboproteins and block types were combined and pro-
cessed using both maximum-likelihood and -parsimony
approaches. Phylogenetic trees were constructed from
the positional variation with maximum parsimony boot-
strapped 300 times (Swofford, 1998) and maximum like-
lihood with quartet puzzling (Jones et al., 1992;Strimmer and von Haeseler, 1997). Eukaryotic parasites
like Encephalitozoon cuniculi and Plasmodium falciparum
were excluded from these analyses. The analyses were
extended to include a number of additional ribosome-as-
sociated proteins as well. The predicted likelihood and/
or bootstrap value of each inferred taxonomic split
was recorded along with all inferred branch lengths.
3. Results
Immediately obvious from inspection of the finished
cross-phylodomain prokaryote riboprotein alignments
is the amalgamated block structure. These blocks fall
into three types: (U) universal blocks (conserved in both
phylodomains), (B) bacterial blocks (conserved in Bacte-ria, but unalignable with any archaeal counterparts),
and (A) archaeal blocks (conserved in Archaea, but una-
lignable with any bacterial counterparts). Types B and A
can be subdivided into the Class I and Class II blocks
(as defined in Introduction). Out of 6597 aligned posi-
tions in the prokaryotic SSU and LSU proteins, 3834
occur in universal blocks. There are 1303 alignable posi-
tions in the riboprotein blocks specific to both Archaeaand eukaryotes, and 787 positions in bacterial-specific
blocks. For comparisons between archaeal and eukary-
otic sequences, we also identified blocks conserved in
eukaryotes, but unalignable with any prokaryotic
counterpart.
Fig. 2 schematically summarizes the block patterns
found in all 15 SSU and 19 LSU proteins with complete
sets of prokaryotic homologs. With the exception of S9,L29 and possibly S10 and L6, and ribosomal proteins
have conserved phylodomain-specific blocks. Proteins
L4, L15, L18, S10 and S19 show similar cross-phylodo-
main regions that are totally unalignable yet occupy
equivalent positions. Proteins S2, S4, S7, S8, S12, S15,
L2, L3, L5, L10, L12, L13 and L14 feature phylodo-
main-specific blocks that are absolutely unalignable
across both prokaryotic phylodomains. Only S5, S9,S11, S13, S17, L7Ae, L22, L29 and L30 do not contain
clear bacterial-specific blocks. While many of the ribo-
somal proteins have N- or C-terminal extensions, some
like S4, S15, L10 and L30 dislpay N- or C-terminal
extensions uniquely characteristic of one entire phylod-
omain. (Fig. 2C) provides a display of the block struc-
ture for a small representative set of eukaryotic
ribosomal proteins, illustrating their archaeal nature.Protein S8 illustrates the general features of many align-
ments (Fig. 1). Note that both the N- and C-terminal
regions have clearly alignable hydrophobicity patterns
and similar conservation of glycine and proline residues
across the two prokaryotic phylodomains. However, in
the central region not even the hydrophobicity pattern
is conserved.
Comparison of the relationships between the posi-tions of cross- and within-phylodomain sequence con-
servation with known SSU rRNA and/or protein
contacts (Brodersen et al., 2001) failed to show any over-
all correlations. In S4 and S7, for example, major RNA
contacts (as seen in the T. thermophilus structure) are in
regions unique to each prokaryotic phylodomain sepa-
rately, though of similar base composition, while in
S3, S9, and S13 cross-phylodomain conserved proteinregions make major contacts to RNA but in regions that
differ in base sequence between the two phylodomains.
With the exception of S12 and all but the very center
of S10, the amino acid sequences in RNA-contacting re-
gions are rather variable even within each phylodomain.
A similar situation exists for protein–protein contacts.
For example, the small region in the very center of
S10 which makes contact with S14 is particularly inter-esting in that the pattern PXGXG is absolutely con-
served in Archaea, but only the proline is (partly)
conserved in Bacteria.
Like many other riboproteins, L4 contains an ex-
tended loop region (positions 43–74 in E. coli) that pen-
etrates deep into the RNA core structure (Ban et al.,
2000; Jenni and Ban, 2003; Yusupov et al., 2001) and
can be assumed to contribute to RNA folding and/orstabilization. This L4 loop reaches through the structure
towards the back of the peptidyl transferase active site
(Jenni and Ban, 2003; Nissen et al., 2000). In both pro-
karyotic phylodomains the three-dimensional structure
shows a similar extended loop with extensive RNA con-
tacts, but for 23 residues in this loop the bacterial and
archaeal sequences have no meaningful alignment simi-
larity. In fact even the structural details and sequencelengths are different (Fig. 3). Only the GXG subse-
quence that comprises part of the inner surface of the
polypeptide exit tunnel (Jenni and Ban, 2003) is found
in both sequences, but not at equivalent alignment posi-
tions. However, L4 aligns very well across both phylod-
omains over nearly its entire remaining length (cf. Fig.
2A). Such cross-phylodomain distinctive sequence con-
text of an apparent common feature is seen in thePXGXG region of the archaeal S10, which though of
similar length and alignment position has no sequence
similarity to bacterial S10 in the same region. In that
case no structural information is available for the archa-
eal homolog and thus whether or not there is some
structural equivalence as in L4, is not yet known.
To explore phylogenetic patterns, we investigated a
variety of concatenated aligned informative positionsets: individual proteins, all SSU universal blocks, all
LSU universal blocks, as well as the archaeal-specific
Fig. 3. Comparative sequence and structural conservation of the extended loop of L4 in Archaea (left) and Bacteria (right). (A) Sequence alignment
of the relevant archaeal segment with the bacterial segment. The sequence region shown for Archaea (Haloarcula marismortui numbering) is for
residues 38–102, and for Bacteria (Deinococcus radiodurans numbering) residues 35-95. Arrows mark the GXG segment discussed in the text. (B)
Structure of the L4 loop in H. marismortui (left) and D. radiodurans (right). Red balls represent the glycine residues of the GXG segment.
620 P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625
blocks and bacterial-specific blocks from the SSU and
LSU proteins separately. In addition, data was obtained
from multialigned protein synthesis, initiation and elon-
gation factors as well as the signal recognition complex
proteins. Fig. 4 presents a phylogenetic tree built from
the multiple alignment of the SSU protein universal
blocks. These generated identical core branch topology[Bacteria, (Euryarchaeota, (Crenarchaeota, Eukaryo-
ta))] for trees made by the concatenation of the LSU
protein universal blocks, and by the multialignment of
the elongation, EF2/EF, and initiation, IF2P/IF2, fac-
tors common to all three phylodomains (data not
shown). Among all of the maximum likelihood gener-
ated trees this core branching was observed with the
exception of the branch between the Euryarchaeotaand the Crenarchaeota, which was of zero length in
one case. Addition of the Nanoarchaea (Huber et al.,
2002) confuses this branch as well. While the maximum
parsimony bootstrapped trees also produced this same
branching topology as a consensus, there was a bit more
variation. The universal-block trees for the SSU and
LSU and other proteins clearly resolve the four phyloge-
netic taxa with bootstrap values from 76 to 100% andlikelihoods of 90–100%, but there is little consistent res-
olution within the bacterial phylodomain. Such star-like
patterns with limited phylogenetic resolving power have
been seen before among the bacterial rRNAs (Pace,
1997). On the other hand, there is generally a clean divi-
sion between the Crenarchaeota and the Euryarchaeota.
Whether or not there is sufficient information in these
sequences to provide more details between the two divi-
sions of Archaea is as yet unclear given the fewer com-
pleted archaeal genomes as compared to the Bacteria.
The average block divergences, as defined by the im-
plied maximum-likelihood branch lengths in residuereplacements per thousand are given in the legend to
Fig. 4 and the number of absolutely conserved positions
per thousand is presented in Table 1 and Fig. 5. The
LSU proteins have a higher implied variation than the
SSU proteins. For the LSU proteins there is also a high-
er implied extent of multiple replacements in the archa-
eal-eukaryotic clade, with the greatest difference leading
to the eukaryotes. Among the prokaryotic-specificblocks there is also a greater implied variation among
the Archaea than among Bacteria as inferred from the
archaeal-specific blocks. In the eukaryotes, the total var-
iation in non-conserved positions is even more pro-
nounced though, at the same time, there are more
absolutely conserved positions as compared to the Bac-
teria and the Archaea.
4. Discussion
Clearly the large number of aligned positions that
show either absolute identity or conservation of hydro-
Fig. 4. Maximum-likelihood phylogenetic tree for the concatenated,
universal small-subunit block sequences. This is the same as that for
the elongation factors EF2/EFG and initiation factors IF2P/IF2 as
well as for the signal recognition complex. The tree was constructed
using the TREE-PUZZLE tool on the concatenated multialignment of
all the universal blocks in the SSU proteins. The tree search procedure
was Quartet Puzzling with 1000 steps and the JTT (Jones et al., 1992)
model of substitution. The topology of this tree is representative of the
topology of all the other trees built using concatenated universal
blocks in LSU ribosomal proteins, elongation/initiation factors, signal
recognition complex proteins or universal blocks in single proteins that
have a sufficient degree of variability. Average branch lengths for the
labeled branches considering SSU, LSU and factor trees (expressed in
terms of maximum-likelihood substitutions per residue) are (a)
0.667 ± 0.19 with 93% confidence, (b) 0.09 ± 0.1 with 100% confidence,
and (c) 1.03 ± 0.2 with 100% confidence. Species labels are Swiss-Prot
codes as defined in the legend to Fig. 1 with the addition of eukaryotes:
ARATH—Arabidopsis thaliana, CAEEL—Caenorhabditis elegans,
DROME—Drosophila melanogaster, HUMAN—Homo sapiens,
ORYSA—Oryza sativa, and SCHPO—Schizosaccharomyces pombe.
P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625 621
phobicity profile across both prokaryotic phylodomains
implies common ancestry for all homologous prokary-otic riboproteins. It also implies that they were carrying
out their ribosomal functions long before the bacterial/
archaeal split. Given the large number of distinct pro-
karyotic phylodomain-specific ribosomal protein fea-
tures, however, one must assume that considerable
time elapsed after that split and before the divergence
that produced their extant forms. Sequences within each
of the three block categories show maximum divergencein the universal blocks and approximately one-third that
much divergence within the two sets of phylodomain-
specific blocks. This implies that the most recent com-
mon ancestor (cenancestor) of all prokaryotes existed
much earlier than the separate bacterial and archaeal
cenancestors, and that these may have coexisted at
about the same time.
There are at least four potential explanations forthese observations: a purely statistical effect resulting
in both prokaryotic phylodomains having similar coa-
lescence times; massive horizontal gene transfer within
each phylodomain; explicit phylodomain cenancestors
having clear selective advantages; and finally a physical
bottleneck well after the last common ancestor of the
Bacteria and Archaea. Coalescence theory (Felsenstein,
2003; Zhaxybayeva and Gogarten, 2004) concludes that
in a large population of near-constant size, there will be
a single cenancestor of all extant members of that pop-
ulation that existed closer to the present day than the
true ancestor of the species. Extant prokaryotes, how-ever, have occupied highly dispersed and diverse niches
since the bacterial/archaeal split. This makes the likeli-
hood of coalescence back to a single chance last com-
mon ancestor for each phylodomain very small. The
derivation of the phylodomain–specific blocks, with
their well-defined boundaries, from horizontal gene
transfer also is unlikely as it would have to involve con-
tinuous events within each separate phylodomain glob-ally across diverse niches. The hypothesis that the last
two cenancestors had certain selective phylodomain-spe-
cific advantages also seems improbable, but cannot be
ruled out (see later discussion on blocks in ribosomal
proteins S8 and L4). Given the data and the similar se-
quence variation seen in the phylodomain-specific
blocks, a physical bottleneck seems to be the simplest
explanation for our results. It should be noted that aphysical bottleneck event is phylogenetically equivalent
to two distinct cenancestors having clear and compara-
ble selective advantages.
A survey of metabolic enzymes from prokaryotes
does not reveal the type of bacterial/archaeal phylodo-
main-specific block structure we have observed among
the riboproteins, particularly class II blocks that have
similar placement within the overall protein sequencesbut no cross-phylodomain similarity. This could come
about as the result of numerous and possibly parallel
adaptations to new habitats following the proposed bot-
tleneck. Clearly one would expect many metabolic adap-
tations to habitats with novel environmental
parameters, including very different metabolites. Also,
metabolic enzymes—in many cases consisting of a single
or only a few structural domains—may have evolved tooptimum forms long before the ribosome system. Fur-
thermore, horizontal transfers would be vastly easier
in the case of such proteins. Interestingly, however, the
block structure—though it was not so characterized—
does appear to occur in one set of published partial ami-
no acid sequence alignments from prokaryotes (Eichler,
2003). The proteins in question (SecDF) participate in
translocation of secreted proteins across the bacterialor archaeal plasma membrane, and as such are indirectly
ribosome-associated (Pohlschroder et al., 2004). They
have no known enzymatic activity.
The idea that there was some strong selective/compet-
itive advantage for the two prokaryotic phylodomain-
specific blocks requires more detailed investigation.
For example, the central blocks in S8 and L4, if they
are truly homologs (as their similar structural place-ments imply), would seem to require strong phylodo-
main-specific selective events along the line of descent,
Table 1
Frequency (&) of absolutely conserved positions in aligned blocks for universally conserved riboproteins
Protein ABE EA AB EB E A B
S2P 79 56 101 17 112 96 157
S3P 77 64 90 13 134 25 121
S4P 49 118 88 29 206 49 79
S5P 76 91 104 35 229 76 118
S7P 53 159 68 53 174 114 106
S8P 95 52 126 10 201 43 53
S9P 57 72 79 64 100 49 50
S10P 54 44 120 11 87 162 152
S11P 108 134 150 50 266 83 75
S12P 116 132 198 82 166 83 208
S13P 100 17 133 42 224 58 125
S14P 122 82 184 0 143 81 81
S15P 34 68 34 51 186 51 152
S17P 49 61 61 73 244 24 86
S19P 123 62 173 37 99 12 99
L1P 41 10 86 5 117 41 127
L2P 108 55 145 31 186 35 149
L3P 77 90 83 32 224 103 77
L4P 35 51 45 31 176 76 4
L5P 55 61 89 27 289 49 172
L6P 42 47 53 11 100 58 120
L10P 30 10 51 0 112 30 10
L11P 63 64 85 36 274 6 168
L12P 37 49 74 0 25 186 198
L13P 83 37 120 19 120 65 130
L14P 156 65 189 57 173 49 147
L15P 40 81 56 8 210 32 73
L18P 54 126 54 9 189 18 135
L22P 27 82 36 9 182 64 55
L23P 52 52 91 26 169 78 39
L24P 101 13 101 13 76 51 76
L29P 18 17 18 17 141 105 35
SSU (Avg.) 79 81 34 38 173 69 112
LSU (Avg.) 61 53 22 20 170 58 101
IF2 137 73 4 35 171 84 87
EF2 134 70 17 33 181 76 157
See legend to Fig. 5 for definition of block categories.
622 P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625
given that there has been very minimal divergence withinthe extant representatives of the two prokaryotic phyl-
odomains. Why might phylodomain-specific selection
act in such a block-like fashion on the L4 loop region
when today there appear to be no ribosomal function
or structural characteristics to distinguish its role in
the two groups of organisms? In fact there is some
experimental evidence that the ribosome can still func-
tion with this L4 loop deleted (Zengel et al., 2003).If one assumes that the phylogenetic mean-likelihood
branch lengths estimated from the ribosomal and associ-
ated proteins are even crudely proportional to time
(Harris et al., 2003), a bottleneck event would have oc-
curred at somewhat more than half the distance back
to the last common prokaryotic ancestor, or about
two billion years ago. In addition, the failure of the se-
quence variation information to resolve the bacterialand archaeal phylogenies suggests an ancient event (or
events) followed by rapid niche divergence (Hartman,
2002). Gogarten-Boekels et al. (1995) have proposed asimilar extinction-based explanation for the absence of
deep-branching lineages in other molecular phylogenies.
Candidates for such an event would be the proposed
Paleoproterozoic ‘‘snowball earth’’ (Kirschvink et al.,
2000); major atmospheric change, resulting from the ra-
pid introduction of oxygen, as suggested recently by
Hedges et al. (2001b); and surely there are others.
A possible bottleneck of this age is interesting giventhat 2.2–2.5 Ga is the approximate time thought to cor-
respond to the rise of the first true eukaryotes (Hedges et
al., 2001a)—an event thought to coincide with signifi-
cant increase in atmospheric oxygen levels (Bekker
et al., 2004). This apparent correlation is most intriguing
given the relationship between the eukaryotes and the
two subclasses of Archaea. All of the eukaryotic ribo-
somal proteins possessed in common with the prokary-otes have the archaeal-specific block structures while
containing no bacterial-specific blocks. There are five
Fig. 5. Venn diagram representing the frequency distribution (&) of absolutely conserved amino acid residue positions in representative alignments
of homologous ribosomal proteins from all three phylodomains. (a) Concatenated SSU proteins. (b) Concatenated LSU proteins. In both cases, the
three-circle diagram reports the number of absolutely conserved residues per 1000 residues in each category for positions in aligned universal (three
domain) blocks, the single circle reports on the bacterial-specific blocks, and the double circle reports on the blocks specific to Archaea and
eukaryotes. (The numbers for purely archaeal-specific and eukaryote-specific blocks were too small to be statistically meaningful, so these categories
have been omitted.) Upper-case letters within the diagrams have the following meanings: E—positions absolutely conserved only within the
eukaryotic sequences, A—positions absolutely conserved only within the archaeal sequences, and B—positions absolutely conserved only within the
bacterial sequences. Combinations of these have the obvious meanings, e.g., EA refers to positions absolutely conserved within eukaryotic and
archaeal, but not bacterial sequences. As an illustration, the total frequency of absolutely conserved positions among the archaeal LSU riboproteins
(in A + AB + EA + ABE blocks) is 58 + 22 + 53 + 61 = 194& (corresponding to 420 out of 2163 positions).
P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625 623
riboproteins, S25e, S26e, S30e, L13e and L38e, com-
mon only to the crenarchaea and the eukaryotes. In
addition the eukaryotic ribosome-associated proteins,
such as the initiation, termination and elongation fac-
tors, are most similar to those in the Crenarchaeota.
This contrasts with the known nuclear histone struc-
tures (Malik and Henikoff, 2003) that are common tothe Euryarchaeota and the Eukaryota, but not found
in either Bacteria or the Crenarchaeota. If one assumes
that the major proto-eukaryotic adaptation was the
ability to phagocytose, then the inclusion of the crenar-
chaeal protein-synthesizing system and the euryarchaeal
chromosomal packing system along with the later inclu-
sion of mitochondria and chloroplasts may all have re-
sulted from similar events. Given the accepted bacterialendosymbiotic origin of mitochondria and chloroplasts,
whose ribosomal protein genes were transferred to the
eukaryotic nucleus, one can ask why there has been
no recombination between those genes and the genes
of apparent crenarchaeal origin that encode nuclear
eukaryotic riboproteins. Thus while our analysis here
clearly lays out the relationships among many parts
of the protein synthesis machinery of these four major
taxa (Bacteria, Euryarchaeota, Crenarchaeota, and Eu-
karyota), it does not resolve the very complex origin of
modern eukaryotes. How the times of bacterial and
archaeal divergence and of the origin of the eukaryotic
ribosome are related remains unknown (Feng et al.,1997; Gupta and Golding, 1996; Hedges and Kumar,
2003).
The firm conclusion that emerges from our analysis is
that the wide range of habitats occupied by (and meta-
bolic systems contained in) today�s Bacteria and Ar-
chaea do not necessarily represent those occupied by
their last common ancestors—neither the last common
ancestor of all extant prokaryotes nor those of the twophylodomains. This is true since all of those habitats
and most of the metabolic systems necessarily represent
re-adaptations following diversification some time after
the complete fixation of the riboprotein phylodomain-
specific block structures. Whether this was the result
of a long selective reduction, statistical coalescence effect
624 P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625
or a classical bottleneck does not seem to change this
conclusion.
Note added in proof
Klein et al. (2004) recently identified six large subunit
proteins making equivalent ribosomal contacts in bacte-
ria and archaea yet showing no sequence or significant
structural similarity. Thus like the L4 loop block they
are phylodomain-specific, suggesting there may have
been many alternate prokaryotic rRNA protein solu-
tions, only two sets of which served the bottleneck.
Acknowledgments
This work was supported by NSF Grant No.
00205512. We thank Yuki Moriya of Kyoto University
for preparing the sequence alignments used to construct
the elongation and initiation factor tree.
References
Altschul, S.F., Koonin, E.V., 1998. Iterated profile searches with PSI-
BLAST—a tool for discovery in protein databases. Trends
Biochem. Sci. 23, 444–447.
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z.,
Miller, W., Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST:
a new generation of protein database search programs. Nucleic
Acids Res. 25, 3389–3402.
Ban, N., Nissen, P., Hansen, J., Moore, P.B., Steitz, T.A., 2000. The
complete atomic structure of the large ribosomal subunit at 2.4 A
resolution. Science 289, 905–920.
Bekker, A., Holland, H.D., Wang, P.L., Rumble III, D., Stein, H.J.,
Hannah, J.L., Coetzee, L.L., Beukes, N.J., 2004. Dating the rise of
atmospheric oxygen. Nature 427, 117–120.
Brodersen, D.E., Clemons Jr., W.E., Carter, A.P., Wimberly, B.T.,
Ramakrishnan, V., 2001. Crystal structure of the 30S ribosomal
subunit from Thermus thermophilus: structure of the proteins and
their interactions with 16S RNA. J. Mol. Biol. 316, 725–768.
Caetano-Anolles, G., 2002. Tracing the evolution of RNA structure in
ribosomes. Nucleic Acids Res. 30, 2575–2587.
Das, S., 1998. Protein function identification using prior-based profiles
to represent protein domains biomolecular engineering. Boston
University, Boston.
Das, S., Smith, T.F., 2000. Identifying nature�s protein lego set. In:
Kim, P.S. (Ed.), Advances in Protein Chemistry. Academic Press,
San Diego, pp. 159–183.
de Jong, W.W., van Dijk, M.A., Poux, C., Kappe, G., van Rheede, T.,
Madsen, O., 2003. Indels in protein-coding sequences of Euarch-
ontoglires constrain the rooting of the eutherian tree. Mol.
Phylogenet. Evol. 28, 328–340.
Eichler, J., 2003. Evolution of the prokaryotic protein translocation
complex: a comparison of archaeal and bacterial versions of
SecDF. Mol. Phylogenet. Evol. 27, 504–509.
Felsenstein, J., 2003. Coalescent Trees Inferring Phylogenies. Sinauer
Associates, Sunderland, MA (Chapter 26).
Feng, D.F., Cho, G., Doolittle, R.F., 1997. Determining divergence
times with a protein clock: update and reevaluation. Proc. Natl.
Acad. Sci. USA 94, 13028–13033.
Gogarten-Boekels, M., Hilario, E., Gogarten, J., 1995. The effects of
heavy meteorite bombardment on the early evolution—the emer-
gence of the three domains of life. Origins Life Evolution Biosphere
25, 251–264.
Gupta, R.S., Golding, G.B., 1996. The origin of the eukaryotic cell.
Trends Biochem. Sci. 21, 166–171.
Harms, J., Schluenzen, F., Zarivach, R., Bashan, A., Gat, S., Agmon,
I., Bartels, H., Franceschi, F., Yonath, A., 2001. High resolution
structure of the large ribosomal subunit from a mesophilic
eubacterium. Cell 107, 679–688.
Harris, J.K., Kelley, S.T., Spiegelman, G.B., Pace, N.R., 2003. The
genetic core of the universal ancestor. Genome Res. 13, 407–412.
Hartman, H., 2002. Macroevolution, catastrophe and horizontal
transfer. In: Kado, C.I. (Ed.), Horizontal Gene Transfer. Chapman
& Hall, London, pp. 411–415.
Hedges, S.B., Chen, H., Kumar, S., Wang, D.Y., Thompson, A.S.,
Watanabe, H., 2001a. A genomic timescale for the origin of
eukaryotes. BMC Evol. Biol. 1, 4.
Hedges, S.B., Chen, H., Kumar, S., Wang, D.Y.-C., Thompson, A.S.,
Watanabe, H., 2001. A genomic timescale for the origin of
eukaryotes. BMC Evol. Biol. Available from: <http://www.bio-
medcentral.com/1471-2148/1/4>.
Hedges, S.B., Kumar, S., 2003. Genomic clocks and evolutionary
timescales. Trends Genet. 19, 200–206.
Held, W.A., Mizushima, S., Nomura, M., 1973. Reconstitution of
Escherichia coli 30 S ribosomal subunits from purified molecular
components. J. Biol. Chem. 248, 5720–5730.
Huber, H., Hohn, M.J., Rachel, R., Fuchs, T., Wimmer, V.C., Stetter,
K.O., 2002. A new phylum of Archaea represented by a nanosized
hyperthermophilic symbiont. Nature 417, 63–67.
Jenni, S., Ban, N., 2003. The chemistry of protein synthesis and voyage
through the ribosomal tunnel. Curr. Opin. Struct. Biol. 13, 533.
Jones, D.T., Taylor, W.R., Thornton, J.M., 1992. The rapid genera-
tion of mutation data matrices from protein sequences. Comput.
Appl. Biosci. 8, 275–282.
Kirschvink, J.L., Gaidos, E.J., Bertani, L.E., Beukes, N.J., Gutzmer,
J., Maepa, L.N., Steinberger, R.E., 2000. Paleoproterozoic snow-
ball earth: extreme climatic and geochemical global change and its
biological consequences. Proc. Natl. Acad. Sci. USA 97, 1400–
1405.
Klein, D.J., Moore, P.B., Steitz, T.A., 2004. The roles of ribosomal
proteins in the structure assembly, and evolution of the large
ribosomal subunit. J. Mol. Biol. 340, 141–177.
Lecompte, O., Ripp, R., Thierry, J.-C., Moras, D., Poch, P., 2002.
Comparative analysis of ribosomal proteins in complete genomes:
an example of reductive evolution at the domain scale. Nucleic
Acids Res. 30, 5382–5390.
Malik, H.S., Henikoff, S., 2003. Phylogenomics of the nucleosome.
Nat. Struct. Biol. 10, 882–891.
Mears, J.A., Cannone, J.J., Stagg, S.M., Gutell, R.R., Agrawal, R.K.,
Harvey, S.C., 2002. Modeling a minimal ribosome based on
comparative sequence analysis. J. Mol. Biol. 321, 215–234.
Nikaido, M., Matsuno, F., Hamilton, H., Brownell Jr., R.L., Cao, Y.,
Ding, W., Zuoyan, Z., Shedlock, A.M., Fordyce, R.E., Hasegawa,
M., Okada, N., 2001. Retroposon analysis of major cetacean
lineages: the monophyly of toothed whales and the paraphyly of
river dolphins. Proc. Natl. Acad. Sci. USA 98, 7384–7389.
Nissen, P., Hansen, J., Ban, N., Moore, P.B., Steitz, T.A., 2000. The
structural basis of ribosome activity in peptide bond synthesis.
Science 289, 920–930.
Pace, N.R., 1997. A molecular view of microbial diversity and the
biosphere. Science 276, 734–740.
Pohlschroder, M., Dilks, K., Hand, N.J., Wesley Rose, R., 2004.
Translocation of proteins across archaeal cytoplasmic membranes.
FEMS Microbiol. Rev. 28, 3–24.
Ramakrishnan, V., Moore, P.B., 2001. Atomic structures at last: the
ribosome in 2000. Curr. Opin. Struct. Biol. 11, 144–154.
P. Vishwanath et al. / Molecular Phylogenetics and Evolution 33 (2004) 615–625 625
Schluenzen, F., Tocilj, A., Zarivach, R., Harms, J., Gluehmann, M.,
Janell, D., Bashan, A., Bartels, H., Agmon, I., Franceschi, F.,
Yonath, A., 2000. Structure of functionally activated small
ribosomal subunit at 3.3 angstroms resolution. Cell 102, 615–623.
Strimmer, K., von Haeseler, A., 1997. Likelihood-mapping: a simple
method to visualize phylogenetic content of a sequence alignment.
Proc. Natl. Acad. Sci. USA 94, 6815–6819.
Swofford, D.L., 1998. PAUP*. Phylogenetic Analysis Using Parsimony
(* and Other Methods). Sinauer Associates, Sunderland, MA.
Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. Clustal W:
improving the sensitivity of progressive multiple sequence align-
ment through sequence weighting, position-specific gap penalties
and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.
Tung, C.-S., Joseph, S., Sanbonmatsu, K.Y., 2002. All-atom homology
model of the Escherichia coli 30S ribosomal subunit. Nat. Struct.
Biol. 9, 750–755.
Wimberly, B.T., Brodersen, D.E., Clemons Jr., W.M., Morgan-
Warren, R.J., Carter, A.P., Vonrhein, C., Hartsch, T., Ramakrish-
nan, V., . Structure of the 30S ribosomal subunit. Nature 407, 327–
339.
Woese, C.R., Kandler, O., Wheelis, M.L., 1990. Towards a natural
system of organisms: proposal for the domains Archaea, Bacteria,
and Eucarya. Proc. Natl. Acad. Sci. USA 87, 4576–4579.
Wuyts, J., van de Peer, Y., de Wachter, R., 2001. Distribution of
substitution rates and location of insertion sites in the tertiary
structure of ribosomal RNA. Nucleic Acids Res. 29, 5017–5028.
Yusupov, M.M., Yusupova, G.Z., Baucom, A., Lieberman, K.,
Earnest, T.N., Cate, J.H.D., Noller, H.F., 2001. Crystal structure
of the ribosome at 5.5 A resolution. Science 292, 883–896.
Zengel, J.M., Jerauld, A., Walker, A., Wahl, M.C., Lindahl, L., 2003.
The extended loops of ribosomal proteins L4 and L22 are not
required for ribosome assembly or L4-mediated autogenous
control. RNA 9, 1188–1197.
Zhaxybayeva, O., Gogarten, J., 2004. Cladogenesis, coalescence and
the evolution of the three domains of life. Trends Genet. 20, 182–
187.