23
Center for Biological Sequence Analysis Department of Systems Biology Introduction to Microbial Genomics Alignments & sequence information Dave Ussery DTU course 27105 - Comparative Genomics Chapter 2 in textbook Wednesday, 4 June, 2014

Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

Embed Size (px)

Citation preview

Page 1: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

Center  for  Biological  Sequence  Analysis

Department  of  Systems  Biology

Introduction to Microbial Genomics

Alignments & sequence information

Dave Ussery !DTU course 27105 - Comparative Genomics Chapter 2 in textbook !Wednesday, 4 June, 2014

Page 2: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014 2DTU course 27105 Comparative Genomics

Page 3: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014

Query. The complete sequence of the genome of ..... has been determined by

Subject. The complete sequence of the genome of ..... has been determined by

!Query. the whole genome shotgun method .... The entire length of the genome was

Subject. assembling the sequences .................... The entire length of the genome was

!Query. 1,669,695 bp. The authenticity of the entire_______sequence was supported by

Subject. 1,738,505 bp. The authenticity of the entire genome sequence was supported by

!Query. restriction analysis of long PCR products, which were directly amplified from the

Subject. restriction analysis of long PCR products, which were directly amplified from the

!Query. genomic DNA. As the potential protein-coding regions, a total of 2,694 open

Subject. genomic DNA. As the potential protein-coding regions, a total of 2 061 open

!Query. reading frames (ORFs) were assigned. DNA Res., 6:83-101, (1999).

3DTU course 27105 Comparative Genomics

Page 4: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014

Query. 5’ GGCCTAGTAGCCCATAGACTATACACCCGGATA 3

: : : :

Subject. 5’ TAACCGGGTTTATAGGCTATGGGGTAGTAGGCC 3

!!Query. 5’ GGCCTAGTAGCCCATAGACTATACACCCGGATA 3’

:::::: :: ::::::: ::::: ::: :: ::

Subject. 5’ GGCCTACTACCCCATAGCCTATAAACCCGGTTA 3’

! Subject (ds) 5’ TAACCGGGTTTATAGGCTATGGGGTAGTAGGCC 3’

3’ ATTGGCCCAAATATCCGATACCCCATCATCCGG 5’ !Figure 2.2. Alignment of two DNA sequences at the top does not display similarity. When the complementary strand of the subject is used (the second alignment) the similarity is apparent. At the bottom both strands of the subject are given.

Aligning two sequences........

4DTU course 27105 Comparative Genomics

Page 5: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014

Query AGGCAAAGTCGCCATGCTGCCATCTGTATG Query. RQSRHAAICM

: : :: : :: :: :: : ::: ::::: :: :

Subject CGTGAGAGCAGGCACTACGCAATAAGCATG Subject. RQSRHYAISM

!! Identity: 15 in 30 positions (50%) 8 in 10 positions (80%)

Subject ArgGluSerArgHisTyrAlaIleSerMet

Subject R Q S R H Y A I S M

Subject R Q S R H Y A I S M

Subject R Q S R H Y A I S M

Query GlyLysValAlaMetLeuProSerVal (Frame 2)

:

Subject ArgGluSerArgHisTyrAlaIleSerMet (Frame 1)

!! Identity: 1 in 10 positions (10%)

Query AlaLysSerProCysCysHisLeuTyr (Frame 3)

:

Subject ArgGluSerArgHisTyrAlaIleSer (Frame 1)

!! Identity: 1 in 10 positions (10%)

5DTU course 27105 Comparative Genomics

Page 6: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014

!Query DADTGTKAAAGVELTTKVVGAPVSALVGAKVLGNVKGISQYIIAQRAAQGQTLTGQPNTQ

:: : :::::.: ..:. ..:.:::.. : .: ..: ::.:

Subj DAPAGQKAAAGAEPALQLLGGGIAAAVGARLTGGTELASSIALALAAARGKTAALLGGLT

Query length: 60 aa

Identity over query length: 33%

Similarity over query length: 55%

Alignment length: 50 aa

Identity over alignment length: 40%

Similarity over alignment length: 66%

6DTU course 27105 Comparative Genomics

Page 7: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014

B! ! ! ! ! ! ! ! Score E!

Sequences producing significant alignments: (Bits) Value!

ref|ZP_01810188.1| putative oxidoreductase subunit [Campyloba... 154 1e-36!

ref|ZP_01070613.1| oxidoreductase, putative [Campylobacter je... 154 1e-36!

ref|NP_281605.1| putative oxidoreductase subunit [Campylobact... 154 1e-36 !

ref|YP_178483.1| oxidoreductase, putative [Campylobacter jeju... 153 3e-36!

ref|YP_001000126.1| oxidoreductase, putative [Campylobacter j... 152 5e-36 !

gb|AAX99091.1| unknown [Campylobacter jejuni subsp. jejuni 81-17 133 3e-30!

gb|AAF25214.1|AF202168_1 chimeric flagellin A/B [Campylobacter j 125 8e-28!

gb|AAC25637.1| flagellin A [Campylobacter jejuni] >gb|ABS8917... 125 8e-28!

sp|P22251|FLA2_CAMJE Flagellin A >gb|AAA23024.1| flagellin A 125 8e-28!

gb|AAC25644.1| flagellin A [Campylobacter jejuni] 124 2e-27!

ref|YP_001000997.1| flagellin B [Campylobacter jejuni subsp.... 124 2e-27 !

gb|AAC25648.1| flagellin A [Campylobacter jejuni] 123 3e-27

7DTU course 27105 Comparative Genomics

Page 8: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014

NNNNNNNACGGAGATTGCAATATATTGAATTTGCAAAGTTTTNNNNNN NNNNNNNATGAATTCTGCAATTTATTGAATTTGCAAACTTTTNNNNNN NNNNNNNACGTATCCTGCAATATATTGAATTTGCAAATTTTTNNNNNN NNNNNNNGCAGCTCCTGCAATATATTGAATTTTCATGCTTTTNNNNNN NNNNNNNTCGACTCCTGCAATGTATTGAATTTGCATGATTTTNNNNNN NNNNNNNGCCATCTCTGCAATATATTGAATTTGCGTGCTTTTNNNNNN NNNNNNNATCATCTCTGCAATATATTGAATTTGCCTGCTTTTNNNNNN NNNNNNNATTTTCTCTGCAATATATTGAATTTGCGTGGTTTTNNNNNN NNNNNNNGTTATTTCAGCAATATATTGAATTTGCCTGCTTTTNNNNNN NNNNNNNGTTAATTCAGCAATATATTGAATCTGCATGCTTTTNNNNNN NNNNNNNGTGTTTCCAGCAATATATTGAATTTGCTCACTTTTNNNNNN NNNNNNNGTGGCTCATGCAATATATTGAATTTGCACGATCTTNNNNNN NNNNNNNGTGGTTTATGCAATATATTGAATTTGCATGGTCTTNNNNNN NNNNNNNATGATTTCTGCAATATATTGAATTTGGAAGAATTTNNNNNN NNNNNNNATCGTTCCTGCAATTTATTGATTTTGCAAGACTTTNNNNNN NNNNNNNAAGGTTCCTGCAATATATTGAATTAGCACGATTTTNNNNNN NNNNNNNCAGAACGCTGCAATTTATTGAATTTGCACGATCATNNNNNN NNNNNNNTATGACGTTGCAATTTATTGAATATGAATGATTTTNNNNNN NNNNNNNCGTCATATTGCAATTTATTGAATTTGCACGAACTTNNNNNN NNNNNNNCATGGTCCTGCAATATATTGAATTGGCAAGATTTTNNNNNN TGCAATWTATTGAATTTGC

Figure 2.6. Multiple sequence alignment of 20 DNA sequences for IHF binding sites. Nucleotides in the region that was experimentally proven to contain the binding site are color-coded. Nucleotides outside the binding site are not defined (N). Below is a consensus sequence given for this alignment.

Aligning multiple sequences........

8DTU course 27105 Comparative Genomics

TGCAATWTATTGAATTTGC

Page 9: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014

Aligning multiple sequences........How can one visualize an alignment of >100 sequences?

9DTU course 27105 Comparative Genomics

T G C A A T W T A T T G A A T T T G C

TGCAATWTATTGAATTTGC

Page 10: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014 10DTU course 27105 Comparative Genomics

Page 11: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014 11DTU course 27105 Comparative Genomics

Page 12: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014

see "wikipedia entry" for Phylogenetic Tree

http://en.wikipedia.org/wiki/Phylogenetic_tree

12DTU course 27105 Comparative Genomics

Page 13: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014 13DTU course 27105 Comparative Genomics

Page 14: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014 14DTU course 27105 Comparative Genomics

Page 15: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

Humans

Rice

15

Page 16: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014

kat

46

Universal phylogenetic tree showing the relationships among Bacteria (e.g., most bacteria and blue-green algae), Archaea (e.g., methanogens and halophiles) andEucarya (e.g., protists, plants, animals, and fungi).

rRNA tree

corn

HUMANS

16DTU course 27105 Comparative Genomics

Page 17: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014 17DTU course 27105 Comparative Genomics

���

���

�����

�����

����������������

��

���������

��� ���������

���� �

� ���������������� ���

��

��

��

����������

������

��

���� �

� ����

��

��

��������

���������

��

������������

�������

����

������������

�������

��

��

�������

������������

���

���������

� ����������

��

������������

������������

���������

��

��

��

��� �� !������"�� ��#�� !�� �� �$$��

����������

����

��������

������

����

���� �

�����������

���������

������

���� �

�����%��

�� �

�& �����

��'��(

)

���������

�!" ##

��

��

��

����������

��

���������#�

��������$�������

�����%����

�����%�������

��

��������%��

��

&�������

������������

�*� � � � �+��#��� ,��

��((

-�����.���

��

/���.����

������������

�����

���

����

��

���

��

��

���

Figure 4.1: 23S rRNA tree with NJ method and 1000 bootstrap resamplings from ClustalW alignments.The tree is viewed and colored with MEGA5. Each phyla is collapsed when possible, except orders ofProteobacteriawas collapsed instead of phyla. See Fig. 4.17 in Appendix for uncollapsed version.

strap values for the orders of Firmicutes is 94.7% and 73% for the family levels. The first

four groups are then clustered with Actinobacteria (42%) and then to Chloroflexi(71%),

where these two forms the group V. Group VI consists mostly of the thermophilic bacteria

which are Deinococcus -Thermus, Synergistetes and Dictyoglomi cluster is one level further

from the first five groups. One level further, Thermotogae is seen with 99% bootstrap.

Aquificae is between the Archaea group(VII) and the rest of the Bacteria. Group VII is all

the archaeal phyla, where Crenarchaeota and Korarchaeota are closest to each other with

84%. Nanoarchaeota clusters with M. fervidus DSM 2088 and the outgroup in archaea is

Euryarchaeota, and the archaeal root has 100% bootstrap value.

The di�erences in the order of phyla inside the designated groups in 23S rRNA tree with

ML (Fig 4.2) is shown in the second column of Table ??. The di�erences are as follows.

In group I, N. meningitidis Z2491 seems outside the Betaproeobacteria. Spirochaetes are

closer to the Bacteroidetesand Chlorobi cluster, so they are in group II. Actinobacteria

clusters with group III with 31%. The order of other phyla are the same, with some

di�erent bootstrap values.

37

Aslı İsmihan Özen!August, 2011

16S rRNA tree

Page 18: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

18

Veill

onel

la a

typi

ca A

CS04

9VSc

h6Ve

illon

ella

disp

ar A

TCC

1774

8Ve

illon

ella

aty

pica

ACS

134V

Col7

aVe

illon

ella

sp. 1

58 st

r F04

1241

2Ve

illon

ella

par

vula

ATC

C 17

745

Veill

onel

la sp

. 3_1

_44

Veill

onel

la p

arvu

la D

SM 2

008

Veill

onel

la sp

. 6_1

_27

Meg

asph

aera

mic

ronu

cifo

rmis

F035

9

Meg

asph

aera

gen

omos

p ty

pe 1

str 2

8L

Dial

ister

invi

sus D

SM 1

5470

Dialist

er m

icrae

roph

ilus D

SM 1

9965

Dialist

er m

icroa

erop

hilu

s UPI

I 345

E

Acidam

inoc

occu

s fer

men

tans

DSM

2073

1

Acidam

inococc

us sp D

21

Phasco

larcto

bacte

rium

sp YIT 12067

Thermosin

us carb

oxydivorans N

or1

Megamonas hyperm

egale

Mitsuokella

multa

cida DSM 20544

Selenomonas sputig

ena ATCC 35185

Selenomonas artemidis F0399

Selenomonas sp oral taxon 137 str F

0430

Selenomonas sp oral taxon 149 str 6

7H29BP

Selenomonas flueggei ATCC 43531

Selenomonas noxia ATCC 43541

Clostridium botulinum F str. Langeland

Clostridium cellulolyticum H10

Eubacterium cylindroides T2-87

Lawsonia intracellularis PHE/MN1-00

Firmicutes I

Firmicutes II

Fusobacteria

TenericutesBacteroidetesProteobacteria - Epsilon

Deferribacteres

Proteobacteria - Delta

Proteobacteria - Alpha

Proteobacteria- Beta

Proteobacteria - Gam

ma

Cya

noba

cter

ia

Fibr

obac

ter s

ucci

noge

nes S

85

Spiro

chae

tes

Chla

myd

ia

Elus

imicr

obiu

m m

inut

um P

ei 1

01

Acidob

acter

ia

Dictyo

glomus t

urgidum

DSM

6724

Thermoanaero

bacter s

p. X514

Synerg

istete

s

Chlorof

lexi

Actinobact

eriaVerrucomicrobia

Chlorobium

Chloroflexi

Deinococcus-Thermus

Thermotogae

Aquificiae

Plantomyces

EuryarchaeotaKorarchaeota

Nanoarchaeota

Crenarchaeota

0.05

Veilonellaceae

Negativicutes

Bootstrap >80

Bootstrap 41 - 80

Bootstrap < 40

16S rRNA tree

Page 19: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014 19DTU course 27105 Comparative Genomics

Cambrian

Page 20: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

20

352 J. W. SCHOPF

and course of the evolutionary progression-the fossil record. record nevertheless remains meager. Even Precambrian Much progress has been made toward uncovering the early cyanobacteria, the best documented branch of early life, record, but much more remains to be accomplished. have a scanty fossil record: For chroococcacean (sphe-

How common the fossils of any biologic group are can roidal) species, the record amounts to fewer than 50 be measured by what are called taxonomic occurrences- taxonomic occurrences per 50-Ma-long interval; and for the number of species belonging to a given group that are filament-forming oscillatoriaceans, fewer than two dozen known to be present in formally recognized geologic (11). Moreover, the record is uneven. As shown in Figure units. For example, 10 species of fossil cyanobacteria in 1, it extends to nearly 3500 Ma ago but is woefully each of three named geologic formations adds up to 30 inadequate in rock units older than about 2200 Ma- taxonomic occurrences. More than 4000 such occur- mainly because rather few rocks from this time have rences of fossil microbes are known from Precambrian survived to the present, and most of these have experi- rocks (11, 12, 13), but the Precambrian encompasses such enced the fossil-destroying heat and pressure of meta- an enormous span of Earth history that the known fossil morphism (but also because most workers have chosen to

Fossil Evidence of Early Life

4500 I 1 Figure 1. Timeline of prokaryotic history inferred from the known Precambrian paleobiologic record. Fossil

representatives of three families of cyanobacteria (Chroococcaceae, Entophysalidaceae, and Oscillatoriaceae) and diverse noncyanobacterial prokaryotes (“bacteria”) have been identified in the listed 38 geologic units older than 1600 Ma (13, 14, 21).

Reference: Biol. Bull. 196: 351-355. (June 1999)

Deep Divisions in the Tree of Life-What Does the Fossil Record Reveal?

J. WILLIAM SCHOPF

IGPP Center for the Study of Evolution and the Origin of Life, Department of Earth and Space Sciences, and Molecular Biology Institute, University of California, Los Angeles, California 9009.5-1.567

Dating early evolution by the fossil record

Accurate dating of the antiquity of the major biologic lin- eages has been a long-standing goal in Precambrian paleobi- ology (1, 2, 3). This is a young field, however, and the early fossil record is too incompletely known to provide precise answers. Moreover, even in the best circumstances, fossils and associated geological and geochemical indicators of biologic activity (that is, “paleobiologic” evidence) can establish only a minimum age for a lineage, not how much earlier the lineage actually existed. Bearing these caveats in mind, data currently available indicate that stromatolitic microbial ecosystems, ev- idently including cyanobacteria and other members of the bacterial domain, were extant -3500 Ma ago (4, 5); metha- nogenic archaeans by -2800 Ma ago (6, 7); and gram-nega- tive sulfate-reducing bacteria as early as -2700 Ma ago (8) and possibly earlier (9).

Timing early evolution by the amino acid “clock”

Another way to address this problem has been proposed by Doolittle et al. (10). They reasoned that a molecular “clock” that would date the times of lineage emergence might be established by comparing the sequences of amino acids in proteins of the same families in diverse groups of organisms. Their study was thorough: It took into account fast and slowly evolving lineages and compared 53 1 amino acid sequences in 57 families in enzymatic proteins from 15 major groups of organisms. Their analysis yielded an internally consistent, amino acid-based tree in which the branches are all more or

This paper was originally presented at a workshop titled Evolution: A Molecular Point of View. The workshop, which was held at the Marine Biological Laboratory, Woods Hole, Massachusetts, from 24-26 October 1997, was sponsored by the Center for Advanced Studies in the Space Life Sciences at MBL and funded by the National Aeronautics and Space Administration under Cooperative Agreement NCC 2-896.

less the same length and have a branching order in good agreement with the 16s rRNA Tree of Life.

The discrepancy and the challenge

Strangely, however, the Doolittle et al. (10) clock meshes not at all well with the known Precambrian fossil record. Because paleobiologic evidence can yield only a minimum date for the emergence of a biologic lineage, the molecular clock would be expected to have yielded ages that are older than those documented by fossils. Yet the Doolittle et al. study concluded that all early evolving lineages originated at younger, not older, ages than those indicated by paleobiology, and the differences are substantial. For example, the amino acid data suggest that cyanobacteria originated about 1500 Ma ago, two billion years later than the time inferred from fossil evidence. The origins of gram-negative bacteria and methane- producing archaeans are placed a billion or more years later than indicated by paleobiology. And the “last common ances- tor” is dated at only -2000 Ma ago, whereas cellular fossils having morphologies indistinguishable from modem microbes are present in rocks nearly 3500 Ma old (4, 5).

Something is amiss. Perhaps the ancient fossils have been misinterpreted and are actually unrelated to present-day lin- eages. Or perhaps, like rRNA tress, the amino acid data yield reliable evidence only of the order of branching of evolution, not the timing of that branching. Whatever the answer, the discrepancies between the molecular clock and evidence from the rock record are too great and too consistent to be ignored. The challenge at hand is to unify these competing lines of evidence. Toward that end, assessment is made here of what now seems known, and what is not, from the available fossil data.

How can the challenge be met?

As Darwin pointed out in On The Origin of Species, there is only one court of last resort by which to decipher the timing

351

Reference: Biol. Bull. 196: 351-355. (June 1999)

Deep Divisions in the Tree of Life-What Does the Fossil Record Reveal?

J. WILLIAM SCHOPF

IGPP Center for the Study of Evolution and the Origin of Life, Department of Earth and Space Sciences, and Molecular Biology Institute, University of California, Los Angeles, California 9009.5-1.567

Dating early evolution by the fossil record

Accurate dating of the antiquity of the major biologic lin- eages has been a long-standing goal in Precambrian paleobi- ology (1, 2, 3). This is a young field, however, and the early fossil record is too incompletely known to provide precise answers. Moreover, even in the best circumstances, fossils and associated geological and geochemical indicators of biologic activity (that is, “paleobiologic” evidence) can establish only a minimum age for a lineage, not how much earlier the lineage actually existed. Bearing these caveats in mind, data currently available indicate that stromatolitic microbial ecosystems, ev- idently including cyanobacteria and other members of the bacterial domain, were extant -3500 Ma ago (4, 5); metha- nogenic archaeans by -2800 Ma ago (6, 7); and gram-nega- tive sulfate-reducing bacteria as early as -2700 Ma ago (8) and possibly earlier (9).

Timing early evolution by the amino acid “clock”

Another way to address this problem has been proposed by Doolittle et al. (10). They reasoned that a molecular “clock” that would date the times of lineage emergence might be established by comparing the sequences of amino acids in proteins of the same families in diverse groups of organisms. Their study was thorough: It took into account fast and slowly evolving lineages and compared 53 1 amino acid sequences in 57 families in enzymatic proteins from 15 major groups of organisms. Their analysis yielded an internally consistent, amino acid-based tree in which the branches are all more or

This paper was originally presented at a workshop titled Evolution: A Molecular Point of View. The workshop, which was held at the Marine Biological Laboratory, Woods Hole, Massachusetts, from 24-26 October 1997, was sponsored by the Center for Advanced Studies in the Space Life Sciences at MBL and funded by the National Aeronautics and Space Administration under Cooperative Agreement NCC 2-896.

less the same length and have a branching order in good agreement with the 16s rRNA Tree of Life.

The discrepancy and the challenge

Strangely, however, the Doolittle et al. (10) clock meshes not at all well with the known Precambrian fossil record. Because paleobiologic evidence can yield only a minimum date for the emergence of a biologic lineage, the molecular clock would be expected to have yielded ages that are older than those documented by fossils. Yet the Doolittle et al. study concluded that all early evolving lineages originated at younger, not older, ages than those indicated by paleobiology, and the differences are substantial. For example, the amino acid data suggest that cyanobacteria originated about 1500 Ma ago, two billion years later than the time inferred from fossil evidence. The origins of gram-negative bacteria and methane- producing archaeans are placed a billion or more years later than indicated by paleobiology. And the “last common ances- tor” is dated at only -2000 Ma ago, whereas cellular fossils having morphologies indistinguishable from modem microbes are present in rocks nearly 3500 Ma old (4, 5).

Something is amiss. Perhaps the ancient fossils have been misinterpreted and are actually unrelated to present-day lin- eages. Or perhaps, like rRNA tress, the amino acid data yield reliable evidence only of the order of branching of evolution, not the timing of that branching. Whatever the answer, the discrepancies between the molecular clock and evidence from the rock record are too great and too consistent to be ignored. The challenge at hand is to unify these competing lines of evidence. Toward that end, assessment is made here of what now seems known, and what is not, from the available fossil data.

How can the challenge be met?

As Darwin pointed out in On The Origin of Species, there is only one court of last resort by which to decipher the timing

351

“molecular clock” evidence for split between Gram+ and Gram-

Proc. Natl. Acad. Sci. USAVol. 94, pp. 13028–13033, November 1997Evolution

Determining divergence times with a protein clock: Updateand reevaluation

(animals!plants!fungi!eukaryotes!eubacteria!archaebacteria)

DA-FEI FENG, GLEN CHO, AND RUSSELL F. DOOLITTLE*Center for Molecular Genetics, University of California at San Diego, La Jolla, CA 92093-0634

Contributed by Russell F. Doolittle, September 23, 1997

ABSTRACT A recent study of the divergence times of themajor groups of organisms as gauged by amino acid sequencecomparison has been expanded and the data have beenreanalyzed with a distance measure that corrects for bothconstraints on amino acid interchange and variation in sub-stitution rate at different sites. Beyond that, the availability ofcomplete genome sequences for several eubacteria and anarchaebacterium has had a great impact on the interpretationof certain aspects of the data. Thus, the majority of thearchaebacterial sequences are not consistent with currentlyaccepted views of the Tree of Life which cluster the archae-bacteria with eukaryotes. Instead, they are either outliers ormixed in with eubacterial orthologs. The simplest resolutionof the problem is to postulate that many of these sequenceswere carried into eukaryotes by early eubacterial endosym-bionts about 2 billion years ago, only very shortly after or evencoincident with the divergence of eukaryotes and archaebac-teria. The strong resemblances of these same enzymes amongthe major eubacterial groups suggest that the cyanobacteriaand Gram-positive and Gram-negative eubacteria also di-verged at about this same time, whereas the much greaterdifferences between archaebacterial and eubacterial se-quences indicate these two groups may have diverged between3 and 4 billion years ago.

In theory, past biological events can be reconstructed on thebasis of present-day macromolecular sequences. Certainly, theassignment of organisms to various biological groups on thebasis of ribosomal or protein sequences has been largelysuccessful. Attaching absolute time scales to phylogenetic treeshas proved more troublesome, however. As a case in point, werecently reported the results of an analysis of 531 amino acidsequences from 57 different sets of enzymes drawn from 15different biological groups (1). We had aligned the 57 sets ofsequences, determined how similar they were from group togroup, and calculated evolutionary distances based on thosesimilarities. The distance data were calibrated on the basis ofdivergence times drawn from the fossil record, and extrapo-lations were made to estimate the divergence times of moredistantly related groups. The data were also used to computean overall phylogeny for these same groups. The most pro-vocative finding was that the divergence time between eu-karyotes and eubacteria was, after various corrections, onlyslightly more than 2 billion years ago.

The results were harshly criticized by others on a number ofcounts. Some felt that it was unreasonable to extrapolate so farbackwards in time on the basis of, mostly, the vertebrate fossilrecord (G. Olsen, quoted in ref. 2). Others felt that the distancevalue calculations did not take sufficient account of variations

in the rate of change at different amino acid locations (3–5).Concern was also expressed that the data were corrupted bythe presence of sequences imported during the endosymbioticacquisition of organelles (4). A pervading thought in all thecriticism was that the divergence time between eukaryotes andeubacteria had to be greater than 3.5 billion years because ofthe occurrence of microfossils that have been undisputedlydated to 3.45 billion years ago and that reportedly resemblemodern cyanobacteria (6).

In a recent response to these criticisms, we showed that evenwith the use of rigorous methods for correcting for sitevariation, the data yielded divergence times for eubacteria andeukaryotes of 2.5 billion years or less, if it was assumed thatthat the plant–animal divergence was 1 billion years (7). Insupport of this finding, Adkins and Li (8), using a subset of thesame data and an algorithm that takes account of site variation,found a value of 2.2 billion years, again presuming the plant–animal divergence was 1 billion years ago.

We have now completed a comprehensive updating of theenzyme sequence data set, the emphasis being on increasingthe number of sequences from the more under-representedbiological groups, including those used in the fossil record-based calibration. At the same time we employed a method ofcalculation that rigorously corrects for site variation (9).Indeed, the additional data and the improved methods ofcalculation have led to some significant changes. The greatestimpact was the result of having access to the total genomicsequences of several eubacteria and particularly an archae-bacterium (10). In our initial effort (1), archaebacteria wererepresented in only 9 of the enzyme sets, and at the time wenoted there were anomalies among them. In the updated dataset archaebacteria are represented in 34 of the 64 enzyme sets.Remarkably, the majority of phylogenies calculated with thesenewly available sequences are not consistent with currentnotions about the Tree of Life in that the archaebacterialsequences are either outliers or mixed in with eubacterialorthologs. Clearly, the matter of finding divergence times foreukaryotes, archaebacteria, and eubacteria is dependent onthe assumed branching order of these groups.

We have addressed the problem by considering the data invarious sets and subsets, assignment being strictly dependenton the phylogenetic trees generated for each enzyme. In thisregard, the enzyme sequence sets were categorized accordingto the presence or absence of an archaebacterial representa-tive, and, if present, on its position in the sequence-basedphylogeny. These groups were employed judiciously accordingto the particular task at hand. Thus, all 64 sets were used fordetermining divergence times of the various groups of eu-karyotes, but only certain sets were useful for determining thedivergence times of eukaryotes and archaebacteria.

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked ‘‘advertisement’’ inaccordance with 18 U.S.C. §1734 solely to indicate this fact.

© 1997 by The National Academy of Sciences 0027-8424!97!9413028-6$2.00!0PNAS is available online at http:!!www.pnas.org.

Abbreviations: My, millions of years; Mya, milllions of years ago.A commentary on this article begins on page 12751.*To whom reprint requests should be addressed at: Center for

Molecular Genetics, Room 206, University of California at SanDiego, La Jolla, CA 92093-0634. e-mail: [email protected].

13028

Cambrian

Page 21: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014 21DTU course 27105 Comparative Genomics

4000 3000 2000 10004500 3500 2500 1500 500

0

mybp

Cyanobacteria

Algae

Plants

Gram-negatives

Prokaryotes

Protozoa

Fungi

Animals

Gram-positives

Veillonella

Archaea

Stromatolites

Multicellular

eukaryotes (?)

Land plants

Oldest fossil recordsoxygen increase

Ediacara biota

Snowball earth

Eukaryotes

?

Page 22: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014 22DTU course 27105 Comparative Genomics

LETTERSPUBLISHED ONLINE: 21 AUGUST 2011 | DOI: 10.1038/NGEO1238

Microfossils of sulphur-metabolizing cells in3.4-billion-year-old rocks of Western AustraliaDavidWacey1,2*, Matt R. Kilburn1*, Martin Saunders1, John Cliff1 and Martin D. Brasier3

Sulphur isotope data from early Archaean rocks suggest thatmicrobes with metabolisms based on sulphur existed almost3.5 billion years ago, leading to suggestions that the earliestmicrobial ecosystems were sulphur-based1–5. However, mor-phological evidence for these sulphur-metabolizing bacteriahas been elusive. Here we report the presence of microstruc-tures from the 3.4-billion-year-old Strelley Pool Formation inWestern Australia that are associated with micrometre-sizedpyrite crystals. The microstructures we identify exhibit indica-tors of biological affinity, including hollow cell lumens, carbona-ceous cell walls enriched in nitrogen, taphonomic degradation,organization into chains and clusters, and �13C values of �33to �46h Vienna PeeDee Belemnite (VPDB). We thereforeidentify them as microfossils of spheroidal and ellipsoidalcells and tubular sheaths demonstrating the organization ofmultiple cells. The associated pyrite crystals have ⇥33S valuesbetween �1.65 and +1.43h and �34S values ranging from�12 to +6h Vienna Canyon Diablo Troilite (VCDT)5. Weinterpret the pyrite crystals as the metabolic by-products ofthese cells, which would have employed sulphate-reductionand sulphur-disproportionation pathways. These microfossilsare about 200 million years older than previously described6

microfossils fromPalaeoarchaean siliciclastic environments.Evidence of cellular organization would represent one of the

strongest lines of evidence for a Palaeoarchean biosphere, butthis has been beset with controversy7,8. At present, microbialmats9–11 together with sulphur isotope analysis1–5 provide the bestinsights into Palaeoarcheanmicrobial metabolisms and ecosystems,with evidence reported for phototrophs9–11 plus hydrogen-based10and sulphur-based1–5 metabolisms. However, these reports lackevidence for accompanying cellular morphology. Here, we providesuch evidence in the form of well-preserved cells closely associatedwith pyrite in the basal sandstone member of the �3,400Myr-oldStrelley Pool Formation (SPF),Western Australia.

The SPF crops out across eleven greenstone belts within theEast Pilbara Terrane, spanning a �75Myr hiatus in volcanismbetween the 3,520–3,427Myr-old Warrawoona Group and the3,350–3,315Myr-old Kelly Group12. Our microfossils come fromthe East Strelley greenstone belt (Supplementary Fig. S1), wherethe SPF lies above an unconformity on top of eroded �3,515Myr-old volcanics13. Here, the SPF records a marine transgressionacross one of Earth’s earliest preserved shorelines, with the basalsandstone deposited in a shallow-water beach or estuarine setting14,and the overlying carbonates deposited in a marine carbonateplatform setting11,15. Early silica cements in the sandstonesinclude isopachous phreatic cements (Supplementary Fig. S2),

1Centre for Microscopy, Characterization and Analysis, The University of Western Australia, 35 Stirling Highway, Crawley, Western Australia 6009,Australia, 2School of Earth and Environment, The University of Western Australia, 35 Stirling Highway, Crawley, Western Australia 6009, Australia,3Department of Earth Sciences, University of Oxford, South Parks Road, Oxford OX1 3AN, UK. *e-mail: [email protected]; [email protected].

accompanied near the base by dripstone and meniscus fabrics(Supplementary Fig. S3) formed in the vadose zone, indicatingpartially gas-filled pore spaces (see ref. 16), probably in the photiczone. Associated density concentrates of rounded detrital pyritewith mass-independently fractionated sulphur isotope signaturesshow this gas was low in oxygen5. The microfossils reported hereare slightly older than both the stromatolites11 and microfossils ofunknown metabolic affinity from stratiform chert higher up in theSPF (ref. 17; see Supplementary Table S1 for comparison).

Candidate microfossils (Figs 1, 2) are restricted to stratiformblack sandstone at the base of the member, or to rounded,reworked clasts of this black sandstone found �0.5–2m above(Supplementary Fig. S1). Their syngenicity is constrained by field,petrographic and geochemical data. First-order Raman spectrafrom the microfossils possess disordered ‘D1’ and ordered ‘G’carbon peaks (Supplementary Fig. S4) with D1/G peak heights andareas consistent with thermally mature disordered carbonaceousmaterial that has experienced approximately lower greenschistfacies metamorphism18. High-resolution transmission electronmicroscopy (HRTEM) reinforces the Raman data, revealing mostlydisordered carbon plus small domains where ordered lattice fringeshave the 0.34 nm interplane spacing of graphite (see ref. 19).Together, these data rule out a post-metamorphic origin forthe candidate microfossils. However, as the SPF experiencedgreenschist facies metamorphism over an extended time period20,these data cannot prove a syn-depositional age for themicrofossils.

Instead, syngenicity is confirmed by the spatial occurrence ofthe microfossils and the nature of silica cementation. Microfossilsare restricted either to beds of carbonaceous/pyritic sandstone orto reworked intraclasts eroded from that lithology; they are absentfrom surrounding pale sandstone that lacks carbon and pyrite(Supplementary Fig. S5). Both the bedded black sandstone andthe black clasts have similar successions of quartz cements andsimilar distributions of microfossils within those cements. At leasttwo generations of microfossils are present; an earlier generation,found within the earliest isopachous silica cements, coating sandgrains directly (Fig. 1c,f,i,j); and a later generation found withinremaining pore spaces, in places coexistent with meniscus anddripstone silica cements (Supplementary Fig. S3). The presence ofcemented intraclasts requires those cements to have formed withinreach of erosion close to the sediment–water interface, and the high‘minus-cement’ porosities indicate all inter-granular cements werein place before significant compaction took place21. Given that thesandstonewas buried by�20mof carbonate sediment by 3,350Myrago and�2 km of basalt by 3,325Myr ago (ref. 20), the microfossilsand their enclosing cements must be >3,350Myr in age. No

NATURE GEOSCIENCE | ADVANCE ONLINE PUBLICATION | www.nature.com/naturegeoscience 1

LETTERS NATURE GEOSCIENCE DOI: 10.1038/NGEO1238

Silica cement

Detrital quartz grain

Detrital quartz grain

20 µm

20 µm

20 µm 10 µm

10 µm

10 µm

10 µm

10 µm

50 µm

a

g h

c eb

d

ij

f

10 µm

Figure 1 | Examples of spheroidal/ellipsoidal microfossils from the SPF (samples SP9D2, SPE1, SPV3a–c). a,b,e, Clusters of cells, some showing cell wallrupturing (arrows in a,b), folding or invagination (arrow in e). c,d,h, Chains of cells with cellular divisions (arrows). f,i–j, Cells attached to detrital quartzgrains, exhibiting cell wall rupturing and putative escape of cell contents (arrow in f), preferred alignment of cells parallel to the surface of the quartz grain(arrows in i), and constriction or folding between two compartments (arrow in j). g, Large cellular compartment with folded walls (arrows).

evidence exists for the later introduction of carbonaceous materialbecause the small silica-filled cracks are not lined with carbon.Early silica cementation would have been facilitated by the highsilica content of the Archaean oceans, and potentially enhanced bylocal hot-spring activity (compare with Gunflint biota preservationstyle22). The ultrastructure of themicrofossil walls (Fig. 3b) suggeststhree-dimensional permineralization by nano-grains of silica, andthat small gaps within microfossil walls (Fig. 3a,b) probably aidedthe permeation of silica-rich fluids into the microfossil interiors toprecipitatemicro-quartz (comparewithGunflint biota22; Fig. 3c).

Determining the biogenicity of putative Archaeanmicrofossils isnotoriously difficult23. No single line of evidence provides adequateproof of biological processing, instead, multiple lines of mutuallysupportingmorphological and geochemical evidence are required.

The SPF microfossils exhibit distinct cell-like morphologies.Co-occurring clusters of hollow spheroids (compare with coccoidalcells), chains of hollow ellipsoids (compare with cell filaments;Fig. 1), and hollow cylindrical tubes (compare with sheaths; Fig. 2)are directly comparable with extant and fossil prokaryote assem-blages. The size range is also typical of such assemblages, with smallspheres and ellipsoids 5–25 µm in diameter, rare examples (<10)of larger cellular envelopes up to 80 µm in diameter, and tubes7–20 µm across (see ref. 24). Frequency size distributions from sin-gle petrographic thin sections (Supplementary Fig. S6) have smallstandard deviations and closely resemble those from younger mi-crofossil assemblages (for example, Gunflint Formation)25. In con-trast, abiotic artefacts (Supplementary Fig. S6e) have much largerstandard deviations andmorphotypes that grade into one another25.Examination of the microfossil walls in three dimensions (Supple-mentary Fig. S7a,b) shows their uniform thickness (mean=658 nm;� = 109 nm; n= 32), hollow interior, and partially preserved septaseparating adjacent cellular compartments. This contrasts with theextremely variable thickness and dispersed particulate nature ofnon-cellular carbonaceous coatings of silica grains or botryoids(mean=3,700 nm;� =1,900 nm;n=8; Supplementary Fig. S7c,d).

The SPF microfossils also demonstrate biological behaviour.The candidate fossils are abundant (100+ cells in some thinsections) but have an environmental restriction to carbonaceoussandstones rich in pyrite. They do not occur randomly, but as

clusters and chains (Fig. 1) comparable to extant prokaryote cellcolonies. Putative carbonaceous septa arise from constrictionsbetween adjacent cellular compartments (Fig. 1d,e,j), althoughsimilar features could also be caused by folding of microfossilwalls. Candidate cells often adhere to the detrital quartz grainsof their host sediment (Fig. 1f,i,j), similar to biofilms on sandgrains in modern siliciclastic settings26. The hollow tubes can beclosely packed together (Fig. 2b,d,e),much like the sheaths of extantprokaryotes within a modern biofilm. Finally, the microfossilsexhibit biological taphonomic degradational features, includingsplitting and folding of walls (Figs 1a,b,f,g, 2c) and the escape ofcellular contents (Figs 1a,f, 4d).

Biology-like morphology and behaviour can seldom completelyexclude an origin from abiotic artefacts, because abiotic carbona-ceous coatings of grain margins may mimic cellular morphology25.However, for the SPF microfossils, TEM mapping of ultrathinsections (Fig. 3) reveals carbonaceous walls set within micro-grainsof quartz, with the carbon restricted to distinct domains of a curved,semi-continuous nature that match optical images of cell walls(Figs 1f, 3b). Moreover, the hollow cell interiors contain multiplemicro-grains of quartz (Fig. 3a), but these are never coated withcarbon. Notably, identical three-dimensional microfossil preser-vation is found in the 1,878Myr-old Gunflint Formation22 wherebiogenicity is universally accepted. A Gunflint coccoidal microfossil(Fig. 3c) shares several features with our microfossils: (1) a curveddomain with semi-continuous carbon making up the microfossilwall; (2) blocky micro-quartz without carbon coatings comprisingthe interior of the hollow microfossil; (3) sub-micrometre quartzgrains intermixedwith carbon in themicrofossil wall domain.

The biogenicity of the SPF microfossils is further strengthenedby geochemical signals consistent with altered biological remains.Laser Raman, NanoSIMS and TEM of multiple examples of SPFmicrofossils all confirm a one-to-one correlation between carbona-ceous chemistry and candidate microfossil walls (Figs 3 and 4;Supplementary Fig. S4). The carbon has a thermally mature butdisordered structure (Supplementary Fig. S4), consistent with a bi-ological precursor but inconsistent with abiotic graphite. Nitrogen,a critical biological marker element, co-occurs with carbon withinthe microfossil walls (Supplementary Fig. S8). Sulphur, another

2 NATURE GEOSCIENCE | ADVANCE ONLINE PUBLICATION | www.nature.com/naturegeoscience

Page 23: Introduction to Microbial Genomics - CBS€¦ · Center&for&Biological&Sequence&Analysis Department&of&Systems&Biology Introduction to Microbial Genomics Alignments & sequence information

4 June, 2014 23DTU course 27105 Comparative Genomics