Upload
gerodi
View
31
Download
0
Embed Size (px)
DESCRIPTION
MCB 372 #12: Tree, Quartets and Supermatrix Approaches. J. Peter Gogarten University of Connecticut Dept. of Molecular and Cell Biology. Collaborators: Olga Zhaxybayeva (Dalhousie) Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre (UConn) Greg Fournier (UConn). - PowerPoint PPT Presentation
Citation preview
MCB 372 #12: Tree, Quartets and Supermatrix Approaches
Collaborators:
Olga Zhaxybayeva (Dalhousie)
Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre (UConn) Greg Fournier (UConn)
Funded through the NASA Exobiology and AISR Programs, and NSF Microbial Genetics
Edvard Munch, The Dance of Life (1900)
J. Peter Gogarten
University of ConnecticutDept. of Molecular and Cell Biology
In the Felsenstein Zone“long branches attract”
0.8
0.1
0.8
0.10.1
C
B
D
A
A B
C D“true” tree
inferred treeA
B
C
D
0
10
20
30
40
50
60
70
80
90
100
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Long Branch Length (substitutions per site)
Reconstructions
LBR
Correct
LBA
Protpars reconstructions
B
A C
D
D
A C
B
ML reconstructions with alignment step
((A:x,B:0.1):0.1,C:0.1,D:x) With Alignment
0
10
20
30
40
50
60
70
80
90
100
0.1 5.1 10.1 15.1 20.1 25.1 30.1 35.1 40.1 45.1
LBA LBR Tie (Non-original) Tie (Original) Original
long branch attraction artifact
100% bootstrap support for bipartition (AD)(CB)
the two longest branches join together
What could you do to investigate if this is a possible explanation? use only slow positions, use an algorithm that corrects for ASRV
seq. from B
seq. from A
seq. from Cseq. from D
seq. from B
seq. from Aseq. from C
seq. from D
True Tree:
Consensus of all trees from all bootstrap samples
2 P_mobilis3 Thermosipho1 F_nodosum4 T_lettingae5 T_maritima7 T_RQ26 T_petrophila
+----------------------------------2 P_mobilis | | +------6 T_petrophila | +134.0-| | +675.0-| +------7 T_RQ2 | | | | +-60.0-| +-------------5 T_maritima | | | +------| +--------------------4 T_lettingae | | +------3 Thermosipho +--------------206.0-| +------1 F_nodosum
Consensus of all consensus trees
2 P_mobilis3 Thermosipho1 F_nodosum4 T_lettingae5 T_maritima7 T_RQ26 T_petrophila
+------6 T_petrophila +318.0-| +745.0-| +------7T_RQ2 | | +330.0-| +-------------5T_maritima | | +------| +--------------------4 T_lettingae | | | | +------3 Thermosipho | +--------------562.0-| | +------1 F_nodosum | +----------------------------------2 P_mobilis
Consensus of all collapsed (<95%) consensus trees
+----------------------------------2 P_mobilis | | +------6 T_petrophila | +134.0-| | +675.0-| +------7 T_RQ2 | | | | +-60.0-| +-------------5 T_maritima | | | +------| +--------------------4 T_lettingae | | +------3 Thermosipho +--------------206.0-| +------1 F_nodosum
If you still have difficulties with tree do the tree tests at
http://www.sciencemag.org/cgi/content/full/310/5750/979/DC1
METAGENOME
Welch et al, 2002
A.W
.F. E
dwar
ds 1
998
Edw
ards
-Ve
nn c
ogw
hee
l
core
Strain-specific
Pan-genome+ +
+
Genomic Islands
Binnewies, Motro et al., Funct. Integr. Genomics (2006) 6: 165–185
Gene frequency in a typical genome
-Pick a random gene from any of the 293 genomes
-Search in how many genomes this gene is present
-Sampling of 15,000 genes
F(x) = sum [ An*exp(Kn*x)]
(Character genes)(Accessory pool) (Extended Core)
Kézdy-Swinbourne Plot If f(x)=K+A • exp(-k•x), then
f(x+∆x)=K+A • exp(-k•(x+∆x)).
Through elimination of A:
f(x+∆x)=exp(-k • ∆x) • f(x) + K’
And for x, f(x)K, f(x+∆x)K
0
50
100
150
200
250
300
350
400
450
500
-100 0 100 200 300 400 500 600
delta x = 10
2030405060708090100110120130
Novel genes after looking in x genomes
No
vel g
en
es
aft
er
loo
kin
g in
x +
∆x
ge
no
me
s
only values with x ≥ 80 genomes were included
Even after comparing to a very large (infinite) number of bacterial genomes, on average, each new genome will contain about 230 genes that do not have a homolog in the other genomes.
~230 novel genes
per genome
ca.70%
ca. 5%
ca. 25%
Gene frequency in individual genomes
Extended Core
Character Genes
Accessory Pool(mainly genes acquired form the mobilome)
Approximate number of genes sampled in 200 bacterial genomes:
25,160 core genes453,781 extended core genes156,259 accessory genes
The Phylogenetic Position of Thermotoga
(a) concordant genes, (b) all genes & according to 16S (c) according to phylogenetically discordant genes
Gophna, U., Doolittle, W.F. & Charlebois, R.L.:
Weighted genome trees: refinements and applications. J. Bacteriol. (2005)
From
: D
elsuc F, B
rinkmann H
, Philippe H
.P
hylogenomics and the reconstruction of the tree of life.
Nat R
ev Genet. 2005 M
ay;6(5):361-75.
QuickTime™ and a decompressor
are needed to see this picture.
Supertree vs. Supermatrix
Schematic of MRP supertree (left) and parsimony supermatrix (right) approaches to the analysis of three data sets. Clade C+D is supported by all three separate data sets, but not by the supermatrix. Synapomorphies for clade C+D are highlighted in pink. Clade A+B+C is not supported by separate analyses of the three data sets, but is supported by the supermatrix. Synapomorphies for clade A+B+C are highlighted in blue. E is the outgroup used to root the tree.
QuickTime™ and a decompressor
are needed to see this picture.
From
: A
lan de Queiroz John G
atesy: T
he supermatrix approach to system
aticsT
rends Ecol E
vol. 2007 Jan;22(1):34-41
A) Template tree
B) Generate 100 datasets using Evolver with certain amount of HGTs
C) Calculate 1 tree using the concatenated dataset or 100 individual trees
D) Calculate Quartet based tree using Quartet Suite Repeated 100 times…
Supermatrix versus Quartet based Supertree
inset: simulated phylogeny