View
223
Download
0
Tags:
Embed Size (px)
Citation preview
PhylogeniesPhylogenies
Preliminaries
Distance-based methods
Parsimony Methods
CS/BIO 271 - Introduction to Bioinformatics 2
Phylogenetic TreesPhylogenetic Trees Hypothesis about the relationship between
organisms Can be rooted or unrooted
A B C D E
A B
C
D
E
Tim
e
Root
CS/BIO 271 - Introduction to Bioinformatics 3
Tree proliferationTree proliferation
!22
!322
n
nN
nR
!32
!523
n
nN
nU
Species Number of Rooted Trees Number of Unrooted Trees
2 1 1
3 3 1
4 15 3
5 105 15
6 34,459,425 2,027,025
7 213,458,046,767,875 7,905,853,580,625
8 8,200,794,532,637,891,559,375 221,643,095,476,699,771,875
CS/BIO 271 - Introduction to Bioinformatics 4
Molecular phylogeneticsMolecular phylogenetics Specific genomic
sequence variations (alleles) are much more reliable than phenotypic characteristics
More than one gene should be considered
CS/BIO 271 - Introduction to Bioinformatics 5
An ongoing didacticAn ongoing didactic Pheneticists tend to prefer distance based
metrics, as they emphasize relationships among data sets, rather than the paths they have taken to arrive at their current states.
Cladists are generally more interested in evolutionary pathways, and tend to prefer more evolutionarily based approaches such as maximum parsimony.
CS/BIO 271 - Introduction to Bioinformatics 6
Distance matrix methodsDistance matrix methods
Species A B C D
B 9 – – –
C 8 11 – –
D 12 15 10 –
E 15 18 13 5
CS/BIO 271 - Introduction to Bioinformatics 7
UPGMAUPGMA Similar to average-link clustering Merge the closest two groups
• Replace the distances for the new, merged group with the average of the distance for the previous two groups
Repeat until all species are joined
CS/BIO 271 - Introduction to Bioinformatics 8
UPGMA Step 1UPGMA Step 1
Species A B C D
B 9 – – –
C 8 11 – –
D 12 15 10 –
E 15 18 13 5
Merge D & E
D E
Species A B C
B 9 – –
C 8 11 –
DE 13.5 16.5 11.5
CS/BIO 271 - Introduction to Bioinformatics 9
UPGMA Step 2UPGMA Step 2
Merge A & C
D E
Species A B C
B 9 – –
C 8 11 –
DE 13.5 16.5 11.5
A C
Species B AC
AC 10 –
DE 16.5 12.5
CS/BIO 271 - Introduction to Bioinformatics 10
UPGMA Steps 3 & 4UPGMA Steps 3 & 4
Merge B & AC
D EA C
Species B AC
AC 10 –
DE 16.5 12.5
B
Merge ABC & DE
D EA C B
(((A,C)B)(D,E))
CS/BIO 271 - Introduction to Bioinformatics 11
Parsimony approachesParsimony approaches Belong to the broader class of character based
methods of phylogenetics Emphasize simpler, and thus more likely
evolutionary pathways
I: GCGGACGII: GTGGACG
C T
I II
(C or T)
C T
I II
A
(C or T)
CS/BIO 271 - Introduction to Bioinformatics 12
Informative and uninformative sitesInformative and uninformative sitesPosition
Seq 1 2 3 4 5 6
1 G G G G G G
2 G G G A G T
3 G G A T A G
4 G A T C A T
For positions 5 & 6, it is possible to select more parsimonious trees – those that invoke less substitutions.
For positions 5 & 6, it is possible to select more parsimonious trees – those that invoke less substitutions.
CS/BIO 271 - Introduction to Bioinformatics 13
Parsimony methodsParsimony methods Enumerate all possible trees Note the number of substitutions events
invoked by each possible tree• Can be weighted by transition/transversion
probabilities, etc.
Select the most parsimonious
CS/BIO 271 - Introduction to Bioinformatics 14
Branch and Bound methodsBranch and Bound methods Key problem – number of possible trees grows
enormous as the number of species gets large Branch and bound – a technique that allows
large numbers of candidate trees to be rapidly disregarded
Requires a “good guess” at the cost of the best tree
CS/BIO 271 - Introduction to Bioinformatics 15
Branch and Bound for TSPBranch and Bound for TSP Find a minimum cost
round-trip path that visits each intermediate city exactly once
NP-complete Greedy approach:
A,G,E,F,B,D,C,A= 251
AC
F
E
D
G
B
93
46
20
35
68
1257 31
15
82
17
8259
CS/BIO 271 - Introduction to Bioinformatics 16
Search all possible pathsSearch all possible pathsA
C
F
E
D
G
B
93
46
20
35
68
1257 31
15
82
17
8259
AC
F
E
D
G
B
93
46
20
35
68
1257 31
15
82
17
8259
All paths
AG (20) AB (46) AC (93)
AGF (88) AGE (55)
AGFB AGFE AGFC
ACB (175) ACD ACF
ACBE (257)
Best estimate: 251
CS/BIO 271 - Introduction to Bioinformatics 17
Parsimony – Branch and BoundParsimony – Branch and Bound Use the UPGMA tree for an initial best estimate
of the minimum cost (most parsimonious) tree Use branch and bound to explore all feasible
trees Replace the best estimate as better trees are
found Choose the most parsimonious
CS/BIO 271 - Introduction to Bioinformatics 18
Parsimony exampleParsimony examplePosition
Seq 1 2 3 4 5 6
1 G G G G G G
2 G G G A G T
3 G G A T A G
4 G A T C A TAll trees
(1,2) [0] (1,3) [1] (1,4) [1]
Position 5:
Etc.