27
Building Phylogenies Parsimony 1

Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Embed Size (px)

Citation preview

Page 1: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Building Phylogenies

Parsimony 1

Page 2: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Methods

• Distance-based• Parsimony• Maximum likelihood

Page 3: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Note

• Some of the following figures come from:– [S05] Swofford

http://www.csit.fsu.edu/~swofford/bioinformatics_spring05

– [F05] Felsenstein http://evolution.gs.washington.edu/gs541/2005/

Page 4: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Parsimony methods

• Goal: Find the tree that allows evolution of the sequences with the fewest changes.

• This is called a most parsimonious (MP) tree

• Parsimony is implemented in PAUP* http://paup.csit.fsu.edu/

• Compatibility methods are closely related to parsimony: – Goal: Find tree that perfectly fits the most

characters.

Page 5: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Evolutionary Steps

G A

A G

G

Steps can have weights

Page 6: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Parsimony

a0111

ABCD

c0011

d0110

e0001

f1000

b0111

A B C D

f

a, b

dc

ed

Typically, each site is treated separately

Page 7: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Some numbers

Number of unrooted trees on n 2 species:

Un = (2n5)(2n7)(2n9) . . . (3)(1),

Number of rooted trees on n 3 species:

Rn = (2n5) Un

Page 8: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

The number of rooted trees

[F05]

Page 9: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Small versus Large Parsimony

• Parsimony score of a tree: The smallest (weighted) number of steps required by the tree

• (Large) Parsimony: Find the tree with the lowest parsimony score

• Small Parsimony: Given a tree, find its parsimony score

• Small parsimony is by far the easier problem. – Used to solve large parsimony

Page 10: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

A DNA data set

[F05]

Page 11: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

An example tree

[F05]

Page 12: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Most parsimonious states for site 1

Page 13: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Most parsimonious states for site 2

Page 14: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Most parsimonious states for site 3

Page 15: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Most parsimonious states for sites 4 and 5

Page 16: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Most parsimonious states for site 6

Page 17: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Evolutionary steps on tree

Only one choice of reconstruction at each site is shown9 steps in all

Page 18: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Algorithms for Small Parsimony

• Fitch’s algorithm: – Based on set operations– Evolutionary steps have same weight

• Sankoff’s algorithm:– Based on dynamic programming– Allows steps to have different weights

• Both algorithms compute the minimum (weighted) number of steps a tree requires at a given site.

Page 19: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Fitch’s Algorithm

• Each node v in tree has a set X(v)• If v is a leaf (tip), X(v) is the nucleotide

observed at v – if there is ambiguity, X(v) contains all

possible nucleotides at v

• If v is a node with descendants u and w, – Let Y X(u) X(w)– If Y make X(v) Y,– If Y make X(v) X(u)X(w) and count

one step.

Page 20: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Fitch’s Algorithm: Example

[F05]

Page 21: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Sankoff’s Algorithm

• Let cij be the cost of going from state i to state j.

• E.g., transitions (AG or CT) are more probable than transversions, so give lower weight to transitions

• Let Sv(k) be the smallest (weighted) number of steps needed to evolve the subtree at or above node v, given that node v is in state k.

Page 22: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Sankoff’s Algorithm• If v is a leaf (tip)

• If v is a node with descendants u and w

• The minimum number of (weighted) steps is

otherwise

state have) could (or has node if0)(

kvkSv

jSc iSckS wkjj

ukii

v minmin

kSS rootk

min*

Page 23: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Sankoff’s Algorithm: Example

Page 24: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Sankoff’s Algorithm: Traceback

Page 25: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Searching for an MP tree

• Exhaustive search (exact)• Branch-and-bound search (exact)• Heuristic search methods

– Stepwise addition– Branch swapping– Star decomposition

Page 26: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Homology, orthology, and paralogy

• Homology: Similarity attributed to descent from a common ancestor.

• Orthologous sequences: Homologous sequences in different species that arose from a common ancestral gene during speciation; may or may not be responsible for a similar function.

• Paralogous sequences: Homologous sequences within a single species that arose by gene duplication.

Page 27: Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood

Orthology and Paralogy

http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html