18
Phylogenies Phylogenies Preliminaries Distance-based methods Parsimony Methods

Phylogenies Preliminaries Distance-based methods Parsimony Methods

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Phylogenies Preliminaries Distance-based methods Parsimony Methods

PhylogeniesPhylogenies

Preliminaries

Distance-based methods

Parsimony Methods

Page 2: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 2

Phylogenetic TreesPhylogenetic Trees Hypothesis about the relationship between

organisms Can be rooted or unrooted

A B C D E

A B

C

D

E

Tim

e

Root

Page 3: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 3

Tree proliferationTree proliferation

!22

!322

n

nN

nR

!32

!523

n

nN

nU

Species Number of Rooted Trees Number of Unrooted Trees

2 1 1

3 3 1

4 15 3

5 105 15

6 34,459,425 2,027,025

7 213,458,046,767,875 7,905,853,580,625

8 8,200,794,532,637,891,559,375 221,643,095,476,699,771,875

Page 4: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 4

Molecular phylogeneticsMolecular phylogenetics Specific genomic

sequence variations (alleles) are much more reliable than phenotypic characteristics

More than one gene should be considered

Page 5: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 5

An ongoing didacticAn ongoing didactic Pheneticists tend to prefer distance based

metrics, as they emphasize relationships among data sets, rather than the paths they have taken to arrive at their current states.

Cladists are generally more interested in evolutionary pathways, and tend to prefer more evolutionarily based approaches such as maximum parsimony.

Page 6: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 6

Distance matrix methodsDistance matrix methods

Species A B C D

B 9 – – –

C 8 11 – –

D 12 15 10 –

E 15 18 13 5

Page 7: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 7

UPGMAUPGMA Similar to average-link clustering Merge the closest two groups

• Replace the distances for the new, merged group with the average of the distance for the previous two groups

Repeat until all species are joined

Page 8: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 8

UPGMA Step 1UPGMA Step 1

Species A B C D

B 9 – – –

C 8 11 – –

D 12 15 10 –

E 15 18 13 5

Merge D & E

D E

Species A B C

B 9 – –

C 8 11 –

DE 13.5 16.5 11.5

Page 9: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 9

UPGMA Step 2UPGMA Step 2

Merge A & C

D E

Species A B C

B 9 – –

C 8 11 –

DE 13.5 16.5 11.5

A C

Species B AC

AC 10 –

DE 16.5 12.5

Page 10: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 10

UPGMA Steps 3 & 4UPGMA Steps 3 & 4

Merge B & AC

D EA C

Species B AC

AC 10 –

DE 16.5 12.5

B

Merge ABC & DE

D EA C B

(((A,C)B)(D,E))

Page 11: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 11

Parsimony approachesParsimony approaches Belong to the broader class of character based

methods of phylogenetics Emphasize simpler, and thus more likely

evolutionary pathways

I: GCGGACGII: GTGGACG

C T

I II

(C or T)

C T

I II

A

(C or T)

Page 12: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 12

Informative and uninformative sitesInformative and uninformative sitesPosition

Seq 1 2 3 4 5 6

1 G G G G G G

2 G G G A G T

3 G G A T A G

4 G A T C A T

For positions 5 & 6, it is possible to select more parsimonious trees – those that invoke less substitutions.

For positions 5 & 6, it is possible to select more parsimonious trees – those that invoke less substitutions.

Page 13: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 13

Parsimony methodsParsimony methods Enumerate all possible trees Note the number of substitutions events

invoked by each possible tree• Can be weighted by transition/transversion

probabilities, etc.

Select the most parsimonious

Page 14: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 14

Branch and Bound methodsBranch and Bound methods Key problem – number of possible trees grows

enormous as the number of species gets large Branch and bound – a technique that allows

large numbers of candidate trees to be rapidly disregarded

Requires a “good guess” at the cost of the best tree

Page 15: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 15

Branch and Bound for TSPBranch and Bound for TSP Find a minimum cost

round-trip path that visits each intermediate city exactly once

NP-complete Greedy approach:

A,G,E,F,B,D,C,A= 251

AC

F

E

D

G

B

93

46

20

35

68

1257 31

15

82

17

8259

Page 16: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 16

Search all possible pathsSearch all possible pathsA

C

F

E

D

G

B

93

46

20

35

68

1257 31

15

82

17

8259

AC

F

E

D

G

B

93

46

20

35

68

1257 31

15

82

17

8259

All paths

AG (20) AB (46) AC (93)

AGF (88) AGE (55)

AGFB AGFE AGFC

ACB (175) ACD ACF

ACBE (257)

Best estimate: 251

Page 17: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 17

Parsimony – Branch and BoundParsimony – Branch and Bound Use the UPGMA tree for an initial best estimate

of the minimum cost (most parsimonious) tree Use branch and bound to explore all feasible

trees Replace the best estimate as better trees are

found Choose the most parsimonious

Page 18: Phylogenies Preliminaries Distance-based methods Parsimony Methods

CS/BIO 271 - Introduction to Bioinformatics 18

Parsimony exampleParsimony examplePosition

Seq 1 2 3 4 5 6

1 G G G G G G

2 G G G A G T

3 G G A T A G

4 G A T C A TAll trees

(1,2) [0] (1,3) [1] (1,4) [1]

Position 5:

Etc.