51
Processing & Processing & Testing Testing Phylogenetic Phylogenetic Trees Trees

Processing & Testing Phylogenetic Trees

  • Upload
    thyra

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Processing & Testing Phylogenetic Trees. Rooting. Rooting 1. Outgroup Rooting: Based on external information. 2. Midpoint Rooting: Direct a posteriori use of the ultrametricity assumption. - PowerPoint PPT Presentation

Citation preview

Page 1: Processing & Testing Phylogenetic Trees

Processing & Testing Processing & Testing Phylogenetic TreesPhylogenetic Trees

Page 2: Processing & Testing Phylogenetic Trees

RootingRooting

Page 3: Processing & Testing Phylogenetic Trees
Page 4: Processing & Testing Phylogenetic Trees
Page 5: Processing & Testing Phylogenetic Trees
Page 6: Processing & Testing Phylogenetic Trees

Rooting1. Outgroup Rooting: Based on external information.

2. Midpoint Rooting: Direct a posteriori use of the ultrametricity assumption.

3. Largest-Genetic-Variability-Group Rooting: Indirect a posteriori use of the ultrametricity assumption.

Page 7: Processing & Testing Phylogenetic Trees

Rooted tree

Rooting with outgroupRooting with outgroup

plant

plant

plant

animal

animal

animal

animal

bacterial outgroup

root

animalanimal

animalanimal

Unrooted tree

plantplantplant

Monophyletic group

Monophyleticgroup

Page 8: Processing & Testing Phylogenetic Trees

Midpoint rooting

Page 9: Processing & Testing Phylogenetic Trees

Largest variation = Most ancientLargest variation = Most ancient

Page 10: Processing & Testing Phylogenetic Trees

Estimating Branch LengthEstimating Branch Length

From pairwise distances to branch lengths: From pairwise distances to branch lengths: maximum likelihood, least squares, etc.maximum likelihood, least squares, etc.

Page 11: Processing & Testing Phylogenetic Trees

Estimating Divergence TimesEstimating Divergence Times

Page 12: Processing & Testing Phylogenetic Trees

Topological comparisonsTopological comparisons

Page 13: Processing & Testing Phylogenetic Trees

Penny and Hendy's topological distance (dT)

A commonly used measure of dissimilarity between two tree topologies. The measure is based on tree partitioning.

dT = 2c

c = the number of partitions resulting in different divisions of the OTUs in the two tree topologies under consideration.

Page 14: Processing & Testing Phylogenetic Trees

Trees inferred from the analysis of a Trees inferred from the analysis of a particular data set are called particular data set are called fundamental treesfundamental trees, i.e., they summarize , i.e., they summarize the phylogenetic information in a data the phylogenetic information in a data set. set.

Page 15: Processing & Testing Phylogenetic Trees

Sometimes we have many Sometimes we have many fundamentalfundamental treestrees pertaining to the same question. pertaining to the same question. For example, we may have trees derived For example, we may have trees derived from different genes for the same taxa, from different genes for the same taxa, or trees derived through different or trees derived through different methods, or different runs in a methods, or different runs in a simulation. In these cases we need to be simulation. In these cases we need to be able to summarize the data.able to summarize the data.

Page 16: Processing & Testing Phylogenetic Trees

Consensus treesConsensus trees are trees that are trees that summarize the phylogenetic information summarize the phylogenetic information in a set of fundamental trees.in a set of fundamental trees.

Page 17: Processing & Testing Phylogenetic Trees

In a strict consensus treestrict consensus tree, all conflicting branching patterns are collapsed into multifurcations.

In a X% majority-rule consensus trees majority-rule consensus trees, a branching pattern that occurs with a frequency of X% or more is adopted.

When X = 100%, the majority-rule consensus tree will be identical with the strict consensus tree.

Page 18: Processing & Testing Phylogenetic Trees
Page 19: Processing & Testing Phylogenetic Trees

A tree is an evolutionary A tree is an evolutionary hypothesishypothesis

Page 20: Processing & Testing Phylogenetic Trees

How do we know that the inferred tree is correct?

Page 21: Processing & Testing Phylogenetic Trees

Joseph H. Camin (1922-1979)

Page 22: Processing & Testing Phylogenetic Trees
Page 23: Processing & Testing Phylogenetic Trees
Page 24: Processing & Testing Phylogenetic Trees

Assessing tree reliabilityAssessing tree reliabilityPhylogenetic reconstruction is a problem of statistical inference. One must assess the reliability of the inferred phylogeny and its component parts.

Questions:

(1) how reliable is the tree?(2) which parts of the tree are reliable? (3) is this tree significantly better than another one?

Page 25: Processing & Testing Phylogenetic Trees

BootstrappingBootstrapping• A statistical A statistical

technique that uses technique that uses intensive random intensive random resampling of data to resampling of data to estimate a statistic estimate a statistic whose underlying whose underlying distribution is distribution is unknownunknown..

Page 26: Processing & Testing Phylogenetic Trees

• Characters are resampled with replacement Characters are resampled with replacement to create many bootstrap replicate data sets to create many bootstrap replicate data sets (pseudosamples)(pseudosamples)

• Each bootstrap replicate data set is Each bootstrap replicate data set is analyzedanalyzed

• Frequency of occurrence of a group Frequency of occurrence of a group (bootstrap proportions) is a measure of (bootstrap proportions) is a measure of support for the groupsupport for the group

BootstrappingBootstrapping

Page 27: Processing & Testing Phylogenetic Trees
Page 28: Processing & Testing Phylogenetic Trees

Bootstrapping - an exampleBootstrapping - an exampleCiliate SSUrDNA - parsimony bootstrap

123456789 Freq-----------------.**...... 100.00...**.... 100.00.....**.. 100.00...****.. 100.00...****** 95.50.......** 84.33...****.* 11.83...*****. 3.83.*******. 2.50.**....*. 1.00.**.....* 1.00

Partition TableOchromonas (1)

Symbiodinium (2)

Prorocentrum (3)

Euplotes (8)

Tetrahymena (9)

Loxodes (4)

Tracheloraphis (5)

Spirostomum (6)

Gruberia (7)

100

96

84

100

100

100

Page 29: Processing & Testing Phylogenetic Trees

Reduction of a phylogenetic tree by the collapsing of internal branches associated with bootstrap values that are lower than a critical value (C).

(a) Gene tree for -tubulin (b) C = 50% (c) C = 90%

Page 30: Processing & Testing Phylogenetic Trees

• All these tests use the null All these tests use the null hypothesis that the differences hypothesis that the differences between two trees (A and B) are between two trees (A and B) are no greater than expected from no greater than expected from the sampling errorthe sampling error

Tests for two competing trees

Page 31: Processing & Testing Phylogenetic Trees

Under the null hypothesis the mean of the differences in parsimony steps at each site is expected to be zero.

Distribution of differences at each site0

Favoring tree A Favoring tree B

Page 32: Processing & Testing Phylogenetic Trees

Tests for two competing treesA parametric test for comparing two trees under the assumption that all nucleotide sites are independent and equivalent.

Di = difference in the minimum number of substitutions between the two trees at the ith informative site.

D = Di.

n = number of informative sites.

V(D) = sample variance of DV(D) n

n 1 Di 1n Dkk1

n

i1

n

2

Page 33: Processing & Testing Phylogenetic Trees

The null hypothesis, D = 0, is tested with the Student paired t-test with n – 1 degrees of freedom:

t D/nV(D) n

Page 34: Processing & Testing Phylogenetic Trees

Likelihood Ratio TestLikelihood Ratio Test• Likelihood of Hypothesis 1 = Likelihood of Hypothesis 1 = LL11

• Likelihood of Hypothesis 2 = Likelihood of Hypothesis 2 = LL22

= 2(ln = 2(ln LL1 1 – ln– ln LL22))• Compare Compare to to 22 distribution distribution

or to a simulated distribution.or to a simulated distribution.

Page 35: Processing & Testing Phylogenetic Trees

Reliability of Phylogenetic MethodsReliability of Phylogenetic Methods

• Phylogenetic methods can also be evaluated in terms Phylogenetic methods can also be evaluated in terms of their general performance, particularly their:of their general performance, particularly their:

consistency - approach the truth with more dataconsistency - approach the truth with more dataefficiency - how quickly can they handle how much dataefficiency - how quickly can they handle how much datarobustness - how sensitive to violations of assumptionsrobustness - how sensitive to violations of assumptions

• Studies of these properties can be analytical or by Studies of these properties can be analytical or by simulationsimulation

Page 36: Processing & Testing Phylogenetic Trees

Problems with long branches

With long branches most methods may yield erroneous trees. For example, the maximum-parsimony method tends to cluster long branches together. This phenomenon is called long-branch attraction or the Felsenstein zone

Page 37: Processing & Testing Phylogenetic Trees
Page 38: Processing & Testing Phylogenetic Trees
Page 39: Processing & Testing Phylogenetic Trees
Page 40: Processing & Testing Phylogenetic Trees

A

B

C

DTRUE TREE WRONG TREE

A B

C D

ppq

qqp >> qp >> q

Page 41: Processing & Testing Phylogenetic Trees

Chaperonin Maximum Likelihood TreeChaperonin Maximum Likelihood Tree (Roger et al. 1998. PNAS 95: 229)(Roger et al. 1998. PNAS 95: 229)

Longest branches

Page 42: Processing & Testing Phylogenetic Trees

Trees: Pectinate (a)

versus Symmetrical (b)

Page 43: Processing & Testing Phylogenetic Trees

RecommendationsRecommendations

Page 44: Processing & Testing Phylogenetic Trees

Avoid the “Black Box”Avoid the “Black Box”• Researchers invest considerable resources in Researchers invest considerable resources in

producing molecular sequence data.producing molecular sequence data.• They should also invest the time and effort They should also invest the time and effort

needed to get the most out of their data.needed to get the most out of their data.• Modern phylogenetic software makes it easy to Modern phylogenetic software makes it easy to

produce trees from aligned sequences, but produce trees from aligned sequences, but phylogenetic inference should not be treated as phylogenetic inference should not be treated as a “black box.”a “black box.”

Page 45: Processing & Testing Phylogenetic Trees

Choices are UnavoidableChoices are Unavoidable• There are many phylogenetic methods. There are many phylogenetic methods. • Thus, the investigator is confronted with unavoidable Thus, the investigator is confronted with unavoidable

choices.choices.• Not all methods are equally good for all data.Not all methods are equally good for all data.• An understanding of the basic properties of the An understanding of the basic properties of the

various phylogenetic methods is essential for various phylogenetic methods is essential for informed choice of method and interpretation of informed choice of method and interpretation of results.results.

Page 46: Processing & Testing Phylogenetic Trees

Data are not PerfectData are not Perfect• Most data includes misleading evidence, and we need Most data includes misleading evidence, and we need

to have a cautious attitude to the quality of data and to have a cautious attitude to the quality of data and trees.trees.

• Data may have both systematic biases and unbiased Data may have both systematic biases and unbiased noise that affect our chances of getting the correct noise that affect our chances of getting the correct tree tree

• Different methods may be more or less sensitive to Different methods may be more or less sensitive to some problems.some problems.

Page 47: Processing & Testing Phylogenetic Trees

AlignmentAlignment• The data determine the results.The data determine the results.• The alignment determines the data.The alignment determines the data.• Be aware of alignment artefacts. Be aware of alignment artefacts. • If using multiple alignment software, explore If using multiple alignment software, explore

the sensitivity of the alignment to the the sensitivity of the alignment to the parameters used. parameters used.

• Eliminate regions that cannot be aligned with Eliminate regions that cannot be aligned with confidence.confidence.

Page 48: Processing & Testing Phylogenetic Trees

ModelsModels

• The data should fit the assumptions of the The data should fit the assumptions of the model.model.

• Explore the data for potential biases and Explore the data for potential biases and deviations from the assumptions of the model.deviations from the assumptions of the model.

Page 49: Processing & Testing Phylogenetic Trees

Choice of ModelsChoice of Models• Complex models may better approximate the Complex models may better approximate the

evolution of the sequences and, therefore, might evolution of the sequences and, therefore, might be expected to give more accurate results.be expected to give more accurate results.

• More complex models require the estimation of More complex models require the estimation of more parameters each of which is subject to more parameters each of which is subject to some error.some error.

• There is a trade-off between more realistic and There is a trade-off between more realistic and complex models and their power to complex models and their power to discriminate between alternative hypotheses.discriminate between alternative hypotheses.

Page 50: Processing & Testing Phylogenetic Trees

Not all methods are Not all methods are good for all problems.good for all problems.

Page 51: Processing & Testing Phylogenetic Trees