26
The Genome Access Course Phylogenetic Analysis

The Genome Access Course Phylogenetic Analysis

  • Upload
    nerita

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

The Genome Access Course Phylogenetic Analysis. Phylogenetics. Developed by Willi Henning (Grundzüge einer Theorie der Phylogenetischen Systematik, 1950; Phylogenetic Systematics, 1966). What is the ancestral sequence?. pfeffer pepper (pf/p)e(ff/pp)er. Evolutionary Trees. - PowerPoint PPT Presentation

Citation preview

Page 1: The Genome Access Course Phylogenetic Analysis

TheGenomeAccessCourse

Phylogenetic Analysis

Page 2: The Genome Access Course Phylogenetic Analysis

Phylogenetics

•Developed by Willi Henning (Grundzüge einer Theorie der Phylogenetischen Systematik, 1950; Phylogenetic Systematics, 1966)

Page 3: The Genome Access Course Phylogenetic Analysis

What is the ancestral sequence?

• pfeffer

• pepper

• (pf/p)e(ff/pp)er

Page 4: The Genome Access Course Phylogenetic Analysis

Evolutionary Trees

• A tree is a connected, acyclic 2D graph

• Leaf: Taxon

• Node: Vertex

• Branch: Edge

• Tree length = sum of all branch lengths

• Phylogenetic trees are binary trees

Page 5: The Genome Access Course Phylogenetic Analysis

A Generic Tree

Page 6: The Genome Access Course Phylogenetic Analysis

Evolutionary Trees

• Rooted– common ancestor– unique path to any leaf– directed

• Unrooted– root could be placed anywhere– fewer possible than rooted

Page 7: The Genome Access Course Phylogenetic Analysis

Rooted Treegenerated by DRAWGRAM (PHYLIP)

Page 8: The Genome Access Course Phylogenetic Analysis

Unrooted Treegenerated by DRAWTREE (PHYLIP)

Page 9: The Genome Access Course Phylogenetic Analysis

Possible Evolutionary Trees

Taxa (n) Rooted(2n-3)!/(2n-2(n-2)!)

Unrooted(2n-5)!/(2n-3(n-3)!)

2 1 1

3 3 1

4 15 3

5 105 15

6 954 105

7 10395 954

8 135135 10395

9 2027025 135135

10 34459425 2027025

Page 10: The Genome Access Course Phylogenetic Analysis

Genes vs. Species

• Sequences show gene relationships, but phylogenetic histories may be different for gene and species

• Genes evolve at different speeds

• Horizontal gene transfer

Page 11: The Genome Access Course Phylogenetic Analysis

Methods for Phylogenetic Analysis

• Character-State– Maximum Parsimony– Maximum Likelihood

• Genetic Distance– Fitch & Margoliash– Neighbor-Joining– Unweighted Pair Group

Page 12: The Genome Access Course Phylogenetic Analysis

Phylogenetic Software

• PHYLIP

• PAUP (Available in GCG)

• TREE-PUZZLE

• PhyloBLAST

• Felsenstein maintains an extensive list of programs on the PHYLIP site

Page 13: The Genome Access Course Phylogenetic Analysis

PHYLIP Programs

• dnapars/protpars

• dnadist/protdist

• dnaml (use fastDNAml instead)

• neighbor

• fitch/kitsch

• drawtree/drawgram

Page 14: The Genome Access Course Phylogenetic Analysis

Maximum Parsimony

• Most common method• Allows use of all evolutionary information• Build and score all possible trees• Each node is a transformation in a character

state• Minimize treelength• Best tree requires the fewest changes to

derive all sequences

Page 15: The Genome Access Course Phylogenetic Analysis

Which is the more parsimonious tree?

9 Node Crossings

8 Node Crossings3 Nodes

3 Nodes

Page 16: The Genome Access Course Phylogenetic Analysis

• Reconstruction using an explicit evolutionary model

• Tree is calculated separately for each nucleotide site. The product of the likelihoods for each site provides the overall likelihood of the observed data.

• Demanding computationally

• Slowest method

• Use to test (or improve) an existing tree

Maximum Likelihood

Page 17: The Genome Access Course Phylogenetic Analysis

Clustering Algorithms

• Use distances to calculate phylogenetic trees• Trees are based on the relative numbers of

similarities and differences between sequences

• A distance matrix is constructed by computing pairwise distances for all sequences

• Clustering links successively more distant taxa

Page 18: The Genome Access Course Phylogenetic Analysis

DNA Distances

• Distances between pairs of DNA sequences are relatively simple to compute as the sum of all base pair differences between the two sequences

• Can only work for pairs of sequences that are similar enough to be aligned

• All base changes are considered equal

• Insertion/deletions are generally given a larger weight than replacements (gap penalties).

• Possible to correct for multiple substitutions at a single site, which is common in distant relationships and for rapidly evolving sites.

Page 19: The Genome Access Course Phylogenetic Analysis

Amino Acid Distances

• More difficult to compute

• Substitutions have differing effects on structure

• Some substitutions require more than one DNA mutation

• Use replacement frequencies (PAM, BLOSUM)

Page 20: The Genome Access Course Phylogenetic Analysis

Fitch & Margoliash

• 3 sequences are combined at a time to define branches and calculate their length

• Additive branch lengths

• Accurate for short branches

Page 21: The Genome Access Course Phylogenetic Analysis

Neighbor Joining

• Most common method of tree construction

• Distance matrix adjusted for each taxon depending on its rate of evolution

• Good for simulation studies

• Most efficient computationally

Page 22: The Genome Access Course Phylogenetic Analysis

UPGMA – Unweighted Pair Group Methods Using Arithmetic Averages

• Simplest method

• Calculates branch lengths between most closely related sequences

• Averages distance to next sequence or cluster

• Predicts a position for the root

Page 23: The Genome Access Course Phylogenetic Analysis

Phylogenetic Complications

• Errors

• Loss of function

• Convergent evolution

• Lateral gene transfer

Page 24: The Genome Access Course Phylogenetic Analysis

Validation

• Use several different algorithms and data sets• NJ methods generate one tree, possibly supporting

a tree built by parsimony or maximum likelihood• Bootstrapping

– Perturb data and note effect on tree

– Repeat many times

– Unchanged ~90%, tree’s correctness is supported

Page 25: The Genome Access Course Phylogenetic Analysis

Are there bugs in our genome?

N-acetylneuraminate lyase

Page 26: The Genome Access Course Phylogenetic Analysis

The End