The Genome Access Course Phylogenetic Analysis

Preview:

DESCRIPTION

The Genome Access Course Phylogenetic Analysis. Phylogenetics. Developed by Willi Henning (Grundzüge einer Theorie der Phylogenetischen Systematik, 1950; Phylogenetic Systematics, 1966). What is the ancestral sequence?. pfeffer pepper (pf/p)e(ff/pp)er. Evolutionary Trees. - PowerPoint PPT Presentation

Citation preview

TheGenomeAccessCourse

Phylogenetic Analysis

Phylogenetics

•Developed by Willi Henning (Grundzüge einer Theorie der Phylogenetischen Systematik, 1950; Phylogenetic Systematics, 1966)

What is the ancestral sequence?

• pfeffer

• pepper

• (pf/p)e(ff/pp)er

Evolutionary Trees

• A tree is a connected, acyclic 2D graph

• Leaf: Taxon

• Node: Vertex

• Branch: Edge

• Tree length = sum of all branch lengths

• Phylogenetic trees are binary trees

A Generic Tree

Evolutionary Trees

• Rooted– common ancestor– unique path to any leaf– directed

• Unrooted– root could be placed anywhere– fewer possible than rooted

Rooted Treegenerated by DRAWGRAM (PHYLIP)

Unrooted Treegenerated by DRAWTREE (PHYLIP)

Possible Evolutionary Trees

Taxa (n) Rooted(2n-3)!/(2n-2(n-2)!)

Unrooted(2n-5)!/(2n-3(n-3)!)

2 1 1

3 3 1

4 15 3

5 105 15

6 954 105

7 10395 954

8 135135 10395

9 2027025 135135

10 34459425 2027025

Genes vs. Species

• Sequences show gene relationships, but phylogenetic histories may be different for gene and species

• Genes evolve at different speeds

• Horizontal gene transfer

Methods for Phylogenetic Analysis

• Character-State– Maximum Parsimony– Maximum Likelihood

• Genetic Distance– Fitch & Margoliash– Neighbor-Joining– Unweighted Pair Group

Phylogenetic Software

• PHYLIP

• PAUP (Available in GCG)

• TREE-PUZZLE

• PhyloBLAST

• Felsenstein maintains an extensive list of programs on the PHYLIP site

PHYLIP Programs

• dnapars/protpars

• dnadist/protdist

• dnaml (use fastDNAml instead)

• neighbor

• fitch/kitsch

• drawtree/drawgram

Maximum Parsimony

• Most common method• Allows use of all evolutionary information• Build and score all possible trees• Each node is a transformation in a character

state• Minimize treelength• Best tree requires the fewest changes to

derive all sequences

Which is the more parsimonious tree?

9 Node Crossings

8 Node Crossings3 Nodes

3 Nodes

• Reconstruction using an explicit evolutionary model

• Tree is calculated separately for each nucleotide site. The product of the likelihoods for each site provides the overall likelihood of the observed data.

• Demanding computationally

• Slowest method

• Use to test (or improve) an existing tree

Maximum Likelihood

Clustering Algorithms

• Use distances to calculate phylogenetic trees• Trees are based on the relative numbers of

similarities and differences between sequences

• A distance matrix is constructed by computing pairwise distances for all sequences

• Clustering links successively more distant taxa

DNA Distances

• Distances between pairs of DNA sequences are relatively simple to compute as the sum of all base pair differences between the two sequences

• Can only work for pairs of sequences that are similar enough to be aligned

• All base changes are considered equal

• Insertion/deletions are generally given a larger weight than replacements (gap penalties).

• Possible to correct for multiple substitutions at a single site, which is common in distant relationships and for rapidly evolving sites.

Amino Acid Distances

• More difficult to compute

• Substitutions have differing effects on structure

• Some substitutions require more than one DNA mutation

• Use replacement frequencies (PAM, BLOSUM)

Fitch & Margoliash

• 3 sequences are combined at a time to define branches and calculate their length

• Additive branch lengths

• Accurate for short branches

Neighbor Joining

• Most common method of tree construction

• Distance matrix adjusted for each taxon depending on its rate of evolution

• Good for simulation studies

• Most efficient computationally

UPGMA – Unweighted Pair Group Methods Using Arithmetic Averages

• Simplest method

• Calculates branch lengths between most closely related sequences

• Averages distance to next sequence or cluster

• Predicts a position for the root

Phylogenetic Complications

• Errors

• Loss of function

• Convergent evolution

• Lateral gene transfer

Validation

• Use several different algorithms and data sets• NJ methods generate one tree, possibly supporting

a tree built by parsimony or maximum likelihood• Bootstrapping

– Perturb data and note effect on tree

– Repeat many times

– Unchanged ~90%, tree’s correctness is supported

Are there bugs in our genome?

N-acetylneuraminate lyase

The End

Recommended