18
BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

  • Upload
    brigit

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011. Phylogenetics. According to the evolutionary theory, all life forms on this planet are related to one another by descent. Traditionally, phylogenetics is the study of the evolutionary relationships of a group of organisms. - PowerPoint PPT Presentation

Citation preview

Page 1: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

BINF6201/8201

Molecular phylogenetic methods 1

11-01-2011

Page 2: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

PhylogeneticsAccording to the evolutionary theory, all life forms on this planet are

related to one another by descent. Traditionally, phylogenetics is the study of the evolutionary

relationships of a group of organisms. The evolutionary relationships of organisms are usually described by

means of a phylogenetic tree.Charles Darwin’s

tree of life: The first conceptual evolutionary tree of life.

Page 3: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

Molecular phylogeneticsErnst Haeckel’s tree of organisms was based on the similarity of

morphological features or organisms.

Ernst Haeckel, Stem-Tree of Organisms, 1866, courtesy Robert J. Richards

With the availability of protein and DNA sequences, phylogenetic trees are constructed based on these sequences

The studies that use molecular sequence to deduce the evolutionary relationships among organisms and genes is called molecular phylogenetics.

There are good reasons for using molecular sequence data to study phylogenetics:

1. Sequences evolve in a more regular manner than do other features;

2. Sequences are more amenable to quantitative treatment.

3. Sequence data are more abundant.

Page 4: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

Phylogenetic treesA phylogenetic tree is an acyclic (no loop is allowed) graph, in which

nodes denote taxonomic units (organism, molecules), and branches (edges) connecting the nodes denote the relationships of the taxonomic units in terms of descent and ancestry.

The length of an edge usually reflects the number of evolutionary changes or time of divergence between the two taxonomic units.

Phylogenetic trees are usually binary trees: there is no more than three edges connecting a node.

If a node has more than one edges connecting to it, it is called an internal node, otherwise it is called an external node.

Internal nodes represent ancestral taxonomic units, external nodes represent the extant taxonomic units, and are referred to as operational taxonomic units (OTUs).

Page 5: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

Phylogenetic treesEdges that connect to external nodes are called external or peripheral

edges, and those that connect internal nodes are called internal edges. If we know the earliest branded edge, which is called an outgroup, we

can place a root on the edge connecting the outgroup, then the tree is a rooted tree.

If we know or if we can deduce the time of divergence between organisms, then OTUs will line up level on the same line representing the current time, and vertical length between the two nodes represent the time of their divergence.

The root represents the common ancestor of the OTUs. From the root we can find an evolutionary path to the OUT.

The branching pattern of a tree is called its topology. Rotating a tree around its internal nodes does not change its topology.

Outgroup Rotate around the root by180°

Page 6: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

Phylogenetic treesA tree is said to be additive if the distance between any two OTUs is

equal to the sum of the length of all the branches connecting them.

A

BC

D

E

I

2

12

1

23

61

G

F

H

For example, if additivity holds, then the distance between A and C is2 + 1 + 3 + 2 = 8

The distance between two OTUs are computed based on the molecular sequences, while the branch lengths are estimated from the distances between OTUs according to certain rules, so the additivity may not necessarily hold for some algorithms.

Page 7: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

Phylogenetic trees

If we use the number of changes (or evolutionary distance) to scale the edges, the OTUs may not line up level on the same line, because the rates of evolution of different taxonomic units may be different. However, if the tree is rooted, we still know the order of branch during the course of evolution.

If the evolutionary rates are the same, then the OTUs will line up level on the same line, and the evolutionary distances can be converted to time of divergence using the same scaling factor. In this case, the tree is ultrametric.

If the branch length on a rooted tree is the time of divergence, then for any three OTUs, two branches among them is longer than the third,

and the two longest distances are the same. This property of such a tree is called ultrametricity.

An ultrametric tree is also additive, but the reverse is not necessarily true.

Page 8: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

Urooted trees If we do not known the earliest branch in the tree, the tree is unrooted. An unrooted tree can only specify the

relationships among the OTUs, but does not define an evolutionary path of an OTU.

Most phylogenetic tree construction methods produce unrooted trees.

To convert an unrooted tree to a rooted one, we usually include an outgroup for the analysis.

An outgroup can be identified according to other information, such as paleontological evidence, and morphological evidence. The root is placed on the edge that connects the outgroup and the other OTUs.

Unrooted tree

Rooted tree

Outgroup

Page 9: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

Monophyletic groups and cladesThe collection of all the descendents of an ancestor is called a

monophyletic group or a clade. A group of OTUs that do not include all the descendents of a common

ancestor is called a paraphyletic group.

A

BC

D

E

I

2

12

1

23

61

G

F

H

The OTUs A and B form a monophyletic groups; A, B, C, D and E is a monophyletic group; C and D is a monophyletic group.

A and C is paraphyletic group; A, B and E is paraphyletic group; and B and C is paraphyletic group, etc.

Page 10: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

Gene trees and species treeWhen we construct a phylogenetic tree of a group of genes, the tree

reflects the evolutionary relationships of the genes, and it is called a gene tree.

The phylogenetic tree that describes the evolutionary relationships of a groups of organism is called a species tree.

When we want to infer the species tree of a group of organisms using molecular sequence data, we pick up the gene or genes that are most informative about the evolutionary history of the organisms, and use the constructed gene tree to represent the species tree.

• The first gene trees were based on cytochromes c and hemoglobin sequences.

• The widely accepted species trees of organisms were based on the small subunit of ribosome RNA (rRNA) gene sequences, since all organisms have the genes, and they tend to evolve slowly.

Page 11: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

Methods for phylogenetic tree reconstruction Numerous tree-construction methods have been proposed, because no

method performs well under all circumstances Most of these methods depend on a multiple alignment of the sequences. The gaps in the original multiple alignment will be removed, and sometimes

manual adjustments are necessary to remove uncertain alignment in the variable regions.

A high quality alignment is necessary for the correct inference of a phylogenetic tree.

A multiple alignment of mitochondria rRNA genes of some mammals

Page 12: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

These alignment based tree-construction methods can be divided into three categories:

Methods for phylogenetic tree reconstruction

1. Distance matrix methods: evolutionary distance (number of substitutions/per site per time unit) between each pair of sequences is computed based on a substitution model of sequence evolution, a tree is then constructed by an algorithm based on some functional relationships among the distance values.

2. Maximum parsimony methods: a tree is constructed through the identification of the tree that confers the shortest path that leads to the changes in the aligned sequences.

3. Maximum likelihood methods: likelihood values for possible trees are computed, and the tree that has the maximal likelihood value is selected as the inferred tree.

Page 13: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

This is the simplest method for tree construction. In order for this method to work correctly, we have to assume that the rate of evolution (rate of substitution) is the same for all lineages so that a linear relation exists between the evolutionary distance and the time of divergence.

After obtaining a multiple alignment, we first compute the evolutionary distance d between any pair of sequences based on a sequence substitution model, i.e., J-C and K2P models, etc.

Unweighted pair-group method with an arithmetic mean (UPGMA)

The J-C distance matrix of the small subunit mitochondrial rRNA genes in a group of catarrhini

Page 14: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

We have discussed the UPGMA algorithm earlier, here is its adaptation to constructing a phylogenetic tree:

The UPGMA algorithm

Step 1: Assign each OUT as a distinct cluster;Step 2: Joint two clusters that have the shortest distance, the length of the branch equals the distance between the two clusters, put branching point at the middle of the branch;Step 3: Re-compute the similarity scores among the clusters if their similarity scores have not been computed as the average pairwise distances,

Step 4: Repeat steps 2 and 3 until all OUTs have been linked to another cluster.

A BN

i

N

jij

BAAB d

NNd

1 1

1

Page 15: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

Let’s first look at a toy example:The UPGMA algorithm

A B C DdAB dAC dAD

dAB dBC dBD

dAC dBC dCD

dAD dBD dCD

A

B

C

D

A

B

If dAB is the smallest, join AB, and put a branching point at the middle of the edge

(AB) C Dd(AB)C d(AB)D

d(AB)C dCD

d(AB)D dCD

(AB)

C

D)(

21

)(21

)(

)(

BDADDAB

BCACCAB

ddd

ddd

If d(AB)C is the smallest, join (AB) and C, and put a branching point at the middle of the edge

A

B

ABd21

CABd )(21 C

Page 16: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

The UPGMA algorithm(AB) C D

d(AB)C d(AB)D

d(AB)C dCD

d(AB)D dCD

(AB)

C

D

)(31

)(

CDBDAD

DABC

ddd

d

If d(AB)C is the smallest, join (AB) and C, and put a branching point at the middle of the edge

A

B

CABd )(21 C

(ABC) Dd(ABC)D

d(ABC)D

(ABC)

D

Join (ABC) and D, and put a root at the middle of the edge

A

BC

DDABCd )(2

1

Root

Page 17: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

The tree constructed by UPGMA method is rooted, the root is put on the middle point of the last jointed cluster.

The UPGMA tree is ultrametric: i.e., given any three OTUs, the two longest distances among them are the same.

A distance matrix is said to be ultrametric if an ultrametric tree can be constructed such that the distance between any two OTUs is the same as that specified in the matrix.

If the distance matrix is ultrametric, then, UPGMA guarantees to produce a correct tree, i.e., d tree

ij = dij .

However, in reality the ultrametricity may not hold, as the sequence may evolve at different rate.

Even if the sequences evolve at the same rate, the distances among them will be only approximately ultrametric, because of the random nature of nucleotide substitutions.

When the distance matrix is far away from ultrametric, the resulting tree will be very different from the true tree, i.e., d tree

ij ≠ dij .

The UPGMA algorithm

Page 18: BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

An real-world example: mitochondria rRNA genes of a group of closely related Catarrhini:

The UPGMA algorithm

The algorithm first joint chimpanzee and pygmy chimp, and then add Human, and so on.

The tree is ultrametric and hence additive, but the matrix is only approximately ultrametric.

dAB = 0.0865 x 2 = 0.173

A

B