59
Phylogenetics

Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Embed Size (px)

Citation preview

Page 1: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Phylogenetics

Page 2: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Phylogenetic Trees

time

Page 3: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

time

NODE BRANCH

ROOTOperationalTaxonomicUnit (OTU)

HypotheticalTaxonomic Unit

Page 4: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Information

• Branching order (topology)– Relative closeness of different taxa

• Branch length– Amount of divergence

Page 5: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Rooted and unrooted trees

A

B

C

D

E

A

B

E

C

D

ROOTED UNROOTED

Page 6: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Rooted and unrooted trees

A

B

C

D

E

A

B

E

C

D

ROOTED UNROOTED

Page 7: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Rooted and unrooted trees

A

B

C

D

E

A

B

E

C

D

ROOTED UNROOTED

Page 8: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

A

B

C

A

B

C

A

B

C

A

B

C

A

B

C

D

A B

C DA

B

C

D

A

BC

D

A

BC

D

A

BC

D

A

B

CD

… 15 rooted trees of 4 OTUs

3 OTUs

4 OTUs

UNROOTED ROOTED

Page 9: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Monophyletic & Paraphyletic

Mammals

Turtles and tortoises

Snakes and lizards

Crocodiles

Birds

REPTILES

Page 10: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Monophyletic & Paraphyletic

• Monophyletic– Natural clade; all of the taxa are derived from

a common ancestor

• Paraphyletic– Taxonomic group whose most recent common

ancestor is shared by another taxon

Page 11: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Reconstruct phylogeny from molecular data

ACTGTTACCGA

ACTGTTACCGA

ACTGTTACCGA

ACTGTTACCGA

ACTGTTACCGA

?

Page 12: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Types of phylogenetic analysis methods

• Phenetic: trees are constructed based on observed characteristics, not on evolutionary history

• Cladistic: trees are constructed based on fitting observed characteristics to some model of evolutionary history

Distancemethods

ParsimonyandMaximumLikelihoodmethods

Page 13: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Methods of Tree reconstruction

• Distance• Maximum Parsimony• Maximum Likelihood• Bayesian

Phylogeny Estimation: Traditional and Bayesian Approaches

Nature Reviews Genetics (2003) 4:275

Page 14: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Genetic distance

• Distance from one sequence to another• Hamming Distance

– Count number of differences

• Multiple hits – number of events is greater than number of differences – Estimate number of events

• Infer tree from genetic distance using Neighbour-joining (NJ) method

Page 15: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

UPGMA shown for illustrative purposes. Neighbour-joining is preferred method.

Page 16: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

• The algorithm in the text means: find the closest distance between two sequences, cluster those; then find the next closest distance, cluster those; as sequences are added to existing clusters find the average distance between existing clusters

• Work through the notation!• UPGMA assumes a molecular clock

mechanism of evolution

Page 17: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

• Neighbor-joining: corrects for UPGMA’s assumption of the same rate of evolution for each branch by modifying the distance matrix to reflect different rates of change.

• The net difference between sequence i and all other sequences is

• ri = Sdik

Page 18: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

• The rate-corrected distance matrix is then • Mij = dij - (ri + rj)/(n - 2)

• Join the two sequences whose Mij is minimal; then calculate the distance from this new node to all other sequences using

• dkm = (dim + djm - dij)/2• Again correct for rates and join nodes.

Page 19: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Maximum Parsimony (MP)

• Find topology requiring smallest number of evolutionary changes

• Consider each position (site) in the sequence alignment independently

• Not all sites are informative

• Informative– Favours one topology over others

Page 20: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Informative sites

a. A A G A G T T C Ab. A G C C G T T C Tc. A G A T A T C C Ad. A G A G A T C C T

a

b

c

d

a b

c d

a

b

c

d

Page 21: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Maximum Likelihood (ML)

• Likelihood L of a tree is the probability of observing the data given the treeL = P(data|tree)

• Find the tree with the highest L value

• Results depends on model of nucleotide substitution

• Computationally time-consuming

Page 22: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

• Actually, all the other methods discussed implicitly use a simple model of evolution similar to the typical model made explicit in maximum likelihood:

• All sites selectively neutral• All mutate independently, forward and

reverse rates equal, given by m

Page 23: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

• Also assume discrete generations and sites change independently

• Given this model, can calculate probability that a site with initial nucleotide I will change to nucleotide j within time t:

• Ptij = dije-mt + (1 - e-mt)gj, where dij = 1 if i = j

and dij = 0 otherwise, and where gj is the equilibrium frequency of nucleotide j

Page 24: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

• The likelihood that some site is in state i at the kth node of a tree is Li

(k)

• The likelihoods for all states for each site for each node are calculated separately; the product of the likelihoods for each site gives the overall likelihood for the observed data

• Different tree topologies are searched to find the highest overall likelihood

Page 25: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

• Maximum likelihood is maybe the “gold standard” for phylogenetic analysis; but because of its computational intensity it can only be used for select data and only after much initial fine tuning of many parameters of sequence alignments

• Often used to distinguish between several already generated trees

Page 26: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Bayesian (B) Phylogeny Estimation

• Searches for best trees consistent with both model and data

• Incorporates prior knowledge (prior probability)

• B maximises probability of tree given data and model

• Searches for best set of trees

Page 27: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Comparison of methods

How much information are they using?• MP, ML, B use actual DNA whereas NJ

summarises information into distance matrix• BUT, not all sites are used by MP (“informative”

sites only)How can the nature of the data affect the

methods?• NJ better for recent divergences• MP works well for a high number of informative

sites

Page 28: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Comparison of methods

How do they cope with lots of sequences?• MP requires comparison of all possible trees

– Not possible for large number of taxa

• ML is computationally intensive and very slow for large number of taxa

• NJ efficient for large number of taxaAnything else?• ML requires explicit assumptions about rate and

pattern of substitution (model)– ML may perform poorly if model is incorrect

• ML or B may get stuck on local maxima

Page 29: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Outgroup rooting of unrooted trees

• Outgroup – related sequence that definitely diverged earlier (paleontological evidence)

humanmouse

rat

human

mouse

rat

chicken

Page 30: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Rate (r) of evolution

• K = number of substitutions per site

• T = time since divergence

• r = K/2T

• Rate is expressed as substitutions per site per year

Species A

Species BT

Page 31: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Estimating species divergence times

• fossil evidence shows that T1 = 310 mya

• What is T2 ?

• Only need to have sequences and information on one divergence time

Human (B)

Chicken (C)

Rat (A)T2

T1

Page 32: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

True tree and inferred tree

• There is only one true tree of species relationships

• Inferred tree may not be correct

1. Some genes may not be representative

2. Tree inference method may have produced an incorrect tree– e.g. parsimony method:

may get several equally parsimonious results

Page 33: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

How credible is the tree?

• The tree is a hypothesis of the true relationship

• Need some measure of the support for that hypothesis

• Note: Bayesian methods simultaneously estimate tree and measures of uncertainty for each branch

Page 34: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Standard Error of branches

Human

Chimp

Gorilla

Orangutan

Page 35: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

• The bootstrap: randomly sample all positions (columns in an alignment) with replacement -- meaning some columns can be repeated -- but conserving the number of positions; build a large dataset of these randomized samples

Page 36: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Bootstrap

Page 37: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

• Then use your method (distance, parsimony, likelihood) to generate another tree

• Do this a thousand or so times • Note that if the assumptions the method is based

on hold, you should always get the same tree from the bootstrapped alignments as you did originally

• The frequency of some feature of your phylogeny in the bootstrapped set gives some measure of the confidence you can have for this feature

Page 38: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Applications of phylogenetics

• Detection of orthology and paralogy

• Estimation of divergence times• Reconstruction of ancient

proteins• Identifying residues important

to selection• Detecting recombination points• Identifying mutations likely to

be associated with disease• Determining the identity of new

pathogens

Page 39: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

The time will come, I believe, though I shall not live tosee it, when we shall have fairly true genealogical treesof each great kingdom of Nature.

Charles Darwin

Page 40: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

The Tree of Life

• Traditional classification of life into five kingdoms– Bacteria (inc

cyanobacteria)– Protista (inc. cilliates,

flagellates, amoebae)– Fungi– Plantae– Animalia

Page 41: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Archaebacteria

• Carl Woese and colleagues• Study relationships by

comparing rRNAs • Methanogens were expected

to group with other bacteria• BUT, found to be equally

distant from bacteria and eukaryotes

• Made new taxon - Archaebacteria

• Includes many extremophiles– thermophiles– hyperthermophiles– halophiles (salt dependent)

Page 42: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

The Tree of Life

Page 43: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Where is the root of the Tree of Life?

• No possible outgroup (by definition)• Iwabe et al. (1989)• Examined phylogenetic tree of pairs of genes that

exist in all organisms– derived from gene duplication that predates lineage

divergences

lineage 1

lineage 2

lineage 3

lineage 1

lineage 2

lineage 3

Gene A

Gene A1

Gene A2

Page 44: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

• Homologous elongation factor genes EF-Tu and EF-G present in all prokaryotes and eukaryotes

• Both genes show the same topology

Archaea

Eucarya

Bacteria

Archaea

Eucarya

Bacteria

EF-Tu

EF-G

Page 45: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

based on morphological characteristics (Chatton, 1925)

Changing view ofThe Tree of Life …(Gaucher et al, 2010)

based on DNA sequence analysis (Woese & Fox, 1977)

based on ancient gene duplication

based on phylogenies of hundreds of genes

based on membrane architecture & gene indels

Most modern view …

Page 46: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Phylogeny of humans and apes

• Darwin – Gorilla and Chimpanzee our closest relatives and human evolutionary origins in Africa

• Many people preferred anthropocentric idea that humans were special

Human

Chimp

Gorilla

Orangutan

Gibbon

Traditional view

Page 47: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

So what is the evidence?

• Serological precipitation (Goodman 1962) – H, G, C constitute a natural clade, orangutans & gibbons earlier diverging

• However, H,G,C relative relationships remained unclear

• Most DNA sequence data support ((H,C),G)

• Some genes show different relationship

Human

Chimp

Gorilla

Orangutan

Gibbon

Page 48: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Conservation biology – the dusky seaside sparrow

• Last one died June 1987 (DisneyWorld)

• Discovered 1872• Ammodramus maritimus

nigrescens• Geographically confined to

small salt marsh in Florida• 2000 individuals in 1900• 6 individuals (all male) in 1980 • Conservation program

– artificial breeding

Page 49: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Conservation genetics

• Mating of remaining males with females from closest subspecies available

• Female hybrids of first generation then “back-crossed” to original males

• Continue as long as original males live

• Which species to choose to take the females from??

Page 50: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

• 8 other A. maritimus subspecies

• Geographically dispersed along coast

• Artificial breeding with Scott’s seaside sparrow (A. m. peninsulae)

• Chosen based on Morphological and behavioural similarities

• Was this the best choice?

Page 51: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

nigrescens

peninsulae

AtlanticCoast

GulfCoast

Page 52: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Woops!

• Two subspecies diverged about 250,000 – 500,000 years ago

• A. m. nigrescens almost indistinguishable molecularly from other Atlantic Coast subspecies

• Any Atlantic Coast subspecies would have been a better choice

• Created a new species instead of saving old• Dusky seaside sparrow officially declared extinct in 1990

Page 53: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Origin of angiosperms

• Flowering plants: carpel-enclosed ovules and seed

• Fossils – began to radiate mid-

Cretaceous (~115 mya)– Dominant land plants 90

mya

• 275,000 species described

Page 54: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Origin of angiosperms

• Probably arose from gymnosperm-like ancestor up to 370-380 mya

• Gymnosperm = “naked seed” (e.g. conifers)

• Long time span of possible origin

• Why no fossils?– Didn’t exist prior to

Cretaceous?– Lived in habitats not

conducive to fossilisation?

Page 55: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Monocot and Dicot divergence

• Monocotyledons• Dicotyledons• Two major classes of

angiosperm• Date of their divergence

gives minimum estimate for age of angiosperms

• Phylogenetic analysis of DNA sequences

Page 56: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Monocot – Dicot divergence

• Initial estimate of 300-320 mya (Martin et al. 1989)– Glyceraldehyde-3-phosphaste dehydrogenase from plants,

animals and fungi

• Implied origin close (within 100myr) to the time of origin of earliest land plants – seems too ancient– implies all vascular plants arose within 100myr

• Alternative study (Wolfe et al., 1989)• Calibrated molecular clock with maize-wheat divergence

(50-70 mya)• Monocot-dicot divergence estimated as 200 mya• Existed long before prominence in paleoflora

Page 57: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Cetaceans

• Link to ungulates (hoofed mammals) suggested by comparative anatomy

• Early protein and mtDNA phylogenetic studies indicated that Cetaceans are closely related to Artiodactyls

Cow

Deer

Hippo

Pig

Peccary

Art

ioda

ctyl

s

Camel

Page 58: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

• Graur and Higgins (1994)• Protein and DNA

sequence from several cetaceans and from three suborders of artiodactyls

• Showed cetaceans are within artiodactyls

• Confirmed by analysis of distribution of SINE elements

Page 59: Phylogenetics. Phylogenetic Trees time NODE BRANCH ROOT Operational Taxonomic Unit (OTU) Hypothetical Taxonomic Unit

Cetartiodactyls