15
336-16 1 Principles of Phylogeny Reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods parsimony maximum likelihood bootstrapping Problems homoplasy hybridization Wayne Maddison Sean Graham 336-16 2 Basic Terminology Phylogeny: Phylogenetic tree: Tips represent taxa (usually extant) Nodes represent hypothesized common ancestors Root is the oldest common ancestor on a rooted tree Branches represent time or amount of change between nodes or nodes and tips (but length is often arbitrary) 336-16 3 Rooted trees typically have one or more outgroups. An outgroup represents a group that diverged before the diversification of the group of interest. Outgroups tell us about the direction of change within the ingroup (the ingroup is the group under study). Rooted trees have a root, and nodes closer to the root represent older divergences than nodes near the tips. The groups on either side of a node (sister taxa) are considered of equal age. Basic Terminology 336-16 4 Looking at Trees These two trees show the same relationships , but the unrooted tree makes no claims about which of the divergences is oldest. An unrooted tree could potentially be rooted by any if its nodes. Can you draw a rooted tree using one of the roots within the red group? Note: there is more than one way to depict a set of relationships. Be careful not to over interpret the orientation of the branches.

Principles of Phylogeny Basic Terminology Reconstruction ...bio336/Bio336... · different models may yield differen t answers ÐComputationally intensive (most data sets not fully

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Principles of Phylogeny Basic Terminology Reconstruction ...bio336/Bio336... · different models may yield differen t answers ÐComputationally intensive (most data sets not fully

336-16 1

Principles of PhylogenyReconstruction

How do we reconstruct the tree oflife?

Outline:

Terminology

Methods

parsimony

maximum likelihood

bootstrapping

Problems

homoplasy

hybridizationWayne Maddison Sean Graham

336-16 2

Basic Terminology

Phylogeny:

Phylogenetic tree:

Tips represent taxa (usuallyextant)

Nodes represent hypothesizedcommon ancestors

Root is the oldest commonancestor on a rooted tree

Branches represent time oramount of change betweennodes or nodes and tips (butlength is often arbitrary)

336-16 3

Rooted trees typically haveone or more outgroups.An outgroup represents agroup that diverged beforethe diversification of thegroup of interest.Outgroups tell us about thedirection of change within theingroup (the ingroup is thegroup under study).Rooted trees have a root, andnodes closer to the rootrepresent older divergencesthan nodes near the tips.The groups on either side of anode (sister taxa) areconsidered of equal age.

Basic Terminology

336-16 4

Looking at Trees

These two trees show the same relationships, but the unrooted tree makes noclaims about which of the divergences is oldest. An unrooted tree couldpotentially be rooted by any if its nodes.

Can you draw a rooted tree using one of the roots within the red group?

Note: there is more than one way to depict a set of relationships. Be carefulnot to over interpret the orientation of the branches.

Page 2: Principles of Phylogeny Basic Terminology Reconstruction ...bio336/Bio336... · different models may yield differen t answers ÐComputationally intensive (most data sets not fully

336-16 5

Bird

s

Cro

codiles

Turt

les

Amph

ibians

Mam

mal

s

Liza

rds

Sna

kes

Turt

les

Amph

ibians

Mam

mal

s

Liza

rds

Sna

kes

Cro

codiles

Bird

s

Looking at Trees

336-16 6

Do these phylogenies show the samerelationships?

Figure 14.17

336-16 7

Interpreting Branch Lengths

A

B

C

D

A

B

C

D

1 nucleotide change

The branch lengths onphylogenetic trees mayor may not not beproportional to theamount of change alongtheir length.

If branch lengths areproportional to change,the tips will not beneatly lined up, and ascale should be included.

Cladogram

Phylogram

336-16 8

Interpreting Groupings

A B C D EMonophyletic Groupor Clade

A B C D E

Paraphyletic Group

A B C D E

Polyphyletic group

These terms are used to compare named entities (e.g. fished,mammals, etc.) to grouping found in phylogenetic trees

Page 3: Principles of Phylogeny Basic Terminology Reconstruction ...bio336/Bio336... · different models may yield differen t answers ÐComputationally intensive (most data sets not fully

336-16 9

What is the relationship betweentaxonomic names and phylogenetic groups?

Bird

s

Cro

codiles

Turt

les

Amph

ibians

Mam

mal

s

Liza

rds

Sna

kes

Amnion

Amniotes

336-16 10

Bird

s

Cro

codiles

Turt

les

Liza

rds

Sna

kes

Cold Blooded

Reptiles

What is the relationship betweentaxonomic names and phylogenetic groups?

336-16 11

Bird

s

Cro

codiles

Turt

les

Amph

ibians

Rod

ents

Liza

rds

Sna

kes

Wings

Bats

What is the relationship betweentaxonomic names and phylogenetic groups?

336-16 12

An example of a polyphyleticgroup: Amentiferae

Willow

Alder

Walnut

All of these trees have highlyreduced male flowersclustered into structurescalled catkins.

These specialized structureswere previously thought toreflect close relationshipsamong the trees that havethem. Therefore, the familiesof trees with catkins weregrouped into the Amentiferae

However, it turns out thatcatkins are adaptations towind pollination, that reflectcommon selection, not commonhistory…

Page 4: Principles of Phylogeny Basic Terminology Reconstruction ...bio336/Bio336... · different models may yield differen t answers ÐComputationally intensive (most data sets not fully

336-16 13

Ancestor withseparate flowers

Willows WalnutsOaks

Evolution ofcatkins

An example of a polyphyleticgroup: Amentiferae

336-16 14

Are these groups monophyletic,paraphyletic or polyphyletic?

fish?

tetrapods? (= four limbed)

amphibians?

mammals?

ectotherms (= warm blooded)?

What is the relationship betweentaxonomic names and phylogenetic groups?

VertebratePhylogeny

336-16 15

Reconstructing Evolutionary Trees

The development of methods:

I. distance methods (UPGMA, Neighbor joining)

II. parsimony methods

III. maximum likelihood

(IV.) Bayesian inference

336-16 16

I. Distance Methods (phenetics)

•Distance methods grew out of the school of ‘numericaltaxonomy’, which had its heyday in the 1960s.

• Taxonomists were looking for more rigorous methods ofdeveloping classifications and inferring relationships.

• The idea was to use ‘total information’, measuring manycharacters and producing a summary of what thecharacters suggest about groupings based on overallsimilarity.

• These approaches were also practical when moleculardatasets started to get very large, and for a timeoutpaced computer processing power.

Page 5: Principles of Phylogeny Basic Terminology Reconstruction ...bio336/Bio336... · different models may yield differen t answers ÐComputationally intensive (most data sets not fully

336-16 17

Trait 1Tr

ait

2

AB

C

D

Overall Distance matrix

D

3.0C

3.03.3B

4.93.01.0A

DCBA

Example 1: morphology

I. Distance Methods (phenetics)

336-16 18

Trait 1

Trait

2

AB

C

D

Distance matrix

D

3.0C

3.03.3B

4.93.01.0A

DCBA

AB

Example 1: morphology

I. Distance Methods (phenetics)

336-16 19

Trait 1

Trait

2

AB

C

D

Distance matrix

D

3.0C

3.03.3B

4.93.01.0A

DCBA

AB

C

D

Example 1: morphology

I. Distance Methods (phenetics)

336-16 20

Distance matrix

D

7C

73B

531A

DCBA

Distance methods with sequence data

A: ATTGCAATCGG

B: ATTACGATCGG

C: GTTACAACCGG

D: CTCGTAGTCGA

AB

Page 6: Principles of Phylogeny Basic Terminology Reconstruction ...bio336/Bio336... · different models may yield differen t answers ÐComputationally intensive (most data sets not fully

336-16 21

New Distance matrix:take averages

D

7C

63AB

DCAB

Distance methods with sequencedata

AB

D

7C

73B

531ADCBA

336-16 22

D

7C

63AB

DCAB

Distance methods with sequencedata

AB

D

7C

73B

531ADCBA

C

ABC

D

336-16 23

D

7C

63AB

DCAB

Distance methods with sequencedata

AB

D

7C

73B

531ADCBA

C

ABC

D336-16 24

Strengths and weaknesses ofdistance methods

Advantages- Intuitive, easy to understand- Works all all sorts of data, alone or in combination- Fast implementation on large data sets- Can handle very large data sets easily

Disadvantages- Must assume that similarity reflects shared

evolutionary history (when is this most problematic?)

Page 7: Principles of Phylogeny Basic Terminology Reconstruction ...bio336/Bio336... · different models may yield differen t answers ÐComputationally intensive (most data sets not fully

336-16 25

II. Parsimony Methods (Cladistics)

Methods originally developed by Willi Hennig (Germanentomologist), presented in a book published in 1966

Translated into English in 1976; very influential

Originally important in analysis of small morphological datasets, including those from fossils

These methods came to the forefront with the applicationof DNA sequencing technology to systematics (early1990s). In the early days, the methods were tough toimplement because of limitations in computer processorspeed (still somewhat limiting at times, because datasets keep getting larger).

336-16 26

Applying parsimony

• Consider four taxa (1-4) and fourcharacters (A-D)

• Ancestral state: abcd

dcb’a’4dc’b’a’3

d’c’b’a’2dcba’1

DCBATrait

Taxo

n

336-16 27

Applying parsimony

• Consider four taxa (1-4) and four characters (A-D)• Ancestral state: abcd

1 2 3 4

a’bcd a’b’c’d’ a’b’c’d a’b’cd

a’

d’

c’

c

Unique changes

Convergences orreversals

b’

5 stepsabcd

dcb’a’4dc’b’a’3

d’c’b’a’2dcba’1

DCBATrait

Taxo

n

336-16 28

Applying parsimony

• Consider four taxa (1-4) and four characters (A-D)• Ancestral state: abcd

1 4 3 2

a’bcd a’b’cd a’b’c’d a’b’c’d’

a’

d’

c’

Unique changes

Convergences orreversals

b’

4 stepsabcd

dcb’a’4dc’b’a’3

d’c’b’a’2dcba’1

DCBATrait

Taxo

n

Page 8: Principles of Phylogeny Basic Terminology Reconstruction ...bio336/Bio336... · different models may yield differen t answers ÐComputationally intensive (most data sets not fully

336-16 29

Strengths and weaknesses ofparsimony

Strengths- straightforward to calculate the length of the

tree (number of steps)- Simulation studies have shown that parsimony

algorithms are reliable under a range of conditions- Conceptually simple; satisfying

Weaknesses- Cannot easily accommodate complex models of

evolutionary change (e.g. in which rates ofevolutionary change differ among branches)

- Under certain circumstances, can be positivelymisleading

336-16 30

Parsimony practice

Which unrooted tree is most parsimonious?

L

M

N

K

L

K N

M

N

L

M

K

Plot each change on each tree. Positions 1 and 2 are done.

Which positions help to determine relationships?

22

2

ATTGAAAN

ATTCCAAM

ATCAGAAL

GCCATGAK

7654321

Characters

Taxa

336-16 31

Inferring the direction of evolution

Chimp

Human

Gorilla

Bonobo

Orangutan

Mouse(outgroup)

ACGCTAGCTACG

ACGCTAGCTACG

ACGCTAGCTAGG

ACGCTAGCTAGG

ACGCTAGCTAGG

ACGCTAGCTAGGWhere did themutation occur, andwhat was thechange?

G C 336-16 32

III. Maximum Likelihood Methods (andBayesian analysis as currently used)

Maximum likelihood approaches involve using a specificmodel to determine the probability that a particularbase substitution will occur along a particular branch ona tree.

In effect the question being addressed is: “what is theprobability of the observed data given a particular treeand a particular model of substitution?”

The best tree is the one with the highest probability ofexplaining the observed data, given the model

Page 9: Principles of Phylogeny Basic Terminology Reconstruction ...bio336/Bio336... · different models may yield differen t answers ÐComputationally intensive (most data sets not fully

336-16 33

Maximum likelihood: a simple model

Probabilities:transition: 0.2 transversion: 0.1 no change 0.7

A

CT

GTransitions

Tran

sver

sion

s

G

G

A

T

A

A

G

G

A

C G

G

G

C

A

G

G

G

G

A

TASK: Find the tree withthe highest probability

336-16 34

Maximum likelihood: a simple model

A

CT

GTransitions

Tran

sver

sion

s

G

G

A

T

A

A

G

G

A

C G

G

G

C

A

G

G

G

G

AG

G

G

T

A

G

G

A

T

AP1 = (.7)(.1)(.2)(.7)(.7)

Probabilities:transition: 0.2 transversion: 0.1 no change 0.7

TASK: Find the tree withthe highest probability

336-16 35

Maximum likelihood: a simple model

• Probabilities– transition: 0.2 transversion: 0.1 no change 0.7

A

CT

GTransitions

Tran

sver

sion

s

G

G

A

T

A

A

G

G

A

C G

G

G

C

A

G

G

G

G

AG

G

A

T

A

G

G

G

A

A

A

G

G

A

A

A

G

G

A

C

P1 = (.7)(.1)(.2)(.7)(.7)

P2 = (.7)(.1)(.7)(.7)(.7)

P3 = (.1)(.2)(.7)(.7)(.2)

TASK: Find the tree withthe highest probability= P1 x P2 x P3

336-16 36

More complex likelihood models..

Likelihood models can be quite complesm anddifferent models assign different probabilitiesto changes, including:– Relative probabilities of transitions and

transversions– Variation in mutation rates across sites (e.g. by

codon position in protein coding genes) or regions(intron versus exon versus spacers)

– Variation in mutation rates across lineages.

Page 10: Principles of Phylogeny Basic Terminology Reconstruction ...bio336/Bio336... · different models may yield differen t answers ÐComputationally intensive (most data sets not fully

336-16 37

Assessment of MaximumLikelihood (also Bayesian)

• Strengths– Highly flexible (any model can be used)- Statistically justifiable- given enough data (and the right model), will

always infer the correct tree (as shown bysimulation studies).

• Weaknesses– Impossible to know that the model is correct, and

different models may yield different answers– Computationally intensive (most data sets not fully

analyzable)

336-16 38

Characters to use in phylogeny

• Morphology

• DNA sequence

336-16 39

Characters to use in phylogeny

What are the desirable qualities of charactersused for phylogeny reconstruction?1.

2.

3.

4.

How are these qualities met by DNA sequence data?336-16 40

The problem of homology with DNA…the good, the bad and the ugly

Alignment (= HOMOLOGY assessment) can bevery challenging!

Taxon 1 AATGCGC

Taxon 2 AATCGCT

Taxon 1 AATGCGC

Taxon 2

Page 11: Principles of Phylogeny Basic Terminology Reconstruction ...bio336/Bio336... · different models may yield differen t answers ÐComputationally intensive (most data sets not fully

336-16 41

The problems of locus choice: Gettingthe right rate of evolution

• Too slow?– not enough variation– Taxon 1 AATGCGC

– Taxon 2 AATGCGC

– Taxon 3 AATGCGC

Polytomy

336-16 42

Example of insufficient evidence:metazoan phylogeny

Fungi

Metazoans

336-16 43

Challenges: sunflower phylogeny

= 15 spp!= 12 spp!

• Recent radiation (200,000 years)• Many species, much hybridization• Need more rapidly evolving markers!!

336-16 44

• Too fast?– homoplasy likely– “saturation” – only 4 possible states forDNA

– Taxon 1 ATTCTGA

– Taxon 2 GTAGTGG

– Taxon 3 CGTGCTG

Polytomy

The problems of locus choice: Gettingthe right rate of evolution

Page 12: Principles of Phylogeny Basic Terminology Reconstruction ...bio336/Bio336... · different models may yield differen t answers ÐComputationally intensive (most data sets not fully

336-16 45

Saturation• Imagine changing one nucleotide every hour to a random

nucleotide• Split the ancestral population in 2.

ACGTGCT

Onehour

Fourhours

12 hours

ACTTGCT

ACGAGCT

ACCTGAA

GCGATCC

ACCAGAA

AGCCTCC

8 hours

AGCGGAA

GAGCTCC

Red indicates multiplemutations at a site

24 hours? 336-16 46

Saturation: mammalianmitochondrial DNA

This line is what we would expect ifwe had an infinite number of bases, sothat every mutation could be seen.

336-16 47

Forces of evolution and phylogenyreconstruction

How does each force affect the ability toreconstruct phylogeny?mutation?drift?selection?non-random mating?migration?

336-16 48

Phylogeny case study I:whales

Are whales ungulates (hoofed mammals)? Figure 4.8

Page 13: Principles of Phylogeny Basic Terminology Reconstruction ...bio336/Bio336... · different models may yield differen t answers ÐComputationally intensive (most data sets not fully

336-16 49

Whales: DNA sequence data

Hillis, D. A. 1999.

How reliable is this tree?Bootstrapping.

336-16 50

How consistent are the data?• Take the dataset (5 taxa, 10 characters)

• Create a new data set by sampling characters at random, withreplacement

CCCATGTGCTOrang

CTGTCGTTCAGorilla

GTATCTTGGABonobo

TTATCTTGGAChimp

TCATGTTGCAHuman

10987654321Taxon

G

T

G

G

G

3

A

T

T

T

T

7

C

G

A

A

A

8

C

G

A

A

A

8

G

G

T

T

T

5

C

C

G

T

T

10

C

C

G

T

T

10

T

C

C

C

G

6

C

C

G

G

C

2

C

G

A

A

A

8

G

T

G

G

G

3

Orang

Gorilla

Bonobo

Chimp

Human

Taxon

336-16 51

Whales: DNA sequence data

Hillis, D. A. 1999.

336-16 52

Molecular clocks

Page 14: Principles of Phylogeny Basic Terminology Reconstruction ...bio336/Bio336... · different models may yield differen t answers ÐComputationally intensive (most data sets not fully

336-16 53

Basic idea of molecular clocks

chimps

humans

whales

hippos56 mya

60 substitutions

6 substitutions

336-16 54

Challenges for phylogeny: geneflow

336-16 55

Sunflower annuals

336-16 56

Different genes may havedifferent histories!

Wayne Maddison (UBC)has emphasized thatgenes and species are notexpected to always havethe same evolutionaryhistory.As such, gene trees andspecies trees will notalways match each other,as shown in this diagramfrom the computerpackage MESQUITE(Maddison and Maddison)developed to tackledsome of thesecomplexities.

Page 15: Principles of Phylogeny Basic Terminology Reconstruction ...bio336/Bio336... · different models may yield differen t answers ÐComputationally intensive (most data sets not fully

336-16 57

Phylogeny study questions1) Explain in words the difference between monophyletic, paraphyletic, and

polyphyletic groups. Draw a hypothetical phylogeny representing eachtype. Give an actual example of a commonly recognized paraphyletic taxonin both animals and in plants (use your text for sources).

2) How can a phylogenetic tree be used to determine if a similar character intwo taxa is due to homoplasy?

3) Whales are classified as cetaceans, not artiodactyl ungulates. This makesartiodactyls paraphyletic – why? What is the evidence that whales belongin the artiodactyls?

4) Phenetics (distance methods) and cladistics (parsimony) differ in the waysthey recognize and use similarities among taxa to form phylogeneticgroupings. What types of similarity does each school recognize, and howuseful is each type of similarity considered to be for identifying groups?

336-16 58

Phylogeny study questions5) What is “bootstrapping” in the context of phylogenetic analysis, and why

is this procedure performed?

6) Why are maximum likelihood methods increasing in popularity forreconstructing phylogenies? In your answer, include a short description ofhow this method identifies the best phylogeny.

7) Integrative question: Draw a pair of axes with “Time since divergence” onthe x axis and “percent of sites that are the same” on the y axis. Draw aline that shows the expected pattern for third codon sites in proteincoding genes: is your graph linear? Explain why or why not.How and why would the graph of first codon positions differ from this?

8) You are studying a group of species that lives in two verydifferent environments. You build two phylogenies: one is basedon a locus that is probably under divergent selection in the twoenvironments, while the other phylogeny is based on a neutrallocus. Which phylogeny would be more likely to represent thespecies history? Why?

336-16 59

Phylogeny study questions9) For a number of years, Anolis lizards are found in similar micro-

habitats on many separate islands in the Caribbean that are verysimilar to each other (for example, large lizards that feed on theground, smaller lizards that feed on tree trunks, and very smalllizards that feed at the tops of branches). Two different, historicalexplanations have been proposed to explain this pattern: each morphhas evolved repeatedly on each island, or each morph has evolved justonce, then dispersed. Sketch a phylogeny that would support eachhypothesis.

10) Integrative question: the Cameroon lake cichlid phylogeny, showingthat the lake species were monophyletic, was based on mitochondrialDNA. Explain why this might not reflect the species history. Howcould you be more certain about the phylogeny?