Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
336-16 1
Principles of PhylogenyReconstruction
How do we reconstruct the tree oflife?
Outline:
Terminology
Methods
parsimony
maximum likelihood
bootstrapping
Problems
homoplasy
hybridizationWayne Maddison Sean Graham
336-16 2
Basic Terminology
Phylogeny:
Phylogenetic tree:
Tips represent taxa (usuallyextant)
Nodes represent hypothesizedcommon ancestors
Root is the oldest commonancestor on a rooted tree
Branches represent time oramount of change betweennodes or nodes and tips (butlength is often arbitrary)
336-16 3
Rooted trees typically haveone or more outgroups.An outgroup represents agroup that diverged beforethe diversification of thegroup of interest.Outgroups tell us about thedirection of change within theingroup (the ingroup is thegroup under study).Rooted trees have a root, andnodes closer to the rootrepresent older divergencesthan nodes near the tips.The groups on either side of anode (sister taxa) areconsidered of equal age.
Basic Terminology
336-16 4
Looking at Trees
These two trees show the same relationships, but the unrooted tree makes noclaims about which of the divergences is oldest. An unrooted tree couldpotentially be rooted by any if its nodes.
Can you draw a rooted tree using one of the roots within the red group?
Note: there is more than one way to depict a set of relationships. Be carefulnot to over interpret the orientation of the branches.
336-16 5
Bird
s
Cro
codiles
Turt
les
Amph
ibians
Mam
mal
s
Liza
rds
Sna
kes
Turt
les
Amph
ibians
Mam
mal
s
Liza
rds
Sna
kes
Cro
codiles
Bird
s
Looking at Trees
336-16 6
Do these phylogenies show the samerelationships?
Figure 14.17
336-16 7
Interpreting Branch Lengths
A
B
C
D
A
B
C
D
1 nucleotide change
The branch lengths onphylogenetic trees mayor may not not beproportional to theamount of change alongtheir length.
If branch lengths areproportional to change,the tips will not beneatly lined up, and ascale should be included.
Cladogram
Phylogram
336-16 8
Interpreting Groupings
A B C D EMonophyletic Groupor Clade
A B C D E
Paraphyletic Group
A B C D E
Polyphyletic group
These terms are used to compare named entities (e.g. fished,mammals, etc.) to grouping found in phylogenetic trees
336-16 9
What is the relationship betweentaxonomic names and phylogenetic groups?
Bird
s
Cro
codiles
Turt
les
Amph
ibians
Mam
mal
s
Liza
rds
Sna
kes
Amnion
Amniotes
336-16 10
Bird
s
Cro
codiles
Turt
les
Liza
rds
Sna
kes
Cold Blooded
Reptiles
What is the relationship betweentaxonomic names and phylogenetic groups?
336-16 11
Bird
s
Cro
codiles
Turt
les
Amph
ibians
Rod
ents
Liza
rds
Sna
kes
Wings
Bats
What is the relationship betweentaxonomic names and phylogenetic groups?
336-16 12
An example of a polyphyleticgroup: Amentiferae
Willow
Alder
Walnut
All of these trees have highlyreduced male flowersclustered into structurescalled catkins.
These specialized structureswere previously thought toreflect close relationshipsamong the trees that havethem. Therefore, the familiesof trees with catkins weregrouped into the Amentiferae
However, it turns out thatcatkins are adaptations towind pollination, that reflectcommon selection, not commonhistory…
336-16 13
Ancestor withseparate flowers
Willows WalnutsOaks
Evolution ofcatkins
An example of a polyphyleticgroup: Amentiferae
336-16 14
Are these groups monophyletic,paraphyletic or polyphyletic?
fish?
tetrapods? (= four limbed)
amphibians?
mammals?
ectotherms (= warm blooded)?
What is the relationship betweentaxonomic names and phylogenetic groups?
VertebratePhylogeny
336-16 15
Reconstructing Evolutionary Trees
The development of methods:
I. distance methods (UPGMA, Neighbor joining)
II. parsimony methods
III. maximum likelihood
(IV.) Bayesian inference
336-16 16
I. Distance Methods (phenetics)
•Distance methods grew out of the school of ‘numericaltaxonomy’, which had its heyday in the 1960s.
• Taxonomists were looking for more rigorous methods ofdeveloping classifications and inferring relationships.
• The idea was to use ‘total information’, measuring manycharacters and producing a summary of what thecharacters suggest about groupings based on overallsimilarity.
• These approaches were also practical when moleculardatasets started to get very large, and for a timeoutpaced computer processing power.
336-16 17
Trait 1Tr
ait
2
AB
C
D
Overall Distance matrix
D
3.0C
3.03.3B
4.93.01.0A
DCBA
Example 1: morphology
I. Distance Methods (phenetics)
336-16 18
Trait 1
Trait
2
AB
C
D
Distance matrix
D
3.0C
3.03.3B
4.93.01.0A
DCBA
AB
Example 1: morphology
I. Distance Methods (phenetics)
336-16 19
Trait 1
Trait
2
AB
C
D
Distance matrix
D
3.0C
3.03.3B
4.93.01.0A
DCBA
AB
C
D
Example 1: morphology
I. Distance Methods (phenetics)
336-16 20
Distance matrix
D
7C
73B
531A
DCBA
Distance methods with sequence data
A: ATTGCAATCGG
B: ATTACGATCGG
C: GTTACAACCGG
D: CTCGTAGTCGA
AB
336-16 21
New Distance matrix:take averages
D
7C
63AB
DCAB
Distance methods with sequencedata
AB
D
7C
73B
531ADCBA
336-16 22
D
7C
63AB
DCAB
Distance methods with sequencedata
AB
D
7C
73B
531ADCBA
C
ABC
D
336-16 23
D
7C
63AB
DCAB
Distance methods with sequencedata
AB
D
7C
73B
531ADCBA
C
ABC
D336-16 24
Strengths and weaknesses ofdistance methods
Advantages- Intuitive, easy to understand- Works all all sorts of data, alone or in combination- Fast implementation on large data sets- Can handle very large data sets easily
Disadvantages- Must assume that similarity reflects shared
evolutionary history (when is this most problematic?)
336-16 25
II. Parsimony Methods (Cladistics)
Methods originally developed by Willi Hennig (Germanentomologist), presented in a book published in 1966
Translated into English in 1976; very influential
Originally important in analysis of small morphological datasets, including those from fossils
These methods came to the forefront with the applicationof DNA sequencing technology to systematics (early1990s). In the early days, the methods were tough toimplement because of limitations in computer processorspeed (still somewhat limiting at times, because datasets keep getting larger).
336-16 26
Applying parsimony
• Consider four taxa (1-4) and fourcharacters (A-D)
• Ancestral state: abcd
dcb’a’4dc’b’a’3
d’c’b’a’2dcba’1
DCBATrait
Taxo
n
336-16 27
Applying parsimony
• Consider four taxa (1-4) and four characters (A-D)• Ancestral state: abcd
1 2 3 4
a’bcd a’b’c’d’ a’b’c’d a’b’cd
a’
d’
c’
c
Unique changes
Convergences orreversals
b’
5 stepsabcd
dcb’a’4dc’b’a’3
d’c’b’a’2dcba’1
DCBATrait
Taxo
n
336-16 28
Applying parsimony
• Consider four taxa (1-4) and four characters (A-D)• Ancestral state: abcd
1 4 3 2
a’bcd a’b’cd a’b’c’d a’b’c’d’
a’
d’
c’
Unique changes
Convergences orreversals
b’
4 stepsabcd
dcb’a’4dc’b’a’3
d’c’b’a’2dcba’1
DCBATrait
Taxo
n
336-16 29
Strengths and weaknesses ofparsimony
Strengths- straightforward to calculate the length of the
tree (number of steps)- Simulation studies have shown that parsimony
algorithms are reliable under a range of conditions- Conceptually simple; satisfying
Weaknesses- Cannot easily accommodate complex models of
evolutionary change (e.g. in which rates ofevolutionary change differ among branches)
- Under certain circumstances, can be positivelymisleading
336-16 30
Parsimony practice
Which unrooted tree is most parsimonious?
L
M
N
K
L
K N
M
N
L
M
K
Plot each change on each tree. Positions 1 and 2 are done.
Which positions help to determine relationships?
22
2
ATTGAAAN
ATTCCAAM
ATCAGAAL
GCCATGAK
7654321
Characters
Taxa
336-16 31
Inferring the direction of evolution
Chimp
Human
Gorilla
Bonobo
Orangutan
Mouse(outgroup)
ACGCTAGCTACG
ACGCTAGCTACG
ACGCTAGCTAGG
ACGCTAGCTAGG
ACGCTAGCTAGG
ACGCTAGCTAGGWhere did themutation occur, andwhat was thechange?
G C 336-16 32
III. Maximum Likelihood Methods (andBayesian analysis as currently used)
Maximum likelihood approaches involve using a specificmodel to determine the probability that a particularbase substitution will occur along a particular branch ona tree.
In effect the question being addressed is: “what is theprobability of the observed data given a particular treeand a particular model of substitution?”
The best tree is the one with the highest probability ofexplaining the observed data, given the model
336-16 33
Maximum likelihood: a simple model
Probabilities:transition: 0.2 transversion: 0.1 no change 0.7
A
CT
GTransitions
Tran
sver
sion
s
G
G
A
T
A
A
G
G
A
C G
G
G
C
A
G
G
G
G
A
TASK: Find the tree withthe highest probability
336-16 34
Maximum likelihood: a simple model
A
CT
GTransitions
Tran
sver
sion
s
G
G
A
T
A
A
G
G
A
C G
G
G
C
A
G
G
G
G
AG
G
G
T
A
G
G
A
T
AP1 = (.7)(.1)(.2)(.7)(.7)
Probabilities:transition: 0.2 transversion: 0.1 no change 0.7
TASK: Find the tree withthe highest probability
336-16 35
Maximum likelihood: a simple model
• Probabilities– transition: 0.2 transversion: 0.1 no change 0.7
A
CT
GTransitions
Tran
sver
sion
s
G
G
A
T
A
A
G
G
A
C G
G
G
C
A
G
G
G
G
AG
G
A
T
A
G
G
G
A
A
A
G
G
A
A
A
G
G
A
C
P1 = (.7)(.1)(.2)(.7)(.7)
P2 = (.7)(.1)(.7)(.7)(.7)
P3 = (.1)(.2)(.7)(.7)(.2)
TASK: Find the tree withthe highest probability= P1 x P2 x P3
336-16 36
More complex likelihood models..
Likelihood models can be quite complesm anddifferent models assign different probabilitiesto changes, including:– Relative probabilities of transitions and
transversions– Variation in mutation rates across sites (e.g. by
codon position in protein coding genes) or regions(intron versus exon versus spacers)
– Variation in mutation rates across lineages.
336-16 37
Assessment of MaximumLikelihood (also Bayesian)
• Strengths– Highly flexible (any model can be used)- Statistically justifiable- given enough data (and the right model), will
always infer the correct tree (as shown bysimulation studies).
• Weaknesses– Impossible to know that the model is correct, and
different models may yield different answers– Computationally intensive (most data sets not fully
analyzable)
336-16 38
Characters to use in phylogeny
• Morphology
• DNA sequence
336-16 39
Characters to use in phylogeny
What are the desirable qualities of charactersused for phylogeny reconstruction?1.
2.
3.
4.
How are these qualities met by DNA sequence data?336-16 40
The problem of homology with DNA…the good, the bad and the ugly
Alignment (= HOMOLOGY assessment) can bevery challenging!
Taxon 1 AATGCGC
Taxon 2 AATCGCT
Taxon 1 AATGCGC
Taxon 2
336-16 41
The problems of locus choice: Gettingthe right rate of evolution
• Too slow?– not enough variation– Taxon 1 AATGCGC
– Taxon 2 AATGCGC
– Taxon 3 AATGCGC
Polytomy
336-16 42
Example of insufficient evidence:metazoan phylogeny
Fungi
Metazoans
336-16 43
Challenges: sunflower phylogeny
= 15 spp!= 12 spp!
• Recent radiation (200,000 years)• Many species, much hybridization• Need more rapidly evolving markers!!
336-16 44
• Too fast?– homoplasy likely– “saturation” – only 4 possible states forDNA
– Taxon 1 ATTCTGA
– Taxon 2 GTAGTGG
– Taxon 3 CGTGCTG
Polytomy
The problems of locus choice: Gettingthe right rate of evolution
336-16 45
Saturation• Imagine changing one nucleotide every hour to a random
nucleotide• Split the ancestral population in 2.
ACGTGCT
Onehour
Fourhours
12 hours
ACTTGCT
ACGAGCT
ACCTGAA
GCGATCC
ACCAGAA
AGCCTCC
8 hours
AGCGGAA
GAGCTCC
Red indicates multiplemutations at a site
24 hours? 336-16 46
Saturation: mammalianmitochondrial DNA
This line is what we would expect ifwe had an infinite number of bases, sothat every mutation could be seen.
336-16 47
Forces of evolution and phylogenyreconstruction
How does each force affect the ability toreconstruct phylogeny?mutation?drift?selection?non-random mating?migration?
336-16 48
Phylogeny case study I:whales
Are whales ungulates (hoofed mammals)? Figure 4.8
336-16 49
Whales: DNA sequence data
Hillis, D. A. 1999.
How reliable is this tree?Bootstrapping.
336-16 50
How consistent are the data?• Take the dataset (5 taxa, 10 characters)
• Create a new data set by sampling characters at random, withreplacement
CCCATGTGCTOrang
CTGTCGTTCAGorilla
GTATCTTGGABonobo
TTATCTTGGAChimp
TCATGTTGCAHuman
10987654321Taxon
G
T
G
G
G
3
A
T
T
T
T
7
C
G
A
A
A
8
C
G
A
A
A
8
G
G
T
T
T
5
C
C
G
T
T
10
C
C
G
T
T
10
T
C
C
C
G
6
C
C
G
G
C
2
C
G
A
A
A
8
G
T
G
G
G
3
Orang
Gorilla
Bonobo
Chimp
Human
Taxon
336-16 51
Whales: DNA sequence data
Hillis, D. A. 1999.
336-16 52
Molecular clocks
336-16 53
Basic idea of molecular clocks
chimps
humans
whales
hippos56 mya
60 substitutions
6 substitutions
336-16 54
Challenges for phylogeny: geneflow
336-16 55
Sunflower annuals
336-16 56
Different genes may havedifferent histories!
Wayne Maddison (UBC)has emphasized thatgenes and species are notexpected to always havethe same evolutionaryhistory.As such, gene trees andspecies trees will notalways match each other,as shown in this diagramfrom the computerpackage MESQUITE(Maddison and Maddison)developed to tackledsome of thesecomplexities.
336-16 57
Phylogeny study questions1) Explain in words the difference between monophyletic, paraphyletic, and
polyphyletic groups. Draw a hypothetical phylogeny representing eachtype. Give an actual example of a commonly recognized paraphyletic taxonin both animals and in plants (use your text for sources).
2) How can a phylogenetic tree be used to determine if a similar character intwo taxa is due to homoplasy?
3) Whales are classified as cetaceans, not artiodactyl ungulates. This makesartiodactyls paraphyletic – why? What is the evidence that whales belongin the artiodactyls?
4) Phenetics (distance methods) and cladistics (parsimony) differ in the waysthey recognize and use similarities among taxa to form phylogeneticgroupings. What types of similarity does each school recognize, and howuseful is each type of similarity considered to be for identifying groups?
336-16 58
Phylogeny study questions5) What is “bootstrapping” in the context of phylogenetic analysis, and why
is this procedure performed?
6) Why are maximum likelihood methods increasing in popularity forreconstructing phylogenies? In your answer, include a short description ofhow this method identifies the best phylogeny.
7) Integrative question: Draw a pair of axes with “Time since divergence” onthe x axis and “percent of sites that are the same” on the y axis. Draw aline that shows the expected pattern for third codon sites in proteincoding genes: is your graph linear? Explain why or why not.How and why would the graph of first codon positions differ from this?
8) You are studying a group of species that lives in two verydifferent environments. You build two phylogenies: one is basedon a locus that is probably under divergent selection in the twoenvironments, while the other phylogeny is based on a neutrallocus. Which phylogeny would be more likely to represent thespecies history? Why?
336-16 59
Phylogeny study questions9) For a number of years, Anolis lizards are found in similar micro-
habitats on many separate islands in the Caribbean that are verysimilar to each other (for example, large lizards that feed on theground, smaller lizards that feed on tree trunks, and very smalllizards that feed at the tops of branches). Two different, historicalexplanations have been proposed to explain this pattern: each morphhas evolved repeatedly on each island, or each morph has evolved justonce, then dispersed. Sketch a phylogeny that would support eachhypothesis.
10) Integrative question: the Cameroon lake cichlid phylogeny, showingthat the lake species were monophyletic, was based on mitochondrialDNA. Explain why this might not reflect the species history. Howcould you be more certain about the phylogeny?