Upload
sabrina-snow
View
222
Download
3
Tags:
Embed Size (px)
Citation preview
Interpreting molecular phylogenetic trees
Aidan BuddStructural and Computational Biology
UnitEMBL Heidelberg, Germany
EMBO Practical Course on Computational Molecular Evolution
IMBG-HCMR, Heraklion, Greece
Monday 3rd - Tuesday 4th May 2010
Part 1
Session Homepage:
http://tinyurl.com/interpretPhyloMolEvol2010
Session Aims
1.Highlight key aspects of molecular evolutionary studies• common concepts
• clearly it's important to understand these as well as possible
• typical questions
• helps planning your own analyses
• applications to other fields
• helps identifying potential collaborations and applying for funding
2.Review basic concepts and terminology1.Provide a common background for later sessions
• Demonstrate selected important tools and resources
• Place different components of an analysis in context
Not just you think molecular
evolution/phylogenetics is important...
Number of Publications Per Year
query: ((phylogeny OR phylogenetic OR phylogenies OR phylogenetics))
Source: ISI Web of Knowledge as of 28.03.2010
query: (((evolution OR evolutionary OR evolves OR evolve) AND (molecule OR molecular OR molecules)))
molecular evolutionphylogenies
total since 1975: +100,000
total since 1975: 72638
• Many molecular evolution/phylogenetics articles are published each year
• Number of articles published increases year-by-year
Highly Cited Articles
Method/SoftwarePublicati
onYear
Original Citation# of
Citations
MEGA3 2004Kumar et al. 2004Brief Bioinform.;5(2):150-63. PMID: 15260895
6630
MRBAYES 2001Huelsenbeck and Ronquist 2001Bioinformatics;17(8):754-5 PMID: 11524383
5707
CLUSTALW ** 1994Thompson et al. 1994Nucleic Acids Res.;22(22):4673-80 PMID: 7984417
29658
BLAST * 1990Altschul et al. 1990J Mol Biol.;215(3):403-10 PMID: 2231712
27660
Neighbor-Joining Algorithm
1987Saitou and Nei 1987Mol Biol Evol.;4(4):406-25 PMID: 3447015
20523
Non-Parametric Bootstrap in Phylogenetics
1985Felsenstein 1985Evolution;39(4):783-91 PMID: N/A
14566Source: ISI Web of Knowledge, as of 29.03.2010as of 2006 in the ISI web of knowledge:
* most cited paper that year, 26th most cited in the entirety of science** second most cited paper that year, 31st most cited paper in the entirety of science
•Some molecular-evolution-related articles are VERY highly cited
Applications of Phylogenies
Applications of Phylogenetics
•Epidemiology
•Forensics
•Medical treatment selection
•Selecting conservation targets
•Monitoring trade in illegal organisms
•Bioinformatics tools - in particular:
•building MSAs
•predicting function
Applications of Phylogenetics:Epidemiology
Clonal origin and evolution of a transmissible cancer Nyrgua et al. PMID: 16901782
Characterise evolutionary history of a pathogenDog cancer known to be infectious - is the infectious agent:•a virus (cf Human Papilloma Virus)?•dog cancer cells themselves?Root assumed on this branchTumor and host-dog DNA compared and used to draw a treeAll tumor sequences more closely related to each other than any are to host-dogs
Tumor itself is the infectious agent!
Applications of Phylogenetics:Forensics
Analysis of a rape case by direct sequencing of the human immunodeficiency virus type 1 pol and gag genes.Albert et al., PMID:7520096
Using HIV pol and gag genes to estimate phylogeny of viruses from•male rape suspect•female rape victim•random individualsFemale victim’s virus is clearly more closely related to male suspect’s viruses compared to any other sequencesSupports guilt of male suspectConclusion depends on determining order of lineage divergence events
Applications of Phylogenetics:Medical Treatment Selection
Nocardia cyriacigeorgica, an emerging pathogen in the United States.Schlaberg R, Huard RC, Della-Latta P.J Clin Microbiol. 2008 Jan;46(1):265-73.PMID: 18003809Figure 2 Nocardia
cyriacigeorgica
Characters such as drug resistance can vary across the phylogeny
Drug selection for a novel strain can be informed using phylogeny and knowledge of these character distributions
Applications of Phylogenetics:Selecting Conservation Targets
Prioritise organisms for inclusion in conservation programs, taking into account•phylogenetic diversity•conservation costs•probability of extinction
Resource-aware taxon selection for maximizing phylogenetic diversity.Pardi F, Goldman N.Syst Biol. 2007 Jun;56(3):431-44.MID: 17558965Figure 4
Applications of Phylogenies:Monitoring Trade in Illegal Organisms
Genetic evidence of illegal trade in protected whales links Japan with the US and South Korea.Baker CS, Steel D, Choi Y, Lee H, Kim KS, Choi SK, Ma YU, Hambleton C, Psihoyos L, Brownell RL, Funahashi N.Biol Lett. 2010 Apr 14. Figure 1
Determining the source animals for unlabelled meats
Used here to track illegal trade in protected whale and dolphin species
Applications of Phylogenetics:Bioinformatics Tools
STRING
STRING 8--a global view on proteins and their functional interactions in 630 organisms.Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C.Nucleic Acids Res. 2009 Jan;37(Database issue):D412-6.PMID: 18940858
Further exampleA tree-based conservation scoring method for short linear motifs in multiple alignments of protein sequences.Chica C, Labarga A, Gould CM, López R, Gibson TJ.BMC Bioinformatics. 2008 May 6;9:229.PMID: 18460207
Applications of Phylogenetics:Building Progressive Multiple
Alignments
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.Thompson JD, Higgins DG, Gibson TJ.Nucleic Acids Res. 1994 Nov 11;22(22):4673-80.PMID: 7984417
An algorithm for progressive multiple alignment of sequences with insertions.Löytynoja A, Goldman N.Proc Natl Acad Sci U S A. 2005 Jul 26;102(30):10557-62.PMID: 16000407
Applications of Phylogenetics
•Relatedness
•Timing historical events
Common themes
•Are my sequences related to each other?
NOT!
Rooted Phylogenies
Terminology and Concepts
Alternative Tree-Related Terminologies
leaftipterminal nodeexternal node
brancharcedge
Trees: Branches and Nodes
Trees consist of:
branches
nodes (ends of branches)
Internal/External Nodes/Branches
Branches and Nodes are either:
Node - associated with an extant sequence/OTU (operational taxonomic unit)
external/terminal
Branch - links an external and an internal node
internal/interiorNode - at the intersection of two or more branchesBranch - links two internal nodes
Branches
Branches•represent successive generations of
“taxa”
•‘later” taxa have “earlier” taxa as their ancestors
•i.e. a lineage
•time flows from the base of the tree to the tips
Time
Internal Nodes
•represent hypothetical ancestral taxa/sequences/organisms i.e. HTUs - hypothetical taxonomic units
Internal Nodes
•A "special' internal node
•The most recent common ancestor of all OTUs
•Usually implies many other less recent common ancestors
Root (Root Node)
Time
Parent/Daughter Branches
Time
diverge into
multiple daughter lineages/branches
parental/ancestral lineages/branches
Polytomies
Internal nodes with two daughter branches are bifurcations
Internal nodes associated with more than two daughter branches
Polytomies
How many bifurcations on the tree? (a) 4 (b) 5 (c) 6
Interpreting Polytomies
A B C
Soft Polytomy
Lineages only bifurcate - internal lineages so short that no identifiable change/evolution occurred along them
Thus true pattern of lineage divergence cannot be resolved
A B C
Hard Polytomy
Ancestral lineage diverged into 3+ lineages simultaneously
NB: Some software only accepts bifurcating trees
OTU Branchesa zy yt tu uv va
b zy yt tu uv vb
c zy yt tu uc
d zy yt td
e zy ye
f zfg zx xw wg
h zx xw wh
i zx xi
Relatedness
Time
tu
v
w
x
y
z
We can list the set of branches in the complete lineage of each OTU
that neither shares with e
a and care more closely related to each other
because their complete lineages share (at least one) more recent common...
than either is to e
ancestor(s) [ t and u ]branch(es) [ yt and tu ]
Relatedness Statements: 'X is more closely related to Y
than to Z'
Time
tu
v
w
x
y
z
OTU Branchesa zy yt tu uv va
b zy yt tu uv vb
c zy yt tu uc
d zy yt td
e zy ye
f zfg zx xw wg
h zx xw wh
i zx xi
•a is more closely related to b than either is to g•b, d and e are more closely related to each other than they are to f
A set of OTUs are more closely related to each other than any are to other OTUs if they share more recent common ancestor(s)/lineage(s) not shared with other OTUs
•g is equally distantly related to f as (g is related) to d•i is equally distantly related to f, e, and a
Relatedness Statements: 'X is equally distantly related to
Y and Z'
Time
tu
v
w
x
y
z
OTU Branchesa zy yt tu uv va
b zy yt tu uv vb
c zy yt tu uc
d zy yt td
e zy ye
f zfg zx xw wg
h zx xw wh
i zx xi
An OTU is equally distantly related to two (or more) other OTUs if it shares the same most recent common ancestor with these two (or more) OTUs
Relatedness Statements: 'X is most closely related to Y'
Time
tu
v
w
x
y
z
OTU Branchesa zy yt tu uv va
b zy yt tu uv vb
c zy yt tu uc
d zy yt td
e zy ye
f zfg zx xw wg
h zx xw wh
i zx xi
•g is most closely related to h•a, b, and c are most closely related to d
An OTU (or a set of OTUs) are most closely related to another OTU (or set of OTUs) if they share one (or more) most recent common ancestor(s)/lineage(s) not shared by any other OTU (or group of OTUs)
Relatedness Statements: 'X is the sister group of Y'
Time
tu
v
w
x
y
z
•d is the sister group of the group of taxa [a, b, c]•the group of taxa [g, h] and the taxon i are sister groups
OTUs (or groups of OTUs) most closely related to each other are sometimes referred to as sister groups
OTU Branchesa zy yt tu uv va
b zy yt tu uv vb
c zy yt tu uc
d zy yt td
e zy ye
f zfg zx xw wg
h zx xw wh
i zx xi
Relatedness
1. a2. d3. d and b4. a and b5. b
Time
tu
v
w
x
y
z
OTU Branchesa zy yt tu uv va
b zy yt tu uv vb
c zy yt tu uc
d zy yt td
e zy ye
f zfg zx xw wg
h zx xw wh
i zx xi
c is most closely related to...?
Tree Topology
Trees “rotated” around internal branches have the same topology
For rooted trees, different topologies describe different patterns of relatedness of OTUs
Trees with identical topologies
x
y
z
w
The branch intersections (i.e. internal nodes) of a tree specify its topology
Node Branchesz yz za zby xy yw yzw yw wc wdx xy xe
xy
z
w
Tree Representations
Most rooted tree figures use a “rectangular” rather than a “diagonal” representation
Diagonal
Rectangular
Rectangular trees represent internal nodes with lines perpendicular to lines representing the branches
Unrooted Phylogenies
Unrooted Trees
There’s no root on the tree
Thus, there is no statement about the DIRECTION of time (i.e. of direction of divergence of lineages in the tree)
Many applications of phylogenies require a rooted treeBut many tree estimation tools yield only unrooted trees!
thus - we can’t distinguish daughter from parent branches
There are multiple rooted trees possible from a given unrooted tree
The number of possible rooted tree topologies is the number of branches on the unrooted tree (assuming a bifurcating root)
Unrooted Rooted
There are multiple rooted trees possible from a given unrooted tree
If you allow a polytomy at the root...
there is one additional rooted tree possible for each internal node in the unrooted tree
Unrooted Rooted
cd
ba
ab
dc
Which sequence is d most closely related to?
a
b
c
Quiz
None of them - It depends where the root is!
However, if d is most closely related to a single OTU and not to a group, then that OTU must be c (see previous slide)
i.e. d is certainly NOT most closely related to a or to b
Visualising Trees
Demonstration and Exercise
Viewing and manipulating unscaled trees with NJplot•Rotating around internal branches•Re-rooting