40
Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational Molecular Evolution IMBG-HCMR, Heraklion, Greece Monday 3rd - Tuesday 4th May 2010

Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Embed Size (px)

Citation preview

Page 1: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Interpreting molecular phylogenetic trees

Aidan BuddStructural and Computational Biology

UnitEMBL Heidelberg, Germany

EMBO Practical Course on Computational Molecular Evolution

IMBG-HCMR, Heraklion, Greece

Monday 3rd - Tuesday 4th May 2010

Page 2: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Part 1

Page 3: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Session Homepage:

http://tinyurl.com/interpretPhyloMolEvol2010

Page 4: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Session Aims

1.Highlight key aspects of molecular evolutionary studies• common concepts

• clearly it's important to understand these as well as possible

• typical questions

• helps planning your own analyses

• applications to other fields

• helps identifying potential collaborations and applying for funding

2.Review basic concepts and terminology1.Provide a common background for later sessions

• Demonstrate selected important tools and resources

• Place different components of an analysis in context

Page 5: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Not just you think molecular

evolution/phylogenetics is important...

Page 6: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Number of Publications Per Year

query: ((phylogeny OR phylogenetic OR phylogenies OR phylogenetics))

Source: ISI Web of Knowledge as of 28.03.2010

query: (((evolution OR evolutionary OR evolves OR evolve) AND (molecule OR molecular OR molecules)))

molecular evolutionphylogenies

total since 1975: +100,000

total since 1975: 72638

• Many molecular evolution/phylogenetics articles are published each year

• Number of articles published increases year-by-year

Page 7: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Highly Cited Articles

Method/SoftwarePublicati

onYear

Original Citation# of

Citations

MEGA3 2004Kumar et al. 2004Brief Bioinform.;5(2):150-63. PMID: 15260895

6630

MRBAYES 2001Huelsenbeck and Ronquist 2001Bioinformatics;17(8):754-5 PMID: 11524383

5707

CLUSTALW ** 1994Thompson et al. 1994Nucleic Acids Res.;22(22):4673-80 PMID: 7984417

29658

BLAST * 1990Altschul et al. 1990J Mol Biol.;215(3):403-10 PMID: 2231712

27660

Neighbor-Joining Algorithm

1987Saitou and Nei 1987Mol Biol Evol.;4(4):406-25 PMID: 3447015

20523

Non-Parametric Bootstrap in Phylogenetics

1985Felsenstein 1985Evolution;39(4):783-91 PMID: N/A

14566Source: ISI Web of Knowledge, as of 29.03.2010as of 2006 in the ISI web of knowledge:

* most cited paper that year, 26th most cited in the entirety of science** second most cited paper that year, 31st most cited paper in the entirety of science

•Some molecular-evolution-related articles are VERY highly cited

Page 8: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Applications of Phylogenies

Page 9: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Applications of Phylogenetics

•Epidemiology

•Forensics

•Medical treatment selection

•Selecting conservation targets

•Monitoring trade in illegal organisms

•Bioinformatics tools - in particular:

•building MSAs

•predicting function

Page 10: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Applications of Phylogenetics:Epidemiology

Clonal origin and evolution of a transmissible cancer Nyrgua et al. PMID: 16901782

Characterise evolutionary history of a pathogenDog cancer known to be infectious - is the infectious agent:•a virus (cf Human Papilloma Virus)?•dog cancer cells themselves?Root assumed on this branchTumor and host-dog DNA compared and used to draw a treeAll tumor sequences more closely related to each other than any are to host-dogs

Tumor itself is the infectious agent!

Page 11: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Applications of Phylogenetics:Forensics

Analysis of a rape case by direct sequencing of the human immunodeficiency virus type 1 pol and gag genes.Albert et al., PMID:7520096

Using HIV pol and gag genes to estimate phylogeny of viruses from•male rape suspect•female rape victim•random individualsFemale victim’s virus is clearly more closely related to male suspect’s viruses compared to any other sequencesSupports guilt of male suspectConclusion depends on determining order of lineage divergence events

Page 12: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Applications of Phylogenetics:Medical Treatment Selection

Nocardia cyriacigeorgica, an emerging pathogen in the United States.Schlaberg R, Huard RC, Della-Latta P.J Clin Microbiol. 2008 Jan;46(1):265-73.PMID: 18003809Figure 2 Nocardia

cyriacigeorgica

Characters such as drug resistance can vary across the phylogeny

Drug selection for a novel strain can be informed using phylogeny and knowledge of these character distributions

Page 13: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Applications of Phylogenetics:Selecting Conservation Targets

Prioritise organisms for inclusion in conservation programs, taking into account•phylogenetic diversity•conservation costs•probability of extinction

Resource-aware taxon selection for maximizing phylogenetic diversity.Pardi F, Goldman N.Syst Biol. 2007 Jun;56(3):431-44.MID: 17558965Figure 4

Page 14: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Applications of Phylogenies:Monitoring Trade in Illegal Organisms

Genetic evidence of illegal trade in protected whales links Japan with the US and South Korea.Baker CS, Steel D, Choi Y, Lee H, Kim KS, Choi SK, Ma YU, Hambleton C, Psihoyos L, Brownell RL, Funahashi N.Biol Lett. 2010 Apr 14. Figure 1

Determining the source animals for unlabelled meats

Used here to track illegal trade in protected whale and dolphin species

Page 15: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Applications of Phylogenetics:Bioinformatics Tools

STRING

STRING 8--a global view on proteins and their functional interactions in 630 organisms.Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C.Nucleic Acids Res. 2009 Jan;37(Database issue):D412-6.PMID: 18940858

Further exampleA tree-based conservation scoring method for short linear motifs in multiple alignments of protein sequences.Chica C, Labarga A, Gould CM, López R, Gibson TJ.BMC Bioinformatics. 2008 May 6;9:229.PMID: 18460207

Page 16: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Applications of Phylogenetics:Building Progressive Multiple

Alignments

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.Thompson JD, Higgins DG, Gibson TJ.Nucleic Acids Res. 1994 Nov 11;22(22):4673-80.PMID: 7984417

An algorithm for progressive multiple alignment of sequences with insertions.Löytynoja A, Goldman N.Proc Natl Acad Sci U S A. 2005 Jul 26;102(30):10557-62.PMID: 16000407

Page 17: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Applications of Phylogenetics

•Relatedness

•Timing historical events

Common themes

•Are my sequences related to each other?

NOT!

Page 18: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Rooted Phylogenies

Terminology and Concepts

Page 19: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Alternative Tree-Related Terminologies

leaftipterminal nodeexternal node

brancharcedge

Page 20: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Trees: Branches and Nodes

Trees consist of:

branches

nodes (ends of branches)

Page 21: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Internal/External Nodes/Branches

Branches and Nodes are either:

Node - associated with an extant sequence/OTU (operational taxonomic unit)

external/terminal

Branch - links an external and an internal node

internal/interiorNode - at the intersection of two or more branchesBranch - links two internal nodes

Page 22: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Branches

Branches•represent successive generations of

“taxa”

•‘later” taxa have “earlier” taxa as their ancestors

•i.e. a lineage

•time flows from the base of the tree to the tips

Time

Page 23: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Internal Nodes

•represent hypothetical ancestral taxa/sequences/organisms i.e. HTUs - hypothetical taxonomic units

Internal Nodes

•A "special' internal node

•The most recent common ancestor of all OTUs

•Usually implies many other less recent common ancestors

Root (Root Node)

Time

Page 24: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Parent/Daughter Branches

Time

diverge into

multiple daughter lineages/branches

parental/ancestral lineages/branches

Page 25: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Polytomies

Internal nodes with two daughter branches are bifurcations

Internal nodes associated with more than two daughter branches

Polytomies

How many bifurcations on the tree? (a) 4 (b) 5 (c) 6

Page 26: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Interpreting Polytomies

A B C

Soft Polytomy

Lineages only bifurcate - internal lineages so short that no identifiable change/evolution occurred along them

Thus true pattern of lineage divergence cannot be resolved

A B C

Hard Polytomy

Ancestral lineage diverged into 3+ lineages simultaneously

NB: Some software only accepts bifurcating trees

Page 27: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

OTU Branchesa zy yt tu uv va

b zy yt tu uv vb

c zy yt tu uc

d zy yt td

e zy ye

f zfg zx xw wg

h zx xw wh

i zx xi

Relatedness

Time

tu

v

w

x

y

z

We can list the set of branches in the complete lineage of each OTU

that neither shares with e

a and care more closely related to each other

because their complete lineages share (at least one) more recent common...

than either is to e

ancestor(s) [ t and u ]branch(es) [ yt and tu ]

Page 28: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Relatedness Statements: 'X is more closely related to Y

than to Z'

Time

tu

v

w

x

y

z

OTU Branchesa zy yt tu uv va

b zy yt tu uv vb

c zy yt tu uc

d zy yt td

e zy ye

f zfg zx xw wg

h zx xw wh

i zx xi

•a is more closely related to b than either is to g•b, d and e are more closely related to each other than they are to f

A set of OTUs are more closely related to each other than any are to other OTUs if they share more recent common ancestor(s)/lineage(s) not shared with other OTUs

Page 29: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

•g is equally distantly related to f as (g is related) to d•i is equally distantly related to f, e, and a

Relatedness Statements: 'X is equally distantly related to

Y and Z'

Time

tu

v

w

x

y

z

OTU Branchesa zy yt tu uv va

b zy yt tu uv vb

c zy yt tu uc

d zy yt td

e zy ye

f zfg zx xw wg

h zx xw wh

i zx xi

An OTU is equally distantly related to two (or more) other OTUs if it shares the same most recent common ancestor with these two (or more) OTUs

Page 30: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Relatedness Statements: 'X is most closely related to Y'

Time

tu

v

w

x

y

z

OTU Branchesa zy yt tu uv va

b zy yt tu uv vb

c zy yt tu uc

d zy yt td

e zy ye

f zfg zx xw wg

h zx xw wh

i zx xi

•g is most closely related to h•a, b, and c are most closely related to d

An OTU (or a set of OTUs) are most closely related to another OTU (or set of OTUs) if they share one (or more) most recent common ancestor(s)/lineage(s) not shared by any other OTU (or group of OTUs)

Page 31: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Relatedness Statements: 'X is the sister group of Y'

Time

tu

v

w

x

y

z

•d is the sister group of the group of taxa [a, b, c]•the group of taxa [g, h] and the taxon i are sister groups

OTUs (or groups of OTUs) most closely related to each other are sometimes referred to as sister groups

OTU Branchesa zy yt tu uv va

b zy yt tu uv vb

c zy yt tu uc

d zy yt td

e zy ye

f zfg zx xw wg

h zx xw wh

i zx xi

Page 32: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Relatedness

1. a2. d3. d and b4. a and b5. b

Time

tu

v

w

x

y

z

OTU Branchesa zy yt tu uv va

b zy yt tu uv vb

c zy yt tu uc

d zy yt td

e zy ye

f zfg zx xw wg

h zx xw wh

i zx xi

c is most closely related to...?

Page 33: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Tree Topology

Trees “rotated” around internal branches have the same topology

For rooted trees, different topologies describe different patterns of relatedness of OTUs

Trees with identical topologies

x

y

z

w

The branch intersections (i.e. internal nodes) of a tree specify its topology

Node Branchesz yz za zby xy yw yzw yw wc wdx xy xe

xy

z

w

Page 34: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Tree Representations

Most rooted tree figures use a “rectangular” rather than a “diagonal” representation

Diagonal

Rectangular

Rectangular trees represent internal nodes with lines perpendicular to lines representing the branches

Page 35: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Unrooted Phylogenies

Page 36: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Unrooted Trees

There’s no root on the tree

Thus, there is no statement about the DIRECTION of time (i.e. of direction of divergence of lineages in the tree)

Many applications of phylogenies require a rooted treeBut many tree estimation tools yield only unrooted trees!

thus - we can’t distinguish daughter from parent branches

Page 37: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

There are multiple rooted trees possible from a given unrooted tree

The number of possible rooted tree topologies is the number of branches on the unrooted tree (assuming a bifurcating root)

Unrooted Rooted

Page 38: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

There are multiple rooted trees possible from a given unrooted tree

If you allow a polytomy at the root...

there is one additional rooted tree possible for each internal node in the unrooted tree

Unrooted Rooted

cd

ba

ab

dc

Page 39: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Which sequence is d most closely related to?

a

b

c

Quiz

None of them - It depends where the root is!

However, if d is most closely related to a single OTU and not to a group, then that OTU must be c (see previous slide)

i.e. d is certainly NOT most closely related to a or to b

Page 40: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational

Visualising Trees

Demonstration and Exercise

Viewing and manipulating unscaled trees with NJplot•Rotating around internal branches•Re-rooting