19
CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive Lateral Genetic Transfer: A Probabilistic Analysis by Sebastien Roch and Sagi Snir

CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

CS 581 Paper PresentationMuhammad Samir Khan

Recovering the Treelike Trend of Evolution Despite Extensive Lateral Genetic Transfer: A Probabilistic Analysis

by

Sebastien Roch and Sagi Snir

Page 2: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

Overview

• Introduction (what is LGT?)

• Notation

• Model• Bounded-rates Model• Yule Process

• Quartet Based Approach• Bounded Rates Model• Yule Process• Preferential LGT

• Further Results

Page 3: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

What is LGT?

• Non-vertical transfer of genes

• Overall evolution is tree-like

• Particularly common in bacteria

• Primary Reason for the spreadof antibiotic resistance 1

1. https://en.wikipedia.org/wiki/Horizontal_gene_transfer2. http://www.nature.com/nrmicro/journal/v3/n9/images/nrmicro1253-f1.gif

Page 4: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

Species Phylogeny

• 𝑇𝑠 = (𝑉𝑠, 𝐸𝑠, 𝐿𝑠: 𝑟, 𝜏)• 𝑉𝑠 vertices

• 𝐸𝑠 edges

• 𝐿𝑠 leaves

• 𝑟 root

• 𝜏(𝑒) interspeciation times

• Number of leaves 𝑛 = 𝑛+ + 𝑛−

• 𝑛+ > 0 extant species

• 𝑛− ≥ 0 extinct species

𝑟

extinct

extant𝜏(𝑒)

Page 5: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

Extant Phylogeny

• Denoted 𝑇𝑠+ = (𝑉𝑠

+, 𝐸𝑠+, 𝐿𝑠

+: 𝑟+, 𝜏+)• Restrict to extant leaves 𝑇𝑠|𝐿𝑠

+

• Suppress vertices of degree 2 (add up the branch lengths)

• Root at the most recent common ancestor of 𝐿𝑠

+

• 𝑇𝑠+ is ultrametric

• Want to recover the extant phylogeny

𝑟

time

Page 6: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

Gene Trees

• 𝑇𝑔 = (𝑉𝑔, 𝐸𝑔, 𝐿𝑔: 𝜔𝑔) for a gene 𝑔 is an unrooted tree• 𝑉𝑔 vertices

• 𝐸𝑔 edges

• 𝐿𝑔 leaves subset of 𝐿𝑠• 𝜔𝑔(𝑒) branch lengths (expected number of substitutions)

• Each vertex of degree 2 or 3

• 𝒯𝑔 = 𝒯[𝑇𝑔] is the topology of 𝑇𝑔 with degree 2 vertices suppressed

• Not ultrametric

Page 7: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

LGT Transfer – Subtree Prune and Regraft

• LGT Transfer takes place on locations along the edges

• Recipient location: pruning

• Donor location: regrafting

• A new node at donor location

1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.

Page 8: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

Contemporaneous Locations

• Two locations 𝑥, 𝑦 are contemporaneous if their 𝜏-distance to the root is identical:

𝜏 𝑥, 𝑟 = 𝜏(𝑦, 𝑟)

• For 𝑅 > 0, 𝐶𝑥(𝑅)

is the set of locations contemporaneous to 𝑥 and with MRCA at 𝜏-distance at most 𝑅 from 𝑥:

𝐶𝑥(𝑅)

= 𝑦 ∶ 𝜏 𝑥, 𝑟 = 𝜏 𝑦, 𝑟 , 𝜏 𝑥, 𝑦 ≤ 2𝑅

Page 9: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

Random LGT

• Species phylogeny fixed 𝑇𝑠 = 𝑉𝑠, 𝐸𝑠, 𝐿𝑠: 𝑟, 𝜏

• 0 < 𝑅 ≤ ∞ (possibly depending on 𝑛)

• Each edge has a rate of LGT 𝜆 𝑒 : 0 < 𝜆 𝑒 < +∞• Λ 𝑒 = 𝜆 𝑒 𝜏 𝑒

• Λ𝑡𝑜𝑡 = σ𝑒∈𝐸𝑠Λ 𝑒

• Λ = σ𝑒∈𝐸(𝑇𝑠|𝐿𝑠

+)Λ 𝑒

• Taxon sampling probability 𝑝 ∶ 0 < 𝑝 ≤ 1

Page 10: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

Random LGT

• LGT locations:• Start from root (chronologically)• Along each edge 𝑒 ∈ 𝐸𝑠, select a recipient location according to a continuous-

time Poisson process with rate 𝜆 𝑒• If 𝑥 is selected as a recipient location, donor location is selected uniformly at

random from 𝐶𝑥𝑅

• Keep each extant leaf independently with probability 𝑝, to get 𝐿𝑔

• Gene tree 𝑇𝑔 is obtained by keeping the subtree restricted to 𝐿𝑔

Page 11: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

Bounded Rates Model

• Constants:• 𝜌𝜆 ∶ 0 < 𝜌𝜆 < 1

• 𝜌𝜏 ∶ 0 < 𝜌𝜏 < 1

• ҧ𝜏 ∶ 0 < ҧ𝜏 < +∞

• ҧ𝜆 possibly depending on 𝑛+ : 0 < ҧ𝜆 < +∞• Used to control the amount of LGT

• Under the bounded rates model:𝜌𝜆 ҧ𝜆 ≤ 𝜆 𝑒 ≤ ҧ𝜆 ∀𝑒 ∈ 𝐸𝑠𝜌𝜏 ҧ𝜏 ≤ 𝜏+ 𝑒+ ≤ ҧ𝜏 ∀𝑒+ ∈ 𝐸𝑠

+

Page 12: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

Yule Process

• Branching process that starts with two species

• Each species generates a new offspring at rate 𝜈 ∶ 0 < 𝜈 < +∞• No extinct species

• Stop when number of species = 𝑛 + 1 (ignore the last species)

• 𝜌𝜆 ҧ𝜆 ≤ 𝜆 𝑒 ≤ ҧ𝜆 for every edge 𝑒 ∈ 𝐸𝑠• 𝜌𝜆 constant: 0 < 𝜌𝜆 < 1• ҧ𝜆 possibly depending on 𝑛: 0 < ҧ𝜆 < +∞

Page 13: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

Quartet Based Approach

• Input: Gene trees 𝑇𝑔1 , … , 𝑇𝑔𝑁 Output: Estimated extant species phylogeny 𝑇

• Let 𝑋 = 𝑎, 𝑏, 𝑐, 𝑑 be a four-tuple of extant species• Three possible quartets

• 𝑞1 = 𝑎𝑏|𝑐𝑑

• 𝑞2 = 𝑎𝑐|𝑏𝑑

• 𝑞3 = 𝑎𝑑|𝑏𝑐

• Frequency of quartet:

𝑓𝑋 𝑞𝑖 =𝑔𝑗∶𝑋⊆𝐿𝑔𝑗 ,𝒯𝑔𝑗|𝑋=𝑞𝑖

𝑔𝑗∶𝑋⊆𝐿𝑔𝑗

Page 14: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

Quartet Based Approach

1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.

Page 15: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

Bounded Rates Model

1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.

Page 16: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

Yules Process

1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.

Page 17: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

Preferential LGT

1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.

Page 18: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

Further Results

• Highways of LGT• The same model as before with additional “highways”• Highways are pairs of edges where LGT occurs deterministically• Highways can be different for different genes• Same result holds under the bounded rates model

• Assuming no extinctions• Frequency of genes affected by highways is low

• Distance Based Approach under the GTR model• Compute the distance matrix by using the median of distances• Use any statistically consistent distance based method

Page 19: CS 581 Paper Presentationtandy.cs.illinois.edu/khan-presentation.pdf · CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive

Questions?