kGEM : An EM-based Algorithm for Local Reconstruction of Viral Quasispecies

kGEM: An EM-based Algorithm for Local Reconstruction

of Viral Quasispecies

Alexander Artyomenko

ICCABS 2013

Introduction

• Reconstructing spectrum of viral population

• Challenges:– Assembling short reads to span entire genome

– Distinguishing sequencing errors from mutations

• Avoid assembling:– ID sequences via high variability region

Previous Work

• KEC (k-mer Error Correction) [Skums et al.]– Incorporates counts (frequencies) of k-mers

(substrings of length k)• QuasiRecomb (Quasispecies Recombination)

[Töpfer et. al]– Hidden Markov Model-based approach– Incorporates possibility for recombinant progeny– Parameter: k generators (ancestor haplotypes)

Problem Formulation

• Given: a set of reads R emitted by a set of

unknown haplotypes H’

• Find: a set of haplotypes H={H1,…,Hk}

maximizing Pr(R|H)

Fractional HaplotypeFractional Haplotype: a string of 5-tuples of probabilities for each possible symbol: a, c, t, g, d=‘-’

a c - t c t g c

a 0.71 0.06 0.0 0.13 0.0 0.27 0.10 0.03c 0.13 0.94 0.0 0.0 0.64 0.0 0.14 0.58t 0.16 0.0 0.01 0.87 0.11 0.73 0.0 0.09g 0.0 0.0 0.21 0.0 0.25 0.0 0.76 0.09d 0.0 0.0 0.78 0.0 0.0 0.0 0.0 0.21

Initialize (fractional) HaplotypesRepeat until Haplotypes are unchanged

Estimate Pr(r|Hi) probability of a read r being emitted by haplotype Hi

Estimate frequencies of Haplotypes Update and Round Haplotypes

Collapse Identical and Drop Rare HaplotypesOutput Haplotypes

Initialization• Find set of reads representing haplotype population– Start with a random read– Each next read maximizes minimum distance to previously chosen

InitializationTransform selected reads into fractional haplotypes using formula:

where sm is i-th nucleotide of selected read s. a c - t g - g a - c ε=0.01

a 0.96 0.01 0.01 0.01 0.01 0.01 0.01 0.96 0.01 0.01c 0.01 0.96 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.96t 0.01 0.01 0.01 0.96 0.01 0.01 0.01 0.01 0.01 0.01g 0.01 0.01 0.01 0.01 0.96 0.01 0.96 0.01 0.01 0.01d 0.01 0.01 0.96 0.01 0.01 0.96 0.01 0.01 0.96 0.01

Read Emission Probability

For each i=1, … , k and for each read rj from R compute value:

Reads Haplotypesh1,1

Estimate FrequenciesEstimate haplotype frequencies via Expectation Maximization (EM) method • Repeat two steps until the change < σ E-step: expected portion of r emitted by Hi

M-step: updated frequency of haplotype Hi

Update Haplotypes• Update allele frequencies for each haplotype

according to read’s contribution:

a 0.71 0.06 0.0 0.13 0.0 0.27

0.10 0.03c 0.13 0.94 0.0 0.0 0.64 0.0 0.14 0.58t 0.16 0.0 0.01 0.87 0.11 0.73 0.0 0.09g 0.0 0.0 0.21 0.0 0.25 0.0 0.76 0.09d 0.0 0.0 0.78 0.0 0.0 0.0 0.0 0.21

• Round each haplotype’s position to most probable allele

a 0.76 0.0 0.01 0.06 0.77 0.0 0.29

0.14 0.09c 0.11 0.89 0.01 0.01 0.23 0.68 0.0 0.06 0.50t 0.13 0.0 0.11 0.93 0.0 0.14 0.71 0.0 0.04g 0.01 0.0 0.21 0.0 0.0 0.18 0.0 0.80 0.23d 0.01 0.11 0.68 0.0 0.0 0.0 0.0 0.0 0.14

a 0.76 0.0 0.01 0.06 0.77 0.0 0.29

0.14 0.09c 0.11 0.89 0.01 0.01 0.23 0.68 0.0 0.06 0.50t 0.13 0.0 0.11 0.93 0.0 0.14 0.71 0.0 0.04g 0.01 0.0 0.21 0.0 0.0 0.18 0.0 0.80 0.23d 0.01 0.11 0.68 0.0 0.0 0.0 0.0 0.0 0.14

a 0.76 0.0 0.01 0.06 0.77 0.0 0.29

0.14 0.09c 0.11 0.89 0.01 0.01 0.23 0.68 0.0 0.06 0.50t 0.13 0.0 0.11 0.93 0.0 0.14 0.71 0.0 0.04g 0.01 0.0 0.21 0.0 0.0 0.18 0.0 0.80 0.23d 0.01 0.11 0.68 0.0 0.0 0.0 0.0 0.0 0.14

a 0.76 0.0 0.01 0.06 0.77 0.0 0.29

0.14 0.09c 0.11 0.89 0.01 0.01 0.23 0.68 0.0 0.06 0.50t 0.13 0.0 0.11 0.93 0.0 0.14 0.71 0.0 0.04g 0.01 0.0 0.21 0.0 0.0 0.18 0.0 0.80 0.23d 0.01 0.11 0.68 0.0 0.0 0.0 0.0 0.0 0.14

a 0.96 0.01 0.01 0.01 0.96 0.01 0.01

0.01 0.01c 0.01 0.96 0.01 0.01 0.01 0.96 0.01 0.01 0.96t 0.01 0.01 0.01 0.96 0.01 0.01 0.96 0.01 0.01g 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.96 0.01d 0.01 0.01 0.96 0.01 0.01 0.01 0.01 0.01 0.01

Round Haplotypes

a c - t a c t g c

Collapse and Drop Rare

• Collapse haplotypes which have the same integral strings

• Drop haplotypes with coverage ≤δ–Empirically, δ<5 implies drop in PPV without

improving sensitivity

Initialize (fractional) HaplotypesRepeat until Haplotypes are unchanged

Estimate Pr(r|Hi) probability of a read r being emitted by haplotype Hi

Estimate frequencies of Haplotypes Update and Round Haplotypes

Collapse Identical and Drop Rare HaplotypesOutput Haplotypes

Experimental Setup• HCV E1E2 sub-region (315bp) • 20 simulated data sets of 10 variants• 100,000 reads from Grinder 0.5• 10 datasets with homo-polymer errors • Frequency distribution: uniform and

power-law model with parameter α= 2.0

Nicholas Mancuso Alex Zelikovsky

Pavel SkumsIon Măndoiu

Acknowledgements

Thank you! Questions?

kGEM : An EM-based Algorithm for Local Reconstruction of Viral Quasispecies

Documents

THE QUASISPECIES OF EV-A71 IN HAND, FOOT AND MOUTH … Quasispecies of EV-A71 hand, foot, and... · Dalam kajian ini, kami menentukan kuasispesies populasi virus EV-A71 kumpulan B5,

Viral Safety, Practical Solutions for Risk Control,Viral

Viral DNA and Characterization of the Endogenous Viral

Reconstruction of infectious bronchitis virus quasispecies from 454 pyrosequencing reads

Viral Viral

10 BACTERIAL/VIRAL FILTERS IN PULMONARY FUNCTION DEPARTMENTS viral filter... · BACTERIAL/VIRAL FILTERS IN PULMONARY FUNCTION ... pulmonary function testing state ... BACTERIAL/VIRAL

kGEM : an EM Error C orrection Algorithm for NGS Amplicon -based Data

The Outcome of Acute Hepatitis C Predicted by the Evolution of the Viral Quasispecies Farci et al. (2000) Science 288, 339-344. Georg Gerber HST.120 December

Hepatitis C virus quasispecies in plasma and peripheral ...hepatitis.cl/wp-content/uploads/2012/08/2009-3... · Hepatitis C virus quasispecies in plasma and peripheral blood mononuclear

Viral Dna - The Chemistry of Viral Marketing

Viral Quasispecies Evolution · iral quasispecies evolution refers to the fact that RNA viral populations consist of mutant spectra (or mutant clouds) rather than genomes with the

Quasispecies Assembly Using Network Flows

12.Transmitting Your Viral Message via Viral Marketing

Viral Vector and Non-viral Vector

Bovine Viral Diarrhea Virus Quasispecies during Persistent Infection · 2015-03-30 · Bovine Viral Diarrhea Virus Quasispecies during Persistent Infection Margaret E. Collins,1 Moira

Compositional Assemblies Behave Similarly to Quasispecies Model

Make Me A Viral: Viral Video & YouTube

Presentations tips viral viral

VİRAL HEPATİTLER VE VİRAL HEPATİTLERDEN KORUNMA

cevi i fiting - SANIVOD - Vodovod i · PDF file8 KG (PVC) Cevi i fiting Proizvodni program 2009 Ulična kanalizacija SRPS-EN1401 KGEM Cev SDR51, SN2 KGEM Cev SDR41, SN4