Upload
braith
View
49
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Reconstruction of Haplotype Spectra from NGS Data. Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering University of Connecticut. Haplotype Spectra Reconstruction. Given NGS reads, reconstruct: Full length sequences - PowerPoint PPT Presentation
Citation preview
Reconstruction of Haplotype Spectra from NGS Data
Ion MandoiuUTC Associate Professor in Engineering InnovationDepartment of Computer Science & Engineering
University of Connecticut
Haplotype Spectra Reconstruction
• Given NGS reads, reconstruct:– Full length sequences– Sequence frequencies
• Example applications:– Single individual haplotyping– Allele specific transcriptome reconstruction– Viral quasispecies reconstruction
Single Individual Haplotyping• Somatic cells are diploid, containing two nearly
identical copies of each autosomal chromosome– Heterozygous loci found by mapping reads to reference
genome– Long haplotype fragments can be generated by
sequencing fosmid pools [Duitama et al. 2012]
RefHap Algorithm [Duitama et al. 12]
• Reduce the problem to Max-Cut• Solve Max-Cut• Build haplotypes according with the cut
Locus 1 2 3 4 5f1 * 0 1 1 0
f2 1 1 0 * 1
f3 1 * * 0 *
f4 * 0 0 * 1
3f1
1
1 -1
-1f4
f2
f3
h1 00110h2 11001
Chr. 22, 32k SNPs, 14k fragments
Haplotype Spectra Reconstruction
• Given short sequence fragments, reconstruct:– Full length sequences– Sequence frequencies
• Example applications:– Single individual haplotyping– Allele specific transcriptome reconstruction– Viral quasispecies reconstruction
Transcriptome Reconstruction Challenge: Alternative Splicing
[Griffith and Marra 07]
1 742 3 65t1 :
1 743 65t2 :
1 742 3 5t3 :
t4 : 1 743 5
1 742 3 65
• Map the RNA-Seq reads to genome
• Construct Splice Graph - G(V,E)– V : exons– E: splicing events
• Generate candidate transcripts– Depth-first-search (DFS)
• Filter candidate transcripts– Fragment length distribution (FLD)– Integer programming
Genome
TRIPTransciptome Reconstruction using Integer Programming
How to filter?
• Select the smallest set of putative transcripts that yields a good statistical fit between– empirically determined during library preparation– implied by “mapping” read pairs
1 3
1 2 3
500
300
200 200 200
200 200
Series1
Mean : 500; Std. dev. 50
Series1
Mean : 500; Std. dev. 50
t3t2 t1
Allele Specific Expression
Haplotype Spectra Reconstruction
• Given short sequence fragments, reconstruct:– Full length sequences– Sequence frequencies
• Example applications:– Single individual haplotyping– Allele specific transcriptome reconstruction– Viral quasispecies reconstruction
RNA Virus ReplicationHigh mutation rate (~10-4)
Lauring & Andino, PLoS Pathogens 2011
Shotgun reads starting
positions distributed
~uniformly
Amplicon reads
have predefined
start/end positions
covering fixed
overlapping windows
Shotgun vs. Amplicon Reads
Reconstruction from Shotgun Reads: ViSpA
Read Error Correction
Read Alignment
Preprocessing of Aligned Reads
Read Graph ConstructionContig AssemblyFrequency
Estimation
Shotgun reads
Quasispecies sequences w/ frequencies
Reconstruction from Amplicon Reads: VirA
Reference in FASTAformat
Error-correctedSAM/BAMRead data
Estimate Amplicons
Max-Bandwidth Paths
Viral population variants with frequencies
Amplicon Read Graph
Frequency Estimation
• K amplicons represented by K-layer read graph
• Vertices distinct reads⇔• Edges reads with consistent overlap⇔• Vertices have count function c(v)
Amplicon Read Graph
Read Graph Transformation• Heuristic to reduce edges in dense graphs
• Replace bipartite cliques with star subgraphs
Challenges
• Scalability• Exploit inherent sparsity of biological instances
• E.g., exact scaffolding algorithm using non-serial
dynamic programming based on SPQR trees
• Flexibility• Long (noisy) reads + short
• Heterogeneous data, e.g., RNA-Seq + TSSeq + PolyA-Seq
• Quantifying reconstruction uncertainty• Compute intensive, e.g., bootstrapping
+
+
+
--
+
-
-
Acknowledgements
Jorge DuitamaSahar Al SeesiMazhar KahnRachel O’Neill
Alexander ArtyomenkoAdrian CaciulaNicholas MancusoSerghei MangulBassam TorkAlex ZelikovskyIrina AstrovskayaPavel Skums