Upload
sheryl-harrington
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
How will new sequencing technologies enable the
HMP?Elaine Mardis, Ph.D.
Associate Professor of GeneticsCo-Director, Genome Sequencing
CenterWashington University School of
Advantages of Next Gen Platforms
• No sub-cloning, no use of E. coli as host- cloning bias abolished
- one FTE can keep several instruments busy
• Each sequence is from a unique DNA molecule
- quantitation is possible through “counting”
- enhanced dynamic range- detection of rare variants
• Multiple sequence-based assays on one platform
New Sequencing Platforms
• Roche FLX Sequencer
• Illumina 1G Analyzer
• ABI SOLiD Sequencer
• Helicos Single-molecule [email protected]
Roche FLX: Vital Statistics
• >100Mb data/7 hours/$16K• Read lengths average 250 bp• Accuracy is hindered by homopolymer run
in/dels• Coverage model is higher than for 3730 data
© Elaine Mardis, Ph.D.
Currently:
By year’s end:
• Improved pipeline and read assembly software• Paired end reads• 400 bp read lengths• Bar-code tagging of libraries
Illumina 1G Analyzer: Vitals
• 1 Gb/4 days/$3-5000 • 40 bp read lengths, 8 channel flow cell• Read accuracy is highest in 1st 25 bp, ~1%
overall error rate
• Biased representation of high AT regions
Currently:
By year’s end:• Paired end read capability• 50 bp read lengths• Improved short read mapping, assembly algorithms (?)
Cross-Platform Comparisons
Platform cost $350K $500K $395KRead length 650 bp + 250 bp 40-50 bp
Cost/run $55 $16,000 $3-5,000
Mbp/day 1.4 200 333
Cost/Mbp $880 $160 $5
Accuracy highNo subs,Indels at
homopolymershigh
Paired end reads Yes Coming Yes*
Criterion 3730 Roche Illumina
© Elaine Mardis, Ph.D.
AB SOLiD™: Vital Statistics
• 500Mb-1Gb/5 days/?$$• 50 base pair read lengths/ paired end
or fragment reads• Ligation based sequencing with high
accuracy due to 2-base encoding• Analysis software is unknown• Early access platform due Q3 of ‘07
HeliScope sequencer• Single molecule detection obviates PCR
amplification step
• >25Mbp/hour initial data rate, 1000Mbp/hour
ultimately with <1% error rate
• Short read lengths, single molecule
sequencing with high fidelity
• Two 25 channel flow cells
• Read mapping/assembly capability (?)
Comparative metagenomics: Cecal contents of obese mice (ob/ob) and lean littermates
• EXPERIMENTAL DESIGN: 1) Remove cecal contents of 2
ob/ob, 2 +/+, and 1 ob/+ C57Bl/6J mice and isolate DNA.
2) 454 pyrosequencing of total DNA - 350,000 reads/mouse (one ob/ob, one +/+ mouse).
3) Compare data from each mouse to all known bacterial sequences.
4) Use data clustering methods to examine similarities and differences between all 5 mice that were sequenced.
5) Perform microbiota transplantation to test for ability to transfer phenotype to gnotobiotic mice.
© Elaine Mardis, Ph.D.
Next Gen RNA Sequencing• Our laboratory has developed a robust full-
length cDNA process for 454-based sequencing of eukaryotic transcriptomes that features low input of total RNA, enzyme-based normalization and the ability to preferentially sequence the 5’ ends of cDNAs.
• We presently are working to modify this approach for sequencing microbiotal transcriptomes and clinical isolates likely to contain viral RNA genomes (e.g. nasal lavage samples).
© Elaine Mardis, Ph.D.
Illumina ‘Mockagenomics’ Experiment
• We created two mock metagenomic samples by combining known bacterial and human genomic DNAs and sequenced them by Illumina platform to generate short (30bp) reads.
• We plan to compare the relative strengths of classification by assembly and alignment to those of “signature” characterization (GC content, kmer analysis) for short read data
Practical Issues
• DNA quality and quantity• Value of paired end vs. fragment
reads• Normalization vs. quantitation• Depth of “search space”
Sample prep
• Evaluate DNA• Fragment (2-500bp)• Repair ends• Adapter ligate• Enrich• Amplify on
bead(Roche/AB) or on glass slide (Illumina)
• Evaluate DNA• Fragment (2.5kb)• Repair ends• Adapter ligate• Methylate• Restrict adapters• Circularize• 2° restriction with
type IIS enzyme• Purify tags+adapter• Amplify
Fragment reads Paired end reads
Paired End Libraries
Internal Adapter
25 base
Tag #1
25 base
Tag #2
Mate Pair Library
EcoP15I orfragmentation
Sequencing:
PESP#1 PESP#2
NaIO4 U.S.E.R.
Read 1 (25 to 40 cycles) Read 2 (25-40 cycles)Total 50-80 cycles
3-primer PE method
Graft:P7:P7diol:9TUP5
[P7+P7diol] = [9TUP5]
P5 P7 P7diolUP5 P7 P7diol
UP5 P7 P7diol
U
P7diol & 9TUP5 linearisable
P7 non-linearisable
Cluster formation:Heterogeneous clusters containing:• P7/9TUP5 bridges• P7diol/9TUP5 bridges
SBS8 SBS3
NaIO4 USER
S B S 8 S B S 3
N a I O 4NaIO 4 USERUSER
P7diol/9TUP5 P7/9TUP5
What are the issues?
• Consented sample availability!!• Read length and accuracy• Sample complexity• Sensitivity to detect • Coverage and cost• DNA vs. RNA• Bioinformatics-based analyses
Bioinformatics Challenges
• Most daunting issue: the ability to analyze enormous data sets intelligently and efficiently
• Metagenomic analysis tools are now emerging for next gen sequence data
• Testing and implementation into analysis pipelines will follow
• Output is only as good as the depth of the search space and the depth of coverage for any given combination of sample & sequencer