How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

How will new sequencing technologies enable the

HMP?Elaine Mardis, Ph.D.

Associate Professor of GeneticsCo-Director, Genome Sequencing

CenterWashington University School of

[email protected]

Advantages of Next Gen Platforms

• No sub-cloning, no use of E. coli as host- cloning bias abolished

- one FTE can keep several instruments busy

• Each sequence is from a unique DNA molecule

- quantitation is possible through “counting”

- enhanced dynamic range- detection of rare variants

• Multiple sequence-based assays on one platform

[email protected]

New Sequencing Platforms

• Roche FLX Sequencer

• Illumina 1G Analyzer

• ABI SOLiD Sequencer

• Helicos Single-molecule [email protected]

Roche FLX: Vital Statistics

• >100Mb data/7 hours/$16K• Read lengths average 250 bp• Accuracy is hindered by homopolymer run

in/dels• Coverage model is higher than for 3730 data

[email protected]

© Elaine Mardis, Ph.D.

Currently:

By year’s end:

• Improved pipeline and read assembly software• Paired end reads• 400 bp read lengths• Bar-code tagging of libraries

Illumina 1G Analyzer: Vitals

• 1 Gb/4 days/$3-5000 • 40 bp read lengths, 8 channel flow cell• Read accuracy is highest in 1st 25 bp, ~1%

overall error rate

• Biased representation of high AT regions

Currently:

By year’s end:• Paired end read capability• 50 bp read lengths• Improved short read mapping, assembly algorithms (?)

[email protected]

Cross-Platform Comparisons

Platform cost $350K $500K $395KRead length 650 bp + 250 bp 40-50 bp

Cost/run $55 $16,000 $3-5,000

Mbp/day 1.4 200 333

Cost/Mbp $880 $160 $5

Accuracy highNo subs,Indels at

homopolymershigh

Paired end reads Yes Coming Yes*

Criterion 3730 Roche Illumina

[email protected]


AB SOLiD™: Vital Statistics

• 500Mb-1Gb/5 days/?$$• 50 base pair read lengths/ paired end

or fragment reads• Ligation based sequencing with high

accuracy due to 2-base encoding• Analysis software is unknown• Early access platform due Q3 of ‘07

[email protected]

HeliScope sequencer• Single molecule detection obviates PCR

amplification step

• >25Mbp/hour initial data rate, 1000Mbp/hour

ultimately with <1% error rate

• Short read lengths, single molecule

sequencing with high fidelity

• Two 25 channel flow cells

• Read mapping/assembly capability (?)

[email protected]

Comparative metagenomics: Cecal contents of obese mice (ob/ob) and lean littermates

• EXPERIMENTAL DESIGN: 1) Remove cecal contents of 2

ob/ob, 2 +/+, and 1 ob/+ C57Bl/6J mice and isolate DNA.

2) 454 pyrosequencing of total DNA - 350,000 reads/mouse (one ob/ob, one +/+ mouse).

3) Compare data from each mouse to all known bacterial sequences.

4) Use data clustering methods to examine similarities and differences between all 5 mice that were sequenced.

5) Perform microbiota transplantation to test for ability to transfer phenotype to gnotobiotic mice.

[email protected]


Next Gen RNA Sequencing• Our laboratory has developed a robust full-

length cDNA process for 454-based sequencing of eukaryotic transcriptomes that features low input of total RNA, enzyme-based normalization and the ability to preferentially sequence the 5’ ends of cDNAs.

• We presently are working to modify this approach for sequencing microbiotal transcriptomes and clinical isolates likely to contain viral RNA genomes (e.g. nasal lavage samples).

[email protected]


Illumina ‘Mockagenomics’ Experiment

[email protected]

• We created two mock metagenomic samples by combining known bacterial and human genomic DNAs and sequenced them by Illumina platform to generate short (30bp) reads.

• We plan to compare the relative strengths of classification by assembly and alignment to those of “signature” characterization (GC content, kmer analysis) for short read data

Practical Issues

• DNA quality and quantity• Value of paired end vs. fragment

reads• Normalization vs. quantitation• Depth of “search space”

[email protected]

Sample prep

• Evaluate DNA• Fragment (2-500bp)• Repair ends• Adapter ligate• Enrich• Amplify on

bead(Roche/AB) or on glass slide (Illumina)

• Evaluate DNA• Fragment (2.5kb)• Repair ends• Adapter ligate• Methylate• Restrict adapters• Circularize• 2° restriction with

type IIS enzyme• Purify tags+adapter• Amplify

Fragment reads Paired end reads

[email protected]

Paired End Libraries

Internal Adapter

25 base

Tag #1

25 base

Tag #2

Mate Pair Library

EcoP15I orfragmentation

[email protected]

Sequencing:

PESP#1 PESP#2

NaIO4 U.S.E.R.

Read 1 (25 to 40 cycles) Read 2 (25-40 cycles)Total 50-80 cycles

3-primer PE method

Graft:P7:P7diol:9TUP5

[P7+P7diol] = [9TUP5]

P5 P7 P7diolUP5 P7 P7diol

UP5 P7 P7diol

U

P7diol & 9TUP5 linearisable

P7 non-linearisable

Cluster formation:Heterogeneous clusters containing:• P7/9TUP5 bridges• P7diol/9TUP5 bridges

SBS8 SBS3

NaIO4 USER

S B S 8 S B S 3

N a I O 4NaIO 4 USERUSER

P7diol/9TUP5 P7/9TUP5

What are the issues?

• Consented sample availability!!• Read length and accuracy• Sample complexity• Sensitivity to detect • Coverage and cost• DNA vs. RNA• Bioinformatics-based analyses

[email protected]

Bioinformatics Challenges

• Most daunting issue: the ability to analyze enormous data sets intelligently and efficiently

• Metagenomic analysis tools are now emerging for next gen sequence data

• Testing and implementation into analysis pipelines will follow

• Output is only as good as the depth of the search space and the depth of coverage for any given combination of sample & sequencer

[email protected]

Documents

How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington