8
Journal of Biotechnology 136 (2008) 3–10 Contents lists available at ScienceDirect Journal of Biotechnology journal homepage: www.elsevier.com/locate/jbiotec Review The Genome Sequencer FLX TM System—Longer reads, more applications, straight forward bioinformatics and more complete data sets Marcus Droege a,, Brendon Hill b a Roche Applied Science, Global Marketing, 82372 Penzberg, Germany b 454 Life Sciences, Branford, USA article info Article history: Received 3 January 2008 Received in revised form 17 March 2008 Accepted 31 March 2008 Keywords: 2nd generation sequencing technology Human re-sequencing De novo sequencing Transcriptome analysis Microbial genome sequencing Metagenomics Plants sequencing Viral variants abstract The Genome Sequencer FLX System (GS FLX), powered by 454 Sequencing, is a next-generation DNA sequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high through- put. It has been proven to be the most versatile of all currently available next-generation sequencing technologies, supporting many high-profile studies in over seven applications categories. GS FLX users have pursued innovative research in de novo sequencing, re-sequencing of whole genomes and target DNA regions, metagenomics, and RNA analysis. 454 Sequencing is a powerful tool for human genetics research, having recently re-sequenced the genome of an individual human, currently re-sequencing the complete human exome and targeted genomic regions using the NimbleGen sequence capture process, and detected low-frequency somatic mutations linked to cancer. © 2008 Elsevier B.V. All rights reserved. Contents 1. Introduction ............................................................................................................................................ 4 2. The 454 Sequencing technology—an overview........................................................................................................ 4 3. Long reads, high throughput and superior single-read accuracy without filtering against reference sequences .................................... 4 3.1. Average read length of 200–300 bases ......................................................................................................... 4 3.2. Single-read accuracy of more than 99.5%, substitution errors are exceedingly rare .......................................................... 4 3.3. High throughput and cost-efficient split up of runs—up to 2304 samples per large run ..................................................... 6 4. Long reads and superior single-read accuracy ensures broadest application versatility ............................................................. 7 4.1. De novo sequencing (e.g. plant BACs, microbes, viruses) ...................................................................................... 7 4.2. Whole genome re-sequencing (e.g. targeted human genomic regions, structural variations) ................................................ 7 4.3. Amplicon sequencing (e.g. exon re-sequencing, virus variant detection, DNA methylation) ................................................. 8 4.4. Metagenomics and microbial diversity ........................................................................................................ 8 4.5. EST sequencing (e.g. transcriptome survey of organisms with unknown genomes) .......................................................... 8 4.6. Full length/shotgun sequencing of the transcriptome ......................................................................................... 8 4.7. ncRNAs (all classes of ncRNA, from miRNA to >200nt transcripts of unknown function) .................................................... 8 5. Bioinformatics without large investment in enterprise scale infrastructure and people ............................................................. 8 5.1. GS Reference Mapper software ................................................................................................................. 8 5.2. GS De Novo Assembler software ............................................................................................................... 8 5.3. The Amplicon Variant Analyzer (AVA) ......................................................................................................... 8 6. Recent breakthroughs achieved using the Genome Sequencer System............................................................................... 9 6.1. Re-sequencing of the human genome .......................................................................................................... 9 6.2. Re-sequencing of the human exome and targeted gene regions .............................................................................. 9 Corresponding author. Tel.: +49 88566 06888. E-mail address: [email protected] (M. Droege). 0168-1656/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jbiotec.2008.03.021

The Genome Sequencer FLXTM System

Embed Size (px)

DESCRIPTION

About Next Generation Main Idea

Citation preview

Page 1: The Genome Sequencer FLXTM System

R

Ts

Ma

b

a

ARRA

K2HDTMMPV

C

0d

Journal of Biotechnology 136 (2008) 3–10

Contents lists available at ScienceDirect

Journal of Biotechnology

journa l homepage: www.e lsev ier .com/ locate / jb io tec

eview

he Genome Sequencer FLXTM System—Longer reads, more applications,traight forward bioinformatics and more complete data sets

arcus Droegea,∗, Brendon Hillb

Roche Applied Science, Global Marketing, 82372 Penzberg, Germany454 Life Sciences, Branford, USA

r t i c l e i n f o

rticle history:eceived 3 January 2008eceived in revised form 17 March 2008ccepted 31 March 2008

eywords:nd generation sequencing technologyuman re-sequencinge novo sequencingranscriptome analysisicrobial genome sequencingetagenomics

lants sequencingiral variants

a b s t r a c t

The Genome Sequencer FLX System (GS FLX), powered by 454 Sequencing, is a next-generation DNAsequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high through-put. It has been proven to be the most versatile of all currently available next-generation sequencingtechnologies, supporting many high-profile studies in over seven applications categories. GS FLX usershave pursued innovative research in de novo sequencing, re-sequencing of whole genomes and targetDNA regions, metagenomics, and RNA analysis. 454 Sequencing is a powerful tool for human geneticsresearch, having recently re-sequenced the genome of an individual human, currently re-sequencing thecomplete human exome and targeted genomic regions using the NimbleGen sequence capture process,and detected low-frequency somatic mutations linked to cancer.

© 2008 Elsevier B.V. All rights reserved.

ontents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42. The 454 Sequencing technology—an overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43. Long reads, high throughput and superior single-read accuracy without filtering against reference sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.1. Average read length of 200–300 bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.2. Single-read accuracy of more than 99.5%, substitution errors are exceedingly rare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.3. High throughput and cost-efficient split up of runs—up to 2304 samples per large run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4. Long reads and superior single-read accuracy ensures broadest application versatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74.1. De novo sequencing (e.g. plant BACs, microbes, viruses) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74.2. Whole genome re-sequencing (e.g. targeted human genomic regions, structural variations) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74.3. Amplicon sequencing (e.g. exon re-sequencing, virus variant detection, DNA methylation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.4. Metagenomics and microbial diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.5. EST sequencing (e.g. transcriptome survey of organisms with unknown genomes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.6. Full length/shotgun sequencing of the transcriptome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.7. ncRNAs (all classes of ncRNA, from miRNA to >200 nt transcripts of unknown function) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5. Bioinformatics without large investment in enterprise scale infrastructure and people. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85.1. GS Reference Mapper software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5.2. GS De Novo Assembler software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85.3. The Amplicon Variant Analyzer (AVA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

6. Recent breakthroughs achieved using the Genome Sequencer System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

6.1. Re-sequencing of the human genome. . . . . . . . . . . . . . . . . . . . . . . . . . .6.2. Re-sequencing of the human exome and targeted gene region

∗ Corresponding author. Tel.: +49 88566 06888.E-mail address: [email protected] (M. Droege).

168-1656/$ – see front matter © 2008 Elsevier B.V. All rights reserved.oi:10.1016/j.jbiotec.2008.03.021

Page 2: The Genome Sequencer FLXTM System

4 M. Droege, B. Hill / Journal of Biotechnology 136 (2008) 3–10

6.3. Analysis of structural variations in the human genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96.4. Studying the molecular basis for eusociality using expression profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96.5. Metagenomics and microbial diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96.6. Ancient DNA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

7. Even longer reads, considerably higher throughput and significant reductions in the costs per base in 2008 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9. . . . . .

1

busbtsDgsc

gsiwrpm

2

fPausmbeaaw

scscbwmDap

ama7w3ti

ot

Sstfiremteoa(t(

3a

3

tbctoqrSdtF

3a

etaMdeFaish

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. Introduction

For the past 30 years, the Sanger sequencing process haseen the standard method for sequencing DNA. Despite contin-ed advances such as the introduction of capillary electrophoresisystems, and a continuing decrease in costs, this method haseen shown to be prohibitively expensive and time consumingo perform routine high-throughput sequencing such as routineequencing of the human genomes. Demand for faster, affordableNA sequencing has led to the development of so-called “next-eneration” sequencing technologies, which have the potential toequence the human genome for several thousands of dollars in theoming years.

In October 2005, 454 Life Science, a member of the Rocheroup, was the first company to introduce such a next-generationequencing system into the life science market. The 454 Sequenc-ng technology has been enthusiastically adopted by researchers

orldwide and has already been used to achieve several of theecent breakthrough discoveries in genomic research. This articlerovides an overview of the 454 Sequencing technology and sum-arizes some selected breakthroughs achieved using the system.

. The 454 Sequencing technology—an overview

The 454 Sequencing System supports the analysis of samplesrom a wide variety of starting materials including genomic DNA,CR products, BACs, and cDNA. Samples such as genomic DNAnd BACs are fractionated into small, 300–800-basepair fragmentssing a mechanical sheering process (nebulization). For smalleramples, such as small non-coding RNA or PCR amplicons, frag-entation is not required. Using a series of standard molecular

iology techniques (Fig. 1), short adaptors (A and B) are added toach fragment. The adaptors are used for purification, amplification,nd sequencing steps. Single-stranded fragments (sstDNA) with And B adaptors compose the sample library used for subsequentorkflow steps.

During the following emulsion PCR procedure (Fig. 2) thestDNA library is first mixed with an excess of sepharose beadsarrying oligonucleotides complementary to, e.g. the B-adaptorequence of the library fragments. As a result most of these beadsarry a unique single-stranded DNA library fragment. The bead-ound library is then emulsified with amplification reagents in aater-in-oil mixture. Each bead is now captured within its ownicroreactor where clonal amplification of the single-strandedNA fragments occurs. This results in bead-immobilized, clonallymplified DNA fragments (ca. 10 million identical DNA moleculeser bead).

As preparation for the sequencing reaction, sstDNA library beadsre added to the DNA Bead Incubation Mix (containing DNA poly-erase) and are layered with Enzyme Beads (containing sulfurylase

nd luciferase) onto the 454 PicoTiterPlateTM device (Fig. 3). This

0 mm × 75 mm plate is an optical device containing 1.6 millionells at a diameter of 44 �m per well. Only one library bead (around0 �m) fits into one well. The layer of Enzyme Beads ensures thathe DNA beads remain positioned in the wells during the sequenc-ng reaction. The bead-deposition process maximizes the number

pau6n

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

f wells that contain a single-amplified library bead (avoiding morehan one sstDNA library bead per well).

The loaded PicoTiterPlate device is placed into the Genomeequencer FLXTM Instrument (Fig. 4). The fluidics sub-system flowsequencing reagents (containing buffers and nucleotides) acrosshe wells of the plate. Nucleotides are flowed sequentially in axed order across the PicoTiterPlate device during a sequencingun. During the nucleotide flow, hundreds of thousands of beadsach carrying millions of copies of a unique single-stranded DNAolecule are sequenced in parallel. If a nucleotide complemen-

ary to the template strand is flowed into a well, the polymerasextends the existing DNA strand by adding nucleotide(s). Additionf one (or more) nucleotide(s) results in a reaction that generateslight signal that is recorded by the CCD camera in the instrument

Fig. 5). In a limited range, the signal strength is proportional tohe number of nucleotides incorporated in a single-nucleotide flowFig. 6).

. Long reads, high throughput and superior single-readccuracy without filtering against reference sequences

.1. Average read length of 200–300 bases

Read length is one of the most important key factors in high-hroughput sequencing. Gaps in the consensus sequence, causedy repeats, can be covered by longer reads, leading to a moreomprehensive result. Furthermore, long reads enable haplotyping,he identification of low frequency viral quasi species, annotationf fragments isolated from the transcriptome, the more accurateuantification of microbial diversities and so forth. In fact, only theead lengths and single-read accuracies provided by conventionalanger and the new 454 Sequencing technology allow high-qualitye novo sequencing of genomes and transcriptomes. Fig. 7 showshe average read length generated using the Genome SequencerLX System.

.2. Single-read accuracy of more than 99.5%, substitution errorsre exceedingly rare

Compared to the Genome Sequencer 20 System, significantnhancements in the single-read accuracy were integrated intohe Genome Sequencer FLX System (Fig. 8). Currently, single-readccuracies of >99.5% over the first 200 bases are typically achieved.ost notably, this error rate already includes small insertions and

eletions caused by the presence of homopolymers. Substitutionrrors are exceedingly rare (down to 10−6). The Genome SequencerLXTM System offers high-throughput sequencing with single-readccuracies equivalent, or better than traditional Sanger sequenc-ng (99.5%). The vast majority of errors that make up the 0.5%ingle-read error rate are overcalling or undercalling the length ofomopolymeric stretches. The magnitude of this error mode can be

ut into perspective with an example that comes from researcherst the University of Bielefeld (Tauch et al., 2006). A Corynebacteriumrealyticum strain containing 451 oligonucleotides in the range of–13 nucleotides was sequenced using the 454 Sequencing tech-ology (GS20). The lengths of 6 out of the 461 homopolymers were
Page 3: The Genome Sequencer FLXTM System

M. Droege, B. Hill / Journal of Biotechnology 136 (2008) 3–10 5

Fig. 1. Generation of single-stranded fragments (sstDNA) with A and B adaptors compose the sample library used for subsequent workflow steps.

A on

ce

aase

smm

Fl

Fig. 2. Clonal amplification of single sstDN

alled incorrectly, demonstrating the high accuracy of the system,ven in homopolymeric stretches.

Most importantly, 454 Sequencing differentiates itself from

ll other 2nd generation technologies by using quality filteringlgorithms that are independent from the availability of referenceequences. This is of paramount importance in cases when no refer-nce sequence is available or when a comparison to the reference

pces

ig. 3. sstDNA library beads are added to the DNA Bead Incubation Mix (containing Duciferase) onto the 454 PicoTiterPlateTM device.

beads during the so-called emulsion PCR.

equence limits data comprehensiveness. One simple example isicrobial genome sequencing. Bacteria constantly swap geneticaterial with the environment. Haemophilus influenzae as an exam-

le, has as a species 5000 genes, though only ∼1400 genes areonserved among all strains (Hogg et al., 2007). Hence, due to thisnormous genome plasticity of microbial genomes only de novoequencing and subsequent comparison to the reference ensures

NA polymerase) and are layered with Enzyme Beads (containing sulfurylase and

Page 4: The Genome Sequencer FLXTM System

6 M. Droege, B. Hill / Journal of Biote

tmEgi

tggbsigosmgp

32

Fig. 4. The Genome Sequencer FLXTM Instrument.

o get the complete picture about the genome, including infor-ation on additional or deleted genes, structural variations, etc.

ven if a reference sequence is available, a complete comparativeenomics study requires de novo sequencing first, then the compar-son against the reference sequence.

ybwd

Fig. 5. Graphical representation of the 454 Sequencing te

Fig. 6. The 454 Flowgram. The signal strength provided on the y-axis is proportio

chnology 136 (2008) 3–10

Another reason to avoid filtering against the reference ishe requirement to detect insertions and deletions in theenome straightforward. The read filtering software of other next-eneration technologies excludes reads differing more than 2 or 3ases from the reference sequence from further analysis. Not con-idering these reads during data analysis makes the discovery ofnsertions and deletions challenging, especially in large, complexenomes. To detect those variations using micro-reads, the devel-pment of new bioinformatics tools and availability of enterprisecale computing power (due to specificity issues accompanied withicro-reads) is needed. But even then the detection frequency is

enerally significantly lower compared to long-read technologies,articularly in complex genomes.

.3. High throughput and cost-efficient split up of runs—up to304 samples per large run

A large run on the Genome Sequencer FLX System typicallyields more than 400,000 reads at an average read length of 250ases. Since two sequencing runs can be easily started in a standardorkday, 1 gigabase of sequence information can be generated in 5ays. For smaller projects, application development, or feasibility

chnology (advanced pyrosequencing technology).

nal to the number of nucleotides incorporated in a single-nucleotide flow.

Page 5: The Genome Sequencer FLXTM System

M. Droege, B. Hill / Journal of Biotechnology 136 (2008) 3–10 7

F are inj stribu

trsmmatda

FooaMhrt

4b

ig. 7. Read length provided by the system. Depending on the organisms, read lengthsejuni) or GC rich (Thermophilus thermophilus) typically yield a longer read length di

esting, a small run yielding 20 megabases is available. A 100-baseead length kit, generating 400,000 reads, has been optimized forhort-read applications, such as expression profiling. In addition toultiple sequencing kit formats, the GS FLX offers sample-specific

ultiplex identifiers (MIDs). MIDs are short-nucleotide adaptors

dded to the ends of DNA fragments in a sample which allow forhe efficient pooling of samples that require less total sequenceata, such as BACs or amplicons. Using this approach, between 1nd 2304 samples can be sequenced per large format run.

ig. 8. Enhancements in the single-read accuracy of Genome Sequencer Systemsver time. When the technology was made public in 2005, a single-read accuracyf approximately 96% has been reported. Based on several changes in chemistrynd computer algorithms, an accuracy of at least 99% has been achieved in 2006.eanwhile a superior accuracy of >99.5%, including errors caused by the presence of

omopolymers, can be reported (colored lines). Excluding homopolymers, single-ead accuracy is >99.9% in, e.g. E. coli. For comparison, the single-read accuracy ofhe traditional Sanger method is in the range of 99.3–99.6%.

pigtra

4

rga3icaat((

4r

e

the range of 200–300 high-quality bases. Genomes that are more AT (Campylobactertion as compared to an AT/GC neutral genome.

. Long reads and superior single-read accuracy ensuresroadest application versatility

With the unique combination of read length and through-ut (numbers of reads) per run, the Genome Sequencer FLXTM

s the most versatile of all 2nd generation sequencing technolo-ies, offering applications from whole genome de novo sequencingo expression profiling. The following paragraphs summarize whyesearchers have used the Genome Sequencer System for theirpplications.

.1. De novo sequencing (e.g. plant BACs, microbes, viruses)

Researchers have chosen 454 Sequencing because highly accu-ate 250 bp reads yield assemblies that deliver significantly betterenome coverage, ∼5–10-fold fewer contigs, and minimal mis-ssemblies compared to shorter read lengths of approximately5 bp (so-called micro-reads). Long paired-end reads align specif-cally throughout the entire genome, allowing for scaffolding ofontigs. These benefits produce more comprehensive discoveriesnd minimize the need for costly follow-up experiments suchs gap closure. Selected examples for de novo sequencing onhe Genome Sequencer System: Velasco et al. (2007), Pol et al.2007), Pearson et al. (2007), Andries et al. (2005), and Oh et al.2006).

.2. Whole genome re-sequencing (e.g. targeted human genomicegions, structural variations)

Researchers have chosen 454 Sequencing because 250 bp readsffectively span many short repeats and 99.5% single-read accuracy

Page 6: The Genome Sequencer FLXTM System

8 f Biote

etstaitacr(

4d

saetovsST(

4

abeuduetn((e

4u

sgosta

4

2aveSe

4t

tRcgkaTpfFticR(

5s

rSiSft

5

astbasad

5

siiDoat

5

M. Droege, B. Hill / Journal o

liminates the micro-read requirement for filtering reads againsthe reference sequence. The ability to sequence bias-free at high-ingle-read accuracy, to resolve many repeats and to eliminateedious and difficult filtering against a reference sequence results ingenome coverage of close to 99%, and considerably more efficient

dentification of more mutations, including insertions and dele-ions. Long paired-end reads of 100 nt, only available on the GS FLX,lign specifically throughout the entire human genome, yielding aomprehensive list of structural variations. Selected examples fore-sequencing using the 454 Sequencing technology: Albert et al.2007) and Korbel et al. (2007).

.3. Amplicon sequencing (e.g. exon re-sequencing, virus variantetection, DNA methylation)

Researchers have chosen 454 Sequencing because 250 bp readspan typical amplicons such as exons (which in humans have anverage size of 200 bp), and enable real haplotyping over the fullxon length, preventing the inaccurate assessment of variation pat-erns typical of micro-read technologies. High-single-read accuracyf 99.5% means each read can be used to determine low-frequencyariations such as somatic mutations or rare genotypes in complexamples. Selected examples of amplicon sequencing using the 454equencing technology: Dahl et al. (2007), Pettersson et al. (2007),homas et al. (2005), Korshunova et al. (2007), and Taylor et al.2007).

.4. Metagenomics and microbial diversity

Researchers have chosen the 454 Sequencing because highlyccurate 250 bp reads provide the uniqueness required to unam-iguously identify an organism or gene in an unknown complexnvironmental sample by mapping or assembly of single reads. Theniqueness of 250 bp reads ensures an accurate prediction of theiversity compared to micro-reads, which usage often results in annderestimation of the microbial diversity (micro-reads with sev-ral hits against the database are not specific and therefore needo be excluded from the analysis). Selected examples for metage-omics using the 454 Sequencing technology: Cox-Foster et al.2007), Huber et al. (2007), Leininger et al. (2006), Sogin et al.2006), Turnbaugh et al. (2006), Warnecke et al. (2007), and Wegleyt al. (2007).

.5. EST sequencing (e.g. transcriptome survey of organisms withnknown genomes)

Researchers have chosen highly accurate 250 bp reads for ESTequencing because they offer better annotations of unknownenes and more accurate resolution of different expression levelsf alleles and gene family members, in both sequenced and non-equenced genomes. Selected examples for EST sequencing usinghe Genome Sequencer System: Eveland et al. (2007) and Gowda etl. (2006).

.6. Full length/shotgun sequencing of the transcriptome

Researchers choose 454 Sequencing because highly accurate50 bp reads provide enough contiguous sequence to map andssemble known and unknown transcriptomes, identify spliceariants and annotate transcripts of unknown function. Selectedxamples for full-length sequencing using the Genome Sequencerystem: Bainbridge et al. (2006), Weber et al. (2007), and Emricht al. (2007).

fFrahqe

chnology 136 (2008) 3–10

.7. ncRNAs (all classes of ncRNA, from miRNA to >200 ntranscripts of unknown function)

Several different classes of non-coding RNAs (ncRNAs) inhe range of 20–>200 bp have been identified (Gingeras, 2007).esearchers have chosen 454 clonal sequencing because it is aheap, fast and bias-free tool to discover these ncRNA species on aenome wide basis. 250 bp reads cover the broadest range of thesenown and unknown ncRNA species and the 99.5% single-readccuracy enables the differentiation of highly similar sequences.he combination of high accuracy and comprehensive coveragerovides a more complete picture of the existence and putativeunction of ncRNAs in the genome than micro-read technologies.urthermore, the read lengths are long enough to completely readhrough ncRNA cloning adaptors (known sequence), providing andeal quality control of each read. Selected examples of miRNA dis-overy using the 454 Sequencing technology: Ruby et al. (2007),uby et al. (2006), Yao et al. (2007), Axtell et al. (2007), Girard et al.2006), and Berezikov et al. (2006).

. Bioinformatics without large investment in enterprisecale infrastructure and people

Data analysis is a rate-limiting step in many labs. To helpesearchers make discoveries faster, the Genome Sequencer FLXTM

ystem comes with a suite of state-of-the-art analysis tools thatntegrate seamlessly with the instrument and are optimized for 454equence data analysis. Unlike other systems which leave you toend for yourself, the Genome Sequencer FLXTM provides you withhe tools needed to make discoveries.

.1. GS Reference Mapper software

The GS Reference Mapper software maps shotgun reads againstgiven reference sequence and assembles them into a consensus

equence. In addition, it generates a list of high-confidence muta-ions (e.g. SNPs, insertions, deletions) by identification of individualases that differ between the generated consensus DNA sequencend the reference sequence. Currently, the GS Reference Mapperoftware will map reads against genomes up to 3 gigabase in sizend is capable of co-assembling reads generated with both the tra-itional Sanger and the 454 technology.

.2. GS De Novo Assembler software

The GS De Novo Assembler software generates a consensusequence by de novo assembly of the shotgun sequencing readsnto contigs. Subsequent ordering of these contigs into scaffoldss achieved with the addition of paired-end reads. Currently, the GSe Novo Assembler software allows for routine de novo assemblyf genomes up to 500 Mb megabases in size and is capable of co-ssembling reads generated with both the traditional Sanger andhe 454 technology.

.3. The Amplicon Variant Analyzer (AVA)

The AVA software application computes the alignment of readsrom amplicon libraries sequenced using the Genome SequencerLX System, and identifies differences between the reads and a

eference sequence. The frequency of variants can be detectednd quantified by examination of the read alignments. This toolas been shown to be perfectly suited for the identification anduantification of somatic mutations in cancer samples (Thomast al., 2005) or for the detection of mutations conferring resis-
Page 7: The Genome Sequencer FLXTM System

f Biote

t2

6S

eu1pDrtga

6

suwaah

6r

rSwNirgegcsg

6

ndafnt

6p

cifi(ewn

gn

6

tttie

UsFio

abc

6

tmSai

7s

cPsesct

R

A

A

A

B

B

C

M. Droege, B. Hill / Journal o

ance in HIV quasi species (Hoffmann et al., 2007; Wang et al.,007).

. Recent breakthroughs achieved using the Genomeequencer System

In October of 2006, Roche Applied Science and 454 Life Sci-nce announced the publication of the 100th peer-reviewed studysing the Genome Sequencer System, including 12 papers in Nature,1 papers in Science, 10 papers in Nucleic Acids Research, or 4apers in Cell. These first 100 studies span a diverse group ofNA sequencing applications, including de novo sequencing and

e-sequencing of whole genomes, metagenomics, RNA analysis, andargeted sequencing of DNA regions of interest. The following para-raphs provides a short overview of some selected breakthroughschieved using the system.

.1. Re-sequencing of the human genome

The 454 Sequencing technology was the first 2nd generationequencing technology used to decipher the genome of an individ-al human being—the genome of Dr. James D. Watson. The dataas initially analyzed by the Human Genome Sequencing Center

t the Baylor College of Medcine, Houston, USA, and the publiclyvailable sequence is expected to provide many new insights intouman genetics.

.2. Re-sequencing of the human exome and targeted geneegions

One of the most recent breakthroughs in human genomee-sequencing also has been achieved by the Human Genomeequencing Center at Baylor College of Medicine, in cooperationith Roche Nimblegen (Albert et al., 2007). Based on the Rocheimblegen capturing and enrichment procedure, 6726 approx-

mately 500-base ‘exon’ segments, and ‘locus-specific’ regionsanging in size from 200 kb to 5 Mb, we extracted from the humanenome and quickly sequenced on a GS FLX System. The directnrichment method avoids the need for the tedious and expensiveeneration of thousands of small and long-range PCR products andan, in combination with the 454 Sequencing technology, be con-idered as a milestone towards routine sequencing of the humanenome.

.3. Analysis of structural variations in the human genome

Recently, a group of researchers at Yale University used 454’sew 100 nt paired-end sequencing strategy to determine structuralifferences between human genomes. In the two human genomesnalyzed, more than 1000 different structural variations have beenound, suggesting that structural variation is responsible for a largerumber of differences between the genomes of two individualshan single-nucleotide polymorphisms (Korbel et al., 2007).

.4. Studying the molecular basis for eusociality using expressionrofiling

In order to study the molecular mechanisms underlying euso-iality in wasp colonies, researchers from the University of Illinoisn the US recently analyzed differences in the gene expression pro-

les in the brain of wasp queens, gynes, workers, and foundressesToth et al., 2007). For the first time it could be shown that genexpression in worker wasp brains was more similar to foundresses,hich show maternal care, than to queens and gynes, which doot. Insulin-related genes were among the differentially regulated

D

chnology 136 (2008) 3–10 9

enes, suggesting that the evolution of eusociality involved majorutritional and reproductive pathways.

.5. Metagenomics and microbial diversity

Mitchell Sogin and team from Woods Hole Oceanographic Insti-ution analyzed the microbial diversity of deep sea samples usinghe amplicon sequencing procedure of the Genome Sequencer Sys-em (Sogin et al., 2006). They found that the microbial diversityn their samples was 1–2 orders of magnitude more complex thanver published.

A study which has been published by a team at Washingtonniversity suggests that the microflora in obese individuals differsignificantly from that of lean individuals (Turnbaugh et al., 2006).urthermore, the injection of microbiota isolated from obese micento the gut of a germ-free lean mouse resulted in symptoms ofbesity.

Using a breakthrough transcriptome metagenomics approachteam from Columbia University found a significant connection

etween an RNA virus identified in honey bees and the honey beeolony collapse disorder (Cox-Foster et al., 2007).

.6. Ancient DNA

Researchers from Pennsylvania State University in the US usedhe Genome Sequencer System to re-sequence the wholly mam-

oth genome (Poinar et al., 2006). At the MPI in Leipzig, Germany,vante Päboö are sequencing the Neanderthal genome, with theim to better understand the evolutionary background of variationsn the human genome (Green et al., 2006).

. Even longer reads, considerably higher throughput andignificant reductions in the costs per base in 2008

Based upon the exciting research of customers worldwide, onean expect that many more groundbreaking studies are pending.lanned improvements to the GS FLX will include an increase inequence read length beyond 400 base pairs and throughput inxcess of 1 billion bases per day (>5 GB in 5 working days). Theseystem improvements can be used on the same GS FLX hardwareonfiguration that is currently available and in use in labs aroundhe world today.

eferences

lbert, T.J., Molla, M.N., Muzny, D.M., Nazareth, L., Wheeler, D., Song, X., Richmond,T.A., Middle, C.M., Rodesch, M.J., Packard, C.J., Weinstock, G.M., Gibbs, R.A., 2007.Direct selection of human genomic loci by microarray hybridization. NatureMethods 4 (11), 903–905.

ndries, K., Verhasselt, P., Guillemont, J., Göhlmann, H.W.H., Neefs, J.-M., Winkler,H., Van Gestel, J., Timmerman, P., Zhu, M., Lee, E., Williams, P., de Chaffoy, D.,Huitric, E., Hoffner, S., Cambau, E., Truffot-Pernot, C., Lounis, N., Jarlier, V., 2005.A diarylquinoline drug active on the ATP synthase of Mycobacterium tuberculosis.Science 307 (5707), 223–227.

xtell, M.J., Snyder, J.A., Bartel, D.P., 2007. Common functions for diverse small RNAsof land plants. Plant Cell 19 (6), 1750–1769.

ainbridge, M.N., Warren, R.L., Hirst, M., Romanuik, T., Zeng, T., Go, A., Delaney, A.,Griffith, M., Hickenbotham, M., Magrini, V., Mardis, E.R., Sadar, M.D., Siddiqui,A.S., Marra, M.A., Jones, S.J.M., 2006. Analysis of the prostate cancer cell lineLNCaP transcriptome using a sequencing-bysynthesis approach. BMC Genomics7, 246.

erezikov, E., Thuemmler, F., van Laake, L.W., Kondova, I., Bontrop, R., Cuppen, E.,Plasterk, R.H., 2006. Diversity of microRNAs in human and chimpanzee brain.Nature Genetics 38 (12), 1375–1377.

ox-Foster, D.L., Conlan, S., Holmes, E.C., Palacios, G., Evans, J.D., Moran, N.A., Quan, P.-

L., Briese, T., Hornig, M., Geiser, D.M., Martinson, V., vanEngelsdorp, D., Kalkstein,A.L., Drysdale, A., Hui, J., Zhai, J., Cui, L., Hutchison, S.K., Simons, J.F., Egholm, M.,Pettis, J.S., Lipkin, W.I., 2007. A metagenomic survey of microbes in honey beecolony collapse disorder. Science 318 (5848), 283–287.

ahl, F., Stenberg, J., Fredriksson, S., Welch, K., Zhang, M., Nilsson, M., Bicknell, D.,Bodmer, W.F., Davis, R.W., Ji, H., 2007. Multigene amplification and massively

Page 8: The Genome Sequencer FLXTM System

1 f Biote

E

E

G

G

G

G

H

H

H

K

K

L

O

P

P

P

P

R

R

S

T

T

T

T

T

V

W

W

W

Wegley, L., Edwards, R., Rodriguez-Brito, B., Liu, H., Rohwer, F., 2007. Metagenomic

0 M. Droege, B. Hill / Journal o

parallel sequencing for cancer mutation discovery. Proceedings of the NationalAcademy of Sciences U.S.A. 104 (22), 9387–9392.

mrich, S.J., Barbazuk, W.B., Li, L., Schnable, P.S., 2007. Gene discovery and annotationusing LCM-454 transcriptome sequencing. Genome Research 17 (1), 69–73.

veland, A.L., McCarty, D.R., Koch, K.E., 2008. Transcript profiling by 3(UTR sequenc-ing resolves expression of gene families. Plant Physiology 146, 32–44.

ingeras, T.R., 2007. Origin of phenotypes: genes and transcripts. Genome Research17, 682–690.

irard, A., Sachidanandam, R., Hannon, G.J., Carmell, M.A., 2006. A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature 442 (7099),199–202.

owda, M., Li, H., Alessi, J., Chen, F., Pratt, R., Wang, G.-L., 2006. Robust analysis of5′-transcript ends (5′-RATE): a novel technique for transcriptome analysis andgenome annotation. Nucleic Acids Research 34 (19), e126.

reen, R.E., Krause, J., Ptak, S.E., Briggs, A.W., Ronan, M.T., Simons, J.F., Du, L., Egholm,M., Rothberg, J.M., Paunovic, M., Pääbo, S., 2006. Analysis of one million basepairs of Neanderthal DNA. Nature 444 (7117), 330–336.

offmann, C., Minkah, N., Leipzig, J., Wang, G., Arens, M.Q., Tebas, P., Bushman, F.D.,2007. DNA bar coding and pyrosequencing to identify rare HIV drug resistancemutations. Nucleic Acids Research 35 (13), e91.

ogg, J.S., Hu, F.Z., Janto, B., Boissy, R., Hayes, J., Keefe, R., Post, J.C., Ehrlich, G.D.,2007. Characterization and modeling of the Haemophilus influenzae core andsupragenomes based on the complete genomic sequences of Rd and 12 clinicalnontypeable strains. Genome Biology, doi:10.1186/gb-2007-8-6-r103, 8:R103.

uber, J.A., Welch, D.B.M., Morrison, H.G., Huse, S.M., Neal, P.R., Butterfield, D.A.,Sogin, M.L., 2007. Microbial population structures in the deep marine biosphere.Science 318 (5847), 97–100.

orbel, J.O., Urban, A.E., Affourtit, J.P., Godwin, B., Grubert, F., Simons, J.F., Kim, P.M.,Palejev, D., Carriero, N.J., Du, L., Taillon, B.E., Chen, Z., Tanzer, A., Saunders, A.C.E.,Chi, J., Yang, F., Carter, N.P., Hurles, M.E., Weissman, S.M., Harkins, T.T., Ger-stein, M.B., Egholm, M., Snyder, M., 2007. Paired-end mapping reveals extensivestructural variation in the human genome. Science 318 (5849), 420–426.

orshunova, Y., Maloney, R.K., Lakey, N., Citek, R.W., Bacher, B., Budiman, A., Ordway,J.M., McCombie, W.R., Leon, J., Jeddeloh, J.A., McPherson, J.D., 2008. Mas-sively parallel bisulphate pyrosequencing reveals the molecular complexity ofbreast cancer-associated cytosine-methylation patterns obtained from tissueand serum DNA. Genome Research 18, 19–29.

eininger, S., Urich, T., Schloter, M., Schwark, L., Qi, J., Nicol, G.W., Prosser, J.I., Schus-ter, S.C., Schleper, C., 2006. Archaea predominate among ammonia-oxidizingprokaryotes in soils. Nature 442 (7104), 806–809.

h, J.D., Kling-Bäckhed, H., Giannakis, M., Xu, J., Fulton, R.S., Fulton, L.A., Cordum,H.S., Wang, C., Elliott, G., Edwards, J., Mardis, E.R., Engstrand, L.G., Gordon, J.I.,2006. The complete genome sequence of a chronic atrophic gastritis Helicobacterpylori strain: evolution during disease progression. Proceedings of the NationalAcademy of Sciences U.S.A. 103 (26), 9999–10004.

earson, B.M., Gaskin, D.J.H., Segers, R.P.A.M., Wells, J.M., Nuijten, P.J.M., van Vliet,A.H.M., 2007. The complete genome sequence of Campylobacter jejuni strain81116 (NCTC11828). Journal of Bacteriology 189 (22), 8402–8403.

ettersson, E., Zajac, P., Ståhl, P.L., Jacobsson, J.A., Fredriksson, R., Marcus, C., Schiöth,H.B., Lundeberg, J., Ahmadian, A., 2007. Allelotyping by massively parallelpyrosequencing of SNP-carrying trinucleotide threads. Human Mutation 29 (2),323–329.

oinar, H.N., Schwarz, C., Qi, J., Shapiro, B., Macphee, R.D., Buigues, B., Tikhonov, A.,Huson, D.H., Tomsho, L.P., Auch, A., Rampp, M., Miller, W., Schuster, S.C., 2006.Metagenomics to paleogenomics: largescale sequencing of mammoth DNA. Sci-ence 311 (5759), 392–394.

ol, A., Heijmans, K., Harhangi, H.R., Tedesco, D., Jetten, M.S.M., Op den Camp, H.J.M.,2007. Methanotrophy below pH 1 by a new Verrucomicrobia species. Nature 450(7171), 874–878.

uby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., Bartel, D.P.,2006. Large-scale sequencing reveals 21U-RNAs and additional microRNAs andendogenous siRNAs in C. elegans. Cell 127 (6), 1193–1207.

Y

chnology 136 (2008) 3–10

uby, J.G., Jan, C.H., Bartel, D.P., 2007. Intronic microRNA precursors that bypassDrosha processing. Nature 448 (7149), 83–86.

ogin, M.L., Morrison, H.G., Huber, J.A., Welch, D.M., Huse, S.H., Neal, P.R., Arrieta, J.M.,Herndl, G.J., 2006. Microbial diversity in the deep sea and the underexplored‘rare biosphere’. Proceedings of the National Academy of Sciences U.S.A. 103(32), 12115–12120.

auch, A., Trost, E., Bekel, T., Goesmann, A., Ludewig, U., Pühler, A. 2006. Ultra-fast de novo sequencing of the human pathogen Corynebacterium urealyticumwith the Genome Sequencer System. Roche Application note; www.genome-sequencing.com.

aylor, K.H., Kramer, R.S., Davis, J.W., Guo, J., Duff, D.J., Xu, D., Caldwell, C.W.,Shi, H., 2007. Ultradeep bisulfite sequencing analysis of DNA methylation pat-terns in multiple gene promoters by 454 Sequencing. Cancer Research 67 (18),8511–8518.

homas, R.K., Nickerson, E., Simons, J.F., Jänne, P.A., Tengs, T., Yuza, Y., Garraway, L.A.,LaFramboise, T., Lee, J.C., Shah, K., O’Neill, K., Sasaki, H., Lindeman, N., Wong,K.K., Borras, A.M., Gutmann, E.J., Dragnev, K.H., DeBiasi, R., Chen, T.H., Glatt, K.A.,Greulich, H., Desany, B., Lubeski, C.K., Brockman, W., Alvarez, P., Hutchison, S.K.,Leamon, J.H., Ronan, M.T., Turenchalk, G.S., Egholm, M., Sellers, W.R., Rothberg,J.M., Meyerson, M., 2005. Sensitive mutation detection in heterogeneous cancerspecimens by massively parallel picoliter reactor sequencing. Nature Medicine12 (7), 852–855.

oth, A.L., Varala, K., Newman, T.C., Miguez, F.E., Hutchison, S.K., Willoughby, D.A.,Simons, J.F., Egholm, M., Hunt, J.H., Hudson, M.E., Robinson, G.E., 2007. Waspgene expression supports an evolutionary link between maternal behavior andeusociality. Science 318 (5849), 441–444.

urnbaugh, P.J., Ley, R.E., Mahowald, M.A., Magrini, V., Mardis, E.R., Gordon, J.I., 2006.An obesity-associated gut microbiome with increased capacity for energy har-vest. Nature 444 (7122), 1027–1031.

elasco, R., Zharkikh, A., Troggio, A., Cartwright, D.A., Cestaro, A., Pruss, D., Pindo, M.,FitzGerald, L.M., Vezzulli, S., Reid, J., Malacarne, G., Iliev, D., Coppola, G., Wardell,B., Micheletti, D., Macalma, T., Facci, M., Mitchell, J.T., Perazzolli, M., Eldredge,G., Gatto, P., Oyzerski, R., Moretto, M., Gutin, N., Stefanini, M., Chen, Y., Segala,C., Davenport, C., Demattè, L., Mraz, A., Battilana, J., Stormo, K., Costa, F., Tao, Q.,Si-Ammour, A., Harkins, T., Lackey, A., Perbost, C., Taillon, B., Stella, A., Solovyev,V., Fawcett, J.A., Sterck, L., Vandepoele, K., Grando, S.M., Toppo, S., Moser, C.,Lanchbury, J., Bogden, R., Skolnick, M., Sgaramella, V., Bhatnagar, S.K., Fontana,P., Gutin, A., Van de Peer, Y., Salamini, F., Viola, R., 2007. A high quality draftconsensus sequence of the genome of a heterozygous grapevine variety. PLoSOne 2 (12), e1326.

ang, C., Mitsuya, Y., Gharizadeh, B., Ronaghi, M., Shafer, R.W., 2007. Characteriza-tion of mutation spectra with ultra-deep pyrosequencing: application to HIV-1drug resistance. Genome Research 17 (8), 1195–1201.

arnecke, F., Luginbühl, P., Ivanova, N., Ghassemian, M., Richardson, T.H., Stege, J.T.,Cayouette, M., McHardy, A.C., Djordjevic, G., Aboushadi, N., Sorek, R., Tringe,S.G., Podar, M., Martin, H.G., Kunin, V., Dalevi, D., Madejska, J., Kirton, E., Platt,D., Szeto, E., Salamov, A., Barry, K., Mikhailova, N., Kyrpides, N.C., Matson, E.G.,Ottesen, E.A., Zhang, X., Hernández, M., Murillo, C., Acosta, L.G., Rigoutsos, I.,Tamayo, G., Green, B.D., Chang, C., Rubin, E.M., Mathur, E.J., Robertson, D.E.,Hugenholtz, P., Leadbetter, J.R., 2007. Metagenomic and functional analysisof hindgut microbiota of a wood-feeding higher termite. Nature 450 (7169),560–565.

eber, A.P., Weber, K.L., Carr, K., Wilkerson, C., Ohlrogge, J.B., 2007. Sampling theArabidopsis transcriptome with massively-parallel pyrosequencing. Plant Phys-iology 144 (1), 32–42.

analysis of the microbial community associated with the coral Porites astreoides.Environmental Microbiology 9 (11), 2707–2719.

ao, Y., Guo, G., Ni, Z., Sunkar, R., Du, J., Zhu, J.-K., Sun, Q., 2007. Cloning and char-acterization of microRNAs from wheat (Triticum aestivum L.). Genome Biology 8(6), R96.