2
NATURE BIOTECHNOLOGY VOLUME 28 NUMBER 1 JANUARY 2010 43 combine to confound a simple bases-per- dollar comparison. So what can be said about the relative cost of this approach? Drmanac et al. 1 sequenced three genomes at a coverage of 45–87× and at an average reagent cost of $4,400 per genome. By their estimates, one false-positive sequenc- ing error occurred every 100,000 bases—an accuracy on par with, or better than, that of other popular sequencing platforms. Complete Genomics, the company associated with the study by Drmanac et al. 1 , has posi- tioned itself as a service provider of full human genome sequences rather than as a vendor of sequencing instruments and reagents. Of course, the cost of reagents does not include equipment, labor and data-handling, and is not the same as the price charged to custom- ers for a genome sequence. One appropri- ate comparison is therefore with Illumina’s Personal Genome Sequencing Service, which delivers a full human genome sequence (on an iMac computer) for $48,000. Complete Genomics currently charges $20,000 for a sequence of similar accuracy 1,4,7 . With time, it is certain that prices across all vendors will fall. Complete Genomics has a target price of $5,000 per genome for ‘bulk’ orders 7 , a substantial drop that is certainly possible in the near term. Reducing prices significantly beyond that will likely require further innovation. For instance, substantial increases in instrument throughput could be achieved by switching from CCD cameras to the much faster and cheaper complementary metal oxide semiconductor (CMOS) technol- ogy. But CMOS is less sensitive, so the DNA nanoballs would have to be made brighter, which may require considerable research and development. As Complete Genomics makes progress in process automation and robustness, they may be able to address applications beyond human genome sequencing, including gene expression analysis, chromatin immunopre- cipitation and metagenomics. For these, part of the difficulty will be in the process scaling and multiplexing required to accommodate the ultra-high throughput of their machines. In addition, for quantitative applications, it will be important to ensure they can calibrate for biases introduced by the library- and nanoball-generation protocols. performance because the submicron-sized DNA features are attached to its surface. So they devised a process to manufacture thin chambers and perfected a way to flow liquid through them, potentially enabling signifi- cant cost savings over current systems with thicker flow-cells. Second, throughput is driven both by the speed of the camera and by how many spots can be packed into a single image. The authors used the electron-multiplied charge-coupled device (CCD) present in several other sequenc- ing systems 2 (http://www.polonator.org/), which is faster and more sensitive than what is found in the most popular next-generation platforms on the market. They combined this camera with one of their key innovations, a patterned array of DNA nanoballs. These com- pact chains of amplified DNA assemble into a densely packed grid of spots on the flow- cell surface, maximizing the yield of useful sequenced bases from camera pixels. Nanoballs offer a higher array packing density than bridge amplification 4 , because they physically exclude other DNA molecules from their spot on the grid, and a much easier and cheaper workflow than emulsion PCR 5 , because they are prepared in a simple reaction that does not waste most of the amplification reagents on empty emulsion bubbles. The result is an instrument capable of maximal throughput, given today’s camera technology, and therefore minimal capital cost per base pair. It is difficult to directly compare sequenc- ing cost between different platforms. This is because, for genomic resequencing, cost is driven by the coverage required to achieve the desired accuracy. Different platforms may require different levels of coverage to achieve the same accuracy, so comparisons have to be made by fixing either coverage, to measure differences in cost and accuracy, or accu- racy, to measure differences in coverage and cost 6 . There are other considerations as well. Different mutations (e.g., homozygous ver- sus heterozygous substitutions, insertions or deletions) are generally sequenced with dif- ferent accuracy. Moreover, the reference stan- dard used to verify mutations must be more accurate than the sequence in question, and this is difficult to achieve with the genome- wide single-nucleotide polymorphism chips that are often used. All of these factors The rapid pace of innovation in the field of genome sequencing continues with a recent publication in Science by Drmanac et al. 1 . The authors resequenced three full human genomes using a next-generation technol- ogy that combines highly efficient imag- ing on ordered arrays with an inexpensive ligation-based chemistry. These technologi- cal improvements further reduce the cost of human genome sequencing. Next-generation sequencing technologies generate up to billions of short reads in a run. All of these approaches use either polymerase or ligase to identify each base with a fluores- cent signal that is read by a microscope and a digital camera. Development of these systems has focused on manipulating and arraying DNA such that it can be seen by the camera and sequenced, and on devising a sequenc- ing chemistry with sufficient accuracy and read-length. For effective human genome sequencing, the individual DNA spots should be small (1 µm) and present at high den- sity (approaching 1 million spots per mm 2 ). Furthermore, the reads must be long enough (>30 bp) to allow unambiguous alignment to the reference sequence, which is the first step in identifying variants. Drmanac et al. 1 have met these goals with a platform that integrates several technolo- gies: (i) a library-generation protocol that transforms fragments of genomic DNA into highly engineered molecules; (ii) a method for generating spots, called DNA nanoballs, and arraying them in highly dense grids for efficient imaging; and (iii) a nonprogressive chemistry (that is, errors do not accumu- late because each base is read from a fresh sequencing primer) that uses ligation with partially degenerate sequencing primers 1–3 to yield accurate ~70-bp reads split across eight priming sites (Fig. 1). What is most intriguing is how the plat- form approaches several technical optima that in concert drive down cost. First, the amount of reagent used is dictated by the area and height of the instrument’s flow-cell chamber. Drmanac et al. 1 recognized that the chamber’s height does not affect sequencing Genome sequencing on nanoballs Gregory J Porreca Advances in technology deliver cheaper human genome sequencing. Gregory J. Porreca is at Good Start Genetics, Boston, MA. e-mail: [email protected] NEWS AND VIEWS © 2010 Nature America, Inc. All rights reserved.

Genome sequencing on nanoballs

Embed Size (px)

Citation preview

Page 1: Genome sequencing on nanoballs

nature biotechnology volume 28 number 1 january 2010 43

combine to confound a simple bases-per-dollar comparison.

So what can be said about the relative cost of this approach? Drmanac et al.1 sequenced three genomes at a coverage of 45–87× and at an average reagent cost of $4,400 per genome. By their estimates, one false-positive sequenc-ing error occurred every 100,000 bases—an accuracy on par with, or better than, that of other popular sequencing platforms. Complete Genomics, the company associated with the study by Drmanac et al.1, has posi-tioned itself as a service provider of full human genome sequences rather than as a vendor of sequencing instruments and reagents. Of course, the cost of reagents does not include equipment, labor and data-handling, and is not the same as the price charged to custom-ers for a genome sequence. One appropri-ate comparison is therefore with Illumina’s Personal Genome Sequencing Service, which delivers a full human genome sequence (on an iMac computer) for $48,000. Complete Genomics currently charges $20,000 for a sequence of similar accuracy1,4,7.

With time, it is certain that prices across all vendors will fall. Complete Genomics has a target price of $5,000 per genome for ‘bulk’ orders7, a substantial drop that is certainly possible in the near term. Reducing prices significantly beyond that will likely require further innovation. For instance, substantial increases in instrument throughput could be achieved by switching from CCD cameras to the much faster and cheaper complementary metal oxide semiconductor (CMOS) technol-ogy. But CMOS is less sensitive, so the DNA nanoballs would have to be made brighter, which may require considerable research and development.

As Complete Genomics makes progress in process automation and robustness, they may be able to address applications beyond human genome sequencing, including gene expression analysis, chromatin immunopre-cipitation and metagenomics. For these, part of the difficulty will be in the process scaling and multiplexing required to accommodate the ultra-high throughput of their machines. In addition, for quantitative applications, it will be important to ensure they can calibrate for biases introduced by the library- and nanoball-generation protocols.

performance because the submicron-sized DNA features are attached to its surface. So they devised a process to manufacture thin chambers and perfected a way to flow liquid through them, potentially enabling signifi-cant cost savings over current systems with thicker flow-cells.

Second, throughput is driven both by the speed of the camera and by how many spots can be packed into a single image. The authors used the electron-multiplied charge-coupled device (CCD) present in several other sequenc-ing systems2 (http://www.polonator.org/), which is faster and more sensitive than what is found in the most popular next-generation platforms on the market. They combined this camera with one of their key innovations, a patterned array of DNA nanoballs. These com-pact chains of amplified DNA assemble into a densely packed grid of spots on the flow-cell surface, maximizing the yield of useful sequenced bases from camera pixels. Nanoballs offer a higher array packing density than bridge amplification4, because they physically exclude other DNA molecules from their spot on the grid, and a much easier and cheaper workflow than emulsion PCR5, because they are prepared in a simple reaction that does not waste most of the amplification reagents on empty emulsion bubbles. The result is an instrument capable of maximal throughput, given today’s camera technology, and therefore minimal capital cost per base pair.

It is difficult to directly compare sequenc-ing cost between different platforms. This is because, for genomic resequencing, cost is driven by the coverage required to achieve the desired accuracy. Different platforms may require different levels of coverage to achieve the same accuracy, so comparisons have to be made by fixing either coverage, to measure differences in cost and accuracy, or accu-racy, to measure differences in coverage and cost6. There are other considerations as well. Different mutations (e.g., homozygous ver-sus heterozygous substitutions, insertions or deletions) are generally sequenced with dif-ferent accuracy. Moreover, the reference stan-dard used to verify mutations must be more accurate than the sequence in question, and this is difficult to achieve with the genome-wide single-nucleotide polymorphism chips that are often used. All of these factors

The rapid pace of innovation in the field of genome sequencing continues with a recent publication in Science by Drmanac et al.1. The authors resequenced three full human genomes using a next-generation technol-ogy that combines highly efficient imag-ing on ordered arrays with an inexpensive ligation-based chemistry. These technologi-cal improvements further reduce the cost of human genome sequencing.

Next-generation sequencing technologies generate up to billions of short reads in a run. All of these approaches use either polymerase or ligase to identify each base with a fluores-cent signal that is read by a microscope and a digital camera. Development of these systems has focused on manipulating and arraying DNA such that it can be seen by the camera and sequenced, and on devising a sequenc-ing chemistry with sufficient accuracy and read-length. For effective human genome sequencing, the individual DNA spots should be small (≤1 µm) and present at high den-sity (approaching 1 million spots per mm2). Furthermore, the reads must be long enough (>30 bp) to allow unambiguous alignment to the reference sequence, which is the first step in identifying variants.

Drmanac et al.1 have met these goals with a platform that integrates several technolo-gies: (i) a library-generation protocol that transforms fragments of genomic DNA into highly engineered molecules; (ii) a method for generating spots, called DNA nanoballs, and arraying them in highly dense grids for efficient imaging; and (iii) a nonprogressive chemistry (that is, errors do not accumu-late because each base is read from a fresh sequencing primer) that uses ligation with partially degenerate sequencing primers1–3 to yield accurate ~70-bp reads split across eight priming sites (Fig. 1).

What is most intriguing is how the plat-form approaches several technical optima that in concert drive down cost. First, the amount of reagent used is dictated by the area and height of the instrument’s flow-cell chamber. Drmanac et al.1 recognized that the chamber’s height does not affect sequencing

Genome sequencing on nanoballsGregory J Porreca

Advances in technology deliver cheaper human genome sequencing.

Gregory J. Porreca is at Good Start Genetics, Boston, MA. e-mail: [email protected]

NE wS AND v i EwS©

201

0 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 2: Genome sequencing on nanoballs

44 volume 28 number 1 january 2010 nature biotechnology

In the span of a few short years, the mature technology of capillary sequencing has been supplanted by new sequencing approaches that offer tremendous increases in how much we can afford to sequence and how quickly we can do it. As the technology advances, focus will shift from the initial feats of sequencing single genomes to the ongoing challenge of producing lots of sequence accurately and efficiently. In this endeavour, the platform of Drmanac et al.1 is sure to remain in the mix.

COMPETING INTERESTS STATEMENTThe author declares competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/naturebiotechnology/.

1. Drmanac, R. et al. Science, published online doi:10.0026/science.1181498 (5 November 2009). 2009 Nov 5 [Epub ahead of print]

2. Shendure, J.A. et al. Science 309, 1728–1732 (2005)

3. Church, G.M. et al. US appl. no. 2007/0207482 (2007).

4. Bentley, D.R. et al. Nature 456, 53–59 (2008).5. Mckernan, K.J. et al. Genome Res. 19, 1527–1541

(2009).6. Fuller, C.w. et al. Nat. Biotechnol. 27, 1013–1023

(2009).7. Karow, J. Complete genomics details low-cost sequenc-

ing tech in paper; collaborators encouraged by results. InSequence <http://www.genomeweb.com/sequenc-ing/complete-genomics-details-low-cost-sequencing-tech-paper-collaborators-encourage> (10 November 2009).

and false-negative calls (which generally can-not be verified) are a source of diagnostic error. Thus, accuracy must be high, quantifi-able and thoroughly measured in advance of releasing an assay into the clinic. What’s more, continual monitoring of assay performance and compliance with clinical laboratory best-practices are imperative if a sequence is to be considered actionable medical advice.

As costs continue to drop, the use of sequencing in diagnostics is expected to increase dramatically. This large market imposes significant requirements on any technology it adopts. Costs must be very low to displace existing technologies, and accu-racy must be extremely high. False-positive mutation calls drive up assay cost by requir-ing expensive and time-intensive verification,

a Library generation

b Arraying

c Ligation-based sequencing

d Imaging

DNA captured with beadsDNA nanoballs

Random array inlarger flow cell

Patterned array inlow-volume flowcell

Drmanac et al.1

Rolling circleamplification

Librarymolecule

i ii

iv iiiEmulsion

PCR

N N N C

Probes

Ligase

Anchor

Genomic DNA

N N N N N

N N N A N N N N N

N N N T N N N N N

N N N G N N N N N

N

G G GA AC C C A AT T T

N NNN N N G N N N N N

N N N C

Probes

Ligase

Anchor

Genomic DNA

N N N N N

N N N A N N N N N

N N N T N N N N N

N N N G N N N N N

N

G G GA AC C C A AT T T

N N G N N N N N

Standard anchors allow26 bases per spot

Degenerate anchors allow62–70 bases per spot

Fewer spots per imageMore spots per image

Previous approach2 Figure 1 Comparison of the sequencing process in Drmanac et al.1 (left) and in a previous sequencing-by-ligation method2 (right). (a) Genomic DNA is converted into library molecules. Each molecule contains four segments of genomic DNA (i–iv), flanked by priming sites (shown in purple). Each library molecule is converted into a linear concatemer of itself to become a DNA nanoball. (b) Billions of DNA nanoballs are added to a silicon slide that contains a grid-like pattern of binding sites, which causes the nanoballs to self-assemble into a dense grid of spots for sequencing, maximizing the number of useful sequenced bases in each image (see d). (c) Ligation-based sequencing chemistry is used to interrogate bases of genomic DNA in the library molecules. Each cycle of sequencing tags the DNA nanoballs with a fluorophore whose color identifies the base (A, C, T or G) present at a specific position. The chemistry allows 5–10 contiguous bases to be read from each of the eight priming sites in the library molecule. (d) Digital images of the patterned arrays are taken after each sequencing reaction. The images are computationally analyzed to generate billions of raw sequence reads. These reads are then processed with assembly and analysis software to accurately identify mutations.

Kim

Cae

sar

NE wS AND v i EwS©

201

0 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.