4
Solexa Sequencing: Decoding Genomes on a Population Scale Shankar Balasubramanian 1,2* It is has now been some 17 years since David Klener- man and I (Fig. 1) conceived a project that would subsequently enable rapid whole human genome se- quencing. The core thinking that led to Solexa Se- quencing was sparked by observations made during ba- sic exploratory research carried out in our laboratories during the mid to late 1990s, and the redirection of these observations toward DNA sequencing was an un- intended consequence. The founding ideas and proof- of-concept experiments were carried out at the Univer- sity of Cambridge. The first commercial sequencing system was subsequently developed at Solexa Limited, a company we founded in 1998. Illumina Inc. acquired Sol- exa in early 2007 and made further improvements that led to several new sequencing systems. Today, the technology is being used to routinely decode human genomes for medical research, clinical decision-making, and basic sci- ence, all at a cost and speed that make population-scale sequencing practical. The journey from concept to reduc- tion to practice has already gone beyond my expectations in terms of performance, adoption, and early insights, yet the era of clinical whole-genome sequencing is perhaps only just beginning. In the mid-1990s, in the University Chemical Lab- oratories, Cambridge, David Klenerman and I were us- ing fluorescence single-molecule spectroscopy to ob- serve the synthesis of DNA by a polymerase enzyme using fluorescently encoded nucleotides. The work it- self and the grant proposal that supported it were fun- damental in nature and said nothing of sequencing. The struggle and attempts to improve the experimental design to optimize what we wished to observe sug- gested to us a means to decode a strand of immobilized DNA by single-molecule fluorescence imaging. It was not immediately clear what the benefits of decoding DNA on a surface would be, until our awareness of DNA microarrays prompted the realization that the decoding of immobilized DNA could be made parallel on an array-type format, giving the potential to se- quence DNA on a massive scale. These purely technical ideas were stimulating, but a greater purpose was needed to drive us toward committing them to a seri- ous effort. Klenerman and I were aware of the Human Genome Project activities at the nearby Sanger Insti- tute at Hinxton Hall, and we made a trip to visit 3 of the key scientists, David Bentley, Richard Durbin, and Jane Rogers, in early 1998. We were most impressed by the scale and organization of the sequencing activities at the Sanger Institute. It was also striking that even with all this capacity (along with the other participating ge- nome centers in the world), it would take about a de- cade to produce the first human reference genome. In tearoom discussions with our 3 hosts, we felt confident that the reference human genome would definitely emerge during the subsequent 10 years. We described our strong desire to create a new method capable of decoding human genomes improved by many orders of magnitude over the methods being used at that time. The Human Genome Project was, after all, going to produce only 1 human genome, and it was evident that very many genomes would be needed to pin down the genetic basis of biological function, dysfunction, and genetic diseases. Equipped with new insights and an enthusiastic vote of confidence from our hosts, we re- turned to Lensfield Road, to the task of reducing our ideas to practice. The early work at Lensfield Road focused on chemically adapting fluorescently tagged nucleotide triphosphates so that they could be incorporated one at a time with complete chemical control (Fig. 2). This also involved screening all DNA polymerases that we could lay our hands on, to understand which classes of polymerases would tolerate the types of changes we wanted to make to our nucleotides. In parallel, we also investigated methods for immobilizing DNA to sur- faces that would allow stable single-molecule fluores- cence imaging. Our early attempts to address these key questions delivered sufficient proof of concept to war- rant a more serious effort to integrate all the necessary factors to build a sequencing system. I thought the best vehicle for this would be via a start-up company, pro- viding the impetus to raise the necessary funds and build an interdisciplinary team that could be scaled up rapidly to develop a robust commercial system. We raised venture capital investment and started Solexa Limited in the summer of 1998. Initially, Solexa oper- 1 Professor, Department of Chemistry, University of Cambridge; 2 CRUK Cam- bridge Institute, University of Cambridge, Cambridge, UK. * Address correspondence to this author at: Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, UK CB2 1EW. E-mail: [email protected]. Received August 22, 2014; accepted September 12, 2014. Previously published online at DOI: 10.1373/clinchem.2014.221747 Clinical Chemistry 61:1 000 – 000 (2015) Reflections 1 http://hwmaint.clinchem.org/cgi/doi/10.1373/clinchem.2014.221747 The latest version is at Papers in Press. Published October 20, 2014 as doi:10.1373/clinchem.2014.221747 Copyright (C) 2014 by The American Association for Clinical Chemistry

Solexa Sequencing: Decoding Genomes on a Population Scale

  • Upload
    s

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Solexa Sequencing:Decoding Genomes on a Population Scale

Shankar Balasubramanian1,2*

It is has now been some 17 years since David Klener-man and I (Fig. 1) conceived a project that wouldsubsequently enable rapid whole human genome se-quencing. The core thinking that led to Solexa Se-quencing was sparked by observations made during ba-sic exploratory research carried out in our laboratoriesduring the mid to late 1990s, and the redirection ofthese observations toward DNA sequencing was an un-intended consequence. The founding ideas and proof-of-concept experiments were carried out at the Univer-sity of Cambridge. The first commercial sequencingsystem was subsequently developed at Solexa Limited, acompany we founded in 1998. Illumina Inc. acquired Sol-exa in early 2007 and made further improvements that ledto several new sequencing systems. Today, the technologyis being used to routinely decode human genomes formedical research, clinical decision-making, and basic sci-ence, all at a cost and speed that make population-scalesequencing practical. The journey from concept to reduc-tion to practice has already gone beyond my expectationsin terms of performance, adoption, and early insights, yetthe era of clinical whole-genome sequencing is perhapsonly just beginning.

In the mid-1990s, in the University Chemical Lab-oratories, Cambridge, David Klenerman and I were us-ing fluorescence single-molecule spectroscopy to ob-serve the synthesis of DNA by a polymerase enzymeusing fluorescently encoded nucleotides. The work it-self and the grant proposal that supported it were fun-damental in nature and said nothing of sequencing.The struggle and attempts to improve the experimentaldesign to optimize what we wished to observe sug-gested to us a means to decode a strand of immobilizedDNA by single-molecule fluorescence imaging. It wasnot immediately clear what the benefits of decodingDNA on a surface would be, until our awareness ofDNA microarrays prompted the realization that thedecoding of immobilized DNA could be made parallelon an array-type format, giving the potential to se-

quence DNA on a massive scale. These purely technicalideas were stimulating, but a greater purpose wasneeded to drive us toward committing them to a seri-ous effort. Klenerman and I were aware of the HumanGenome Project activities at the nearby Sanger Insti-tute at Hinxton Hall, and we made a trip to visit 3 of thekey scientists, David Bentley, Richard Durbin, and JaneRogers, in early 1998. We were most impressed by thescale and organization of the sequencing activities atthe Sanger Institute. It was also striking that even withall this capacity (along with the other participating ge-nome centers in the world), it would take about a de-cade to produce the first human reference genome. Intearoom discussions with our 3 hosts, we felt confidentthat the reference human genome would definitelyemerge during the subsequent 10 years. We describedour strong desire to create a new method capable ofdecoding human genomes improved by many ordersof magnitude over the methods being used at that time.The Human Genome Project was, after all, going toproduce only 1 human genome, and it was evident thatvery many genomes would be needed to pin down thegenetic basis of biological function, dysfunction, andgenetic diseases. Equipped with new insights and anenthusiastic vote of confidence from our hosts, we re-turned to Lensfield Road, to the task of reducing ourideas to practice.

The early work at Lensfield Road focused onchemically adapting fluorescently tagged nucleotidetriphosphates so that they could be incorporated one ata time with complete chemical control (Fig. 2). Thisalso involved screening all DNA polymerases that wecould lay our hands on, to understand which classes ofpolymerases would tolerate the types of changes wewanted to make to our nucleotides. In parallel, we alsoinvestigated methods for immobilizing DNA to sur-faces that would allow stable single-molecule fluores-cence imaging. Our early attempts to address these keyquestions delivered sufficient proof of concept to war-rant a more serious effort to integrate all the necessaryfactors to build a sequencing system. I thought the bestvehicle for this would be via a start-up company, pro-viding the impetus to raise the necessary funds andbuild an interdisciplinary team that could be scaled uprapidly to develop a robust commercial system. Weraised venture capital investment and started SolexaLimited in the summer of 1998. Initially, Solexa oper-

1 Professor, Department of Chemistry, University of Cambridge; 2 CRUK Cam-bridge Institute, University of Cambridge, Cambridge, UK.* Address correspondence to this author at: Department of Chemistry, University of

Cambridge, Lensfield Road, Cambridge, UK CB2 1EW. E-mail: [email protected] August 22, 2014; accepted September 12, 2014.Previously published online at DOI: 10.1373/clinchem.2014.221747

Clinical Chemistry 61:1000 – 000 (2015) Reflections

1

http://hwmaint.clinchem.org/cgi/doi/10.1373/clinchem.2014.221747The latest version is at Papers in Press. Published October 20, 2014 as doi:10.1373/clinchem.2014.221747

Copyright (C) 2014 by The American Association for Clinical Chemistry

ated in virtual mode, with all of the experimental workbeing incubated in our laboratories in the university,where success and further proof of concept led to fur-ther investment and a move to external premises nearto the Sanger Institute in 2000. Details of the technicalaspects have been described elsewhere (1–3 ). The firstwhole genome to be sequenced by the Solexa approachwas that of � � 174 in 2005. An important change fromour original technical vision was to move away fromsingle-molecule to multimolecule sequencing byforming clusters from an array seeded by single mole-cules of DNA. This decision was prompted by a con-versation I had with Sydney Brenner in December2002, just a few days before he received his Nobel Prizein Stockholm, when he convinced me that clusterswould lead to a more pragmatic way of reducing sto-chastic errors that are generated from single-moleculesequencing. He was right! And furthermore, this pro-vided improved signal strength, leading to systemswith relatively inexpensive cameras compared withsingle-molecule detection.

In 2006, Solexa released the first commercial se-quencing system called the Genome Analyzer. It coulddecode a billion bases of human DNA sequence accu-rately in a single run, and this was something that wasproudly announced in early January 2007. Shortly afterthat announcement, Illumina consummated the ac-quisition of Solexa and its technology. In Illumina, thetechnology was subjected to continuous, further im-provement, culminating in the HiSeq platform beingreleased in 2011, initially delivering 600 billion bases

per experimental run, later improving to a trillionbases. Other formats of the technology included thefirst desktop version, the MiSeq, which premiered in2012; the desktop whole genome sequencer, theNexSeq 500, which was released in 2014; and the HiSeqX Ten systems for population-scale sequencing, whichwas also released in 2014.

In November 1997, Klenerman and I had claimed(to investors) that our method would be scalable to abillion bases of DNA per experiment (a calculationoriginally done on the back of a beer mat in a pub). Thiscapacity had been realized in a commercial system inearly January 2007, and then exceeded by 3 orders ofmagnitude by 2014. The cost of accurate whole genomesequencing fell from about 1 billion US dollars for theHuman Genome Project, down to about 1000 US dol-lars today. This is about a million-fold improvement inspeed and cost achieved over 17 years.

I will now reflect on where clinical whole humangenome sequencing stands and where it may be head-ing in the coming years. There are a good number ofclinical areas where genome (or high-depth) sequenc-ing has demonstrable potential to alter the course ofclinical management in the future, and I will mentionjust a few. Perhaps the most obvious case for wholegenome sequencing is cancer, given the genetic causa-tion and genetic uniqueness of every human cancer.Huge strides have been made toward building our un-derstanding of the genetic signatures of cancer. Large-scale efforts such as the International Cancer GenomeConsortium compiled huge and important data sets,

Fig. 1. David Klenerman and me at our “local,” where we have enjoyed much creative discussion (and beer).

Reflections

2 Clinical Chemistry 61:1 (2015)

from which have been extracted common genetic sig-natures of cancers (4 ). Real-time monitoring of theevolution of a patient’s cancer genome, under particu-lar drug treatment, can provide guidance to optimizeand monitor the effectiveness of therapy (5 ). I had not,until recently, appreciated the potential impact ofwhole genome sequencing on rare diseases, which col-lectively afflict 1 in 17 people and are mostly genetic inorigin, with the dominant cases expressed early in life,during childhood years (6 ). There are already somepublished, well-documented examples of diagnosis ofchildhood rare diseases by whole genome sequencingof the child and both parents (trio), and some pediatricclinics use whole genome sequencing routinely as partof their standard of care (7, 8 ). The third area wherewhole genome sequencing is beginning to make an im-pact is with infectious diseases. Whole genome se-quencing of pathogens in a clinical setting can provideearly diagnosis and prevention of outbreaks and islikely to form a routine part of clinical practice (9 ).Noninvasive analysis of cell-free DNA circulating inplasma, by genome-wide or high-depth targeted se-

quencing, has huge potential for prenatal diagnosis offetal genetic disorders (10 ) and also early detection anddiagnosis of cancers (11 ).

It has been extraordinary to experience the trans-formation from concepts that stemmed from basic re-search to a widely used technology, in less than 20years. It has also been remarkable to observe the rapidadoption of genome sequencing, and early signs ofpromise, in various segments of the clinical sector. Al-though one must be cautious not to overstate how farthe successful implementation into routine clinicalpractice will proceed, to me it is now beyond questionthat genome sequencing will be a lasting part of medi-cine. The recent launch by the UK National Health Ser-vice of a project to sequence the whole genomes of100000 UK patients (approximately 0.2% of the pop-ulation) and integrate the resulting data with the clas-sical clinical records (12 ) is a pioneering step towardthe implementation of genomic medicine on a popu-lation scale.

I wish to acknowledge my collaborator David Kle-nerman, with whom I share the founding inventionsand initiation of the Solexa project. I thank coworkerswho were courageous enough to embark on this jour-ney, particularly at the early stages when the risks werehigh. I also acknowledge the talented and dedicatedpeople of Solexa and Illumina, for the commercial-ization and continued improvement of this technol-ogy. I thank the Biotechnology and Biological Sci-ences Research Council of the UK for funding thebasic science that provided the foundation for Solexasequencing.

Author Contributions: All authors confirmed they have contributed tothe intellectual content of this paper and have met the following 3 re-quirements: (a) significant contributions to the conception and design,acquisition of data, or analysis and interpretation of data; (b) draftingor revising the article for intellectual content; and (c) final approval ofthe published article.

Authors’ Disclosures or Potential Conflicts of Interest: Upon man-uscript submission, all authors completed the author disclosure form.Disclosures and/or potential conflicts of interest:

Employment or Leadership: S. Balasubramanian, Solexa.Consultant or Advisory Role: S. Balasubramanian, Illumina.Stock Ownership: S. Balasubramanian, Illumina.Honoraria: None declared.Research Funding: S. Balasubramanian, Biotechnology and Biolog-ical Sciences Research Council of the UK.Expert Testimony: None declared.Patents: US 8,158,346, US 7,772,384, US 7,427,673, US 7,057,026,US 8,623,628, US 8,394,586, US 8,148,064, US 7,785,796, US7,566,537, and US 6,787,308.

Role of Sponsor: No sponsor was declared.

Fig. 2. Concept for solid-phase DNA sequencingdrawn August 18, 1997.

Reflections

Clinical Chemistry 61:1 (2015) 3

References

1. Balasubramanian S. Decoding genomes at highspeed: implications for science and medicine. An-gew Chem Int Ed 2011;50:12406.

2. Balasubramanian S. Sequencing nucleic acids:from chemistry to medicine. RSC Chem Commun2011;47:7281.

3. Bentley DR, Balasubramanian S, Swerdlow HP,Smith GP, Milton J, Brown CG et al. Accuratewhole human genome sequencing using revers-ible terminator chemistry. Nature 2008;456:53–9.

4. Alexandrov LB, Nik-Zainal S, Wedge DC, AparicioSAJR, Behjati S, Biankin AV et al. Signatures ofmutational processes in human cancer. Nature2013;500:415–21.

5. Jones SJM, Laskin J, Li YY, Griffith OL, An J,

Bilenky M, et al. Evolution of an adenocarcinomain response to selection by targeted kinase inhib-itors. Genome Biol 2010;11:R82.

6. Rare Disease UK [home page]. http://www.raredisease.org.uk/ (Accessed October 2014).

7. Saunders CJ, Miller NA, Soden SE, Dinwiddie DL,Noll A, Alnadi N et al. Rapid whole-genomesequencing for genetic-disease diagnosis in neo-natal intensive care units. Sci Transl Med 2012;4:154ra135.

8. Jacob HJ, Abrams K, Bick DP, Brodiel K, DimmockDP, Farrell M et al. Genomics in clinical practice:lessons from the front lines. Sci Transl Med. 2013;5:194cm5.

9. Köser CU, Bryant JM, Becq J, Török ME, Elling-

ton MJ, Marti-Renom MA, et al. Whole-genomesequencing for rapid susceptibility testingof M. tuberculosis. N Engl J Med 2013;369:290 –2.

10. Lo YM, Chan KC, Sun H, Chen EZ, Jiang P, Lun FM,et al. Maternal plasma DNA sequencing reveals thegenome-wide genetic and mutational profile of thefetus. Sci Transl Med. 2010;2:61ra91.

11. Forshew T, Murtaza M, Parkinson C, Gale D, TsuiDW, Kaper F, et al. Noninvasive identification andmonitoring of cancer mutations by targeted deepsequencing of plasma DNA. Sci Transl Med 2012May 30;4:136ra68.

12. Genomics England [home page]. http://www.genomicsengland.co.uk/ (Accessed October 2014).

Reflections

4 Clinical Chemistry 61:1 (2015)