Upload
thomas-jarvie
View
216
Download
2
Embed Size (px)
Citation preview
TECHNOLOGIES
DRUG DISCOVERY
TODAY
Next generation sequencingtechnologiesThomas Jarvie454 Life Sciences, 20 Commercial Street, Branford, CT 06405, USA
Drug Discovery Today: Technologies Vol. 2, No. 3 2005
Editors-in-Chief
Kelvin Lam – Pfizer, Inc., USA
Henk Timmerman – Vrije Universiteit, The Netherlands
Emerging technologies
Section Editors:Steve Gullans – RxGen, Inc., New Haven, CT, USARobert Zivin – Johnson and Johnson, New Brunswick, NJ,USA
From the investigation of disease-associated loci in
humans, to monitoring the changing genomes of
pathogenic viruses and bacteria, sequencing is a power-
ful and versatile tool. A new generation of sequencing
technologies will increase the speed and lower the cost
of sequencing, and promises to expand the utility of
sequencing in drug discovery and development.
Introduction
DNA sequencing is a central technology in our understanding
of biology and plays a significant, supporting role in drug
discovery and development. The Human Genome Project
and the resequencing of selected regions of the human
genome in disease association studies have contributed to
a refined understanding of the molecular basis of many
diseases. Sequencing of pathogenic microbes and drug resis-
tant strains has aided in our understanding of drug resistance,
the development of drug resistance over time and the
mechanism of action of new drugs. Additionally, the sequen-
cing of full viral genomes, or a subset of the genomes, derived
from clinical samples provides a picture of the course of
infection over time, response to antiviral therapies and an
insight into possible strategies for further drug development.
This review focuses on the next generation of sequencing
technologies and the potential for these technologies to
revolutionize pharmaceutical development.
The need for new sequencing methods
Electrophoresis-based, Sanger sequencing technology is the
most commonly used technology for sequencing and was the
E-mail address: T. Jarvie ([email protected])
1740-6749/$ � 2005 Elsevier Ltd. All rights reserved. DOI: 10.1016/j.ddtec.2005.08.003
mainstay of the Human Genome Project. A look into the Gold
Database (http://www.genomesonline.org/) shows that San-
ger-based sequencing has build a solid foundation of genomic
sequence that the next generation of technologies can build
upon through resequencing and comparative genomics stu-
dies. In addition to the whole genome sequencing, Sanger-
based sequencing has been used to sequence countless ampli-
cons for applications such as verification of clones, searching
for SNPs, forensic analysis and resequencing. Over the past 10
years, significant improvements in Sanger technology have
cut the cost of sequencing from �$10/kb to �$1/kb. Over the
same period of time, the throughput for a state of the art
instrument has increased from <10 kb/h to �100 kb/h. The
standard method of sequencing, however, might be nearing
the end of the line for dramatic cost reductions and through-
put increases.
The most high profile example of the drive to lower
sequencing cost is the goal of a $1000 human genome, a
goal that would enable the sequencing of individual human
genomes as a component of diagnostic and preventative
medicine, or personalized medicine. In addition to human
genome resequencing, several other applications for low cost,
high-throughput sequencing are discussed in the literature. A
few of the applications with pharmaceutical relevance are
sequencing of multiple strains of pathogenic bacteria to
monitor drug resistance and pathogenicity in bacteria, rese-
quencing selected regions to search for human variation in
www.drugdiscoverytoday.com 255
Drug Discovery Today: Technologies | Emerging technologies Vol. 2, No. 3 2005
populations [1,2], monitoring the onset of drug resistance in
HIV [3,4] or HCV, profiling tumors to guide cancer therapies
[5] and discerning the mechanism of action of antibiotics [6].
For sequence-based studies to play a more central role in
pharmaceutical research, the cost and time associated with
sequencing must be reduced. The flexibility in experimental
design afforded by quicker and more cost efficient sequencing
holds the promise of not only making the current experi-
ments more feasible, but will generate new sequence based
experiments.
Several academic labs, start-up companies and large instru-
ment companies are all developing a variety of technologies
aimed at lowering the cost and increasing the throughput of
sequencing. In October of 2004, The US National Human
Genome Research Institute (NHGRI) awarded $38 million to
18 companies and academic laboratories to develop the next
generation sequencing technologies (http://www.genome.-
gov/12513162). This government money along with private
investment has led to a range of technologies, some of them
usable today and some still in the R&D phase of their devel-
opment. Some of the new sequencing technologies are well
suited to resequencing, whereas others are more flexible and
suitable for resequencing and de novo sequencing. In rese-
quencing, one is performing a sequence-based comparison of
an entire genome or a subset of the genome and looking for
differences as compared to a previously determined sequence.
The known sequence is either used as a reference, or in
sequencing by hybridization, is used as the basis of the
resequencing technology. In de novo sequencing a new gen-
ome (or other sequence) is sequenced and assembled without
direct comparison against a known sequence. As a result, de
novo sequencing technologies are suitable for new genetic
material and genetic material that differs markedly from a
previously sequenced strain [7,8]. All de novo technologies
can be used for resequencing. This paper discusses four broad
classes of new sequencing technologies that are all capable of
de novo sequencing: microelectrophoretic methods, sequen-
cing by hybridization, real time detection of single molecules
and cyclic-array sequencing.
New methods for sequencing
Microelectrophoretic methods
Microelectrophoretic methods have the advantage of
employing and building upon the existing capillary electro-
phoresis, Sanger sequencing technologies. The advance of
microelectrophoretic technology, as compared to the com-
mercially available capillary sequencing technologies, come
from scaling down the size of the electrophoresis platform
(and therefore the cost of reagents and potentially, capital
equipment) and, frequently, scaling up of the number of
lanes used in the electrophoresis [9–11]. Additional efforts
have integrated the sample preparation and sequencing pro-
cesses onto one single microfabricated device [12,13]. The
256 www.drugdiscoverytoday.com
majority of the work to date has been in academic labs,
although Shimadzu Biotech (http://www.shimadzu-biotech.-
net/) has plans to introduce a commercially available instru-
ment in the near future, and Microchip Biotechnologies
(http://microchipbiotech.com/) received a large NHGRI
grant to develop an instrument.
Sequencing by hybridization
Sequencing by hybridization utilizes the microarray technol-
ogies that are the basis of much of the gene expression work
commonly performed as a part of drug development. Hybri-
dization sequencing works by hybridizing single-stranded
sample DNA to a microfabricated array of DNA oligonucleo-
tide probes. Each base in a sequence is queried by changing
the middle base in the oligonucleotide probe to all four
possibilities (A, C, G and T) while keeping the remaining
sequence unchanged. The sequence of the DNA is determined
by which of the four probe oligonucleotide yields the stron-
gest hybridization signal. Although the amount of sequence
that can be generated is high, the readlength is limited to the
length of the oligonucleotide probe. Additionally, although
investigation of single nucleotide changes are straightfor-
ward, more complex changes in a genome, such as insertion
or deletion of a codon (or codons), multiple point mutations
within close proximity of one another, and insertion or
deletion of large segments of genetic material (such as entire
ORFs) are challenging. Although this technology has been
applied to both resequencing and de novo sequencing [14–19],
the strength of the technology and its greatest potential is in
the massive resequencing of a limited number of genomic
positions. Several companies such as Illumina (http://www.il-
lumina.com/), Perlegen (http://perlegen.com/), Nimblegen
(http://nimblegen.com/) and Parellele (http://www.paralle-
lebio.com/) (recently purchased by Affymetrix http://affyme-
trix.com/index.affx) offer instruments and/or services.
Real-time detection of single molecules
The most elegant of sequencing technology, should it ever
become viable, is the direct detection of single molecules.
This technology would allow for fast sequencing of small
quantities of DNA. Nanopore sequencing and directly mon-
itoring the incorporation of nucleotides by a polymerase are
two fundamental approaches that are under consideration for
direct single molecule detection.
Direct monitoring of nucleotide incorporation operates
on the principal of watching an engineered polymerase
synthesize the second strand of DNA. The nucleotides are
distinguished from one another by differing fluorescent
labels. Among the challenges in the direct monitoring
technology is one of achieving sufficient signal from single
nucleotide incorporation events in a background of labeled
nucleotides and capturing all of the nucleotide incorpo-
ration events. Visigen (http://www.visigenbio.com/) and
Vol. 2, No. 3 2005 Drug Discovery Today: Technologies | Emerging technologies
LI-COR (http://www.licor.com/) are two companies work-
ing on this technology.
In the nanopore sequencing methodology, DNA is mon-
itored as it passes through a nanometer scale surface pore. The
sequencing process relies upon the ability to translate the
differing chemical and physical properties of each base into
electrical signals as the nucleotides pass through the nano-
pore [20,21]. To date, the promise of this technology is still
speculative. Various academic groups have reported the abil-
ity to monitor fragments as they pass through the pores,
although single base sequencing is still elusive. Extensive
work is underway on improved nanopores and detection
schemes [22,23]. Agilent (http://www.chem.agilent.com/
Scripts/Phome.asp) is developing the technology.
Cyclic-array sequencing
The category of cyclic-array sequencing is composed of sev-
eral different approaches. All of the various approaches are
broadly classified into either methods that sequence ampli-
fied molecules or those that sequence from single molecules.
Regardless of whether the sequencing will occur on a single
molecule or amplified molecules, all of the methods utilize
the physical separation of the DNA fragments to be
sequenced in an array and the multiple cycles of reagent
addition/enzymatic manipulation that are responsible for the
sequence generation. The set-up of the array can be either
ordered or random: the important point is the physical
separation of distinct fragments from one another. The
majority of the cyclic-array sequencing technologies are
based on a stepwise build-up of the sequence by a polymerase
(sequencing-by-synthesis, Fig. 1) coupled with a detection
mechanism, although one method, the ‘massively parallel
Figure 1. Sequencing-by-synthesis is the underlying method used by many of
built upon sequencing-by-synthesis are 454 Life Sciences, Agencourt, Genovoxx
number (either clonally amplified molecules or single molecules) and detection
nucleotides to a primed, single strand of DNA. The specific example in this fig
signature sequencing’, or MPPS [24] from Lynx (http://
www.lynxgen.com) employs cyclic restriction digestion
and ligation. Several companies are working on amplified
molecule cyclic-array sequencing-by-synthesis.
The first of the amplified molecule cyclic-array technolo-
gies to be commercialized is the sequencing-by-synthesis
method from 454 Life Sciences (http://www.454.com/)
[25]. This technology relies on the clonal amplification of
single molecules (either single-stranded or double-stranded
DNA) on capture beads isolated in an emulsion, and the
subsequent highly parallel sequencing of the clonally ampli-
fied DNA on beads deposited into the picoliter scale wells of a
PicoTiterPlateTM. The 454 sequencing instrument is currently
capable of sequencing a minimum of 20 Mb, or 200,000
fragments with a median 100 base-pair readlength, in a
4.5-h run. The detection scheme for the 454 Life Sciences
instrument is based on the conversion of pyrophosphate,
which is released by the polymerase mediated addition of a
nucleotide to the complimentary strand, into light via an
enzyme cascade. In May 2005, 454 Life Sciences entered into
a worldwide distribution deal for the sequencing instrument
and reagents with Roche Diagnostics (http://www.roche-
applied-science.com/). An overview of the sequencing pro-
cess is presented in Fig. 2.
Two other companies, Agencourt and Solexa, are promi-
nent in the amplified molecule cyclic-array field. Agencourt
(http://www.agencourt.com/), a sequencing service com-
pany, was purchased by Beckman Coulter (http://www.
beckmancoulter.com/) in late April 2005. As part of the
purchase, a new company, Agencourt Personal Genomics
was spun off to accelerate development of a new platform.
The platform is based upon the fluorescent detection of
the next generation sequencing technologies. The companies
, Helicos, Nanofluidics, Solexa and Visigen. Although the specifics of copy
schemes vary, all methods rely on polymerase-mediated addition of
ure is clonally amplified DNA fragments attached to a bead.
www.drugdiscoverytoday.com 257
Drug Discovery Today: Technologies | Emerging technologies Vol. 2, No. 3 2005
Figure 2. Schematic overview of the 454 Life Sciences instrument and sequencing process. The process begins with large DNA molecules
(such as genomic DNA) that are fragmented and subsequently ligated with universal adaptors before clonal amplification on beads, deposition on a
PicoTiterPlateTM, and pyrophosphate based sequencing-by-synthesis. A CCD camera captures the light generated from the sequencing reaction.
The resulting signals are converted into sequence. An alternative input to the process is small fragments, such as exons, that are amplified with
tailed-primers containing the 454 universal adaptors. These tailed-primer amplicons enter the process and the clonal amplification step.
single-nucleotide extensions of DNA fragments attached to
beads. Solexa (http://www.solexa.com) (merged in early 2005
with Lynx) is also working on fluorescent-based detection of
amplified DNA. Solexa has a planned instrument release
schedule of early 2006.
The cyclic-array, amplified molecule methods all rely upon
clonal amplification of the fragments before sequencing [26].
The clonal amplification is achieved either by isolation of the
molecules by means such as an emulsion or an acrylamide
matrix [27–29], or through tagging and subsequent separa-
tion of molecules. As a result, these methods, although not
strictly single-molecule detection methods, have the ability
to sequence from single molecules that originated in a com-
plex mixture.
A second class of cyclic array technologies is aimed at single
molecule detection. These methods directly sequence from
single molecules and thus avoid the cost associated with
258 www.drugdiscoverytoday.com
either cloning or PCR amplification. The sequencing
approach to cyclic-array single molecule sequencing is similar
to the amplified molecule sequencing-by-synthesis methods,
the difference is a more sensitive detection scheme that
avoids the need for multiple molecules to provide sufficient,
detectable signal. All of the single molecule methods, includ-
ing those being worked on by several academic labs and
companies such as Genovoxx (http://www.genovoxx.com/),
Nanofluidics (http://www.nanofluidics.com/) [30] and
Helicos (http://www.helicosbio.com/) [31], proceed by using
the step-wise incorporation of fluorescent nucleotides. The
difference between the methods lies in the different signal
detection schemes and the details of the biochemistry.
Conclusion
Several promising technologies will revolutionize the role of
sequencing in pharmaceutical research over the next few
Vol. 2, No. 3 2005 Drug Discovery Today: Technologies | Emerging technologies
Table 1. Overview of next generation de novo sequencing technologies
Technology Microelectrophoretic
sequencing
Sequencing by
hybridization
Single molecule,
real-time detection
Cyclic-array sequencing
Company Shimadzu Biotech,
Microchip Biotechnologies
Perlegen, Parallele, Affymetrix,
Nimblegen, Illumina
Visigen, LI-COR, Agilent 454 Life Sciences, Agencourt,
Solexa, Genovoxx,
Nanofluidics, Helicos
Pros - Sequencing by
electrophoresis is well established
- Ideal for resequencing
of known point mutations
- Ability to detect single
molecules in complex mixture
- Ability to detect single
molecules in complex
mixture- Long readlengths - Commercially available - Suitable for resequencing
and de novo - Suitable for resequencing
and de novo
- Low error rate - Commercially available
Cons - Potentially not as high-throughput
as other methods
- Current readlengths less than
electrophoretic methods
- Still not a commercially
viable technology
- Current readlengths
less than electrophoretic
methods- Potentially not as low cost
as other methods
- De novo sequencing is slow
References [4,9–13] [14–19] [20–23] [6,24,26–32]
Outstanding issues
years (Table 1). The first of the new generation instruments is
commercially available from one cyclic-array sequencing
based company. Microarray companies are introducing an
increasing variety of microarrays for sequencing. Other man-
ufactures plan the commercial release of instruments over the
next few years. Additionally, access to several the low-cost
high-throughput technologies is already available as a service
from some companies (454 Life Sciences, Paralelle, Perlegen
and Solexa/Lynx).
The idea behind the drive for low cost, high-throughput
sequencing has been to lower the cost of sequencing enough
to enable personalized human genome sequencing. Along
the way to this lofty goal, many potential applications of the
technology are already available or will be enabled soon as the
bioinformatics development races to keep up with the large
quantities of data and the new possibilities that inexpensive
and quick sequencing allow. For example, affordable micro-
bial sequencing, either resequencing for SNP identification or
de novo sequencing of more variable strains, enables compara-
tive genomics on strains of varying virulence, drug resistance
and host species preference.
The molecular based cloning employed in the cyclic-array
methods and the real-time, single molecule methods, with
their inherent ability to sequence from single molecules in a
complex mixture, open up the possibility of massive over-
sampling of specific regions of interest or tagged sequences
and do so in a quick and cost effective manner. The first
Related articles
Shendure, J. et al. (2004) Advanced sequencing technologies: methods
and goals. Nat. Rev. 5, 335–344
Marziali, A. and Akeson, M. (2001) New DNA sequencing methods.
Ann. Rev. Biomed. Eng. 3, 195–223
demonstration of sequencing from complex mixtures has
sensitivity below 1% in a complex mixture of HIV quasispe-
cies [32]. This level of sensitivity is achievable by the Sanger
based methods only by cloning of fragments into bacteria.
Microarray based sequencing methods are not as sensitive or
as quantitative as the direct sequencing of clonally amplified
single molecules. Examples of the utility of this technology
are applications such as disease associated exons within a
population, investigation of viral quasi-species present
within a patient as a function of time and drug response,
deep sequencing of microRNAs, querying somatic mutations
in tumor samples, and monitoring pathogens for changes in
their genome as a function of drug resistance or changing
virulence.
The applications enabled by the next generation sequen-
cing technologies and their usefulness to the drug discovery
and development process are only beginning to be discov-
ered. Once the technology is widely available and the power/
promise of the technology is known, additional applications
will be developed. The ability of many of the technologies to
sequence de novo opens up a wide array of possibilities for new
discovery and creative approaches to important and unad-
dressed problems in pharmaceutical research and develop-
ment.
� Long readlengths, comparable to traditional electrophoretic
methods, are a challenge for some of the new technologies.
� The high volume of sequencing data that will result from low-cost
high-throughput technology presents demands on data handling and
bioinformatics/interpretation infrastructure.
� All new technologies are still too expensive and too time consuming
to enter into the range of personalized human genome sequencing
(the $1000 genome).
www.drugdiscoverytoday.com 259
Drug Discovery Today: Technologies | Emerging technologies Vol. 2, No. 3 2005
References1 Hinds, D.A. et al. (2005) Whole-genome patterns of common DNA
variation in three human populations. Science 307, 1072–1079
2 Hardenbol, P. et al. (2005) Highly multiplexed molecular inversion probe
genotyping: over 10,000 targeted SNPs genotyped in a single tube array.
Genome Res. 15, 269–275
3 Gerhardt, M. et al. (2005) In-depth, longitudinal analysis of viral
quasispecies from an individual triply infected with late-stage human
immunodeficiency virus type 1, using a multiple PCR primer approach. J.
Virol. 79, 8249–8261
4 Kapoor, A. et al. (2004) Sequencing-based detection of low-frequency
human immunodeficiency virus type 1 drug-resistant mutants by
an RNA/DNA heteroduplex generator-tracking assay. J. Virol. 78,
7112–7123
5 Kwak, E.L. et al. (2005) Irreversible inhibitors of the EGF receptor may
circumvent acquired resistance to gefitinib. Proc. Natl. Acad. Sci. 102, 7665–
7670
6 Andreis, K. et al. (2005) A diarylquinoline drug active on the ATP synthase
of Mycobacterium tuberculosis. Science 307, 223–227
7 Bhattacharyya, A. et al. (2002) Draft sequencing and comparative
genomics of Xylella fastidiosa strains reveal novel biological insights.
Genome Res. 12, 1556–1563
8 Goo, Y.A. et al. (2004) Low-pass sequencing for microbial comparative
genomics. BMC Genomics 5, 3
9 Emrich, C.A. et al. (2002) Microfacricated 384-lane capillary array
electrophoresis bioanalyzer for ultrahigh-throughput genetic analysis.
Anal. Chem. 74, 5076–5083
10 Koutny, L. et al. (2000) Eight hundred-base sequencing in a
microfabricated electrophoresis device. Anal. Chem. 72, 3388–3391
11 Aborn, J.H. et al. (2005) A 768-lane microfabricated system for high-
throughput DNA sequencing. Lab. Chip. 5, 669–674
12 Lagally, E.T. and Mathies, R.A. (2004) Integrated genetic analysis
microsystems. J. Phys. D: Appl. Phys. 37, R245–R261
13 Paegel, B.M. et al. (2003) Microfluidic devices for DNA sequencing:
sample preparation and electrophoretic analysis. Curr. Opin. Biotechnol. 14,
42–50
14 Zwick, M.E. et al. (2004) Microarray-based resequencing of multiple
Bacillus anthracis isolates. Genome Biol. 6, R10
15 Sougakoff, W. et al. (2004) Use of a high-density DNA probe array for
detection mutations involved in rifampicin resistance in Mycobacterium
tuberculosis. Clin. Microbiol. Infect. 10, 289–294
260 www.drugdiscoverytoday.com
16 Miatra, A. et al. (2004) The Human MitoChip: a high-throughput
sequencing microarray for mitochondrial mutation detection. Genome
Res. 14, 812–819
17 Gonzalez, R. et al. (2004) Detection of human immunodeficiency virus
type 1 antiretroviral resistance mutations by high-density DNA probe
arrays. Clin. Microbiol. 42, 2907–2912
18 Read, T.D. et al. (2002) Comparative genome sequencing for discovery of
novel polymorphisms in Bacillus anthracis. Science 296, 2028–2033
19 Poly, F. et al. (2005) Genomic diversity in Campylobacter jejuni:
identification of C. jejuni 81-176-specific genes. J. Clin. Microbiol. 43, 2330–
2338
20 Deamer, D.W. and Akeson, M. (2000) Nanopores and nucleic acids:
prospects for ultrarapid sequencing. Trends Biotechnol. 18, 147–151
21 Meller, A. et al. (2003) Dynamics of polynucleotide transport through
nanometer pores. J. Phys. Condens. Matter 15, R581–R607
22 Chen, P. et al. (2004) Atomic layer deposition to fine-tune the surface
properties and diameters of fabricated nanopores. Nano Lett. 4, 1333–1337
23 Karhanek, M. et al. (2004) Single molecule detection using nanopipettes
and nanoparticles. Nano Lett. 5, 403–407
24 Brenner, S. et al. (2000) Gene expression analysis by massively parallel
signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 18,
630–634
25 Margulies, M. et al. (2005) Genome sequencing in open microfabricated
high density picoliter reactors. Nature 10.1038/nature03959
26 Dressman, D. et al. (2003) Transforming DNA molecules into fluorescent
magnetic particles for detection and enumeration of genetic variations.
Proc. Natl. Acad. Sci. 100, 8817–8822
27 Mitra, R.D. and Church, G.M. (1999) In situ localized amplification and
contact replication of many individual DNA molecules. Nucleic Acids Res.
27, e34
28 Mitra, R.D. et al. (2003) Digital genotyping and haplotyping with
polymerase colonies. Proc. Natl. Acad. Sci. 100, 5926–5931
29 Mitra, R.D. et al. (2003) Fluorescent in situ sequencing on polymerase
colonies. Anal. Biochem. 320, 55–65
30 Levene, M.J. et al. (2003) Zero-mode waveguides for single-molecule
analysis at high concentrations. Science 299, 682–685
31 Braslavsky, I. et al. (2003) Sequence information can be obtained from
single DNA molecules. Proc. Natl. Acad. Sci. 100, 3960–3964
32 Simons, J. et al. (2005) Ultra-deep sequencing of HIV from drug resistant
patients. XIV International HIV Drug Resistance Workshop, June 7–11,
Quebec City, Canada