Drug discovery is a prolonged process that uses a variety oftools from diverse fields. To accelerate the process, a number ofbiotechnologies, including genomics, proteomics and a numberof cellular and organismic methodologies, have beendeveloped. Proteomics development faces interdisciplinarychallenges, including both the traditional (biology andchemistry) and the emerging (high-throughput automation andbioinformatics). Emergent technologies include two-dimensionalgel electrophoresis, mass spectrometry, protein arrays, isotope-encoding, two-hybrid systems, information technologyand activity-based assays. These technologies, as part of thearsenal of proteomics techniques, are advancing the utility ofproteomics in the drug-discovery process.
AddressesActivX Biosciences, Inc., 11025 North Torrey Pines Road, Suite 120,La Jolla, CA 92037, USA*e-mail: firstname.lastname@example.org
Current Opinion in Chemical Biology 2002, 6:427433
1367-5931/02/$ see front matter 2002 Elsevier Science Ltd. All rights reserved.
Published online 6 June 2002
Abbreviations2DGE two-dimensional gel electrophoresisABP activity-based probeESI electrospray ionizationICAT isotope-coded affinity taggingMALDI matrix-assisted laser desorption ionizationMudPIT multidimensional protein identification technologyPCR polymerase chain reaction
IntroductionThe drug-discovery process involves many phases, includingtarget identification, lead identification, small-moleculeoptimization, and pre-clinical/clinical development. Efficiencyin this process relies on timely knowledge of biologicalcause-and-effect in the course of disease and treatment,which ultimately rests on knowledge of protein functionand regulation. One of the key steps, target identification,has been fostered through applied genomics, primarilybecause both high-throughput methods and tools that allownucleic acid amplification have enabled large-scale profilingof expressed genes . However, analysis of the informationproduced by genomics, when measured against comparableinformation regarding protein expression, has led to theconclusion that message abundance fails to correlate withprotein quantity . Further, post-translational processessuch as protein modifications or protein degradation remainunaccounted for in genomic analysis . Because bothcell function and its biochemical regulation depend on protein activity, and because the correlation between message level and protein activity is low, the measurementof expression has proven to be inadequate. Consequently,the development of drug-discovery technologies has begun
to shift from genomics to proteomics. This shift hasoccurred not only in target discovery but also in many otherareas of the process, including patient treatment and care. This review focuses on the burgeoning field of proteomics as it applies to drug discovery, which relies uponthe determination of cellular function and regulationthrough large-scale measurement of protein function and interaction.
Proteomics techniquesProteomics, as a scientific field, is defined as the study ofthe protein products of the genome, and their interactionsand functions. Similarly, the proteins expressed at a giventime in a given environment constitute a proteome .From a technology viewpoint, traditional proteomicsinvolves separation of proteins in a proteome, coupled to ameans of identification. Until recently, the tools of choicewere two-dimensional gel electrophoresis (2DGE) for separation, and mass spectrometry (MS) for protein identification. However, 2DGE is limited because it failsto detect proteins at the extremes of separation either bysize or by isoelectric point, and because it is insufficientlysensitive for low-abundance proteins . From the perspective of drug discovery, 2DGE fails in two importantways. First, 2DGE is ineffective for the separation of membrane proteins, which represent nearly 50% of importantdrug targets . Second, low-abundance proteins areunder-represented in a 2DGE analysis, yet often representkey sites of biological regulation. Specifically, it has beenestimated that more than 50% of proteins in cells are of lowabundance [10,11]. Therefore, although 2DGE is powerful,researchers wishing to apply proteomics to drug discoverymust seek innovative ways to measure both protein abundance and activity.
Proteomics presents researchers with a formidable challenge for a number of reasons. First, protein levels varywidely with both cell type and environment . Second,unlike genomics, which can amplifybenefits from theamplification of single genes using the polymerase chainreaction (PCR), protein science has no comparable ampli-fication method . Third, proteomics is complicated bythe fact that the absolute quantity of protein is of limitedinterest to drug discovery, because protein activities arehighly regulated post-translationally . Therefore, proteinscan be abundant, yet possess little activity. Finally, becauseproteins interact functionally in vivo, proteinprotein andproteinsmall-molecule interactions need to be evaluatedin processes of interest .
For drug discovery, the ideal proteomics method would beone that is:
1. Sensitive enough to detect low-abundance proteins.
Proteomics in drug discoveryJonathan Burbaum* and Gabriela M Tobal
2. Able to detect activity over in addition to abundance.
3. Able to detect proteinprotein and proteinsmall-molecule interactions.
4. Easily implemented and performed quickly.
Research in proteomics seeks to satisfy all, or some, ofthese conditions by developing new methods to understand
428 Next-generation therapeutics
Properties of various proteomics techniques.
Analyticaltechnique 2DGE MudPIT Protein chips 2-Hybrid systems ICAT ABPs
Polypeptide chain size
Polypeptide chain size
Surface affinity Protein interaction
potentialIsoelectric point Active site peptide
Abundance Polypeptide chain size
Identity MS MS MS DNA MS MSestablishedby:
Applications Target selection Target selection Target selection Target selection Target selection Target selectionin drug
Protein express- ion profile
Profiling for diagnostics
Drug screening Drug screening Profiling for diagnostics
Drug screening and pan selectivity
Profiling for diagnostics
Detection of protein protein and proteindrug interactions
Pros Can detect 1000s of proteins at once
Can detect 1000s of proteins at once
Can detect 1000s of proteins at once
Can detect potential protein interactions
Circumvents the proteome coverage problems of 2DGE
Circumvents the proteome coverage problems of2DGE
Circumvents the Circumvents the Is easily Is easily Can detect 1000s of proteome coverage problems of
proteome coverage problems of
automated automated proteins at once
2DGE 2DGE Is easily automated
Is easily automated Results in cloned genes for all
Detects protein activity,rather than protein
proteins abundance interrogated
Cons Cannot detect proteins that are very small, large, acidic or basic, poorly soluble and of low abundance
Does not detect abundance, activity, or interactions
At a proteomic scale, will require the cloning of 100s of 1000s of proteins
False negatives and positives
Does not detectpost-translation-al modifications or interactions
Probes needed for all protein families, hence proteomic-scale coverage difficult to ascertain
Difficult to automate
Does not detect interactions in physiologically relevant situations
Limited to proteins localized in the nucleus
protein function and interactions in a biological context.Recent technological advances in this area include developments in separation and identification technologies(i.e. MS, protein-chip technologies, and phage display),bioinformatics, and technologies that detect protein interactions and activities (i.e. activity-based assays, andtwo-hybrid assays) (Table 1).
Mass spectrometryAdvances in MS have allowed the rapid sequencing of proteins . In particular, techniques that enable thetransfer and charging of large molecules such as proteinsand peptides peptides, as well as transfer into a gaseousphase (e.g. electrospray ionization [ESI] and matrix-assistedlaser desorption ionization [MALDI]), have allowed proteins to be analyzed by MS . ES ionizationESIproduces a fine spray of charged particles through acharged needle, whereas MALDI involves crystallizing thesample of interest within a matrix that can be vaporizedquickly using a laser pulse . Two general MS methodsare employed for protein identification using MS. Thefirst, peptide-mass fingerprinting, compares the pattern ofmolecular weights of peptides generated from a proteolyticdigestion to theoretical fingerprints derived from proteindatabases. The second method, tandem mass spectrometry(MSn), selects peptides of interest and uses a secondaryfragmentation process to determine the peptide sequence,which is then identified using protein sequence databases.Further technology improvements, including ion-sourceminiaturization (for sensitivity) and detectors (for massaccuracy), have greatly expanded the method . Thecentral importance of MS in proteomics is attributable tothe large amounts of structural data that the method cancreate at great speed, and is demonstrated by the fact thatmost separation and detection methods rely on MS for protein identification.
BioinformaticsAs a method of analysis, MS is vital to proteomicsresearchers because it identifies their data, playing thesame role as nucleotide sequencing in genomics.Interpreting the roles of the identified proteins, however,is equally important. Bioinformatics provides importanttools that systematize the data produced by genomics andproteomics to enable computer-aided data interpretation.These technologies have increased the speed and thoroughness of data analysis to a degree that would otherwise be impossible. Bioinformatics methods not onlycatalog experiments, but also provide algorithms for dataanalysis and comparisons in numerous contexts, includingprotein and gene identification, protein structurefunctionrelationship predictions, and functional connectionsbetween proteins . This type of organization andanalysis is crucial to all areas of the drug-discovery processfrom the identification of novel drug targets, in whichinformatics technology enables the mining of DNA andprotein sequences databases for analysis of similarities, toscreening of active compounds in silico by virtual screening
and/or docking of compound collections. Further along inthe drug-development process, informatics enables thefacile optimization of leads in drug design, and the selectionof pre-clinical candidates . The need for such automatedanalyses of large data sets has been met by growth in bothcomputing power and systematic databases .
Microfabrication and miniaturizationMicrofabrication has also played an important role in functional the development of proteomics technology.Miniaturization has two advantages. First, it providesminiaturized instruments that can improve the developmentand automation of current techniques through reduction ofsample amount, increased sample numberthroughput, andincreased sensitivity. Second, it provides miniaturizedcontainers to segregate and identify samples. In thebroadest sense, such microfabricated containers encompassboth protein-array technology (discussed later in thisreview) and lab-on-a-chip technology. The latter refers toenclosed fluidics devices that create channels and reactionchambers on substrates such as glass. Although in proteomics,microfabrication of analytical devices is an emerging field,it has made important progress towards developing lab-on-a-chip MS technologies, protein separation techniques,and is the basis of protein-array technologies . In thefuture, proteomics will use many techniques that wereestablished on a macroscopic level, and become stream-lined by microfabricated instrumentation.
Types of information from proteomicsThe methods for producing proteomic data can be dividedinto two major categories: the classical approach thatstrives to catalogue all proteins, and the functionalapproach that seeks to classify proteins to be studied by properties such as activity or affinity. The classicalapproach tends to provide information about identity and,in some cases, abundance. The functional approach providesidentity and abundance, as well as information about protein function and, in some cases, protein interactions.
Proteomics in drug discovery Burbaum and Tobal 429
The proteomics information pyramid describes the escalatingcomplexity of different types of information collected usingproteomics techniques.
Current Opinion in Chemical Biology
Both approaches provide valuable information that can beintegrated into a pyramid of proteomics information(Figure 1), with the classical approach providing the baseof the pyramid. The most basic, necessary informationabout a proteome is the identity of the comprised proteins.The next level of information is abundance. The classicalapproach tends to concentrate on detecting those twofoundation layers of the proteomics information pyramid.The next two layers of the pyramid involve informationproduced by more functional techniques that either measure the activity of the proteins directly or interrogate theinteractions of proteins, or both. Although activity, the nextlayer in the pyramid, may be related to abundance, pro-teins are often post-translationally regulated. Therefore,functional techniques that clarify the relationship betweenabundance and activity, and determine how active proteinsare in biologically relevant circumstances, are important inunderstanding the complexities of the proteome. The nextlayer of the pyramid involves information produced by techniques that determine protein interactions. This information is the most complex and difficult to acquire, asproteins may interact with many different proteins undervarious circumstances. Information from all areas of theproteomics information pyramid is important for a well-rounded proteomics approach to drug discovery.
For classical proteome analysis, a chemical modificationstrategy called isotope-coded affinity tagging (ICAT) hasbeen developed to catalogue and quantify all the proteinsin a proteome . ICAT uses a reagent with three components: a reactive group to covalently bind aminoacids (e.g. cysteines), an isotopically light or heavy linker,and an affinity tag (e.g. biotin). The isotopic differencespermit protein abundance comparisons between two samples.In this process, samples for comparison are alkylated with either light or heavy reagent. The two samples arethen combined, undergo proteolytic digestion, and thelabeled peptides are isolated using the affinity tag. Thesample is then analyzed by LC/MS, with quantitation (notnormally a feature of MS m...