Proteomics: drug target discovery on an industrial scale

  • Published on

  • View

  • Download

Embed Size (px)


  • Terence E. RyanScott D. Patterson*Celera Genomics Group,

    45 West Gude Drive,

    Rockville, MD 20850, USA.

    *e-mail: scott.patterson@ S45

    Trends in Biotechnology Vol. 20 No. 12 (Suppl.), 2002 A TRENDS Guide to Proteomics | Review

    0167-7799/02/$ see front matter 2002 Elsevier Science Ltd. All rights reserved. PII: S0167-7799(02)02089-9

    Proteomics represents the systematic and broad applica-

    tion of technologies that have traditionally supported the

    field of protein biochemistry. In its most common appli-

    cation, proteomics is used to characterize differences in

    protein expression between biological specimens.

    Although proteomics technologies can be used to catalog

    protein differences after metabolic perturbation, its great-

    est therapeutic value lies in the comparison of cells from

    normal tissue with those representing a disease state

    (e.g. [1]). Such comparisons could enable the identifica-

    tion of disease-specific biomarkers that could be used for

    diagnostic or prognostic tests, or target proteins that have

    the potential for drug intervention.

    Owing to the variability of natural protein expression in

    the same tissue between individuals (owing to inherent

    genetic, metabolic, diurnal, environmental and nutritional

    differences, among others), the disease specificity of an

    observed protein-expression differential needs to be rigor-

    ously demonstrated.This can be achieved by characterizing

    the frequency of a differential expression across a range of

    samples taken from many individuals with the disease, as

    well as by the relative absence of the differential expression

    in other normal tissues in the same individual. These

    requirements, which demonstrate the specificity for the

    disease state, require an experimental design that can

    encompass large numbers of experimental samples and

    controls, and effective interassay comparisons between

    individual samples and among sample groups.The number

    of individual samples required to generate statistical confi-

    dence results from a complex mixture of biological and

    laboratory process considerations. For example, a single

    disease can be represented by various degrees of disease

    progression or characteristic phenotype. Patients with acute

    myeloid leukemia (AML) can generally be classified into

    one of seven FrenchAmericanBritish (FAB)-AML classi-

    fication disease groups [2,3]. Examination of AML sam-

    ples requires that they be grouped accordingly to produce

    meaningful results, or that a larger number of AML

    samples are examined to identify pan-disease patterns of

    differential protein expression. In addition to relevant

    disease subtypes, samples need to be grouped according

    to tumor staging, degree of metastasis, and known

    genetic lesions. The reproducibility (variability in repli-

    cate processes) and sensitivity of laboratory processes also

    contributes to the number of samples needed for exami-

    nation; differentials at the limits of signal-to-noise identi-

    fication will require a greater number of samples to

    achieve statistical importance. Statistical evaluation of the

    differentially expressed proteins will establish appropriate

    levels of confidence for each observation; for the processes

    outlined here, we have found that 20 samples per study

    point is usually sufficient. However, in general, the greater

    the frequency of representation of a particular protein-

    expression differential in a range of samples correlates not

    only with the degree of statistical significance, but also

    with the level of interest in that differential as representative

    of the disease process under study. The rigor required for

    these comparisons suggests that proteomics approaches

    need to become standardized within each laboratory and,

    in addition, the laboratory should be able to process the

    requisite number of samples required to provide statistical

    confidence in the results.These requirements make a factory

    approach to proteomic discovery essential: a facility where

    standard protocols are applied to large numbers of samples,

    with the product being the generation of information

    with a high statistical confidence.

    Proteomics: drug target discoveryon an industrial scaleTerence E. Ryan and Scott D. PattersonThe discovery of targets that are sufficiently robust to yield marketable therapeutics is an enormous challenge. Through theyears, several approaches have been used with varying degrees of success. These include target-independent screening oftumor-derived cell lines (disease-dependent), reductionist approaches to identifying crucial elements of disease-affectedpathways, disease-independent screening of homologs of previously drugged targets, disease-dependent globalexamination of gene transcript levels, and disease-dependent global examination of protein expression levels. Theseendeavors have been enabled by several major advancements in technology, most recently, the sequencing of the humangenome. This review identifies the technical issues to be addressed for industrial-scale protein-based discovery in theidentification of targets for therapeutic (or diagnostic) intervention. Such approaches aim to direct discovery in a way thatincreases the probability of robust target identification, and decreases the probability of failure owing to variable expressionin this emerging field.

  • http://www.trends.comS46

    Trends in Biotechnology Vol. 20 No. 12 (Suppl.), 2002Review | A TRENDS Guide to Proteomics

    Standardization and methodologyThe need for standard protocols that are reproducible in

    both their execution and data output heightens the

    importance of methodology in large-scale proteomics.

    The complexity of biological samples, as well as the capa-

    bilities of the current-generation mass spectrometers,

    enables the separation of proteins or peptides into discrete,

    analyzable entities. Traditionally, this high-resolution

    separation step has been achieved using 2D gel elec-

    trophoresis (Fig. 1). Comparisons between samples must

    therefore be made on the basis of separate 2D gel experi-

    ments; this requires an extraordinary level of care to

    ensure that protocols for gel preparation, sample prepara-

    tion, sample loading, electrophoresis conditions, and pro-

    tein spot staining and identification, precisely match [4].

    Chromatographic approaches are increasingly used for

    proteomic studies as they provide this precision, are rela-

    tively easy to automate, and the instrument software is

    robust (Fig. 1).

    The complexity of protein mixtures from cellular

    lysates or fractions can undergo only limited reduction

    using ion exchange, molecular sieving, or affinity chro-

    matography. However, mixtures of proteins from limited

    chromatographic fractionation can be proteolyzed as a

    group, and the resulting peptides separated by reverse-

    phase chromatography with online mass spectrometric

    detection [59]. This complex-mixture method of gen-

    erating peptides for tandem mass spectrometric identifi-

    cation has been widely used in academia and industry

    because of its reproducibility and ease of automation [10].

    It has gained further favor over gel-based methods

    because it can detect low-abundance peptides [11], and

    also gives a more complete representation of cellular

    proteins (particularly membrane proteins). This review

    discusses issues surrounding the large-scale application of

    complex-mixture proteomic analysis for drug target dis-

    covery: the first step in the drug discovery and develop-

    ment pipeline (Fig. 2). However, it should be noted that

    the platform described here can be applied not only to the

    initial stages of the pipeline, but also to all subsequent

    steps (except filing and marketing).

    Normal Disease

    Enrichment of cell type, subcellular organelle or protein class

    Image analysis

    Selected spot excision

    Digestion and/or MALDI-MS



    Digestion of proteins

    Peptide capture (e.g. ICAT-peptides on avidin)

    LCMS (quantitative analysis)



    Stable isotope labeling (e.g. ICAT)of samples separately (combine)

    d0-ICAT d8-ICAT

    TRENDS in Biotechnology

    Figure 1. The two most commonly used analytical approaches in proteomics

    Complex mixture analysis using 2D gel electrophoresis, liquid chromatography and isotope-coded affinitytag (ICAT) reagent are the current standards for analysis of protein expression levels on a broad scale.

    TRENDS in Biotechnology

    Datamanagementand analysis

    Data captureQuantitation IdentificationSeparation Fractionation

    Preparation for analysis Sample processing Data analysis

    Target discovery





    Lead IDoptimization



    Filing, salesand marketing

    Figure 2. Workflow for large-scale proteomics approach for target discovery within a pharmaceutical setting

    Although the schematic infers proteomics is applied only in target discovery, the platform can also be used for all additional parts of the traditional drug discovery pipeline (theuppermost flow-chart) except the filing and subsequent sales and marketing components. Of note is the increasing use of proteomics in the toxicology aspects of pre-clinicaldrug development.

  • S47

    Trends in Biotechnology Vol. 20 No. 12 (Suppl.), 2002 A TRENDS Guide to Proteomics | Review

    Biological material a variable start to aconstant processTo begin the discovery process, samples must first be

    acquired or generated in-house; this is the point at which

    the variability of biological material could potentially

    confound experimental analysis and therefore demands

    careful experimental design.

    In a large-scale proteomics factory, the process design

    should account for this variability by minimizing all con-

    trollable variables once the sample has entered the factory

    process (Fig. 2, Table 1). When established cell lines are

    used, all elements of sample preparation can be con-

    trolled, from growth medium through culture conditions

    to cell fractionation. Additional data can be collected from

    the cell line that are useful in making the most informed

    and instructive comparisons. For example, part of the

    sample preparation process could include the evaluation

    of cell lines for their rates of apoptosis and DNA synthe-

    sis, among other measures of physiology (Table 1). Using

    such applied culture conditions ensures that cell line

    comparisons have minimized differences.

    For human clinical material, sample-collection specifi-

    cation can play only a limited role, because such materials

    are usually obtained as an adjunct to a necessary medical

    procedure. Biofluid collection, particularly serum, has a

    considerably simpler path to sample collection control

    because the collection procedure is non-invasive and rela-

    tively routine. However, diverse elements ranging from

    the collection vessel to the posture of the patient, and

    even the rank order of sample draw in a multiple-tube

    phlebotomy, can affect the quality and protein content of

    serum [12,13]. In addition, the elapsed time before cen-

    trifugation, the storage temperature, and serum thawing

    method have all been shown to play a role in the repro-

    ducibility of clinical chemistry profiles. Attention to these

    trivial but known issues can be crucial in the evaluation

    of proteomic data from serum.

    Human (as well as animal) tissue provides an additional

    series of challenges to the proteomic researcher, particu-

    larly those analyzing a large number of samples. In addition

    to the obvious variability in tissue harvesting procedures,

    and inherent patient variability, it is important to recognize

    that tissue is generally heterogeneous, comprising many

    different cell types. In some cases, the cell type under study

    will comprise only a minor portion of the tissue, and

    would therefore be difficult to isolate in sufficient quanti-

    ties without resorting to mechanical disaggregation and/or

    enzymatic digestion of extracellular matrix and adhesion

    molecules. Both procedures result in some degree of cell

    death and damage; methods must therefore account for

    these effects, and quality standards need to be set for cell

    preparations derived from disaggregated tissue (Table 1).

    Table 1. Summary of processes used at each stage of protein-based target discovery andQA/QC approaches used to monitor these processesa

    Preparation for analysis Sample processing Data analysis


    Separation Fractionation Quantitation Identification Data capture Datamanagementand analysis

    In vivo samples Dissociation Enrichment for proteins Quantitative 2D gel electrophoresis spot Sample data ProjectHuman tissues into desired of specific classes protein analysis excision and enzymatic Biological data managementHuman fluids cell type Reduction of protein by 2D gel digestion, followed by MS Fractionation data Correlation ofModel organisms (e.g. flow mixture complexity electrophoresis or online LCMSMS for MS data data obtained (tissues and cytometry, through separation or LCMS using simultaneous quantitation QA and/or QC with clinical fluids) LCM) Enzymatic digestion of ICAT reagents and identification or data andXenograft tissues Subcellular protein mixture, with quantitation followed by Pipeline (data experimentalIn vitro samples fractionation or without reduction identification capture) software stateCultured cells of peptide complexity Candidates forConditioned media evaluation

    Monitoring QA/QCFlow cytometry for Flow cytometry Protein quantitation Image analysis MS instrument calibration Data integrity LIMS systems markers for markers Data and protein Chromatographic Chromatographic data LIMS systems Data reviewApoptotic rate Markers for qualitative data data analysis analysis Confirmation of

    subcellular analysis resultsProliferative rate organelles Chromatographic

    data analysis

    aAbbreviations: ICAT, isotope-coded affinity tag; LCM, laser capture microscopy; LCMS, liquid chromatographymass spectrometry; LCMSMS,liquid chromatographytandem mass spectrometry; LIMS, Laboratory Information Management System; MS, mass spectrometry.

  • http://www.trends.comS48

    Trends in Biotechnology Vol. 20 No. 12 (Suppl.), 2002Review | A TRENDS Guide to Proteomics

    Laser capture microscopy (LCM) has been used to

    isolate specific cells from sections but yields only a very

    small number of cells. mRNA-expression level analysis

    has been successful but only limited proteomic studies

    have been published [1416]. Cell separation using fluo-

    rescently or magnetically tagged antibodies to obtain

    only the desired population of cells are powerful tools

    that enable a focused study, and in addition, can generate

    sufficient material for the proteomic analysis. However,

    each enrichment step involved in isolating the desired

    population begs the question: What have I lost? In the

    proteomic analysis of tumor disaggregation, sorting for

    the cancer cells alone will enable a direct comparison

    with normal cells of the same type from a non-m...