12
CHAPTER 3 Target Proteins: Bottom-up and Top-down Proteomics MICHAEL BOYNE and RON BOSE 3.1 MASS SPECTRAL APPROACHES TO TARGETED PROTEIN IDENTIFICATION The ability to use mass as a feature to identify proteins and peptides has undergone a revolution over the past 20 years. The development of electrospray ionization (ESI) [1], matrix-assisted laser desorption/ionization (MALDI) [2,3], and related methods (see Chapter 1 by Coffee-Rodriguez, Zhang, Miao, and Chen in this volume) in the late 1980s made direct measurement of the mass of proteins and peptides routinely possible. The development of mass spectrometers with increasing mass accuracy, higher sensitivity, and faster duty cycles combined with the coupling of these instruments to protein and peptide separation techniques has produced a number of highly sophisticated approaches for the identification and characterization of proteins. Mass spectrometry (MS) has become the dominant analytical tool for identifying proteins, whether in their purified form or within a complex mixture, pushing aside Edman sequencing owing to the increased sensitivity (<femtomoles of material) and throughput (100–1000 proteins/day) of the MS techniques. The two main mass spectral approaches to proteomics and targeted protein identification are termed “bottom-up” and “top-down” [4,5]. Bottom-up commonly refers to exhaustive digestion of proteins either pre-fractionated (commonly by one- or two-dimensional polyacrylamide gel electrophorisis [6,7]) or proteolyzed in a whole-cell extract en masse (shotgun digestion) [6,8] followed by mass measurement of the peptides in the resulting mixture (Figure 3.1). Top-down proteomics removes the proteolytic digestion step described above, focusing instead on sequence coverage (routinely 100%) and complete protein characterization to tease out more biologically relevant information by integrating all the protein complexity into an array of forms Protein and Peptide Mass Spectrometry in Drug Discovery, Edited by Michael L. Gross, Guodong Chen, and Birendra N. Pramanik. Ó 2012 John Wiley & Sons, Inc. Published 2012 by John Wiley & Sons, Inc. 89

Protein and Peptide Mass Spectrometry in Drug Discovery (Gross/Protein Mass Spec Drug Discovery) || Target Proteins: Bottom-Up and Top-Down Proteomics

Embed Size (px)

Citation preview

Page 1: Protein and Peptide Mass Spectrometry in Drug Discovery (Gross/Protein Mass Spec Drug Discovery) || Target Proteins: Bottom-Up and Top-Down Proteomics

CHAPTER 3

Target Proteins: Bottom-upand Top-down Proteomics

MICHAEL BOYNE and RON BOSE

3.1 MASS SPECTRAL APPROACHES TO TARGETED PROTEINIDENTIFICATION

The ability to use mass as a feature to identify proteins and peptides has undergone a

revolution over the past 20 years. The development of electrospray ionization

(ESI) [1], matrix-assisted laser desorption/ionization (MALDI) [2,3], and related

methods (see Chapter 1 by Coffee-Rodriguez, Zhang,Miao, and Chen in this volume)

in the late 1980s made direct measurement of the mass of proteins and peptides

routinely possible. The development of mass spectrometers with increasing mass

accuracy, higher sensitivity, and faster duty cycles combined with the coupling of

these instruments to protein and peptide separation techniques has produced a number

of highly sophisticated approaches for the identification and characterization of

proteins. Mass spectrometry (MS) has become the dominant analytical tool for

identifying proteins, whether in their purified form or within a complex mixture,

pushing aside Edman sequencing owing to the increased sensitivity (<femtomoles of

material) and throughput (100–1000 proteins/day) of the MS techniques.

The two main mass spectral approaches to proteomics and targeted protein

identification are termed “bottom-up” and “top-down” [4,5]. Bottom-up commonly

refers to exhaustive digestion of proteins either pre-fractionated (commonly by one-

or two-dimensional polyacrylamide gel electrophorisis [6,7]) or proteolyzed in a

whole-cell extract enmasse (shotgun digestion) [6,8] followed bymass measurement

of the peptides in the resulting mixture (Figure 3.1). Top-down proteomics removes

the proteolytic digestion step described above, focusing instead on sequence coverage

(routinely 100%) and complete protein characterization to tease outmore biologically

relevant information by integrating all the protein complexity into an array of forms

Protein and Peptide Mass Spectrometry in Drug Discovery, Edited by Michael L. Gross, Guodong Chen,and Birendra N. Pramanik.� 2012 John Wiley & Sons, Inc. Published 2012 by John Wiley & Sons, Inc.

89

Page 2: Protein and Peptide Mass Spectrometry in Drug Discovery (Gross/Protein Mass Spec Drug Discovery) || Target Proteins: Bottom-Up and Top-Down Proteomics

from a single gene (Figure 3.1) [9]. The appropriate technique to use depends on the

capabilities of the mass spectrometer available, the limitations of each approach, and

ultimately the biological question to be answered.

3.2 BOTTOM-UP PROTEOMICS

Bottom-up proteomics is predicated on the generation of 1- to 3 kDa peptides from the

protein in question. These peptides can easily be measured in many types of mass

spectrometers including ion traps, triple quadrupoles, time of flight (TOF), and

Fourier-transform instruments. Most often the regioselective endopeptidase trypsin,

which cleaves theC-terminal to lysine and arginine residues, is employed to generate

peptides of the desired size. These peptides are then identified, and these identifica-

tions are used to infer the protein precursors. Peptide mass fingerprinting, GeLC-MS,

and shotgun digestion are the three most widely used experimental approaches in

bottom-up protein identification.

Bottom Up Top Down

Cellular Proteins

2D Gel ShotgunDigestCharge

MW`

DigestExtact

Separate

MS/MS

Fractionate

MS

x

x

5-90%x

x x~100%

MS/MSx x

FIGURE 3.1 Comparison of bottom-up and top-down approaches to protein identification.

Bottom-up approaches use proteolytic digestions to create amixture of peptides, which are then

introduced to the mass spectrometer. Protein identification is inferred from two or more

confident peptide identifications. Top-down approaches differ in that proteins are fractionated

and then introduced into the MS instrument while still intact. Subsequent fragmentation

produces a series of fragmentmasses, which are then used in combinationwith the intactmass to

identify the protein.

90 TARGET PROTEINS: BOTTOM-UP AND TOP-DOWN PROTEOMICS

Page 3: Protein and Peptide Mass Spectrometry in Drug Discovery (Gross/Protein Mass Spec Drug Discovery) || Target Proteins: Bottom-Up and Top-Down Proteomics

3.2.1 Peptide Mass Fingerprinting

Peptide mass fingerprinting (PMF) is a protein identification process where by a single

protein or simple mixture of proteins is digested into peptides whose absolute masses

are subsequentlymeasured in amass spectrometer, typically anESI-TOF,ESI-trap, ESI-

FTMS, orMALDI-TOF.Developed in the early 1990s, thismethod’s analytical identifi-

cation power comes from the generated list of peptidemasses. PMF is based on the idea

that each protein in an organismwill generate a unique set of peptides, upon proteolytic

digestion,whosemasseswill provideamolecular signature (i.e. afingerprint) identifying

that particularly protein. In practice, these peptidemasses are searched against a protein

or translated sequence database that has been digested in silico (in a computer), knowing

the specificity of the digesting enzyme. Proteins are identified based on the number of

peptide masses matching to the theoretical generated list within a mass tolerance.

In a typical experiment a single protein or protein mixture is loaded onto a

SDS-PAGEor 2D gel and resolved. An individual band or spot is excised from the gel,

and the isolated protein is reduced in the gel to break disulfide bonds, and alkylated to

prevent crosslinking of peptides. This protein is then digested for 12 to 18 hours (50:1

protein to enzyme ratio) before the peptides are extracted from the gel with

acetonitrile and evaporated to dryness. The dried peptides are subsequently re-

suspended and desalted to prepare them for MS analysis.

Thismethod’s advantages are speed and low performance requirements for theMS

instruments, but it is often unable to handlemixtures of proteins. In the tryptic peptide

mass range of 600 to 2500Da, 0.2- to 0.3-ppm relativemass error is need to determine

the amino-acid composition with 99% confidence [10]. This performance is beyond

the capabilities of all but themost expensive, highest resolving power instruments and

limits PMF to all but the simplest of protein mixtures. Database search algorithms to

support this technique are available on Mascot (http://www.matrixscience.com) and

ProFound (http://prowl.rockefeller.edu/) as well as others.

Figure 3.2A shows PMF analysis of bovine serum albumin (BSA) collected on a

MALDI-TOF instrument, whichwe commonly use as a positive control. The resulting

mass peak list from the TOF spectrum was searched with Mascot, and the only

significant identificationwas BSA (Mowse Score [11]). Sheep serum albuminwas the

next closest match but did not reach the confidence threshold. Approximately 80%

sequence coverage was obtained (Figure 3.2B) from matching peptide masses.

The growing capabilities of tandem MS instruments and computer algorithms to

automate peptide product-ion spectral (from MS/MS) interpretation have largely

replaced this technique. As this example shows, however, selective and careful use of

PMF makes it an adequate approach to the identification of proteins, particularly

when speed and ease of sample preparation are of utmost importance or when tandem

MS is not available.

3.2.2 Bottom-up Proteomics Using Tandem MS: GeLC-MS/MSand Shotgun Digests

The evolution of sensitive ion traps and fast hybrid instruments helped tandem MS

experiments replace peptide mass fingerprinting as the default technique for protein

BOTTOM-UP PROTEOMICS 91

Page 4: Protein and Peptide Mass Spectrometry in Drug Discovery (Gross/Protein Mass Spec Drug Discovery) || Target Proteins: Bottom-Up and Top-Down Proteomics

identification. In a tandemMS experiment, a precursor scan is acquired, and all of the

peptide masses in the spectrum are recorded. Subsequently individual peptide ion

populations are isolated and fragmented within the mass spectrometer and a second

mass scan is taken to measure the m/z’s of the generated fragments. Threshold

dissociation of peptides (collisional-induced/activated dissociation—CID or CAD,

infrared multiphoton dissociation—IRMPD) yields a mixture of predominantly b-

and y-ions, whereas electron-based dissociation (electron capture dissociation – ECD

or electron transfer dissociation—ETD) yields a mixture of predominantly c- and

z-ions (Figure 3.3). These two sets of data (the precursor mass and its corresponding

(m/z)

~ 2X

899 1321 1734 2166 2588

MKWVTFISLL LLFSSAYSRG VFRRDTHKSE IAHRFKDLGE EHFKGLVLIA FSQYLQQCPF DEHVKLVNEL TEFAKTCVAD ESHAGCEKSL HTLFGDELCK VASLRETYGD MADCCEKQEP ERNECFLSHK DDSPDLPKLK PDPNTLCDEF KADEKKFWGK YLYEIARRHP YFYAPELLYY ANKYNGVFQE CCQAEDKGAC LLPKIETMRE KVLASSARQR LRCASIQKFG ERALKAWSVA RLSQKFPKAE FVEVTKLVTD LTKVHKECCH GDLLECADDR ADLAKYICDN QDTISSKLKE CCDKPLLEKS HCIAEVEKDA IPENLPPLTA DFAEDKDVCK NYQEAKDAFL GSFLYEYSRR HPEYAVSVLL RLAKEYEATL EECCAKDDPH ACYSTVFDKL KHLVDEPQNL IKQNCDQFEK LGEYGFQNAL IVRYTRKVPQ VSTPTLVEVS RSLGKVGTRC CTKPESERMP CTEDYLSLIL NRLCVLHEKT PVSEKVTKCC TESLVNRRPC FSALTPDETY VPKAFDEKLF TFHADICTLP DTEKQIKKQT ALVELLKHKP KATEEQLKTV MENFVAFVDK CCAADDKEAC FAVEGPKLVV STQTALA

(A) (B)

FIGURE 3.2 Peptide fingerprint mapping of bovine serum albumin. (A) MALDI-TOF

spectrum of digested, desalted BSA. (B) Sequence coverage map after Mascot search against

the Swissprot other mammalia database. The underlined text highlights the observed sequence

coverage (�80%). (See the color version of this figure in Color Plates section.)

CHN

C

H R

O H R

yz

C-terminuls

cb

N-terminus

Electron CaptureDissociation

Collisional- and Photo-Dissociation

Fragmentation Nomenclature

FIGURE 3.3 Fragmentation of peptides. The Roepstorff nomenclature [27] of peptide

fragment ions is shown with b- and c-ions representing fragments to the N-terminal side of

fragmented bond and y- and z. ions representing those to C-terminal side. Cleavage of the C–C

carbonyl bond would result in a- and x-ions, but these are rarely used in peptide identifications.

92 TARGET PROTEINS: BOTTOM-UP AND TOP-DOWN PROTEOMICS

Page 5: Protein and Peptide Mass Spectrometry in Drug Discovery (Gross/Protein Mass Spec Drug Discovery) || Target Proteins: Bottom-Up and Top-Down Proteomics

fragment ion masses) can then be used to identify more confidently a peptide and

thereby a protein.

These complex product-ion (MS/MS) spectra can be interpreted in three ways:

(1) by statistical matching of observed fragment-ions to predicted fragment-ion

spectra or masses generated from a genomic database (e.g., Sequest, Mascot,

OMSSA), (2) by de novo sequencing, which relies on identifying a ladder of

sequential fragment ions and deducing the amino acid residue masses from it, or

(3) by the sequence tag approach, where a limited amount of the amino acid

sequence in the peptide is obtained by de novo sequencing and is coupled with

the mass of the peptide and the location of the “tag” within the peptide to conduct

a database search. In practice, the first approach of statistically matching observed

fragmentation data to predicted data is the dominant methodology in bottom-up

proteomics, given the extensive amount of genome sequence information that is

now publically available. The statistical considerations involved in these database

search approaches require detailed consideration and are discussed in greater

detail in Chapter 9. The advantages of this technique include its speed and

relatively high throughput as well as the standardization and robustness of the

methods involved.

3.2.3 GeLC-MS/MS

In a GeLC-MS/MS experiment the protein or proteins of interests are separated on a

SDS-PAGE gel, and individual protein bands or gel slices are digested with trypsin,

typically after a reduction and alkylation step. The resulting peptides are most often

analyzed via liquid chromatography coupled to a mass analyzer via ESI (LC-MS/

MS). The infused peptides’ masses are measured with at least unit resolution

followed by a fragmentation step where ions are activated by threshold dissociation

techniques and the subsequent fragment ions are also measured at unit resolution in

an automated data-dependent manner. Usually, ion traps or hybrid time-of-flight

instruments are used to collect tandem mass spectral data. These data are then

searched against curated peptide databases created from the target organism’s

genome sequence to retrieve a list of observed peptides and thereby a list of

identified proteins.

GeLC-MS/MSapproaches are particularly suited for targeted protein analysis. The

use of a gel matrix simultaneously traps the proteins of interest while removing

detergents and other buffer containments from the sample, eliminating a common

limitation in MS analysis. Moreover the gel acts as a stage of fractionation at

the protein level, which can greatly reduce the sample complexity and thereby aid

the MS analysis.

Many robust methods are available in core proteomic facilities that routinely

perform this experiment, but a GeLC-MS/MS experiment suffers a few limitations. A

GeLC-MS/MS approach is often less sensitive than other bottom-up strategies, often

requiring enough material (450 ng protein) to be visualized using Coomassie Blue

staining. Although analysis of smaller quantities of material that can visualized by

using silver or sypro ruby stain levels is possible, the chances of success are greatly

BOTTOM-UP PROTEOMICS 93

Page 6: Protein and Peptide Mass Spectrometry in Drug Discovery (Gross/Protein Mass Spec Drug Discovery) || Target Proteins: Bottom-Up and Top-Down Proteomics

diminished owing to incomplete extraction and the sensitivity limits of the instru-

ments. GeLC-MS/MS is also time-consuming. Each band must be excised, reduced,

alkylated, washed, swelled with trypsin, digested, and extracted prior to MS analysis.

The published versions of this protocol take a minimum of 4 h and can range up to

24 h [12], depending on the number of bands to excise and the length of digestion

employed. The multiple steps and long time also increase the likelihood of keratin

contamination, which confounds analysis and may mask low-abundance species.

Evenwith all of these caveats,GeLC-MS/MS is often the fastestway for nonexperts to

confidently identify a targeted protein.

As an example,we show in Figure 3.4A, a recombinant construct of theHer4 kinase

domain (�40 kDa) that was subjected to a GeLC-MS/MS approach. Panel A shows a

base peak chromatogramwhere the intensity of the signal formost abundant species in

the mass spectrometer is plotted against time and is a measure of the chromatography

performance and instrument sensitivity. In total, around 40% of the construct was

sequenced using this method.

3.2.4 Shotgun Digest

For targeted protein analysis, the gel separation step may be unnecessary, particularly

for recombinant proteins or immunoprecipitated proteins. In a shotgun digest

approach, the target protein is digested along with any and all contaminating proteins

and the resulting peptide mixture is loaded onto a reverse phase column coupled to a

(A)

(B)

MEQKLISEED LASWSHPQFE KNDYDIPTTE NLYFQGTAPN QAQLRILKETELKRVKVLGS XGAFGTVYKG IWVPEGETVK IPVAIKILNE TTGPKANVEFMDEALIMASM DHPHLVRLLG VXCLSPTIQL VTQLMPHGCL LEYVHEHKDNIGSQLLLNWC VQIAKGMMYL EERRLVHRDL AAXRNVLVKS PNHVKITDFGLARLLEGDEK EYNADGGKMP IKWMALECIH YRKFTHQSDV WSYXGVTIWELMTFGGKPYD GIPTREIPDL LEKGERLPQP PICTIDVYMV MVKCWMIDADSRPKXFKELA AEFSRMARDP QRYLVIQGDD RMKLPSPNDS KFFQNLLDEEDLEDMMDAEE YLVPQXAFN

20 25 30 35 40 45 50 55 60time (min)

20 25 30 35 40 45 50 55 60time (min)

MEQKLISEED LASWSHPQFE KNDYDIPTTE NLYFQGTAPN QAQLRILKETELKRVKVLGS XGAFGTVYKG IWVPEGETVK IPVAIKILNE TTGPKANVEFMDEALIMASM DHPHLVRLLG VXCLSPTIQL VTQLMPHGCL LEYVHEHKDNIGSQLLLNWC VQIAKGMMYL EERRLVHRDL AAXRNVLVKS PNHVKITDFGLARLLEGDEK EYNADGGKMP IKWMALECIH YRKFTHQSDV WSYXGVTIWELMTFGGKPYD GIPTREIPDL LEKGERLPQP PICTIDVYMV MVKCWMIDADSRPKXFKELA AEFSRMARDP QRYLVIQGDD RMKLPSPNDS KFFQNLLDEEDLEDMMDAEE YLVPQXAFN

FIGURE 3.4 GeLC-MS/MS versus a shotgun digest. Recombinant Her4 kinase domain was

analyzed by GeLC-MS/MS (panel A) and Shotgun digest (panel B) approaches. The greater

sequence coverage and denser base peak chromatograph emphasize the enhanced sensitivity of

a shotgun approach (�60% sequence coverage vs. 40% sequence coverage). (See the color

version of this figure in Color Plates section.)

94 TARGET PROTEINS: BOTTOM-UP AND TOP-DOWN PROTEOMICS

Page 7: Protein and Peptide Mass Spectrometry in Drug Discovery (Gross/Protein Mass Spec Drug Discovery) || Target Proteins: Bottom-Up and Top-Down Proteomics

mass spectrometer. The eluting peptides are analyzed as described above. This setup

provides the greater sensitivity and often improves sequence coverage because there

is not a gel extraction step. In Figure 3.4B, the same recombinant construct of the

Her4 kinase domain was shotgun digested and analyzed by LC-MS/MS. Panel B

shows a base peak chromatogram. In total, around 60% of the construct was

sequenced using this method, demonstrating the enhanced sensitivity of this ap-

proach (Figure 3.4).

If the target protein is isolated in a mixture, however, the increase in complexity of

the sample may offset the gains in sensitivity and may require more extensive

fractionation. MudPIT (Multidimensional Protein Identification Technology) is a

multi-phase chromatography-based approach to protein identification where two

orthogonal stationary phases (usually strong cation exchange and reverse phase) are

used to enhance binding capacity, increasing both the resolution of the separation and

the sensitivity of the analysis. A mixture of proteins is precipitated, and the pellet is

washed and then digested, typically with trypsin. After digestion, the entire mixture is

then loaded onto the first dimension column, which is then sequentially eluted with

increasing amounts of salt. After each salt “bump,” the eluted peptides are separated

by reverse-phase chromatography coupled directly to amass spectrometer. In general,

this setup has the highest sensitivity and greatest dynamic range but also demands the

greatest amount of instrument time and expertise. For targeted proteomics, aMudPIT-

based approach may not be warranted except in certain cases like the isolation and

identification of members of protein complexes.

Despite the success, growth and expansion of mass spectrometry based ap-

proaches, scientists and laboratories employing these methods should employ great

technical care and a cautious scientific approach to both the methodological and

broader conceptual issues involved here.Major protein contaminants such as keratins,

which are commonly operator-introduced during sample handling, and chemical

contaminants, including plasticizers, detergents, and trypsin autolysis products, can

obscure the spectra of the desired proteins to be characterized. Limitations of the

bottom-up approach include mixture complexity, a stochastic data-acquisition pro-

cess (discussed below) and the problems of false negatives (the peptide is detected, but

no identification can be made) and false positives (the peptide sequence identification

or protein identification assigned is incorrect).

By proteolytically digesting mixtures of proteins, the complexity of a protein

mixture is dramatically increased. The median size of a tryptic peptide is 10 amino-

acid residues, so a single protein of 40 kD can give rise to over 40 possible tryptic

peptides. When missed cleavage events and protein modifications are considered,

the peptide mixture arising from a single protein is even more complex. Translating

this to a proteomic scale, we see that an in silico of human proteome database from the

International Protein Index (May 2006), containing 58,099 protein sequences,

produces 15.5million theoretical peptides. Even after two dimensions of separation

commonly associated with bottom-up, the sheer number of peptides eluting during an

LC-MS/MS experiment overwhelms all currently available mass spectrometers [13]

and leads to a stochastic data-acquisition process [14,15] where multiple injections of

the same sample only have around 30% overlap in the peptides identified.

BOTTOM-UP PROTEOMICS 95

Page 8: Protein and Peptide Mass Spectrometry in Drug Discovery (Gross/Protein Mass Spec Drug Discovery) || Target Proteins: Bottom-Up and Top-Down Proteomics

Other significant limitations exist in a shotgun digestion analysis, particularly in

the context of a targeted protein. The sequence coverage (i.e., the fraction of an

identified protein actually detected) is usually less than 100%, limiting the ability to

detect and characterize post-translational modifications, such as phosphorylation, or

biological variation, such as coding SNPs. Moreover shotgun digestion methods

make an already complex mixture proteins [16] into an even more complex mixture

of peptides.

3.3 TOP-DOWN APPROACHES

Traditionally top-down experiments involve high resolving power (450,000),

accurate measurement of the intact protein mass followed by its isolation, and

fragmentation within the mass spectrometer [17]. The fragmentation data are then

used to retrieve the correct protein isoform from an annotated database of predicted

protein forms [18]. In those caseswhere amass discrepancy (Dm) is observed between

the predicted and observed forms, the MS/MS data can localize the Dm to a specific

region (or even a single amino acid) of the protein. This data analysis logic–the use of

high resolving power, mass accurate MS, and MS/MS—becomes increasingly

important for multicellular eukaryotes where there are a large number of protein

modifying events (nonsynonymous coding polymorphisms, alternative splicing, post-

translationalmodifications, etc.) that can cause themass of a protein to differ from that

predicted by the gene sequence. By incorporating known and predicted modifying

events into a single organism database, one can greatly reduce the difficulty in

identifying, characterizing, and distinguishing multiple protein isoforms from one

another; these isoforms would otherwise be collapsed into a single protein identifi-

cation in a bottom-up experiment.

Top-down approaches are not nearly aswidespread as bottom-up, owing to the lack

of available software, the difficulty in obtaining robust, automated MS/MS fragmen-

tation from proteins, and the increased cost and decreased availability of high

resolving power mass spectrometers. For protein targets below 60 kDa that are

highly modified or that contain other biological variations, top-down’s potential

100% sequence coverage often yields information that is missed by bottom-up.

In a typical top-down experiment, amixture of proteins is fractionated by using one

or more dimensions of separation, and then individual fractions are desalted and

infused directly into the mass spectrometer. Although online approaches to top-down

are available [9], these methods are limited to the highest performing mass spectro-

meters and often require nonstandardmethods to ensure detection of targeted species.

An offline approach to top-down is highlighted in Figure 3.5, which demonstrates the

significant differences between top-down and bottom-up.Multiple scans are typically

summed to reach the signal-to-noise level necessary to obtain the high-quality

isotopic distributions needed to make accurate mass measurements. This also means

a single MS/MS experiment takes much longer than in the corresponding bottom-up

experiments. For targeted species, this time gap is often inconsequential, but for

proteome-wide studies, this decrease in speed is often prohibitive. The ability to sum

96 TARGET PROTEINS: BOTTOM-UP AND TOP-DOWN PROTEOMICS

Page 9: Protein and Peptide Mass Spectrometry in Drug Discovery (Gross/Protein Mass Spec Drug Discovery) || Target Proteins: Bottom-Up and Top-Down Proteomics

scans, however, allows for the collection of robust MS/MS data, which improves

identification confidence.

Figure 3.5 shows the identification and characterization of brain acid-soluble

protein (P80723) from a HeLa cell lysate. The component corresponding to the peak

shown in red was isolated and accumulated in the mass spectrometer, and fragmen-

tation data were collected. The resulting intact mass of 22759.3 Da and its corre-

sponding fragment-ion list were searched with ProSightPC. The identified protein

was myristolylated with fragment ions, partially localizing it to the N-terminus.

Bottom-up LC-MS/MS approaches might have identified the protein but would have

missed the n-terminal modification because myristolylation is not a frequently

searched modification and a three-residue peptide is often hard to identify, given

the lack of fragmentation data and loss in the liquid chromatography.

While the common distinction between top-down and bottom-up is the digestion

step, the more fundamental difference lies in their approaches to data acquisition and

analysis. Bottom-up experiments use fast, sensitive, lower resolution mass spectro-

meters in an effort to increase proteome coverage and measure quantitative dynamics

at the expense of peptide and protein characterization and confidence in the

FIGURE 3.5 Top-down identification of brain acid soluble protein 1 (BASP1). A single

charge state of BASP1 was isolated (inset) and fragmented on a 12 Tesla LTQ-FT. The intact

mass and fragmentation data were searched against an annotated human protein database and

BASP1 was identified with high confidence (8E-20 expectation value) and characterized as

myristoylated.

TOP-DOWN APPROACHES 97

Page 10: Protein and Peptide Mass Spectrometry in Drug Discovery (Gross/Protein Mass Spec Drug Discovery) || Target Proteins: Bottom-Up and Top-Down Proteomics

identifications. Top-down experiments use the highest performing mass spectro-

meters available, Fourier-transform instruments whose mass resolving power

(450,000) and mass accuracy (routinely <5 ppm) dramatically increase the confi-

dence in identifications. When top-down data are combined with well-annotated

databases, the output often simultaneously characterizes biological variation, greatly

clarifying the outcome for researcherswho are notmass spectrometrists. This depth of

knowledge comes at the expense of throughput and sensitivity. Fourier-transform

mass spectrometers are an order of magnitude slower and less sensitive than ion traps

and time-of-flight instruments, since theymeasure a frequency rather than an electron

multiplier response, but the gap in speed and sensitivity is narrowing.

3.4 NEXT-GENERATION APPROACHES

The evolution of higher performance hybrid instruments (Q-FTMS, LTQ-FT,

LTQ-Orbitrap) has spawned a new generation of data acquisition and data analysis

techniques [19,20] that blur the distinction between bottom-up and top-down. The

higher mass resolving power and greater mass accuracy that these instruments confer,

at both the MS and MS/MS level, allow researchers to identify more proteins, faster

and with greater confidence [16].

In PMF experiments, where the a list of measured masses is compared against the

in silico digests of a given database, the number of masses matching within a given

tolerance and the number of masses searched are the key components in obtaining

confident scores [21]. Higher mass resolving power and mass accuracy help in two

ways. More accurate mass measurement eliminates a significant portion of the

peptides with the same nominal mass but different amino-acid compositions [22].

Eliminating possible candidate peptides to search increases the confidence in an

identification, while correspondingly decreasing the incident of false positives [23].

Moreover search speed is increased as fewer candidates must be considered by

identification software. In the extreme, accurate mass measurements can be used to

discriminate between nonpeptide and peptide signals in the mass spectrometer, since

themonoisotopicmass of all peptidesmust be in a predictable range of values. By only

submitting those species that can arise from peptides, one can exponentially increase

the confidence in which a protein is identifiedwhile gaining a linear increase in search

speed and specificity [24].

The benefits to data-dependentMS/MS experiments are also significant. The use of

a high mass resolving power, accurate mass precursor scan, followed by a lower

resolution data-dependent MS/MS, has been shown to providemore identifications at

higher confidence levels than traditional low-resolution experiments with increased

characterization rates for post-translational modifications in pull-down experi-

ments [25,26]. This increase in data depth and quality can be attributed to a multitude

of factors: the exclusive selection of multiply charged peptides, improved identifica-

tion power when spectrum quality is poor, and the reduction in the number of peptides

considered by the search algorithm [25]. Moreover mass accuracy can be indepen-

dently used to validate peptide spectral masses; true positives’ mass deviations tend to

98 TARGET PROTEINS: BOTTOM-UP AND TOP-DOWN PROTEOMICS

Page 11: Protein and Peptide Mass Spectrometry in Drug Discovery (Gross/Protein Mass Spec Drug Discovery) || Target Proteins: Bottom-Up and Top-Down Proteomics

cluster together whereas false positives’ mass deviations are evenly distributed across

mass space [25].

Even with these increases in performance, targeted protein studies are also

benefiting from more focused MS techniques that attempt to minimize the stochastic

nature of data-dependent acquisition processes and direct the mass spectrometer to

“look” for the target proteins and their peptides. For data-dependent LC/MS-MS

processes this means the use of inclusion lists and some form of quantification (see

Chapter 4) where the mass spectrometer is programmatically set to fragment peptides

of a particular m/z (or mass in some acquisition software) eluting during a particular

time window. Here the precursor mass, fragment-ion masses, and elution time are

used to confirm the identity of the peptide. The combination of targeted analysis and

quantitative information produces a data set that monitors protein changes during the

experiment, which is often the goal of biological studies. In addition to inclusion

lists, selected reactionmonitoring or multiple reactionmonitoring on triple quads and

Q-traps are being developed as focusedMS approaches. These techniques movemass

spectrometer based identification into the realm of specific assays, where both the

detection and quantification of a target protein can be assessed in one experiment, and

offer a fast and sensitive alternative when the target protein is known and well

described and multiple samples need to be tested.

This evolution of mass spectrometry techniques means both bottom-up and top-

down approaches for target proteins are being continually optimized. In this chapter

we have provided an overview of the most established practices in the field: peptide

mass fingerprinting, GeLC-MS/MS, shotgun digestion for bottom-up, and direct

infusion for top-down, while keeping an eye toward the future and the use of

increasingly accurate data and focused mass spectrometry techniques. Bottom-up,

top-down, mass accurate and focused, mass spectrometry based analysis will remain

as valuable tools for identifying and characterizing target proteins.

REFERENCES

1. Fenn, J. B., et al. (1989). Electrospray ionization for mass spectrometry of large

biomolecules. Science 246, 64–71.

2. Karas, M. Hillenkamp, F. (1988). Laser desorption ionization of proteins with molecular

masses exceeding 10,000 Daltons. Anal Chem 60, 2299–2301.

3. Tanaka, K., et al. (1988). Protein and polymer analyses up to m/z 100 000 by laser

ionization time-of-flight mass spectrometry. Rapid Commun Mass Spectrom 2, 151–153.

4. Kelleher, N. L. (2004). Top down proteomics. Anal Chem 76, 197A–203A.

5. Kelleher, N. L., et al. (1999). Top down versus bottom up protein characterization by

tandem high-resolution mass spectrometry. J Am Chem Soc 121, 806–812.

6. McCormack, A. L., et al. (1997). Direct analysis and identification of proteins in mixtures

by LC/MS/MS and database searching at the low-femtomole level. Anal Chem 69,767–776.

7. O’Farrell, P. H. (1975). High resolution two-dimensional electrophoresis of proteins.

J Biol Chem 250, 4007–4021.

REFERENCES 99

Page 12: Protein and Peptide Mass Spectrometry in Drug Discovery (Gross/Protein Mass Spec Drug Discovery) || Target Proteins: Bottom-Up and Top-Down Proteomics

8. Washburn,M. P.,Wolters, D., Yates, J. R. (2001). Large-scale analysis of the yeast proteome

by multidimension protein identification technology. Nat Biotechnol 19, 242–247.

9. Roth, M. J., et al. (2008). “Proteotyping”: Population proteomics of human leukocytes

using Top Down mass spectrometry. Anal Chem 80, 2857–2866.

10. Zubarev, R., Mann, M. (2007). On the proper use of mass accuracy in proteomics.

Mole Cell Proteomics 6, 377–381.

11. Pappin, D. J. C., Hojrup, P., Bleasby, A. J. (1993). Rapid identification of proteins by

peptide-mass fingerprinting. Curr Biol 3, 327–332.

12. Shevchenko, A., et al. (2007). In-gel digestion for mass spectrometric characterization of

proteins and proteomes. Nat Proto 1, 2856–2860.

13. MacCoss, M. J. (2005). Computational analysis of shotgun proteomics data. Curr Opin

Chem Biol 9, 88–94.

14. Elias, J. E., et al. (2005). Comparative evaluation of mass spectrometry platforms used in

large-scale proteomics investigations. Nat Meth 2, 667–675.

15. Liu, H., Sadygov, R. G., Yates, J. R. (2004). A model for random sampling and estimation

of relative protein abundance in shotgun proteomics. Anal Chem 76, 4193–4201.

16. Liu, T., et al. (2007). AccurateMass Measurements in Proteomics.Chemical Reviews 107,3621–3653.

17. Meng, F., et al. (2002). Processing complex mixtures of intact proteins for direct analysis

by mass spectrometry. Anal Chem 74, 2923–2929.

18. Pesavento, J. J., et al. (2004). Shotgun annotation of histone modifications: A new

approach for streamlined characterization of proteins by top down mass spectrometry.

J Am Chem Soc 126, 3386–3387.

19. Gorshkov, M. V., Zubarev, R.A. (2005). On the accuracy of polypeptide masses mea-

sured in a linear ion trap. Rapid Commun Mass Spectrom 19, 3755–3758.

20. Frank, A. M., et al. (2007). De novo peptide sequencing and identification with

precision mass spectrometry. J Proteome Res 6, 114–123.

21. Perkins, D. N., et al. (1999). Probablility based protein identification by searching

sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567.

22. He, F., et al. (2004). Theoretical and experiemental prospects for protein identification

based solely on accurate mass measurement. J Proteome Res 3, 61–67.

23. Dodds, E. D., et al. (2007). Systematic characterization of high mass accuracy influence

on false discovery and probability scoring in peptide mass fingerprinting. Anal Biochem

372, 156–166.

24. Dodds,E.D., et al. (2006).Enhancedpeptidemassfingerprinting throughhighmass acccuracy:

Exclusion of non-peptide signals based on residual mass. J Proteome Res 5, 1195–1203.

25. Bakalarski, C. E., et al. (2007). The effects of mass accuracy, data acquisition speed,

and search algorithm choice on peptide identification rates in phosphoproteomics. Anal

Bioanal Chem 389, 1409–1419.

26. Wu, S. L., et al. (2005). Extended range proteomic analysis (ERPA): A new and sensitive

LC-MS platform for high sequence coverage of complex proteins with extensive post-

translational modifications-comprehensive analysis of beta-casein and epidermal growth

factor receptor (EGFR). J Proteome Res 4, 1155–1170.

27. Roepstorff, P., Fohlman, J. (1984). Proposal for a common nomenclature for sequence

ions in mass spectra of peptides. Biomed Mass Spectrom 11, 601.

100 TARGET PROTEINS: BOTTOM-UP AND TOP-DOWN PROTEOMICS