Upload
birendra-n
View
214
Download
0
Embed Size (px)
Citation preview
CHAPTER 3
Target Proteins: Bottom-upand Top-down Proteomics
MICHAEL BOYNE and RON BOSE
3.1 MASS SPECTRAL APPROACHES TO TARGETED PROTEINIDENTIFICATION
The ability to use mass as a feature to identify proteins and peptides has undergone a
revolution over the past 20 years. The development of electrospray ionization
(ESI) [1], matrix-assisted laser desorption/ionization (MALDI) [2,3], and related
methods (see Chapter 1 by Coffee-Rodriguez, Zhang,Miao, and Chen in this volume)
in the late 1980s made direct measurement of the mass of proteins and peptides
routinely possible. The development of mass spectrometers with increasing mass
accuracy, higher sensitivity, and faster duty cycles combined with the coupling of
these instruments to protein and peptide separation techniques has produced a number
of highly sophisticated approaches for the identification and characterization of
proteins. Mass spectrometry (MS) has become the dominant analytical tool for
identifying proteins, whether in their purified form or within a complex mixture,
pushing aside Edman sequencing owing to the increased sensitivity (<femtomoles of
material) and throughput (100–1000 proteins/day) of the MS techniques.
The two main mass spectral approaches to proteomics and targeted protein
identification are termed “bottom-up” and “top-down” [4,5]. Bottom-up commonly
refers to exhaustive digestion of proteins either pre-fractionated (commonly by one-
or two-dimensional polyacrylamide gel electrophorisis [6,7]) or proteolyzed in a
whole-cell extract enmasse (shotgun digestion) [6,8] followed bymass measurement
of the peptides in the resulting mixture (Figure 3.1). Top-down proteomics removes
the proteolytic digestion step described above, focusing instead on sequence coverage
(routinely 100%) and complete protein characterization to tease outmore biologically
relevant information by integrating all the protein complexity into an array of forms
Protein and Peptide Mass Spectrometry in Drug Discovery, Edited by Michael L. Gross, Guodong Chen,and Birendra N. Pramanik.� 2012 John Wiley & Sons, Inc. Published 2012 by John Wiley & Sons, Inc.
89
from a single gene (Figure 3.1) [9]. The appropriate technique to use depends on the
capabilities of the mass spectrometer available, the limitations of each approach, and
ultimately the biological question to be answered.
3.2 BOTTOM-UP PROTEOMICS
Bottom-up proteomics is predicated on the generation of 1- to 3 kDa peptides from the
protein in question. These peptides can easily be measured in many types of mass
spectrometers including ion traps, triple quadrupoles, time of flight (TOF), and
Fourier-transform instruments. Most often the regioselective endopeptidase trypsin,
which cleaves theC-terminal to lysine and arginine residues, is employed to generate
peptides of the desired size. These peptides are then identified, and these identifica-
tions are used to infer the protein precursors. Peptide mass fingerprinting, GeLC-MS,
and shotgun digestion are the three most widely used experimental approaches in
bottom-up protein identification.
Bottom Up Top Down
Cellular Proteins
2D Gel ShotgunDigestCharge
MW`
DigestExtact
Separate
MS/MS
Fractionate
MS
x
x
5-90%x
x x~100%
MS/MSx x
FIGURE 3.1 Comparison of bottom-up and top-down approaches to protein identification.
Bottom-up approaches use proteolytic digestions to create amixture of peptides, which are then
introduced to the mass spectrometer. Protein identification is inferred from two or more
confident peptide identifications. Top-down approaches differ in that proteins are fractionated
and then introduced into the MS instrument while still intact. Subsequent fragmentation
produces a series of fragmentmasses, which are then used in combinationwith the intactmass to
identify the protein.
90 TARGET PROTEINS: BOTTOM-UP AND TOP-DOWN PROTEOMICS
3.2.1 Peptide Mass Fingerprinting
Peptide mass fingerprinting (PMF) is a protein identification process where by a single
protein or simple mixture of proteins is digested into peptides whose absolute masses
are subsequentlymeasured in amass spectrometer, typically anESI-TOF,ESI-trap, ESI-
FTMS, orMALDI-TOF.Developed in the early 1990s, thismethod’s analytical identifi-
cation power comes from the generated list of peptidemasses. PMF is based on the idea
that each protein in an organismwill generate a unique set of peptides, upon proteolytic
digestion,whosemasseswill provideamolecular signature (i.e. afingerprint) identifying
that particularly protein. In practice, these peptidemasses are searched against a protein
or translated sequence database that has been digested in silico (in a computer), knowing
the specificity of the digesting enzyme. Proteins are identified based on the number of
peptide masses matching to the theoretical generated list within a mass tolerance.
In a typical experiment a single protein or protein mixture is loaded onto a
SDS-PAGEor 2D gel and resolved. An individual band or spot is excised from the gel,
and the isolated protein is reduced in the gel to break disulfide bonds, and alkylated to
prevent crosslinking of peptides. This protein is then digested for 12 to 18 hours (50:1
protein to enzyme ratio) before the peptides are extracted from the gel with
acetonitrile and evaporated to dryness. The dried peptides are subsequently re-
suspended and desalted to prepare them for MS analysis.
Thismethod’s advantages are speed and low performance requirements for theMS
instruments, but it is often unable to handlemixtures of proteins. In the tryptic peptide
mass range of 600 to 2500Da, 0.2- to 0.3-ppm relativemass error is need to determine
the amino-acid composition with 99% confidence [10]. This performance is beyond
the capabilities of all but themost expensive, highest resolving power instruments and
limits PMF to all but the simplest of protein mixtures. Database search algorithms to
support this technique are available on Mascot (http://www.matrixscience.com) and
ProFound (http://prowl.rockefeller.edu/) as well as others.
Figure 3.2A shows PMF analysis of bovine serum albumin (BSA) collected on a
MALDI-TOF instrument, whichwe commonly use as a positive control. The resulting
mass peak list from the TOF spectrum was searched with Mascot, and the only
significant identificationwas BSA (Mowse Score [11]). Sheep serum albuminwas the
next closest match but did not reach the confidence threshold. Approximately 80%
sequence coverage was obtained (Figure 3.2B) from matching peptide masses.
The growing capabilities of tandem MS instruments and computer algorithms to
automate peptide product-ion spectral (from MS/MS) interpretation have largely
replaced this technique. As this example shows, however, selective and careful use of
PMF makes it an adequate approach to the identification of proteins, particularly
when speed and ease of sample preparation are of utmost importance or when tandem
MS is not available.
3.2.2 Bottom-up Proteomics Using Tandem MS: GeLC-MS/MSand Shotgun Digests
The evolution of sensitive ion traps and fast hybrid instruments helped tandem MS
experiments replace peptide mass fingerprinting as the default technique for protein
BOTTOM-UP PROTEOMICS 91
identification. In a tandemMS experiment, a precursor scan is acquired, and all of the
peptide masses in the spectrum are recorded. Subsequently individual peptide ion
populations are isolated and fragmented within the mass spectrometer and a second
mass scan is taken to measure the m/z’s of the generated fragments. Threshold
dissociation of peptides (collisional-induced/activated dissociation—CID or CAD,
infrared multiphoton dissociation—IRMPD) yields a mixture of predominantly b-
and y-ions, whereas electron-based dissociation (electron capture dissociation – ECD
or electron transfer dissociation—ETD) yields a mixture of predominantly c- and
z-ions (Figure 3.3). These two sets of data (the precursor mass and its corresponding
(m/z)
~ 2X
899 1321 1734 2166 2588
MKWVTFISLL LLFSSAYSRG VFRRDTHKSE IAHRFKDLGE EHFKGLVLIA FSQYLQQCPF DEHVKLVNEL TEFAKTCVAD ESHAGCEKSL HTLFGDELCK VASLRETYGD MADCCEKQEP ERNECFLSHK DDSPDLPKLK PDPNTLCDEF KADEKKFWGK YLYEIARRHP YFYAPELLYY ANKYNGVFQE CCQAEDKGAC LLPKIETMRE KVLASSARQR LRCASIQKFG ERALKAWSVA RLSQKFPKAE FVEVTKLVTD LTKVHKECCH GDLLECADDR ADLAKYICDN QDTISSKLKE CCDKPLLEKS HCIAEVEKDA IPENLPPLTA DFAEDKDVCK NYQEAKDAFL GSFLYEYSRR HPEYAVSVLL RLAKEYEATL EECCAKDDPH ACYSTVFDKL KHLVDEPQNL IKQNCDQFEK LGEYGFQNAL IVRYTRKVPQ VSTPTLVEVS RSLGKVGTRC CTKPESERMP CTEDYLSLIL NRLCVLHEKT PVSEKVTKCC TESLVNRRPC FSALTPDETY VPKAFDEKLF TFHADICTLP DTEKQIKKQT ALVELLKHKP KATEEQLKTV MENFVAFVDK CCAADDKEAC FAVEGPKLVV STQTALA
(A) (B)
FIGURE 3.2 Peptide fingerprint mapping of bovine serum albumin. (A) MALDI-TOF
spectrum of digested, desalted BSA. (B) Sequence coverage map after Mascot search against
the Swissprot other mammalia database. The underlined text highlights the observed sequence
coverage (�80%). (See the color version of this figure in Color Plates section.)
CHN
C
H R
O H R
yz
C-terminuls
cb
N-terminus
Electron CaptureDissociation
Collisional- and Photo-Dissociation
Fragmentation Nomenclature
FIGURE 3.3 Fragmentation of peptides. The Roepstorff nomenclature [27] of peptide
fragment ions is shown with b- and c-ions representing fragments to the N-terminal side of
fragmented bond and y- and z. ions representing those to C-terminal side. Cleavage of the C–C
carbonyl bond would result in a- and x-ions, but these are rarely used in peptide identifications.
92 TARGET PROTEINS: BOTTOM-UP AND TOP-DOWN PROTEOMICS
fragment ion masses) can then be used to identify more confidently a peptide and
thereby a protein.
These complex product-ion (MS/MS) spectra can be interpreted in three ways:
(1) by statistical matching of observed fragment-ions to predicted fragment-ion
spectra or masses generated from a genomic database (e.g., Sequest, Mascot,
OMSSA), (2) by de novo sequencing, which relies on identifying a ladder of
sequential fragment ions and deducing the amino acid residue masses from it, or
(3) by the sequence tag approach, where a limited amount of the amino acid
sequence in the peptide is obtained by de novo sequencing and is coupled with
the mass of the peptide and the location of the “tag” within the peptide to conduct
a database search. In practice, the first approach of statistically matching observed
fragmentation data to predicted data is the dominant methodology in bottom-up
proteomics, given the extensive amount of genome sequence information that is
now publically available. The statistical considerations involved in these database
search approaches require detailed consideration and are discussed in greater
detail in Chapter 9. The advantages of this technique include its speed and
relatively high throughput as well as the standardization and robustness of the
methods involved.
3.2.3 GeLC-MS/MS
In a GeLC-MS/MS experiment the protein or proteins of interests are separated on a
SDS-PAGE gel, and individual protein bands or gel slices are digested with trypsin,
typically after a reduction and alkylation step. The resulting peptides are most often
analyzed via liquid chromatography coupled to a mass analyzer via ESI (LC-MS/
MS). The infused peptides’ masses are measured with at least unit resolution
followed by a fragmentation step where ions are activated by threshold dissociation
techniques and the subsequent fragment ions are also measured at unit resolution in
an automated data-dependent manner. Usually, ion traps or hybrid time-of-flight
instruments are used to collect tandem mass spectral data. These data are then
searched against curated peptide databases created from the target organism’s
genome sequence to retrieve a list of observed peptides and thereby a list of
identified proteins.
GeLC-MS/MSapproaches are particularly suited for targeted protein analysis. The
use of a gel matrix simultaneously traps the proteins of interest while removing
detergents and other buffer containments from the sample, eliminating a common
limitation in MS analysis. Moreover the gel acts as a stage of fractionation at
the protein level, which can greatly reduce the sample complexity and thereby aid
the MS analysis.
Many robust methods are available in core proteomic facilities that routinely
perform this experiment, but a GeLC-MS/MS experiment suffers a few limitations. A
GeLC-MS/MS approach is often less sensitive than other bottom-up strategies, often
requiring enough material (450 ng protein) to be visualized using Coomassie Blue
staining. Although analysis of smaller quantities of material that can visualized by
using silver or sypro ruby stain levels is possible, the chances of success are greatly
BOTTOM-UP PROTEOMICS 93
diminished owing to incomplete extraction and the sensitivity limits of the instru-
ments. GeLC-MS/MS is also time-consuming. Each band must be excised, reduced,
alkylated, washed, swelled with trypsin, digested, and extracted prior to MS analysis.
The published versions of this protocol take a minimum of 4 h and can range up to
24 h [12], depending on the number of bands to excise and the length of digestion
employed. The multiple steps and long time also increase the likelihood of keratin
contamination, which confounds analysis and may mask low-abundance species.
Evenwith all of these caveats,GeLC-MS/MS is often the fastestway for nonexperts to
confidently identify a targeted protein.
As an example,we show in Figure 3.4A, a recombinant construct of theHer4 kinase
domain (�40 kDa) that was subjected to a GeLC-MS/MS approach. Panel A shows a
base peak chromatogramwhere the intensity of the signal formost abundant species in
the mass spectrometer is plotted against time and is a measure of the chromatography
performance and instrument sensitivity. In total, around 40% of the construct was
sequenced using this method.
3.2.4 Shotgun Digest
For targeted protein analysis, the gel separation step may be unnecessary, particularly
for recombinant proteins or immunoprecipitated proteins. In a shotgun digest
approach, the target protein is digested along with any and all contaminating proteins
and the resulting peptide mixture is loaded onto a reverse phase column coupled to a
(A)
(B)
MEQKLISEED LASWSHPQFE KNDYDIPTTE NLYFQGTAPN QAQLRILKETELKRVKVLGS XGAFGTVYKG IWVPEGETVK IPVAIKILNE TTGPKANVEFMDEALIMASM DHPHLVRLLG VXCLSPTIQL VTQLMPHGCL LEYVHEHKDNIGSQLLLNWC VQIAKGMMYL EERRLVHRDL AAXRNVLVKS PNHVKITDFGLARLLEGDEK EYNADGGKMP IKWMALECIH YRKFTHQSDV WSYXGVTIWELMTFGGKPYD GIPTREIPDL LEKGERLPQP PICTIDVYMV MVKCWMIDADSRPKXFKELA AEFSRMARDP QRYLVIQGDD RMKLPSPNDS KFFQNLLDEEDLEDMMDAEE YLVPQXAFN
20 25 30 35 40 45 50 55 60time (min)
20 25 30 35 40 45 50 55 60time (min)
MEQKLISEED LASWSHPQFE KNDYDIPTTE NLYFQGTAPN QAQLRILKETELKRVKVLGS XGAFGTVYKG IWVPEGETVK IPVAIKILNE TTGPKANVEFMDEALIMASM DHPHLVRLLG VXCLSPTIQL VTQLMPHGCL LEYVHEHKDNIGSQLLLNWC VQIAKGMMYL EERRLVHRDL AAXRNVLVKS PNHVKITDFGLARLLEGDEK EYNADGGKMP IKWMALECIH YRKFTHQSDV WSYXGVTIWELMTFGGKPYD GIPTREIPDL LEKGERLPQP PICTIDVYMV MVKCWMIDADSRPKXFKELA AEFSRMARDP QRYLVIQGDD RMKLPSPNDS KFFQNLLDEEDLEDMMDAEE YLVPQXAFN
FIGURE 3.4 GeLC-MS/MS versus a shotgun digest. Recombinant Her4 kinase domain was
analyzed by GeLC-MS/MS (panel A) and Shotgun digest (panel B) approaches. The greater
sequence coverage and denser base peak chromatograph emphasize the enhanced sensitivity of
a shotgun approach (�60% sequence coverage vs. 40% sequence coverage). (See the color
version of this figure in Color Plates section.)
94 TARGET PROTEINS: BOTTOM-UP AND TOP-DOWN PROTEOMICS
mass spectrometer. The eluting peptides are analyzed as described above. This setup
provides the greater sensitivity and often improves sequence coverage because there
is not a gel extraction step. In Figure 3.4B, the same recombinant construct of the
Her4 kinase domain was shotgun digested and analyzed by LC-MS/MS. Panel B
shows a base peak chromatogram. In total, around 60% of the construct was
sequenced using this method, demonstrating the enhanced sensitivity of this ap-
proach (Figure 3.4).
If the target protein is isolated in a mixture, however, the increase in complexity of
the sample may offset the gains in sensitivity and may require more extensive
fractionation. MudPIT (Multidimensional Protein Identification Technology) is a
multi-phase chromatography-based approach to protein identification where two
orthogonal stationary phases (usually strong cation exchange and reverse phase) are
used to enhance binding capacity, increasing both the resolution of the separation and
the sensitivity of the analysis. A mixture of proteins is precipitated, and the pellet is
washed and then digested, typically with trypsin. After digestion, the entire mixture is
then loaded onto the first dimension column, which is then sequentially eluted with
increasing amounts of salt. After each salt “bump,” the eluted peptides are separated
by reverse-phase chromatography coupled directly to amass spectrometer. In general,
this setup has the highest sensitivity and greatest dynamic range but also demands the
greatest amount of instrument time and expertise. For targeted proteomics, aMudPIT-
based approach may not be warranted except in certain cases like the isolation and
identification of members of protein complexes.
Despite the success, growth and expansion of mass spectrometry based ap-
proaches, scientists and laboratories employing these methods should employ great
technical care and a cautious scientific approach to both the methodological and
broader conceptual issues involved here.Major protein contaminants such as keratins,
which are commonly operator-introduced during sample handling, and chemical
contaminants, including plasticizers, detergents, and trypsin autolysis products, can
obscure the spectra of the desired proteins to be characterized. Limitations of the
bottom-up approach include mixture complexity, a stochastic data-acquisition pro-
cess (discussed below) and the problems of false negatives (the peptide is detected, but
no identification can be made) and false positives (the peptide sequence identification
or protein identification assigned is incorrect).
By proteolytically digesting mixtures of proteins, the complexity of a protein
mixture is dramatically increased. The median size of a tryptic peptide is 10 amino-
acid residues, so a single protein of 40 kD can give rise to over 40 possible tryptic
peptides. When missed cleavage events and protein modifications are considered,
the peptide mixture arising from a single protein is even more complex. Translating
this to a proteomic scale, we see that an in silico of human proteome database from the
International Protein Index (May 2006), containing 58,099 protein sequences,
produces 15.5million theoretical peptides. Even after two dimensions of separation
commonly associated with bottom-up, the sheer number of peptides eluting during an
LC-MS/MS experiment overwhelms all currently available mass spectrometers [13]
and leads to a stochastic data-acquisition process [14,15] where multiple injections of
the same sample only have around 30% overlap in the peptides identified.
BOTTOM-UP PROTEOMICS 95
Other significant limitations exist in a shotgun digestion analysis, particularly in
the context of a targeted protein. The sequence coverage (i.e., the fraction of an
identified protein actually detected) is usually less than 100%, limiting the ability to
detect and characterize post-translational modifications, such as phosphorylation, or
biological variation, such as coding SNPs. Moreover shotgun digestion methods
make an already complex mixture proteins [16] into an even more complex mixture
of peptides.
3.3 TOP-DOWN APPROACHES
Traditionally top-down experiments involve high resolving power (450,000),
accurate measurement of the intact protein mass followed by its isolation, and
fragmentation within the mass spectrometer [17]. The fragmentation data are then
used to retrieve the correct protein isoform from an annotated database of predicted
protein forms [18]. In those caseswhere amass discrepancy (Dm) is observed between
the predicted and observed forms, the MS/MS data can localize the Dm to a specific
region (or even a single amino acid) of the protein. This data analysis logic–the use of
high resolving power, mass accurate MS, and MS/MS—becomes increasingly
important for multicellular eukaryotes where there are a large number of protein
modifying events (nonsynonymous coding polymorphisms, alternative splicing, post-
translationalmodifications, etc.) that can cause themass of a protein to differ from that
predicted by the gene sequence. By incorporating known and predicted modifying
events into a single organism database, one can greatly reduce the difficulty in
identifying, characterizing, and distinguishing multiple protein isoforms from one
another; these isoforms would otherwise be collapsed into a single protein identifi-
cation in a bottom-up experiment.
Top-down approaches are not nearly aswidespread as bottom-up, owing to the lack
of available software, the difficulty in obtaining robust, automated MS/MS fragmen-
tation from proteins, and the increased cost and decreased availability of high
resolving power mass spectrometers. For protein targets below 60 kDa that are
highly modified or that contain other biological variations, top-down’s potential
100% sequence coverage often yields information that is missed by bottom-up.
In a typical top-down experiment, amixture of proteins is fractionated by using one
or more dimensions of separation, and then individual fractions are desalted and
infused directly into the mass spectrometer. Although online approaches to top-down
are available [9], these methods are limited to the highest performing mass spectro-
meters and often require nonstandardmethods to ensure detection of targeted species.
An offline approach to top-down is highlighted in Figure 3.5, which demonstrates the
significant differences between top-down and bottom-up.Multiple scans are typically
summed to reach the signal-to-noise level necessary to obtain the high-quality
isotopic distributions needed to make accurate mass measurements. This also means
a single MS/MS experiment takes much longer than in the corresponding bottom-up
experiments. For targeted species, this time gap is often inconsequential, but for
proteome-wide studies, this decrease in speed is often prohibitive. The ability to sum
96 TARGET PROTEINS: BOTTOM-UP AND TOP-DOWN PROTEOMICS
scans, however, allows for the collection of robust MS/MS data, which improves
identification confidence.
Figure 3.5 shows the identification and characterization of brain acid-soluble
protein (P80723) from a HeLa cell lysate. The component corresponding to the peak
shown in red was isolated and accumulated in the mass spectrometer, and fragmen-
tation data were collected. The resulting intact mass of 22759.3 Da and its corre-
sponding fragment-ion list were searched with ProSightPC. The identified protein
was myristolylated with fragment ions, partially localizing it to the N-terminus.
Bottom-up LC-MS/MS approaches might have identified the protein but would have
missed the n-terminal modification because myristolylation is not a frequently
searched modification and a three-residue peptide is often hard to identify, given
the lack of fragmentation data and loss in the liquid chromatography.
While the common distinction between top-down and bottom-up is the digestion
step, the more fundamental difference lies in their approaches to data acquisition and
analysis. Bottom-up experiments use fast, sensitive, lower resolution mass spectro-
meters in an effort to increase proteome coverage and measure quantitative dynamics
at the expense of peptide and protein characterization and confidence in the
FIGURE 3.5 Top-down identification of brain acid soluble protein 1 (BASP1). A single
charge state of BASP1 was isolated (inset) and fragmented on a 12 Tesla LTQ-FT. The intact
mass and fragmentation data were searched against an annotated human protein database and
BASP1 was identified with high confidence (8E-20 expectation value) and characterized as
myristoylated.
TOP-DOWN APPROACHES 97
identifications. Top-down experiments use the highest performing mass spectro-
meters available, Fourier-transform instruments whose mass resolving power
(450,000) and mass accuracy (routinely <5 ppm) dramatically increase the confi-
dence in identifications. When top-down data are combined with well-annotated
databases, the output often simultaneously characterizes biological variation, greatly
clarifying the outcome for researcherswho are notmass spectrometrists. This depth of
knowledge comes at the expense of throughput and sensitivity. Fourier-transform
mass spectrometers are an order of magnitude slower and less sensitive than ion traps
and time-of-flight instruments, since theymeasure a frequency rather than an electron
multiplier response, but the gap in speed and sensitivity is narrowing.
3.4 NEXT-GENERATION APPROACHES
The evolution of higher performance hybrid instruments (Q-FTMS, LTQ-FT,
LTQ-Orbitrap) has spawned a new generation of data acquisition and data analysis
techniques [19,20] that blur the distinction between bottom-up and top-down. The
higher mass resolving power and greater mass accuracy that these instruments confer,
at both the MS and MS/MS level, allow researchers to identify more proteins, faster
and with greater confidence [16].
In PMF experiments, where the a list of measured masses is compared against the
in silico digests of a given database, the number of masses matching within a given
tolerance and the number of masses searched are the key components in obtaining
confident scores [21]. Higher mass resolving power and mass accuracy help in two
ways. More accurate mass measurement eliminates a significant portion of the
peptides with the same nominal mass but different amino-acid compositions [22].
Eliminating possible candidate peptides to search increases the confidence in an
identification, while correspondingly decreasing the incident of false positives [23].
Moreover search speed is increased as fewer candidates must be considered by
identification software. In the extreme, accurate mass measurements can be used to
discriminate between nonpeptide and peptide signals in the mass spectrometer, since
themonoisotopicmass of all peptidesmust be in a predictable range of values. By only
submitting those species that can arise from peptides, one can exponentially increase
the confidence in which a protein is identifiedwhile gaining a linear increase in search
speed and specificity [24].
The benefits to data-dependentMS/MS experiments are also significant. The use of
a high mass resolving power, accurate mass precursor scan, followed by a lower
resolution data-dependent MS/MS, has been shown to providemore identifications at
higher confidence levels than traditional low-resolution experiments with increased
characterization rates for post-translational modifications in pull-down experi-
ments [25,26]. This increase in data depth and quality can be attributed to a multitude
of factors: the exclusive selection of multiply charged peptides, improved identifica-
tion power when spectrum quality is poor, and the reduction in the number of peptides
considered by the search algorithm [25]. Moreover mass accuracy can be indepen-
dently used to validate peptide spectral masses; true positives’ mass deviations tend to
98 TARGET PROTEINS: BOTTOM-UP AND TOP-DOWN PROTEOMICS
cluster together whereas false positives’ mass deviations are evenly distributed across
mass space [25].
Even with these increases in performance, targeted protein studies are also
benefiting from more focused MS techniques that attempt to minimize the stochastic
nature of data-dependent acquisition processes and direct the mass spectrometer to
“look” for the target proteins and their peptides. For data-dependent LC/MS-MS
processes this means the use of inclusion lists and some form of quantification (see
Chapter 4) where the mass spectrometer is programmatically set to fragment peptides
of a particular m/z (or mass in some acquisition software) eluting during a particular
time window. Here the precursor mass, fragment-ion masses, and elution time are
used to confirm the identity of the peptide. The combination of targeted analysis and
quantitative information produces a data set that monitors protein changes during the
experiment, which is often the goal of biological studies. In addition to inclusion
lists, selected reactionmonitoring or multiple reactionmonitoring on triple quads and
Q-traps are being developed as focusedMS approaches. These techniques movemass
spectrometer based identification into the realm of specific assays, where both the
detection and quantification of a target protein can be assessed in one experiment, and
offer a fast and sensitive alternative when the target protein is known and well
described and multiple samples need to be tested.
This evolution of mass spectrometry techniques means both bottom-up and top-
down approaches for target proteins are being continually optimized. In this chapter
we have provided an overview of the most established practices in the field: peptide
mass fingerprinting, GeLC-MS/MS, shotgun digestion for bottom-up, and direct
infusion for top-down, while keeping an eye toward the future and the use of
increasingly accurate data and focused mass spectrometry techniques. Bottom-up,
top-down, mass accurate and focused, mass spectrometry based analysis will remain
as valuable tools for identifying and characterizing target proteins.
REFERENCES
1. Fenn, J. B., et al. (1989). Electrospray ionization for mass spectrometry of large
biomolecules. Science 246, 64–71.
2. Karas, M. Hillenkamp, F. (1988). Laser desorption ionization of proteins with molecular
masses exceeding 10,000 Daltons. Anal Chem 60, 2299–2301.
3. Tanaka, K., et al. (1988). Protein and polymer analyses up to m/z 100 000 by laser
ionization time-of-flight mass spectrometry. Rapid Commun Mass Spectrom 2, 151–153.
4. Kelleher, N. L. (2004). Top down proteomics. Anal Chem 76, 197A–203A.
5. Kelleher, N. L., et al. (1999). Top down versus bottom up protein characterization by
tandem high-resolution mass spectrometry. J Am Chem Soc 121, 806–812.
6. McCormack, A. L., et al. (1997). Direct analysis and identification of proteins in mixtures
by LC/MS/MS and database searching at the low-femtomole level. Anal Chem 69,767–776.
7. O’Farrell, P. H. (1975). High resolution two-dimensional electrophoresis of proteins.
J Biol Chem 250, 4007–4021.
REFERENCES 99
8. Washburn,M. P.,Wolters, D., Yates, J. R. (2001). Large-scale analysis of the yeast proteome
by multidimension protein identification technology. Nat Biotechnol 19, 242–247.
9. Roth, M. J., et al. (2008). “Proteotyping”: Population proteomics of human leukocytes
using Top Down mass spectrometry. Anal Chem 80, 2857–2866.
10. Zubarev, R., Mann, M. (2007). On the proper use of mass accuracy in proteomics.
Mole Cell Proteomics 6, 377–381.
11. Pappin, D. J. C., Hojrup, P., Bleasby, A. J. (1993). Rapid identification of proteins by
peptide-mass fingerprinting. Curr Biol 3, 327–332.
12. Shevchenko, A., et al. (2007). In-gel digestion for mass spectrometric characterization of
proteins and proteomes. Nat Proto 1, 2856–2860.
13. MacCoss, M. J. (2005). Computational analysis of shotgun proteomics data. Curr Opin
Chem Biol 9, 88–94.
14. Elias, J. E., et al. (2005). Comparative evaluation of mass spectrometry platforms used in
large-scale proteomics investigations. Nat Meth 2, 667–675.
15. Liu, H., Sadygov, R. G., Yates, J. R. (2004). A model for random sampling and estimation
of relative protein abundance in shotgun proteomics. Anal Chem 76, 4193–4201.
16. Liu, T., et al. (2007). AccurateMass Measurements in Proteomics.Chemical Reviews 107,3621–3653.
17. Meng, F., et al. (2002). Processing complex mixtures of intact proteins for direct analysis
by mass spectrometry. Anal Chem 74, 2923–2929.
18. Pesavento, J. J., et al. (2004). Shotgun annotation of histone modifications: A new
approach for streamlined characterization of proteins by top down mass spectrometry.
J Am Chem Soc 126, 3386–3387.
19. Gorshkov, M. V., Zubarev, R.A. (2005). On the accuracy of polypeptide masses mea-
sured in a linear ion trap. Rapid Commun Mass Spectrom 19, 3755–3758.
20. Frank, A. M., et al. (2007). De novo peptide sequencing and identification with
precision mass spectrometry. J Proteome Res 6, 114–123.
21. Perkins, D. N., et al. (1999). Probablility based protein identification by searching
sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567.
22. He, F., et al. (2004). Theoretical and experiemental prospects for protein identification
based solely on accurate mass measurement. J Proteome Res 3, 61–67.
23. Dodds, E. D., et al. (2007). Systematic characterization of high mass accuracy influence
on false discovery and probability scoring in peptide mass fingerprinting. Anal Biochem
372, 156–166.
24. Dodds,E.D., et al. (2006).Enhancedpeptidemassfingerprinting throughhighmass acccuracy:
Exclusion of non-peptide signals based on residual mass. J Proteome Res 5, 1195–1203.
25. Bakalarski, C. E., et al. (2007). The effects of mass accuracy, data acquisition speed,
and search algorithm choice on peptide identification rates in phosphoproteomics. Anal
Bioanal Chem 389, 1409–1419.
26. Wu, S. L., et al. (2005). Extended range proteomic analysis (ERPA): A new and sensitive
LC-MS platform for high sequence coverage of complex proteins with extensive post-
translational modifications-comprehensive analysis of beta-casein and epidermal growth
factor receptor (EGFR). J Proteome Res 4, 1155–1170.
27. Roepstorff, P., Fohlman, J. (1984). Proposal for a common nomenclature for sequence
ions in mass spectra of peptides. Biomed Mass Spectrom 11, 601.
100 TARGET PROTEINS: BOTTOM-UP AND TOP-DOWN PROTEOMICS