12
ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics Benjamin Mann 1 , Milan Madera 1,2 , Quanhu Sheng 2,3 , Haixu Tang 2,3 , Yehia Mechref 1,2 * and Milos V. Novotny 1,2 * 1 Department of Chemistry, Indiana University, Bloomington, IN 47405, USA 2 National Center for Glycomics and Glycoproteomics, Department of Chemistry, Indiana University, Bloomington, IN 47405, USA 3 School of Informatics, Indiana University, Bloomington, IN 47405, USA Received 24 April 2008; Revised 14 August 2008; Accepted 17 September 2008 In simplifying the evaluation and quantification of high-throughput label-free quantitative proteo- mic data, we introduce ProteinQuant Suite. It comprises three standalone complementary computer utilities, namely ProtParser, ProteinQuant, and Turbo RAW2MGF. ProtParser is a filtering utility designed to evaluate database search results. Filtering is performed according to different criteria that are defined by the end-user. ProteinQuant then utilizes this parsed list of peptides and proteins in conjunction with mzXML or mzData files generated from the raw files for quantification. This quantification is based on the automatic detection and integration of chromatographic peaks representative of the liquid chromatography/mass spectrometry (LC/MS) elution profiles of ident- ified peptides. Turbo RAW2MGF was developed to extend the applicability of ProteinQuant Suite to data collected from different types of mass spectrometers. It directly processes raw data files generated by Xcalibur, a ThermoElectron data acquisition software, and generates a MASCOT generic file (MGF). This file format is needed since the protein identification results generated by the database search employing this file format include information required for the precise identi- fication and quantification of chromatographic peaks. The performance of ProteinQuant Suite was initially validated using LC/MS/MS generated for a mixture of standard proteins as well as standard proteins spiked in a complex biological matrix such as blood serum. Automated quantification of the collected data resulted in calibration curves with R 2 values higher than 0.95 with linearity spanning over more than 2 orders of magnitude with peak quantification reproducibility better than 15% (RSD). ProteinQuant Suite was also applied to confirm the binding preference of standard glyco- proteins to Con A lectin using a sample consisting of both standard glycoproteins and proteins. Copyright # 2008 John Wiley & Sons, Ltd. The continuous development of qualitative and quantitative proteomic approaches has been made possible through the technological advancement in the areas of separation science and mass spectrometry (MS). Such advances stimulated the life science activities aiming at biomarker discovery as well as at a better understanding of biological systems. Quantitative proteomics can be successfully used in characterizing alterations in protein abundance as a con- sequence of disease state or treatment of a disease. This is based on the assumption that such differences represent differential protein expression originating from a pertur- bation of the biological system as a consequence of such conditions. Different MS-based approaches for performing quantitative proteomics which offer distinct advantages and disadvantages have been developed. 1–4 The available methods can be classified into those based on electrophoretic separation techniques such as one- and two- dimensional polyacrylamide gel electrophoresis (1-DE or 2- DE, respectively) and those based on chromatographic separations. 2 Due to the high complexity of most proteomes, 2-DE is very popular in comparative quantitative proteo- mics. 5–7 It is able to resolve thousands of proteins, allowing visualization of changes between complex proteome samples. In addition, only those spots that appear differen- tially abundant need to be analyzed by MS, thus substantially reducing the overall task. Nevertheless, 2-DE still suffers from its limited sensitivity and dynamic range as well as its long and tedious procedure. 8,9 Quantitative proteomics has recently capitalized on major advances in chromatographic media, columns and instru- mentation. Chromatographically based quantitative proteo- mic approaches largely depend on the liquid chromatog- raphy/tandem mass spectrometry (LC/MS/MS) analyses of RAPID COMMUNICATIONS IN MASS SPECTROMETRY Rapid Commun. Mass Spectrom. 2008; 22: 3823–3834 Published online in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/rcm.3781 *Correspondence to: Y. Mechref or M. V. Novotny, Department of Chemistry, Indiana University, 800 E Kirkwood Ave, Blooming- ton, IN 47405, USA. E-mails: [email protected]; [email protected] Contract/grant sponsor: National Institute of General Medical Sciences, US Department of Health and Human Services; con- tract/grant number: GM24349. Contract/grant sponsor: NIH/NCRR – National Center for Glycomics and Glycoproteomics (NCGG); contract/grant num- ber: RR018942. Copyright # 2008 John Wiley & Sons, Ltd.

ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics

Embed Size (px)

Citation preview

Page 1: ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics

RAPID COMMUNICATIONS IN MASS SPECTROMETRY

Rapid Commun. Mass Spectrom. 2008; 22: 3823–3834

) DOI: 10.1002/rcm.3781

Published online in Wiley InterScience (www.interscience.wiley.com

ProteinQuant Suite: a bundle of automated software

tools for label-free quantitative proteomics

Benjamin Mann1, Milan Madera1,2, Quanhu Sheng2,3, Haixu Tang2,3, Yehia Mechref1,2*

and Milos V. Novotny1,2*1Department of Chemistry, Indiana University, Bloomington, IN 47405, USA2National Center for Glycomics and Glycoproteomics, Department of Chemistry, Indiana University, Bloomington, IN 47405, USA3School of Informatics, Indiana University, Bloomington, IN 47405, USA

Received 24 April 2008; Revised 14 August 2008; Accepted 17 September 2008

*CorrespoChemistrton, IN 4E-mails: yContract/Sciences,tract/graContract/Glycomicber: RR01

In simplifying the evaluation and quantification of high-throughput label-free quantitative proteo-

mic data, we introduce ProteinQuant Suite. It comprises three standalone complementary computer

utilities, namely ProtParser, ProteinQuant, and Turbo RAW2MGF. ProtParser is a filtering utility

designed to evaluate database search results. Filtering is performed according to different criteria that

are defined by the end-user. ProteinQuant then utilizes this parsed list of peptides and proteins in

conjunction with mzXML or mzData files generated from the raw files for quantification. This

quantification is based on the automatic detection and integration of chromatographic peaks

representative of the liquid chromatography/mass spectrometry (LC/MS) elution profiles of ident-

ified peptides. Turbo RAW2MGFwas developed to extend the applicability of ProteinQuant Suite to

data collected from different types of mass spectrometers. It directly processes raw data files

generated by Xcalibur, a ThermoElectron data acquisition software, and generates a MASCOT

generic file (MGF). This file format is needed since the protein identification results generated by

the database search employing this file format include information required for the precise identi-

fication and quantification of chromatographic peaks. The performance of ProteinQuant Suite was

initially validated using LC/MS/MS generated for a mixture of standard proteins as well as standard

proteins spiked in a complex biological matrix such as blood serum. Automated quantification of the

collected data resulted in calibration curves with R2 values higher than 0.95 with linearity spanning

over more than 2 orders of magnitude with peak quantification reproducibility better than 15%

(RSD). ProteinQuant Suite was also applied to confirm the binding preference of standard glyco-

proteins to Con A lectin using a sample consisting of both standard glycoproteins and proteins.

Copyright # 2008 John Wiley & Sons, Ltd.

The continuous development of qualitative and quantitative

proteomic approaches has been made possible through

the technological advancement in the areas of separation

science and mass spectrometry (MS). Such advances

stimulated the life science activities aiming at biomarker

discovery as well as at a better understanding of biological

systems. Quantitative proteomics can be successfully used in

characterizing alterations in protein abundance as a con-

sequence of disease state or treatment of a disease. This is

based on the assumption that such differences represent

differential protein expression originating from a pertur-

bation of the biological system as a consequence of such

ndence to: Y. Mechref or M. V. Novotny, Department ofy, Indiana University, 800 E Kirkwood Ave, Blooming-7405, [email protected]; [email protected] sponsor: National Institute of General MedicalUS Department of Health and Human Services; con-

nt number: GM24349.grant sponsor: NIH/NCRR – National Center fors and Glycoproteomics (NCGG); contract/grant num-8942.

conditions. Different MS-based approaches for performing

quantitative proteomics which offer distinct advantages and

disadvantages have been developed.1–4

The available methods can be classified into those based on

electrophoretic separation techniques such as one- and two-

dimensional polyacrylamide gel electrophoresis (1-DE or 2-

DE, respectively) and those based on chromatographic

separations.2 Due to the high complexity of most proteomes,

2-DE is very popular in comparative quantitative proteo-

mics.5–7 It is able to resolve thousands of proteins, allowing

visualization of changes between complex proteome

samples. In addition, only those spots that appear differen-

tially abundant need to be analyzed by MS, thus substantially

reducing the overall task. Nevertheless, 2-DE still suffers

from its limited sensitivity and dynamic range as well as its

long and tedious procedure.8,9

Quantitative proteomics has recently capitalized on major

advances in chromatographic media, columns and instru-

mentation. Chromatographically based quantitative proteo-

mic approaches largely depend on the liquid chromatog-

raphy/tandem mass spectrometry (LC/MS/MS) analyses of

Copyright # 2008 John Wiley & Sons, Ltd.

Page 2: ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics

3824 B. Mann et al.

proteome samples that have been subjected to proteolytic

digestion. Such analyses are achieved through comparing the

differences between LC/MS/MS runs of the proteolytic

digests of both control and experimentally perturbed

systems. Generally, quantitative proteomics involves the

analysis of samples that are either subjected to stable-isotope

labeling, such as isotope-coded affinity tag (ICAT), global

internal standard strategy (GIST) and isobaric tag for relative

and absolute quantification (iTRAQ),10–12 or analyzed with-

out any labeling step, an approach commonly referred to as

label-free quantitative proteomics.8,13,14

Regardless of the general approach used, quantitative

proteomics often deals with the analysis of large sets of

samples, generating a considerable number of data files.

Therefore, data evaluation required for either absolute or

relative quantification of all or a limited number of

components is extremely challenging and manually imposs-

ible. Due to the lack of comprehensive quantification

packages currently supplied by the mass spectrometry

vendors as part of their data acquisition software, a variety

of open source tools15 for both stable-isotope labeling16–19

and label-free19–24 experiments have been developed by

different groups. These programs are frequently developed

as cross-platform applications and are commonly distributed

as a source code, which requires additional compilation prior

to use. Enabling compatibility with multiple operating

systems is undoubtedly useful but the successful deploy-

ment of such utilities usually requires broad computer

knowledge and extensive configuration. Although the

majority of currently available software uses very sophisti-

cated algorithms to resolve chromatographic peaks and can

handle raw data in a universal format, some still have a

limited capability of automatically processing multiple data

files,19 while others lack the association between the resolved

and integrated peaks and the list of identified peptides and

proteins.24 For example, mapQuant, a software capable of

large-scale protein quantification developed by Church and

coworkers,23 resolves chromatographic peaks through the

combination of 2D imaging, watershed segmentation and

isotopic deconvolution, but it still lacks the support for

unified mzXML or mzData file formats. Another software,

msInspect, released by McIntosh and colleagues,19 utilizes

similar 2D imaging of LC/MS/MS runs in mzXML format

for the determination of eluting components, yet it does not

officially support label-free quantification approaches and

requires additional script writing to allow for automated

processing of multiple data files. Notable features of

mzMine, a tool facilitating label-free quantitative proteomics,

include its ability to run in batch mode and implement de-

noising, background subtraction and isotopic deconvolu-

tion.24 However, mzMine does not currently associate the

areas of evaluated peaks with identified peptides or proteins,

thus making it more suitable for high-throughput quanti-

tative profiling, where the identification of resolved features

is not necessary.

Recently, the aforementioned limitations associated with

quantitative proteomics appear to be overcome via trans-

proteomic pipelines (TPPs) with standardized inputs and

outputs.25 These robust solutions, led by CPAS (Compu-

tational Proteomics Analysis System), are not however

Copyright # 2008 John Wiley & Sons, Ltd.

designed as standalone utilities for proteomic quantification;

they rather provide complete and unified data processing

with the capability of sharing the results among different

institutions. TPPs are highly configurable and work with

various quantification utilities, such as Xpress or ASAPRa-

tio;18 however, these software plug-ins thus far only support

approaches based on isotopic labeling. Because of their

robustness and universal architecture, TPPs require dedi-

cated servers or even computer clusters to handle systematic

centralized data processing; therefore, they may not be

deemed convenient or necessary for some laboratories.

In responding to some general needs of quantitative

proteomics with an emphasis on providing a utility that

would be very easy to use without a need for extensive

computer knowledge or configuration, we have developed

ProteinQuant Suite software package. It facilitates the

evaluation of multiple data files generated from label-free

proteomic experiments and offers a simple, user-friendly

and standalone alternative to other currently available high-

throughput quantification tools. The utility of ProteinQuant

Suite is demonstrated for high-throughput comparative

quantification of proteolytic digests of standard proteins

analyzed separately or spiked in depleted human blood

serum as an example of a complex biological mixture.

Different features associated with the utility of ProteinQuant

Suite in quantitative proteomics are addressed, including

run-to-run reproducibility, the linearity of dynamic range,

and different normalization methods. The use of Protein-

Quant Suite was also demonstrated in studying of glyco-

protein binding by lectin affinity chromatography.

EXPERIMENTAL

Reagents and standardsLysozyme, bovine serum albumin (BSA), cytochrome C,

ovalbumin, alpha-lactalbumin, lactoglobulin B, histone,

alpha-casein, myoglobin, lactoferrin, immunoglobulin G,

glutathione s-transferase, hemoglobin, ribonuclease B,

fetuin, thryoglobulin, carbonic anhydrase II, and trypsin

(proteomics, sequencing grade) were purchased from Sigma-

Aldrich (St. Louis, MO, USA). Dithiothreitol (DTT) and

iodoacetamide (IAA) were acquired from Bio-Rad (Hercules,

CA, USA). HPLC grade reagents were purchased from EMD

(Darmstadt, Germany). The different buffers used here were

prepared in Millipore deionized water (Millipore, Billerica,

MA, USA). Standard protein mixtures were prepared from

stock solutions suspended in 50 mM ammonium bicarbon-

ate. Female human blood serum was acquired from

Innovative Research, Inc. (Southfield, MI, USA). Serum

was divided into 1-mL aliquots and frozen at �208C in less

than 1 h upon its receipt from the vendor. This step was

necessary to avoid unnecessary freeze/thaw cycles.

Software developmentAll applications included in ProteinQuant Suite were written

in C# programming language and compiled in Microsoft

Visual Studio 2005 Professional Edition. The developed

software is fully compatible with Windows-based operating

systems with ‘.NET’ framework v2.0. It features an easy

installation procedure and provides a graphic, user-friendly

Rapid Commun. Mass Spectrom. 2008; 22: 3823–3834

DOI: 10.1002/rcm

Page 3: ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics

ProteinQuant Suite: software bundle for proteomics 3825

interface. This is true for all utilities except for Turbo

RAW2MGF converter which requires a complete installation

of XCalibur mass spectrometer controlling software (Ther-

moElectron, San Jose, CA, USA) including the XDK

development kit.

Depletion of human blood serum with themultiple affinity removal system (MARS)Blood serum was depleted on an Akta purifier (Amersham

Biosciences, NJ, USA) using the multiple affinity removal

system (MARS), a 4.6� 100 mm affinity column (Agilent

Technologies, Santa Clara, CA, USA). Depletion was

performed as suggested by the manufacturer’s LC protocol.

A 30-mL aliquot from each depletion process was collected

and subsequently pooled with other depleted fractions and

reconcentrated to a total volume of 500mL. Next, buffer

exchange was performed, replacing the depletion buffers

with 50 mM ammonium bicarbonate and preconcentrating

the mixture to ca. 0.5 mg/mL concentration. The total protein

concentration of the final mixture was determined by

Bradford protein assay (BioRad, Hercules, CA, USA).

Trypsin digestionProtein samples were subjected to tryptic digestion accord-

ing to the following procedure. After thermal denaturation at

958C for 10 min, samples were reduced through the

addition of DTT to a final concentration of 5 mM and

incubated at 608C for 45 min. Alkylation was achieved by

adding IAA to a final concentration of 20 mM prior to

incubation at room temperature for 45 min in the dark. A

second aliquot of DTT was then added, increasing the final

concentration of DTT to ca. 10 mM. Samples were then

incubated at room temperature for 30 min to quench the

alkylation reaction. Next, trypsin was added (1:30 w/w) and

the solutions were incubated at 378C for 18 h.26 The

enzymatic digestions were then quenched through the

addition of neat formic acid. The sample containing

17 standard proteins at equimolar concentrations was

prepared prior to enzymatic digestion, as were the depleted

human blood serum samples spiked with different oval-

bumin concentrations.

Lectin affinity chromatographyA mixture of standard proteins (BSA, cytochrome C,

myoglobin) and glycoproteins (fetuin, ovalbumin, ribonu-

clease B) was prepared in Con A binding buffer (10 mM

TRIS.HCl, pH 7.5, 500 mM NaCl, 1 mM MnCl2, 1 mM CaCl2,

0.08% NaN3) to a final concentration of 1mg/mL of each

protein. A 50-mL aliquot of Con A Sepharose was thoroughly

washed with 1 mL Con A binding buffer, mixed with 100mL

of the protein mixture sample followed by the addition of

100mL lectin binding buffer. After overnight incubation at

48C, unbound proteins were washed from the media with

2� 100mL binding buffer and combined in an Eppendorf

tube. Bound glycoproteins were then displaced from the

lectin through three sequential washes each with a 150-mL

aliquot of the elution buffer (0.2 M a-D-methylmannoside,

0.2 M a-D-methylglucoside in the binding buffer) and

combined in a separate vial. Both bound and unbound

Copyright # 2008 John Wiley & Sons, Ltd.

fractions were desalted using Microcon 10 kDa spin mem-

brane filters and dried.

The dried proteins were then denatured with 20mL of 6 M

guanidine hydrochloride and incubated at room tempera-

ture for 30 min. After the addition of 180mL of 50 mM

ammonium bicarbonate buffer, sample was reduced with

5mL of 200 mM DTT for 30 min at 608C, and alkylated with

20mL of 200 mM IAA for 30 min at room temperature.

Finally, the mixture was digested with trypsin (2% w/w) for

18 h at 378C and a 1mL of the generated peptides was

subjected to LC/MS/MS analysis.

Nano-LC/MS/MSA nano-LC/MS/MS system comprised of a 1100 nano-LC

system (Agilent Technologies, Santa Clara, CA, USA)

interfaced to XCT Ultra LC/MSD ion-trap mass spectrometer

(Bruker Daltonics, Billerica, MA, USA) and equipped with

the nano-electrospray ionization (ESI) source was used here.

Samples were desalted through on-line trapping using a

PepMap300 C18 cartridge (5mm, 300 A; Dionex, Sunnyvale,

CA, USA) prior to separating the peptides on a Zorbax 300SB

C18 nanocolumn (3.5mm particles, 75mm� 150 mm; Agilent

Technologies, Santa Clara, CA, USA). The separation was

performed at a flow rate of 250 nL/min using a linear

gradient from 3% to 55% acetonitrile containing 0.1% formic

over 45 min. The LC system was controlled by ChemStation

(Agilent Technologies, Santa Clara, CA, USA), while MS data

acquisition was performed using Esquire Control software

(Bruker Daltonics, Billerica, MA, USA). Capillary voltage

was kept at 1700 V, while the desolvation temperature was

maintained at 3008C. The ion charge control value (ICC) was

set to 200 000 with a maximum accumulation time of 200 ms.

MS/MS fragmentation of the five most intense precursor

ions in the spectra was performed automatically with an

exclusion window of 0.5 min.

Data processing and quantification usingProteinQuant SuiteThe data acquired by the XCT Ultra mass spectrometer were

processed with Data Analysis software (Bruker Daltonics,

Billerica, MA, USA) and the generated peak lists were saved

as a MASCOT generic file (MGF). MGF files were then

submitted to MASCOT database searching and results were

parsed with ProtParser set to specific parsing criteria which

are defined by the end-users. In this study, only þ2 and þ3

charged peptides were subjected to MS/MS experiments,

since every tryptic peptide should have at least two charges,

one on its N-terminus, and the other at lysine or arginine

residue of its C-terminus.27 Minimum MOWSE ion score

threshold was set to 30. Also, peptide mass threshold was set

to 600 Da to exclude possible low molecular weight

fragments or other possible non-peptide interferences.

Additionally, tryptic peptides with KK, RR, RK or KR motifs

were also not considered valid, as trypsin would most likely

cleave at least one of the bonds.28 The use of the so-called

‘decoy database’ has recently been shown to be a valuable

tool for evaluating the rate of false positive identifications of

peptides through database searching.29 Using this approach

in conjunction with the abovementioned filtering criteria, the

Rapid Commun. Mass Spectrom. 2008; 22: 3823–3834

DOI: 10.1002/rcm

Page 4: ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics

3826 B. Mann et al.

false positive identification rate for our different experiments

was estimated to be 4.9%.

All parsed files were finally combined into a master file,

which contained the list of all proteins and peptides

identified across all the processed LC/MS/MS analyses.

The generated master files, in conjunction with their

corresponding mzXML files created from raw data files by

Bruker’s CompassXport conversion tool, were finally sub-

mitted to ProteinQuant, which quantifies identified features

through several steps described below.

Construction of extracted ion chromatogramsFirst, ProteinQuant reads the m/z values of the identified

peptide saved in the ProtParser text file and subsequently

creates extracted ion chromatograms for each m/z value from

the mzXML or mzData MS data file. Since the data generated

by a mass spectrometer may sometimes be noisy, Protein-

Quant uses a Savitzky-Golay30 smoothing algorithm, thereby

facilitating peak integration. ProteinQuant also has a variable

m/z tolerance window that can be adjusted to reconstruct

chromatograms. This feature allows the use of this software

to integrate data generated by both high and low mass

resolution and accuracy mass spectrometers.

Baseline calculationProteinQuant offers several options for baseline evaluation,

which can be modified to accommodate the end-user

requirements. By default, baseline values are calculated as

an average intensity of the data points in the extracted ion

chromatogram through the first minute of data acquisition.

Alternatively, the user can define this time segment or assign

a fixed baseline value. These values are entered in the

configuration page of ProteinQuant.

Peak apex and edge assignmentEvery peptide hit in the ProtParser files is associated with a

retention time and intensity value. In addition, these files

contain only the top-intensity queries of multiple hits of

matching peptides with the same sequences. Therefore,

ProteinQuant initially utilizes the retention time of a peptide

listed in the parsed file as the apex of chromatographic peak

for that peptide. However, this retention time only reflects

the time of MS/MS acquisition of a particular precursor ion,

which does not often correspond to the real apex of the

chromatographic peak of an identified peptide. Due to mass

spectrometer duty cycle and depending on MS method

settings, the precursor ion may be selected for MS/MS before

or after the maximum of its eluting peak. Therefore,

ProteinQuant checks the intensity of the precursor ion in

the interval given by the peak width entry, which has a

default setting of 1 min, and, subsequently, within this peak

width it assigns the apex of the chromatographic peak

corresponding to the m/z value of the identified peptide.

Next, ProteinQuant allows the end-user to define the method

to be used for assigning the edges of the chromatographic

peak. Peaks can be defined by the edges calculated using full-

width at half maximum criteria,31 an arbitrary time window,

intensity threshold, or, by default, from a combination of an

intensity threshold that is constrained by a maximum time

window. After the definition of the apex and both edges, the

Copyright # 2008 John Wiley & Sons, Ltd.

elution profiles of the identified peptides are then integrated

based on rectangular approximation. Peptide and protein

quantification results and the information of the identified

peptides are finally reported and saved in comma delimited

(CSV) file format.

RESULTS AND DISCUSSION

ProteinQuant Suite figures of meritsDue to the increasing interest in the quantitative aspects of

proteomics studies accompanied by a frequent need to

analyze very complex protein or peptide mixtures, high-

throughput quantification usually requires software-assisted

data evaluation methodologies. Therefore, we have devel-

oped the ProteinQuant Suite software bundle, consisting of

three stand-alone complementary utilities, namely Turbo

RAW2MGF, ProtParser and ProteinQuant. The workflow of

ProteinQuant Suite is summarized in Fig. 1. The first step in

the workflow involves the processing of LC/MS/MS raw

data to generate a peak list which consists of precursor ion

m/z values and MS/MS fragments and their intensities. This

peak list is saved as MASCOT generic file (MGF) format

which is necessary for quantification with ProteinQuant,

since MGF files contain retention times and intensities of

precursor ions required for quantification. MGF files are then

submitted to the database searching engine, MASCOT,

which outputs the result as a list containing identified

peptides and their corresponding proteins. The generated

results are saved as HTML files, which are easily processed

and filtered using parsing software such as ProtParser

described here. Although ProteinQuant Suite is only capable

of processing data searched using MASCOT, the potential of

using ProteinQuant with other search engines is currently

being investigated.

In our ProteinQuant Suite, the same MS raw data files,

which were used to generate the list of proteins, will need to

be converted to mzXML32 or HUPO’s mzData33 file format

prior to quantification. Translating raw data files to these

universal file formats is usually performed separately using a

variety of open source utilities that are either part of data

acquisition software or can be downloaded from the

internet.34 Therefore, we opted not to include any third-

party converters into the Suite.

The performance of ProteinQuant in terms of peak

integration was evaluated through comparing its integrated

values to those generated using the vendor’s software and

referred to here as manual integration. This comparison was

performed using the data acquired from the LC/MS/MS

analysis of the tryptic digests of both a 10-ng aliquot

(150 fmol) of BSA and a 1-mg aliquot of depleted human

serum. The areas of 13 reliably identified peptides integrated

manually and by ProteinQuant Suite are listed in Table 1.

Accordingly, the peak areas reported by ProteinQuant are

exceedingly comparable to those obtained through manual

integration. The differences between the two approaches

were less than 10% for all peptides, including those detected

at low signal-to-noise (S/N) ratios, suggesting an acceptable

performance of ProteinQuant’s peak-picking and peak-

integration algorithm.

Rapid Commun. Mass Spectrom. 2008; 22: 3823–3834

DOI: 10.1002/rcm

Page 5: ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics

Table 1. Integrated areas of different peptides derived from BSA tryptic digest and depleted human blood serum digest,

calculated through manual integration and ProteinQuant integration

Peptide m/z S/Na

Integration method

Difference [%]Manual ProteinQuant

BSA peptidesGLVLIAFSQYLQQCPFDEHVK 831.66 232 3.47Eþ 08 3.43Eþ 08 1.3TVMENFVAFVDK 700.34 614 9.03Eþ 07 8.60Eþ 07 5.1LGEYGFQNALIVR 740.71 79 1.18Eþ 08 1.13Eþ 08 3.9DAFLGSFLYEYSR 784.34 71 3.20Eþ 08 2.91Eþ 08 9.7

Depleted human blood serum peptidesSPVGVQPILNEHTFCAGMSK 724.8 79 1.10Eþ 10 1.12Eþ 10 2.14ILLQGTPVAQMTEDAVDAERLK 800.3 38 4.72Eþ 09 4.81Eþ 09 1.84DYVSQFEGSALGK 700.7 29 2.42Eþ 09 2.50Eþ 09 3.19NFPSPVDAAFR 610.7 29 3.40Eþ 09 3.37Eþ 09 1.06IASFSQNCDIYPGKDFVQPPTK 838.2 20 2.21Eþ 09 2.19Eþ 09 0.52YFKPGMPFDLMVFVTNPDGSPAYR 917.5 12 2.03Eþ 09 1.97Eþ 09 2.98FICPLTGLWPINTLK 887.4 10 1.86Eþ 09 1.82Eþ 09 2.24GPSVFPLAPCSR 644.2 6 2.05Eþ 08 1.96Eþ 08 4.32AFQPFFVELTMPYSVIRGEAFTLK 931.7 4 2.51Eþ 08 2.46Eþ 08 1.75

a S/N (as calculated by Data Analysis Software, Bruker Daltonics, Billerica, MA, USA) equals the chromatographic peak height of the extractedion chromatogram of interest divided by five times the standard deviation, s, of the 3rd derivative of the total ion chromatogram during the first

5 min of the LC/MS/MS experiment. s is calculated using the equation, s ¼ffiffiffiffiffiffiffiffiffiffiffiPNi¼1

y000N

s, where y is the chromatographic height at each data point i

and N is the total number of data points.

Figure 1. ProteinQuant Suite experimental workflow describing the automated

quantification aspects.

Copyright # 2008 John Wiley & Sons, Ltd. Rapid Commun. Mass Spectrom. 2008; 22: 3823–3834

DOI: 10.1002/rcm

ProteinQuant Suite: software bundle for proteomics 3827

Page 6: ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics

3828 B. Mann et al.

In addition to having a good agreement between the

results from a supervised and the automated integration of

constructed elution profiles for identified peptides, it is also

very important to demonstrate reproducible chromato-

graphic peak-picking and peak-integration. The extracted

ion chromatograms of the peptide LGEYGFQNALIVR

constructed from five injections of BSA tryptic digest were

both evaluated manually and with ProteinQuant (data not

shown). The shapes of the chromatographic peaks, as well as

their retention times, are very consistent, as suggested by an

automated integration relative standard deviation (RSD) of

less than 5%. This suggests that contribution of Protein-

Quant’s integration algorithm to the variation of the

evaluated peaks is negligible. This is supported by the fact

that the overall variations of an LC/MS/MS analysis can be

as high as 15%.35

Quantitative proteomics employingProteinQuant Suite

Label-free quantification of a mixture of standardproteins using normalization and a master peptide fileThe addition of an internal standard in label-free quantitative

proteomics has repeatedly been suggested for its benefit as a

control element to measure the consistency of instrument

response. Several strategies, involving the addition of an

internal standard in quantitative proteomic experiments,

have been previously discussed.36,37 On the other hand, the

use of a sample-dependent, global normalization coefficient,

that does not require artificial spiking and utilizes all

components assumed to be present in a mixture at constant

concentration, has also been discussed as an alternative

approach to normalization of proteomic data.8 ProteinQuant

Suite allows the normalization of proteomic data using any of

these approaches. An entry box is included in ProteinQuant

to allow the user to define the normalization approach to be

employed, if any.

As a model choice for internal spiking with a standard, we

chose lysozyme. In 2004, Riter et al. reported that lysozyme

was a sound choice for the differential study of protein

expression in rat serum for many reasons, including its size

(14 kDa) and multiple tryptic peptides of similar mass-to-

charge ratios.38 Lysozyme was also attractive for a reversed-

phase separation as its peptides eluted over a large range of

the chosen gradient. This final characteristic was considered

to be highly desirable for complex mixtures. In cases where

analyte components might co-elute with certain standard

peptides, the broad range over which lysozyme peptides

elute could limit variation introduced in the standard

response by competitive ionization.39 Standard protein

normalization was quickly performed with ProteinQuant

by inputting the Swiss-Prot entry name (LYSC_CHICK) in

the normalization tab of the software configuration menu.

The global normalization strategy was also evaluated since

it has its own merits: fewer preparation steps, minimal

sample complexity, higher efficiency compared to spiking

with a standard, and normalization to a large set of signals,

thus reducing the influence of random variation. This

method was also easily evaluated as it has been implemented

in ProteinQuant. With regard to the results discussed herein,

Copyright # 2008 John Wiley & Sons, Ltd.

lysozyme was spiked in the mixture of 17 standard proteins,

and the global strategy was employed in both the standard

mixture study and that where ovalbumin was spiked in the

depleted human serum sample.

Current applications of high-throughput proteomics seek

to reveal the significant changes in abundance of signal

proteins for particular diseases or perturbed conditions.3,40

Often, confident analysis of these important components

may be hampered by limited selection of critical MS

precursors in a given experiment as a result of the duty

cycle of the mass spectrometer. It is more likely that a peptide

will not be subjected to an MS/MS experiment if it is present

at a low concentration. Therefore, a sophisticated approach

to quantify components that are known to exist in a sample

even when they are not determined through MS/MS is

needed to prevent exclusion of possibly critical peptides.

This issue has been discussed, and it is illustrated here by

the example depicted in Fig. 2, in which the peptide,

LSFNPTQLEEQCHI from beta-lactoglobulin, was skipped

four times over the course of 20 LC/MS/MS experiments.

Clearly, the base peak chromatogram for the associated ion of

m/z 858.49 was present at the same retention time in the

experiments for which it was not picked for MS/MS

(Fig. 2(A)). Accordingly, it was vital to implement a strategy

for including these significant components for quantitative

purposes.

Previously, research teams have reported methods based

on ‘landmark’ alignment of MS peaks associated with

commonly identified peptides in which they calculated a

‘universal’ retention time for each component by comparison

to a designated template chromatogram.41,42 Through this

approach, chromatograms were aligned to the template by

extension or compression of the intervals between landmarks

after which cross-assignment of each identified peptide

(throughout the entire investigation) to a coinciding MS peak

allowed for quantitative analysis.20 While this approach

utilizes sophisticated alignment algorithms, it operates

under the assumption that the order in which components

elute is unchanging from experiment to experiment, which

may be difficult to conclude in a mixture of several thousand

peptides. Smith and coworkers have discussed another

method in which chromatograms are normalized to one

another to generate ‘normalized elution times’ (NET) for

each peptide that are then included in an ‘accurate mass and

time’ tag (AMT) database which is used to identify peptides

from standard proteomics experiments.43 This approach,

however, requires the utility of high-mass accuracy mass

spectrometers.

Although peak alignment strategies are known to improve

the performance of quantification algorithms, we would like

to note that these methods require computer clusters to

facilitate data processing in a timely manner.41,42,44 Protein-

Quant has been designed as a portable tool that could be used

by proteomics laboratories that may not have access to

designated servers for data processing. This being said,

ProteinQuant includes a retention time window option as a

means of accounting for chromatographic shifts. The

approach we developed is based on compiling a master file

containing all peptides identified with MASCOT over all

experiments in ProtParser and integrating the MS precursor

Rapid Commun. Mass Spectrom. 2008; 22: 3823–3834

DOI: 10.1002/rcm

Page 7: ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics

Figure 2. Extracted ion chromatograms (EIC) of LSFNPTQLEEQCHI. In 20 LC/MS/MS injections, this

peptide was not subjected to MS/MS four times. (A) Top chromatogram shows an EIC for an experiment in

which the MS precursor was picked for MS2 and the four proceeding were cases in which the precursor

was not picked in spite of the apparently significant chromatographic peak. (B) Zoom spectrum for m/z

250–950 from a representative MS2 experiment that was used to identify the peptide by MASCOT; in all 16

cases where the peptide was picked for MS2, the ion score was >30.

ProteinQuant Suite: software bundle for proteomics 3829

peak associated with each component listed in this new file in

every individual LC/MS/MS analysis. Compiling of differ-

ent parsed files into a master file is attained using the

‘combine’ function of ProtParser. While there were slight

variations in the retention time of peptides from experiment

to experiment, these were accounted for with the Protein-

Quant peak apex assignment options, in which an appro-

priate scan window was designated for the specific

investigation. The master file was created by automated

scanning of the individual database searches from each

experiment and including the first occurrence of a peptide

and its associated retention time (the time at which the MS

precursor was subjected to MS/MS) from these individual

runs. After a peptide had been added to the master file,

proceeding occurrences from subsequent database searches

were not included by ProtParser. In order to reassign the

retention time for each peptide to the apex of the MS peak,

ProteinQuant performed automatic peak assignment by

generating a base peak chromatogram for the appropriate

m/z value and then scanning in each direction from the

retention time associated with the MS precursor for the

maximum intensity value within the user-designated time

window. Without using a master file, the default window for

peak assignment had been �1 min. Because a master file

included the first occurrence of each peptide in all 20 database

search files of the 20 LC/MS/MS analyses, the scan window

was increased to ensure proper assignment of the apex for all

LC/MS experiments throughout the 20-injection investi-

gation. Although the chromatographic retention time

variation was limited to 1–2 min for any peptide, it was

necessary to account for the different points at which a

peptide could be picked on its elution profile for MS/MS

experiments by the instrument. Therefore, the peak assign-

ment window was increased to �1.5 min for quantification

with the master file. It has been discussed previously by

Copyright # 2008 John Wiley & Sons, Ltd.

Higgs and coworkers41 that extension of the integration

window can mask the area of an individual peptide by

including partial peaks from co-eluting peptides in the

calculation, and it is important to note that our approach

does not expand the integration window, only the peak

assignment window. It should be noted that the compen-

sation for the chromatographic shift through the peak

assignment window is necessary only when the user chooses

to use the master file. In this case, the end-user would need to

experimentally define chromatographic shifts and sub-

sequently retention time variation. However, if only those

peptides that were identified in each run are quantified, then

it is not necessary to compensate for chromatographic shift

because each peak is associated with a retention time at

which its MS/MS spectrum is acquired.

The method described above was tested using a standard

mixture containing 17 proteins, all present at equimolar

concentrations. The mixture was injected 20 times, and a

master peptide file was generated. Quantification results

based on individual experiments were compared to those

obtained with the master file. In Fig. 3(A), components were

not normalized, and in Fig. 3(B) components were globally

normalized to all 225 identified peptides. Normalization to

the lysozyme protein area in each sample was also

performed (data not shown), and while results were

improved over the not normalized, global normalization

showed significantly greater improvement. The global

approach was also more attractive because it did not require

artificial spiking of sample, thus eliminating additional

uncertainty.

Considering that less intense MS precursors were more

likely to be skipped for MS/MS, it is expected that the total

areas calculated for each protein would only increase by a

small amount when a master file was used. Reproducibility

improved significantly, lowering most coefficients of vari-

Rapid Commun. Mass Spectrom. 2008; 22: 3823–3834

DOI: 10.1002/rcm

Page 8: ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics

Figure 4. Calibration curve of ovalbumin spiked in 1mg

depleted human blood serum quantified with a master file:

(A) not normalized, (B) normalized, and (C) ovalbumin signal

was increased by a factor of 100 for visual clarity and then log

transformed to emphasize the continuation of the linear trend

into the low end of the dynamic range.

Figure 3. Bar graph comparing the quantification results with

a master file of peptides to quantification with only peptides

identified through MASCOT for each individual experiment; a

considerable increase in precision was observed using a

master file for both (A) not normalized and (B) normalized

results; normalization also contributed to additional improve-

ment in reproducibility (>15%).

3830 B. Mann et al.

ation from 20–30% to 5–15%, which was also to be expected

since a master file insured that all peaks identified for a

protein throughout the investigation would be integrated in

each experiment. Based on these results, implementation of a

master file with ProteinQuant coupled with normalized

quantification appeared to be advantageous for label-free

analysis of proteins.

Label-free quantification of ovalbumin spiked in depletedhuman blood serum sampleTo test the efficacy of the software for label-free quantifi-

cation of a complex biological mixture, namely depleted

human blood serum, known quantities of ovalbumin were

injected in 1-mg aliquots of depleted serum sample. Seven

experiments were conducted through which triplicate

injections were made of samples containing 250–10 000 fmol

ovalbumin. Relative protein abundance for the standard

protein was calculated with a master file as described above.

We chose a modified global normalization method, in part to

utilize the large number of serum proteins that were

assumed to be present at constant concentration, and also

to limit the number of sample preparation steps. Ovalbumin

peptide areas were normalized to the sum of all serum

peptides listed in the master file in an automated fashion by

Copyright # 2008 John Wiley & Sons, Ltd.

configuring ProteinQuant to normalize to all peptides except

those from ovalbumin.

While normalized data suggested a trend in the calculated

area as the amount of ovalbumin increased (Figs. 4(A)

and 4(B)), the contribution of the lower points to the overall

linearity (R2) was further elucidated by performing linear

regression on log transformed data. The contribution of the

lower abundant points to the overall linearity was augmen-

ted, thus clearly suggesting that this trend extended to the

lower end of the dynamic range (Fig. 4(C)). To further

illustrate this point, five ovalbumin peptides quantified in

the mixture at 250 fmol were manually inspected to verify

that the ProteinQuant algorithm was functioning properly

for peptides present near the limit of quantification (Table 2).

It should also be noted that this data provided empirical

evidence that the peak apex assignment method employed

Rapid Commun. Mass Spectrom. 2008; 22: 3823–3834

DOI: 10.1002/rcm

Page 9: ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics

Table 2. Integrated areas of five chicken ovalbumin peptides injected in triplicate at a concentration of 250 fmol in 1mg depleted

human blood serum digest. The data empirically demonstrates that ProteinQuant can accurately quantify peptides for a protein

present near the limit of quantification, 187 fmol

Peptide m/z S/Na

Integration method

Difference [%]Manual ProteinQuant

Ovalbumin peptides (250 fmol)Injection 1

ISQAVHAAHAEINEAGR 592.02 7 5.53Eþ 07 5.55Eþ 07 0.32GGLEPINFQTAADQAR 844.39 12 3.38Eþ 07 3.91Eþ 07 13.68ELINSWVESQTNGIIR 930.69 10 3.16Eþ 07 3.83Eþ 07 17.40NVLQPSSVDSQTAMVLVNAIVFKGLWEK 1025.51 10 4.40Eþ 07 4.89Eþ 07 9.99LYAEERYPILPEYLQCVK 762.11 11 7.61Eþ 07 7.46Eþ 07 2.06

Injection 2ISQAVHAAHAEINEAGR 592.02 10 3.65Eþ 08 3.32Eþ 08 9.83GGLEPINFQTAADQAR 844.39 12 5.67Eþ 08 5.7Eþ 08 0.51ELINSWVESQTNGIIR 930.69 7 2.74Eþ 08 2.77Eþ 08 1.06NVLQPSSVDSQTAMVLVNAIVFKGLWEK 1025.51 7 1.02Eþ 08 1.24Eþ 08 17.85LYAEERYPILPEYLQCVK 762.11 14 3.21Eþ 08 3.48Eþ 08 7.95

Injection 3ISQAVHAAHAEINEAGR 592.02 10 4.06Eþ 08 4.09Eþ 08 0.69GGLEPINFQTAADQAR 844.39 5 6.77Eþ 07 8.22Eþ 07 17.68ELINSWVESQTNGIIR 930.69 5 9.27Eþ 07 9.63Eþ 07 3.74NVLQPSSVDSQTAMVLVNAIVFKGLWEK 1025.51 4 5.55Eþ 07 6.74Eþ 07 17.73LYAEERYPILPEYLQCVK 762.11 9 2.16Eþ 08 2.12Eþ 08 2.02

a Calculated as in Table 1.

ProteinQuant Suite: software bundle for proteomics 3831

by ProteinQuant was effective in the quantification of

peptides originating from a protein present in a complex

mixture. However, in the very rare situation where two

peptides with the same m/z value co-elute within the user

designated retention time window, the ability to automati-

cally distinguish between those peptides is not possible. This

situation is less pronounced for a high-mass accuracy

instrument. The m/z delta option in ProteinQuant, in this

case, allows accurate peak-picking, thus making it even less

likely that such a situation would occur.

The amounts at which standards were spiked in blood

serum were comparable to those of middle to high

abundance proteins endogenously present in human serum.

The limit of quantification for this calibration curve was

determined employing confidence bands according to

IUPAC standards.45 Through this method, the low limit of

quantification was determined to be ca. 180 fmol. We have

observed in this study that for globally normalized data the

threshold is 0.0005 below which measurements become

inadequate for quantification. However, this value was

determined in this study, and it may not necessarily apply to

other label-free studies of different complexity and consti-

tution. Accordingly, ProteinQuant users are advised to

determine their specific limit of quantification.

The low relative deviation for the ovalbumin signals,

<15%, suggested that this method could offer enough

precision to observe an up- or down-regulation of approxi-

mately 30% or greater. This result suggested that Protein-

Quant Suite is viable for label-free quantification exper-

iments in a complex biological mixture. The primary concern

for a method such as this would be its value for analysis of

low abundance proteins in a biological fluid or tissue sample

in which challenges associated with signal masking originat-

ing from highly abundant proteins still persist. However,

we believe that the same theory could be applied to these

Copyright # 2008 John Wiley & Sons, Ltd.

proteins, and that with advanced enrichment,46–48

depletion,49,50 and affinity purification techniques,51,52 it

will be possible to observe their fluctuations as a result of

biological perturbations.

Quantification of proteins and glycoproteins subjected tolectin (Con A) affinity chromatographyComparative quantification of proteins and peptides has

elucidated the fact that changes in expression of certain

features identified in biological samples reflect the pro-

gression of various diseases. Furthermore, many of these

potential biomarkers feature characteristic post-translational

modifications, such as glycosylation or phosphorylation.

Particularly, glycoproteins have been receiving continuous

attention, as more than 50% of all proteins in mammalian

systems are commonly believed to be glycosylated53 and

some glycoproteins have already been implicated in

perturbations related to different types of cancer.54

We have employed Con A affinity chromatography to

enrich specific standard glycoproteins present in a mixture to

demonstrate the utility of ProteinQuant Suite for the relative

quantification of these glycoproteins. The resulting lectin-

bound and unbound fractions were adjusted to the same

volumes, subjected to proteolytic digestion, analyzed by LC/

MS/MS, and quantified with ProteinQuant Suite. In this

case, the choice of concanavalin A lectin was arbitrary, yet we

took advantage of its relatively broad specificity and facile

availability in a pure form. As described elsewhere,55 Con A

exhibits strong affinity toward carbohydrates with a high

content of mannose, glucose and galactose, and occasionally

interacts with the chitobiose core of N-glycans. The standard

protein mixture consisted of three proteins and three

glycoproteins, namely BSA, cytochrome C, and myoglobin

proteins and ribonuclease B, ovalbumin, and fetuin glyco-

proteins.

Rapid Commun. Mass Spectrom. 2008; 22: 3823–3834

DOI: 10.1002/rcm

Page 10: ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics

Table 3. Comparative quantification of proteins and glyco-

proteins identified in unbound and bound fractions after Con A

lectin affinity chromatography enrichment and label-free

quantification using ProteinQuant Suite. Calculation of confi-

dence intervals (a¼ 0.05) was based on three replicates

Protein

Protein area (�1010) Ratio

Con Aunbound

Con Abound

bound/unbound

Cytochrome C 8.38� 1.68 0.05� 0.02 0.01BSA 27.70� 5.04 0.22� 0.19 0.01Ribonuclease B� 1.08� 0.53 8.42� 1.31 7.80Fetuin� 7.24� 0.25 0.37� 0.05 0.05Myoglobin 18.50� 3.08 0.26� 0.26 0.01Ovalbumin� 12.75� 3.12 10.67� 5.55 0.84

�Glycosylated proteins.

3832 B. Mann et al.

The results summarized in Table 3 clearly demonstrate the

significantly higher amount of ribonuclease B and ovalbumin

in the Con A bound fraction. Such observations were totally

expected since both glycoproteins contain high mannose

glycan structures. On the other hand, lower binding

efficiency of fetuin, which is a well-studied glycoprotein

with characterized N- and O-glycans, is mainly due to the

presence of terminal sialic acids, which interfere with Con A

binding. It is also worth mentioning that a compiled master

file was used here for the automated quantification of all

proteins identified in both bound and unbound fractions.

Table 4. Comparison of ProteinQuant performance to that of CPA

proteomic experiments in which ovalbumin was spiked in deplete

offered by the different platforms

ProteinQuant LabKey C

Ovalbumin theoretical

concentration ratio2.50 2.52� 0.01 1.71� 0.271.33 1.48� 0.25 1.65� 0.251.50 1.62� 0.26 1.67� 0.25

Software features

Normalization total peptide signal orinternal standard(s)

none

De-noising Savitzky-Golay noneBatch mode yes yesVisualization none elution profilDatabase searchingcompatibility

Mascot Mascot, X!tan

Master file based on databaseidentifications

none

Peak alignment nonec none

Statistical analysis noneb noneb

Data file format mzXML, mzData mzXMLComputer requirements 2 GHz CPU or better, 1 GB

RAM, Microsoft .NETFramework 2.0 or higher

P4 or dual p>1 GB RAMspace free, op

a Label-free quantification with the XPRESS algorithm was performed accobased label-free quantification is not officially supported by LabKey CPAS.performed manually.b Outputs can be processed with third-party statistical software tools.c Possible retention time shifts are compensated for through the peak aped With Thermo XCalibur software installed.

Copyright # 2008 John Wiley & Sons, Ltd.

Hence, ProteinQuant calculated peak areas for identified

proteins and glycoproteins as if they had the same likelihood

of being present in both fractions. This completely eliminated

the bias imposed by the proteins, which commonly generate

more peptides after enzymatic digestion, and the reported

area comparison then truly reflected the difference in the

abundance of proteins and glycoproteins identified in both

fractions.

Comparing the performance of ProteinQuantto two other software platformsAs mentioned above, several quantitative proteomic plat-

forms have been previously developed. Therefore, the

performance of ProteinQuant was compared to two other

software tools, LabKey CPAS and mzMine, that are readily

available and offer similar functionality. A quantitative

evaluation was performed using data collected from the

experiments in which depleted human blood serum was

spiked with different concentration of ovalbumin (5.0, 2.0,

1.5, and 1.0 pmol/mL). The data output from each software

platform was used to calculate three protein signal ratios

using the mean protein area for each concentration.

Assuming no experimental measurement error related to

sample preparation or proteomic analyses, the theoretical

ratios are expected to be 2.5, 1.33, and 1.5 for 5.0:2.0, 2.0:1.5,

and 1.5:1.0, respectively. It can be seen in Table 4 that

ProteinQuant offers relatively higher accuracy and reprodu-

S and mzMine. Data used in this comparison were based on

d human blood serum. The table also summarizes features

PAS w/XPRESSa mzMinea

2.14� 0.071.48� 0.651.46� 1.16

average peak height, averagesquared peak height, maximumpeak height or total raw signalmoving average or Savitzky-Golayyes

e 2D plot, 3D plot, spectrum plotdem, Sequest none

gap filler estimates area for missing peaks

aligned to master template withm/z and ret. time thresholdsLog ratio plot, PCAmzXML, mzData, netCDF, Thermo RAWd

rocessor machine,, 1 GB hard drivetional cluster config.

2 GHz CPU or better, 2 GB RAM,JRE 5.0 or better, Java 3D

rding to the suggestions of LabKey CPAS support; area or intensity-Since mzMine does not support database connectivity, this step was

x assignment window.

Rapid Commun. Mass Spectrom. 2008; 22: 3823–3834

DOI: 10.1002/rcm

Page 11: ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics

ProteinQuant Suite: software bundle for proteomics 3833

cibility as suggested by the values calculated using the three

platforms.

Some of the key features of each software package are also

summarized in Table 4. mzMine and other aforementioned

utilities allow chromatographic alignment, thus permitting

the quantification of all significant chromatographic features,

including those that are not identified through database

searching. CPAS offers a centralized data-processing work-

space, and a workflow that can be completely automated,

beginning with the input of an mzXML data file and ending

with the output of a list of identified peptides as well as their

associated peak areas. However, CPAS does not officially

support area or intensity-based label-free quantification and

only implements spectral counting. Other features are also

summarized in Table 4, including data normalization, data

de-noising, data visualization, multiple file processing,

database searching engine compatibility, utility of master

file, peak alignment, statistical analysis and computer

requirements.

CONCLUSIONS

We have developed ProteinQuant Suite, a software bundle,

which allows fast and automated high-throughput label-free

quantification in proteomics. The suite collectively permits

automated and efficient quantitative evaluation of LC/MS/

MS proteomics data. Although it is currently limited to the

MASCOT search engine, plans to expand its compatibility to

other engines and the universal pepXML format are in

progress. This extension will help to reduce platform

dependence. ProteinQuant is a freeware application avail-

able for download.56

The performance of ProteinQuant has been verified for

label-free quantification with the data reported here for three

sample sets, including a mixture of standard protein,

depleted human blood serum spiked with ovalbumin, and

Con A lectin-enriched fractions. A label-free approach does

not complicate sample preparation, and does not increase the

likelihood of sample loss prior to LC injection. By compiling a

master list of peptides for quantification via MS peak

integration, increased reproducibility was achieved, as

previously suggested by similar strategies,42 and limitations

resulting from the duty cycle of the instrument were

circumnavigated. It is important to consider the effects of

slight chromatographic variations for different experiments

with this approach, but software tools can account for these

and ensure that MS peaks are properly assigned. Application

of this approach to the quantification of a lectin-enriched

sample is encouraging for future studies in which significant

enrichment steps could lead to quantitative knowledge of

lower abundance proteins.

AcknowledgementsThis work was primarily supported by Grant No. GM24349

from the National Institute of General Medical Sciences, US

Department of Health and Human Services. Further support

was provided by NIH/NCRR – National Center for Glyco-

mics and Glycoproteomics (NCGG), Grant No. RR018942.

Copyright # 2008 John Wiley & Sons, Ltd.

REFERENCES

1. Lowe JB, Marth JD. Annu. Rev. Biochem. 2003; 72: 643.2. Righetti PG, Campostrini N, Pascali J, Hamdan M, Astner H.

Eur. J. Mass Spectrom. 2004; 10: 335.3. Veenstra TD. J. Chromatogr. B 2007; 847: 3.4. Hale JE, Gelfanova V, Ludwig JR, Knierman MD. Briefings

Funct. Genom. Proteomics 2003; 2: 185.5. Witzmann FA, Arnold RJ, Bai F, Hrncirova P, Kimpel MW,

Mechref YS, McBride WJ, Novotny MV, Pedrick NM, Ring-ham HN, Simon JR. Proteomics 2005; 5: 2177.

6. Baek W-O, Haupt K, Colin C, Vijayalakshmi MA. Electro-phoresis 1996; 17: 489.

7. Klouckova I, Hrncirova P, Mechref Y, Arnold RJ, Li T-K,McBride WJ, Novotny MV. Proteomics 2006; 6: 3060.

8. Wang G, Wu WW, Zeng W, Chou CL, Shen RF. J. ProteomeRes. 2006; 5: 1214.

9. Gygi SP, Corthals GL, Zhang Y, Rochon Y, Aebersold R. Proc.Natl. Acad. Sci. 2000; 97: 9390.

10. Parker KC, Patterson D, Williamson B, Marchese J, Graber A,He F, Jacobson A, Juhasz P, Martin S. Mol. Cell. Proteomics2004; 3: 625.

11. Julka S, Regnier F. J. Proteome Res. 2004; 3: 350.12. Qiu R, Regnier FE. Anal. Chem. 2005; 77: 7225.13. Wiener MC, Sachs JR, Deyanova EG, Yates NA. Anal. Chem.

2004; 76: 6085.14. Higgs RE, Knierman MD, Gelfanova V, Butler JP, Hale JE.

J. Proteome Res. 2005; 4: 1442.15. Matthiesen R. Proteomics 2007; 7: 2815.16. MacCoss MJ, Wu CC, Liu H, Sadygov R, Yates JR. Anal.

Chem. 2003; 75: 6912.17. Venable JD, Dong M-Q, Wohlschlegel J, Dillin A, Yates JR I.

Nat. Methods 2004; 1: 39.18. Li XJ, Zhang H, Ranish JA, Aebersold R. Anal. Chem. 2003; 75:

6648.19. Bellew M, Coram M, Fitzgibbon M, Igra M, Randolph T,

Wang P, May D, Eng J, Fang R, Lin C, Chen J, Goodlett D,Whiteaker J, Paulovich A, McIntosh M. Bioinformatics 2006;22: 1902.

20. Andreev VP, Li L, Rejtar T, Li Q, Ferry JG, Karger BL.J. Proteome Res. 2006; 5: 2039.

21. Andreev VP, Li L, Cao L, Gu Y, Rejtar T, Wu S-L, Karger BL.J. Proteome Res. 2007; 6: 2186.

22. Ono M, Shitashige M, Honda K, Isobe T, Kuwabara H,Matsuzuki H, Hirohashi S, Yamada T. Mol. Cell. Proteomics2006; 5: 1338.

23. Leptos KC, Sarracino DA, Jaffe JD, Krastins B, Church GM.Proteomics 2006; 6: 1770.

24. Katajamaa M, Miettinen J, Oresic M. Bioinformatics 2006; 22:634.

25. Rauch A, Bellew M, Eng J, Fitzgibbon M, Holzman T, HusseyP, Igra M, Maclean B, Lin CW, Detter A, Fang R, Faca V,Gafken P, Zhang H, Whitaker J, States D, Hanash S, Paulo-vich A, McIntosh MW. J. Proteome Res. 2006; 5: 112.

26. Bondarenko PV, Chelius D, Shaler TA. Anal. Chem. 2002; 74:4741.

27. Yocum AK, Yu K, Oe T, Blair IA. J. Proteome Res. 2005; 4: 1722.28. Resing KA, Meyer-Arendt K, Mendoza AM, Aveline-Wolf

LD, Jonscher KR, Pierce KG, Old WM, Cheung HT, Russell S,Wattawa JL, Goehle GR, Knight RD, Ahn NG. Anal. Chem.2004; 76: 3556.

29. Wang L-h, Li D-Q, Fu Y, Wang H-P, Zhang J-F, Yuan Z-F,Sun R-X, Zeng R, He S-M, Gao W. Rapid Commun. MassSpectrom. 2007; 21: 2985.

30. Savitzky A, Golay MJE. Anal. Chem. 1964; 36: 1627.31. Wang GH, Wu WW, Pisitkun T, Hoffert JD, Knepper MA,

Shen RF. Anal. Chem. 2006; 78: 5752.32. Lin SM, Zhu L, Winter AQ, Sasinowski M, Kibbe WA. Exp.

Rev. Proteomics 2005; 2: 839.33. Sandra O, Chris T, Henning H, Weimin Z, Randall J, Rolf A.

Exp. Rev. Proteomics 2004; 1: 79.34. Available: http://sashimi.sourceforge.net/software_glossolalia.

html35. Massaroti P, Moraes LAB, Marchioretto MAM, Cassiano

NM, Bernasconi G, Calafatti SA, Barros FAP, Meurer EC,Pedrazzoli J. Anal. Bioanal. Chem. 2005; 382: 1049.

36. Immler D, Greven S, Reinemer S. Proteomics 2006; 6: 2947.37. Cutillas PR, Geering B, Waterfield MD, Vanhaesebroeck B.

Mol. Cell. Proteomics 2005; 4: 1038.

Rapid Commun. Mass Spectrom. 2008; 22: 3823–3834

DOI: 10.1002/rcm

Page 12: ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics

3834 B. Mann et al.

38. Riter LS, Hodge BD, Gooding KM, Julian RK Jr. J. ProteomeRes. 2005; 4: 153.

39. Tang K, Page JS, Smith RD. J. Am. Soc. Mass Spectrom. 2004;15: 1416.

40. Hale EJ, Gelfanova V, Ludwig RJ, Knierman MD. BriefingsFunct. Genom. Proteomics 2003; 2: 185.

41. Higgs RE, Knierman MD, Gelfanova V, Butler JP, Hale JE.J. Proteome Res. 2005; 4: 1442.

42. Andreev VP, Li L, Cao L, Gu Y, Rejtar T, Wu SL, Karger BL.J. Proteome Res. 2007; 6: 2186.

43. Qian W-J, Monroe ME, Liu T, Jacobs JM, Anderson GA, ShenY, Moore RJ, Anderson DJ, Zhang R, Calvano SE, Lowry SF,Xiao W, Moldawer LL, Davis RW, Tompkins RG, Camp DGII, Smith RD. Mol. Cell. Proteomics 2005; 4: 700.

44. Finney GL, Blackler AR, Hoopmann MR, Canterbury JD, WuCC, MacCoss MJ. Anal. Chem. 2008; 80: 961.

45. Mocak J, Bond AM, Mitchell S, Scollary G. Pure Appl. Chem.1997; 69: 297.

Copyright # 2008 John Wiley & Sons, Ltd.

46. Madera M, Mechref Y, Novotny MV. Anal. Chem. 2005; 77:4081.

47. Madera M, Mechref Y, Klouckova I, Novotny MV. J. ProteomeRes. 2006; 5: 2348.

48. Yang Z, Harris LE, Palmer-Toy DE, Hancock WS. Clin. Chem.2006; 52: 1897.

49. Zolotarjova N, Martosella J, Nicol G, Bailey J, Boyes BE,Barrett WC. Proteomics 2005; 5: 3304.

50. Schuchard MD, Mehigh RJ, Kappel WK, Scott GB. Sigma-Aldrich Application Note.

51. Cartellieri S, Hamer O, Helmholtz H, Niemeyer B. Biotechn.Appl. Biochem. 2002; 35: 83.

52. Lee W-C, Lee KH. Anal. Biochem. 2004; 324: 1.53. Dwek MV, Ross HA, Leathem AJC. Proteomics 2001; 1:

756.54. Dwek MV, Alaiya AA. Br. J. Cancer 2003; 89: 305.55. Cummings RD. Methods Enzymol. 1994; 230: 66.56. Available: www.ncgg.indiana.edu

Rapid Commun. Mass Spectrom. 2008; 22: 3823–3834

DOI: 10.1002/rcm