1
IPA: an Informed Proteomics Analysis Tool for Improved Peptide Identifications Sangtae Kim, Gordon W. Slysz, Kevin L. Crowell, Samuel H. Payne, Gordon A. Anderson, and Richard D. Smith Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA Introduction Overview Methods Results Acknowledgements Portions of this research were supported by the NIH National Center for Research Resources (RR18522) and National Institute of General Medical Sciences (8 P41 GM103493-10), and by the U. S. Department of Energy Office of Biological and Environmental Research (DOE/BER) Genome Sciences Program. Samples were analyzed using capabilities developed under the support of NIH National Institute of General Medical Sciences (8 P41 GM103493-10) and DOE/BER. Significant portions of the work were performed in the Environmental Molecular Science Laboratory, a DOE/BER national scientific user facility at Pacific Northwest National Laboratory in Richland, Washington. References 1. Kessner D et al. ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24: 2534-6 (2008). 2. Mayampurath AM et al. DeconMSn: a software tool for accurate parent ion monoisotopic mass determination for tandem mass spectra. Bioinformatics 24: 1021-3 (2008). 3. Kim S et al. MS-GF+: Universal database search tool for mass spectrometry. Submitted (2013). 4. Eng J et al. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5: 976989 (1994). 5. Nesvizhskii AI et al. A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 75: 4646-58 (2003). 6. Cox J and Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome- wide protein quantification. Nat Biotechnol 26: 1367-72 (2008). Conclusions Data sets CONTACT: Sangtae Kim, Ph.D. Biological Sciences Division Pacific Northwest National Laboratory E-mail: [email protected] A new approach to using MS1 and MS/MS spectra for identifying peptides. IPA, an informed proteomics analysis tool provides ~15% more peptide identifications. IPA better handles co-eluted peptides. IPA identified ~20% and ~10% more peptides compared to MSConvert and DeconMSn, respectively. IPA identified 20 - 40% more peptides than Sequest/PeptideProphet and MaxQuant. For the phosphorylation-enriched dataset, IPA identified a comparable number of peptides compared to MaxQuant and DeconMSn/ MS-GF+. Shewanella: 34,342 CID spectra (High-Low) Human-iTRAQ: iTRAQ-labeled 22,806 HCD spectra # identified peptides (1% FDR) Examples of peptides exclusively identified by IPA MS/MS Isolation window [760.34, 762.34] Protein Database Charge 1: [759.33,763.35] Charge 2: [1518.66, 1526.7] Charge 3: [2277.99, 2290.05] Score all peptides whose ion m/z’s are within the isolation window Peptide Charge Score(SpecEValue) GETASVADNTTENGR 2 1.41288E-18 ASEWAAK 1 1.99072E-07 EPLLYDFVVRDR 2 2.85083E-07 NTIYAAGRVGTETLGVYRINL 3 3.15152E-07 DGRDGIVVDRKPEFKVGARVEVEAKFK 4 4.1451E-07 GKVWGRQM+15.995AKLVPPQENKAK 3 4.85142E-07 LNPVFPLPHEVAFWYSGQASSSYDFGQ 4 6.69304E-07 VM+15.995DGGAFVKPNTTQFPNDAQK 3 9.16367E-07 WADTAAK 1 9.84701E-07 DETLTIDKELAARVVEGDHGDVLMDVAK 4 1.07169E-06 List top k good scoring peptides using MS-GF+ MS2 Precursor For each peptide, compare its theoretical isotopomer profile with MS1 features across multiple scans Peptide Charge Score(SpecEValue) GETASVADNTTENGR 2 1.41288E-18 Filter peptides with good MS1 feature “fit” Precursor m/z: 761.3433 Charge 2 Determine “accurate precursor” Protein Database Score peptides with masses [1520.6557, 1520.6861] Score peptides matching the accurate precursor Report the best scoring peptide Precursor MS Software Tools Used MSConvert (MC): determine precursors based on selected ions in the raw file DeconMSn (DM): determine precursors based on the Averagine Sequest (SQ): database search engine [4] PeptideProphet (PP): re-scoring Sequest identifications [5] MS-GF+ (MG): database search engine MaxQuant (MQ): database search with precursor refinement [6] Informed Proteome Analysis (IPA): integrated search and precursor refinement 0 2000 4000 6000 8000 10000 12000 DM/SQ/PP MQ MC/MG DM/MG IPA 0 1000 2000 3000 4000 5000 DM/SQ/PP MQ MC/MG DM/MG IPA 0 500 1000 1500 2000 2500 DM/SQ/PP MQ MC/MG DM/MG IPA Shewanella Human-iTRAQ Human-iTRAQ-Phos Determining accurate monoisotopic precursor mass- to-charge ratio (m/z) and charge is important. A simple method to use the ion selected by the instrument (e.g., msconvert in ProteoWizard [1]) is not effective. A more complex method using the Averagine model (e.g., DeconMSn [2]) works better. However, this does not work well if multiple co-eluted peptides are present in the isolation window. We present a new Informed Proteomics Analysis (IPA) approach that addresses this problem. Without pre-determination of the accurate precursor, IPA scores all peptides whose ion m/z are within the isolation window. Afterwards, IPA uses MS1 spectra to filter out peptides based on the fit between their isotopomer profiles and corresponding MS1 peaks. IPA also assigns a score to each peptide using its MS/MS identification scores (SpecEValue) and correlation of its extracted ion chromatograms (XICs) of MS1 features associated with the peptide. With IPA, database search parameters are streamlined. For MS-GF+ [3], required parameters are: 1) spectrum file(s), 2) database file, and 3) modification file. Existing approach IPA approach Isolation window Running Time (Shewanella) IPA peptide-centric scoring For each peptide passing filters, assign a score based on: 1) MS/MS identification scores (SpecEValue) and 2) Correlation of XICs of MS1 features AADLGLETVIVER matched to 4 spectra Scan#: 15831 SpecEValue: 2.0E-14 Charge 2 Scan#: 15850 SpecEValue: 4.4E-14 Charge 1 Scan#: 16178 SpecEValue: 6.5E-12 Charge 2 Scan#: 16172 SpecEValue: 5.3E-12 Charge 2 Monoisotopic ion of KWEQITSGTAPFYIDPAR Co-eluted peptide AADLGLETVIVER 0.00E+00 5.00E+06 1.00E+07 1.50E+07 2.00E+07 2.50E+07 3.00E+07 3.50E+07 4.00E+07 15700 15800 15900 16000 16100 16200 16300 16400 Intensity Scan Number Charge 1 Charge 2 (Note: Peptide-centric scoring is still under development.) Human-iTRAQ-Phos: iTRAQ-labeled 29,212 HCD spectra from Phosphorylation enriched human sample All data sets are generated with Thermo LTQ- Orbitrap Velos Co-eluted peptides m/z: 890.93 SpecEValue: 8.7E-15 Charge 3 m/z: 890.78 SpecEValue: 9.0E-15 Charge 2 “Busy” isolation window Isolation window target and selected ion Correct m/z m/z: 1068.97 SpecEValue: 5.2E-21 Charge 2 0 1000 2000 3000 4000 5000 DM/SQ/PP MQ MC/MG DM/MG IPA IPA MG MC MQ PP SQ DM (Note: DeconMSn does not support multi-threading.)

IPA: an Informed Proteomics Analysis Tool for Improved Peptide … · 2016-01-06 · IPA: an Informed Proteomics Analysis Tool for Improved Peptide Identifications Sangtae Kim, Gordon

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IPA: an Informed Proteomics Analysis Tool for Improved Peptide … · 2016-01-06 · IPA: an Informed Proteomics Analysis Tool for Improved Peptide Identifications Sangtae Kim, Gordon

IPA: an Informed Proteomics Analysis Tool for Improved Peptide Identifications Sangtae Kim, Gordon W. Slysz, Kevin L. Crowell, Samuel H. Payne, Gordon A. Anderson, and Richard D. Smith

Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA

Introduction

Overview Methods Results

Acknowledgements Portions of this research were supported by the NIH National Center for

Research Resources (RR18522) and National Institute of General Medical

Sciences (8 P41 GM103493-10), and by the U. S. Department of Energy

Office of Biological and Environmental Research (DOE/BER) Genome

Sciences Program. Samples were analyzed using capabilities developed

under the support of NIH National Institute of General Medical Sciences (8

P41 GM103493-10) and DOE/BER.

Significant portions of the work were performed in the Environmental

Molecular Science Laboratory, a DOE/BER national scientific user facility at

Pacific Northwest National Laboratory in Richland, Washington.

References 1. Kessner D et al. ProteoWizard: open source software for rapid

proteomics tools development. Bioinformatics 24: 2534-6 (2008).

2. Mayampurath AM et al. DeconMSn: a software tool for accurate

parent ion monoisotopic mass determination for tandem mass

spectra. Bioinformatics 24: 1021-3 (2008).

3. Kim S et al. MS-GF+: Universal database search tool for mass

spectrometry. Submitted (2013).

4. Eng J et al. An approach to correlate tandem mass spectral data of

peptides with amino acid sequences in a protein database. J Am Soc

Mass Spectrom 5: 976–989 (1994).

5. Nesvizhskii AI et al. A statistical model for identifying proteins by

tandem mass spectrometry. Anal Chem 75: 4646-58 (2003).

6. Cox J and Mann M. MaxQuant enables high peptide identification

rates, individualized p.p.b.-range mass accuracies and proteome-

wide protein quantification. Nat Biotechnol 26: 1367-72 (2008).

Conclusions

Data sets

CONTACT: Sangtae Kim, Ph.D. Biological Sciences Division

Pacific Northwest National Laboratory

E-mail: [email protected]

• A new approach to using MS1 and

MS/MS spectra for identifying peptides.

• IPA, an informed proteomics analysis

tool provides ~15% more peptide

identifications.

• IPA better handles co-eluted peptides.

• IPA identified ~20% and ~10% more

peptides compared to MSConvert and

DeconMSn, respectively.

• IPA identified 20 - 40% more peptides

than Sequest/PeptideProphet and

MaxQuant.

• For the phosphorylation-enriched

dataset, IPA identified a comparable

number of peptides compared to

MaxQuant and DeconMSn/ MS-GF+.

• Shewanella: 34,342 CID spectra (High-Low)

• Human-iTRAQ: iTRAQ-labeled 22,806 HCD

spectra

# identified peptides (1% FDR)

Examples of peptides exclusively identified by IPA

MS/MS Isolation window

[760.34, 762.34]

Protein Database

Charge 1: [759.33,763.35] Charge 2: [1518.66, 1526.7] Charge 3: [2277.99, 2290.05] …

Score all peptides whose

ion m/z’s are within the

isolation window

Peptide Charge Score(SpecEValue)

GETASVADNTTENGR 2 1.41288E-18

ASEWAAK 1 1.99072E-07

EPLLYDFVVRDR 2 2.85083E-07

NTIYAAGRVGTETLGVYRINL 3 3.15152E-07

DGRDGIVVDRKPEFKVGARVEVEAKFK 4 4.1451E-07

GKVWGRQM+15.995AKLVPPQENKAK 3 4.85142E-07

LNPVFPLPHEVAFWYSGQASSSYDFGQ 4 6.69304E-07

VM+15.995DGGAFVKPNTTQFPNDAQK 3 9.16367E-07

WADTAAK 1 9.84701E-07

DETLTIDKELAARVVEGDHGDVLMDVAK 4 1.07169E-06

List top k good scoring

peptides using MS-GF+

MS2

Precursor

For each peptide,

compare its theoretical

isotopomer profile with

MS1 features across

multiple scans

Peptide Charge Score(SpecEValue)

GETASVADNTTENGR 2 1.41288E-18

Filter peptides with good

MS1 feature “fit”

Precursor m/z: 761.3433 Charge 2

Determine

“accurate precursor”

Protein Database

Score peptides with masses [1520.6557, 1520.6861]

Score peptides

matching the

accurate precursor

Report the best scoring

peptide

Precursor MS

Software Tools Used

• MSConvert (MC): determine precursors based on

selected ions in the raw file

• DeconMSn (DM): determine precursors based on the

Averagine

• Sequest (SQ): database search engine [4]

• PeptideProphet (PP): re-scoring Sequest

identifications [5]

• MS-GF+ (MG): database search engine

• MaxQuant (MQ): database search with precursor

refinement [6]

• Informed Proteome Analysis (IPA): integrated search

and precursor refinement

0

2000

4000

6000

8000

10000

12000

DM/SQ/PP MQ MC/MG DM/MG IPA

0

1000

2000

3000

4000

5000

DM/SQ/PP MQ MC/MG DM/MG IPA

0

500

1000

1500

2000

2500

DM/SQ/PP MQ MC/MG DM/MG IPA

Shewanella

Human-iTRAQ

Human-iTRAQ-Phos

• Determining accurate monoisotopic precursor mass-

to-charge ratio (m/z) and charge is important.

• A simple method to use the ion selected by the

instrument (e.g., msconvert in ProteoWizard [1]) is not

effective.

• A more complex method using the Averagine model

(e.g., DeconMSn [2]) works better. However, this does

not work well if multiple co-eluted peptides are

present in the isolation window.

• We present a new Informed Proteomics Analysis

(IPA) approach that addresses this problem.

• Without pre-determination of the accurate precursor,

IPA scores all peptides whose ion m/z are within

the isolation window.

• Afterwards, IPA uses MS1 spectra to filter out

peptides based on the fit between their isotopomer

profiles and corresponding MS1 peaks.

• IPA also assigns a score to each peptide using its

MS/MS identification scores (SpecEValue) and

correlation of its extracted ion chromatograms (XICs)

of MS1 features associated with the peptide.

• With IPA, database search parameters are

streamlined. For MS-GF+ [3], required parameters

are: 1) spectrum file(s), 2) database file, and 3)

modification file.

Existing approach IPA approach

Isolation window

Running Time (Shewanella)

IPA peptide-centric scoring

For each peptide passing filters, assign a score

based on:

1) MS/MS identification scores (SpecEValue) and

2) Correlation of XICs of MS1 features

AADLGLETVIVER matched to 4 spectra

Scan#: 15831 SpecEValue: 2.0E-14 Charge 2

Scan#: 15850 SpecEValue: 4.4E-14 Charge 1

Scan#: 16178 SpecEValue: 6.5E-12 Charge 2

Scan#: 16172 SpecEValue: 5.3E-12 Charge 2

Monoisotopic ion of

KWEQITSGTAPFYIDPAR Co-eluted peptide

AADLGLETVIVER

0.00E+00

5.00E+06

1.00E+07

1.50E+07

2.00E+07

2.50E+07

3.00E+07

3.50E+07

4.00E+07

15700 15800 15900 16000 16100 16200 16300 16400

Inte

nsi

ty

Scan Number

Charge 1

Charge 2

(Note: Peptide-centric scoring is still under development.)

• Human-iTRAQ-Phos: iTRAQ-labeled 29,212 HCD

spectra from Phosphorylation enriched human

sample

• All data sets are generated with Thermo LTQ-

Orbitrap Velos

Co-eluted peptides

m/z: 890.93 SpecEValue: 8.7E-15 Charge 3

m/z: 890.78 SpecEValue: 9.0E-15 Charge 2

“Busy” isolation window

Isolation window target and selected ion

Correct m/z

m/z: 1068.97 SpecEValue: 5.2E-21 Charge 2

0

1000

2000

3000

4000

5000

DM/SQ/PP MQ MC/MG DM/MG IPA

IPA

MG

MC

MQ

PP

SQ

DM

(Note: DeconMSn does not support multi-threading.)