1
465.19 2+ 464.59 3+ iPRG2012: A Study on Detecting Modified Peptides in a Complex Mixture A B R F Proteome Informatics Research Group John Cottrell 1 , Karl R. Clauser 2 , Robert J Chalkley 3 , Ruixiang Sun 4 , Eugene Kapp 5 , Matt Chambers 6 , W. Hayes McDonald 6 , Henry H. Lam 7 , Nuno Bandeira 8 , Eric Deutsch 9 and Thomas Neubert 10 1 Matrix Science, London, UK; 2 The Broad Institute of MIT and Harvard, Cambridge, MA; 3 University of California, San Francisco, CA; 4 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; 5 Ludwig Institute for Cancer Research, Parkville, Victoria, Australia; 6 Vanderbilt University, Nashville, TN; 7 Hong Kong University of Science and Technology, Hong Kong, China; 8 University of California, San Diego, CA; 9 Institute for Systems Biology, Seattle, WA; 10 New York University School of Medicine, New York, NY Study Goals 1. Evaluate ability of participants to identify modified peptides in a complex mixture 2. Find out why result sets might differ between participants 3. Produce a benchmark dataset, along with an analysis resource Study Materials 5600 TripleTOF dataset (AB-SCIEX) – WIFF, mzML, dta, MGF (de-isotoped);– conversions by MS Data Converter 1.1.0 – MGF (not de-isotoped – conversion by Mascot Distiller 2.4) 1 FASTA file (SwissProt S. cerevisiae, human, + 1 bovine protein + trypsin from Dec. 2011) 1 template (Excel) 1 on-line survey (Survey Monkey) Study Instructions 1. Analyze the dataset 2. Report the peptide spectrum matches in the provided template 3. Report measures of reliability for PTM site assignments (optional) 4. Complete an on-line survey 5. Attach a 1-2 page description of your methodology How Much Do the Identifications Overlap? Reasonable number of participants from around the globe, mainly experienced users but a few first-timers Large spread in number of spectra identified False negatives (NS) are generally much higher than false positives, so there is generally room for improvement Peak list was a significant factor on performance Varied performance in detecting PTMs Most participants struggled with sulfation Multiply phosphorylated harder to find than singly Most common errors in site assignment were: Reporting sulfo(Y) as phospho(ST) Mis-assignment of site/s in multiply phosphorylated peptides The iPRG2012 are in the process of preparing the data for publication. If you participated and would like to help out, contact the iPRG through [email protected] . For more information on the iPRG and for copies of this poster and the talk please visit: http://www.abrf.org/iPRG after the meeting Acknowledgements: The iPRG are grateful to all of the participants. We would also like to thank Jeremy Carver (UCSD) for serving as the “Anonymizer”. Preliminary Conclusions Study Design Use a common, rich dataset Use a common sequence database Allow participants to use the bioinformatic tools and methods of their choosing Use a common reporting template Report results at an estimated 1% FDR (at the spectrum level) Ignore protein inference A Proteome Informatics Challenge Nature uses a wide variety of protein post-translational modifications to regulate protein structure and activity and tandem mass spectrometry has emerged as the most powerful analytical approach to detect these moieties. However, modified peptides present special challenges for characterization. First, they are generally present at sub-stoichiometric levels, meaning that without enrichment strategies samples are dominated by unmodified peptides, so finding the modified peptides may be a challenge. Secondly, the modifications may have unique fragmentation behaviors in collision-induced dissociation (CID), which may need to be considered by database search engines. Finally, if there are multiple residues within a given peptide that could bear a particular modification type, then it is necessary to identify fragment ions that frame either side of the modification site in order to be able to localize the exact site of modification within the peptide. The Proteome Informatics Research Group (iPRG) created a collaborative data analysis study to enable proteomics laboratories to evaluate their ability to find a variety of post- translationally modified peptides within a complex peptide mixture background. The dataset consists of nearly twenty thousand high resolution and high mass accuracy tandem mass spectra. Within the sample there are peptides with a range of different natural and chemical modifications. This study enabled participants to evaluate their data analysis capabilities and approaches relative to others in analyzing a common data set, with a particular emphasis on their ability to detect and characterize peptides with modifications of potential biological significance. Total Spectra vs. Interesting Mods There is a very wide range in the total number of spectra with identified peptides. Once one focuses only on the spectra containing modifications for which the ability to localize the modification to a particular residue, the range is much narrower. The 5 rightmost participants went so far as to report only spectra of modified peptides. Room for Improvement in ID Certainty Thresholds Identifications reported as: Yes that matched the consensus ; No, but still matching the consensus ; Yes, but a different answer than the consensus ; Yes, < 3 consensus ; No, that disagreed with consensus Table Key: 71755v 58288v 33564 93128i 11211 58409 87133i 94158i 97053i 42424i 77777i 23068 40104i 87048i 92653 34284i 23117 74564 14151 52781 47603 14152 45511 11821 Acetyl (K) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Dimethyl (K) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Dimethyl (R) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Methyl (K) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Methyl (R) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Nitro (Y) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Phospho (STY) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Sulfo (Y) 1 1 1 1 1 1 1 1 Trimethyl (K) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 An = Andromeda/MaxQuant MG = MS-GFDB pF = pFind Sc = Scaffold AS = A-Score MM = MyriMatch Pk = PEAKS SM = Spectrum Mill By = Byonics MO = MODa PkDB = PEAKSDB Sq = Sequest IH = In-house software O = OMSSA Ppi =Protein Pilot ST = SpectraST IP = IDPicker Ot = Other PPr = Protein Prospector TPP = TransProteomic Pipeline M = Mascot P/PP = Pep/Prot Prophet PR = PhosphoRS XL = Excel Mde = Mascot Delta Score PD = ProteomeDiscoverer PW = ProteoWizard XT = X!Tandem Mdi = Mascot Distiller 71755v 58288v 33564 93128i 11211 58409 87133i 94158i 97053i 42424i 77777i 23068 40104i 87048i 92653 34284i 23117 74564 14151 52781 47603 14152 45511 11821 Peaklist Mgf mzML mzML mgf mgf mgf mzML mgf_nd mgf mgf_nd mgf mzML mzML mgf_nd mgf mgf mzML mgf mgf_nd mgf mgf mgf_nd mgf mgf_nd mzML mzML mgf_nd WIFF Spectral Pre-Processing Pk Pk PPi PPi MDi Ot SM pF Pk MDi PD PkDB PW PkDB Sq Peptide Identification By Pk Pk M PPi M O PPr pF M MG M P/PP SM pF M Pk M PPi M O M M PD PkDB PkDB O ST MM O TPP PkDB Sq ST XT IH XT XT XT Ot Discover Unexpected Mods By Pk Pk M PPi ST PPr pF IH MO SM pF M Pk PPi O M PkDB PkDB Site Localization Pk Pk MDe PPi M PPr pF M IH M SM pF M Pk M AS IH An MDe PD PkDB O MM O PkDB Sc Ot Sq ST IH Results Filtering By Pk Pk IH PPi P/PP P/PP PPr pF IP IH XL IH XL pF M Pk M XL Sc M PR PkDB XL Ot TPP TPP XL PkDB NTT 2 1 1 2 1 ? 1 1 1 2 1 ? 1 2 2 2 2 ? 1 2 2 2 ? 2 Experience 5-10 years 5-10 years 5-10 years >10 years < 1 year 3-4 years 5-10 years >10 years 5-10 years 5-10 years 5-10 years 3-4 years 5-10 years >10 years 1-2 years >10 years 1-2 years >10 years >10 years 1-2 years >10 years >10 years 1-2 years >10 years 1.48% 4.94% 6.1% 4.37% n/a 13.68% 8.56% 2% 1.85% 16.67% 2.52% n/a n/a n/a 7.14% 2.92% 8.11% 3.45% 7.95% 2.52% 8% 3.03% 5.72% n/a 0 50 100 150 200 250 300 350 400 450 500 550 71755v 58288v 33564 93128i 11211 58409 87133i 94158i 97053i 42424i 77777i 23068 40104i 87048i 92653 34284i 23117 74564 14151 52781 47603 14152 45511 11821 # Spiked Peptide PSMs (FLR%) # Spiked PSMs Mod Loc Certainty N # Spiked PSMs Mod Loc Certainty Y, Ignored # Spiked PSMs Wrong Mod Loc; Mod Loc Certainty Y # Spiked PSMs Correct Mod Loc; Mod Loc Certainty Y Synthetic Peptide ID by Participant Red corresponds to the presence of at least 1 PSM for a spiked synthetic peptide modified with the correct localization reported and the correct modification name. The localization certainty may have been reported as either Y or N. PSMs containing modification of residues other than s,t,y,k,r were excluded. 0 2000 4000 6000 8000 10000 12000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Cummulative # Spectra # Participants Agreeing Phospho vs Sulfo Observe modified fragment ions. Observe ‘unmodified’ fragment ions. Spectrum looks essentially identical to unmodified peptide spectrum DISLSDY(Phospho)K DISLSDY(Sulfo)K Peak Processing Two types of peak lists were supplied o Deisotoped (AB SCIEX MS Data Converter) Cannot infer fragment charge state Possibly lower chance of false fragment matches o Non-deisotoped (Mascot Distiller) Can infer fragment charge state Possibly higher chance of false fragment matches For 238 consensus spectra the peak lists had different specified charge state 193 results only possible with deisotoped peak list 45 results only possible with non-deisotoped peak list 0 1000 2000 3000 4000 5000 6000 7000 71755v 58288v 33564 93128i 11211 58409 87133i 94158i 97053i 42424i 77777i 23068 40104i 87048i 92653 34284i 23117 74564 14151 52781 47603 14152 45511 11821 # Spectra # No Mods # Common Mods (^q,^c,m,n,q) # Nterm Mods # AA Mutation Mods # Interesting Mods 0 100 200 300 400 500 600 700 # Spectra # Interesting Mod Loc Certainty N # Interesting Mod Loc Certainty Y 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 # Spectra #ND No Id, Diff from Consensus #Y<3 P Id Yes #YD Yes Id, Diff from Consensus #NS No Id, Same as Consensus #YS Yes Id, Same as Consensus 24 submissions from 23 participants. 9 were iPRG members. Participation was international and covered a wide range of experience level. Who Participated: 16 8 ABRF Member Nonmember 15 4 2 2 1 Bioinformatician/Software Developer Mass Spectrometrist Lab Scientist Director/Manager 13 4 4 2 1 North America Europe Asia Australia/NZ Africa 17 2 1 1 3 Academic Manufacturer/Vendor Biotech/Pharma/Industry Government Other 5 6 11 2 Core Only Software development only (no research facility) Conduct both core functions and non-core lab research Non-core research lab Mixed Spectra Exposed by Peak Processing 464.59 3+ Non-deisotoped peaklist 465.19 2+ Deisotoped peaklist

iPRG2012: A Study on Detecting Modified Peptides in a ... · PDF fileDec. 2011) • 1 template (Excel) ... generally room for improvement ... • Most common errors in site assignment

Embed Size (px)

Citation preview

Page 1: iPRG2012: A Study on Detecting Modified Peptides in a ... · PDF fileDec. 2011) • 1 template (Excel) ... generally room for improvement ... • Most common errors in site assignment

465.19 2+

464.59 3+

iPRG2012: A Study on Detecting Modified Peptides in a Complex Mixture A B R F

Proteome Informatics Research Group

John Cottrell1 , Karl R. Clauser2, Robert J Chalkley3, Ruixiang Sun4, Eugene Kapp5, Matt Chambers6, W. Hayes McDonald6, Henry H. Lam7, Nuno Bandeira8, Eric Deutsch9 and Thomas Neubert10

1Matrix Science, London, UK; 2The Broad Institute of MIT and Harvard, Cambridge, MA; 3University of California, San Francisco, CA; 4Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; 5Ludwig Institute for Cancer Research, Parkville, Victoria, Australia; 6Vanderbilt University, Nashville, TN; 7Hong Kong University of Science and Technology, Hong Kong, China; 8University of California, San Diego, CA; 9Institute for Systems Biology, Seattle, WA; 10New York University School of Medicine, New York, NY

Study Goals

1. Evaluate ability of participants to identify modified peptides in a complex mixture 2. Find out why result sets might differ between participants 3. Produce a benchmark dataset, along with an analysis resource

Study Materials

• 5600 TripleTOF dataset (AB-SCIEX) – WIFF, mzML, dta, MGF (de-isotoped);–

conversions by MS Data Converter 1.1.0 – MGF (not de-isotoped – conversion by

Mascot Distiller 2.4)

• 1 FASTA file (SwissProt S. cerevisiae, human, + 1 bovine protein + trypsin from Dec. 2011)

• 1 template (Excel) • 1 on-line survey (Survey Monkey)

Study Instructions

1. Analyze the dataset 2. Report the peptide spectrum matches in the

provided template 3. Report measures of reliability for PTM site

assignments (optional) 4. Complete an on-line survey 5. Attach a 1-2 page description of your

methodology

How Much Do the Identif ications Overlap?

• Reasonable number of participants from around the globe, mainly experienced users but a few first-timers

• Large spread in number of spectra identified • False negatives (NS) are generally much higher than false positives, so there is

generally room for improvement • Peak list was a significant factor on performance • Varied performance in detecting PTMs

• Most participants struggled with sulfation • Multiply phosphorylated harder to find than singly

• Most common errors in site assignment were: • Reporting sulfo(Y) as phospho(ST) • Mis-assignment of site/s in multiply phosphorylated peptides

• The iPRG2012 are in the process of preparing the data for publication. If you participated and would like to help out, contact the iPRG through [email protected].

For more information on the iPRG and for copies of this poster and the talk please visit: http://www.abrf.org/iPRG after the meeting

Acknowledgements: The iPRG are grateful to all of the participants. We would also like to thank Jeremy Carver (UCSD) for serving as the “Anonymizer”.

Prel iminary Conclusions

Study Design • Use a common, rich dataset • Use a common sequence database • Allow participants to use the bioinformatic tools and methods of their choosing • Use a common reporting template • Report results at an estimated 1% FDR (at the spectrum level) • Ignore protein inference

A Proteome Informatics Challenge

Nature uses a wide variety of protein post-translational modifications to regulate protein structure and activity and tandem mass spectrometry has emerged as the most powerful analytical approach to detect these moieties. However, modified peptides present special challenges for characterization. First, they are generally present at sub-stoichiometric levels, meaning that without enrichment strategies samples are dominated by unmodified peptides, so finding the modified peptides may be a challenge. Secondly, the modifications may have unique fragmentation behaviors in collision-induced dissociation (CID), which may need to be considered by database search engines. Finally, if there are multiple residues within a given peptide that could bear a particular modification type, then it is necessary to identify fragment ions that frame either side of the modification site in order to be able to localize the exact site of modification within the peptide. The Proteome Informatics Research Group (iPRG) created a collaborative data analysis study to enable proteomics laboratories to evaluate their ability to find a variety of post-translationally modified peptides within a complex peptide mixture background. The dataset consists of nearly twenty thousand high resolution and high mass accuracy tandem mass spectra. Within the sample there are peptides with a range of different natural and chemical modifications. This study enabled participants to evaluate their data analysis capabilities and approaches relative to others in analyzing a common data set, with a particular emphasis on their ability to detect and characterize peptides with modifications of potential biological significance.

Total Spectra vs. Interesting Mods

There is a very wide range in the total number of spectra with identified peptides. Once one focuses only on the spectra containing modifications for which the ability to localize the modification to a particular residue, the range is much narrower. The 5 rightmost participants went so far as to report only spectra of modified peptides.

Room for Improvement in ID Certa inty Thresholds

I d e n t i f i c a t i o n s r e p o r t e d a s : Ye s t h a t m a t c h e d t h e c o n s e n s u s ; N o , b u t s t i l l m a t c h i n g t h e c o n s e n s u s ; Ye s , b u t a d i f f e r e n t a n s w e r t h a n t h e c o n s e n s u s ; Ye s , < 3 c o n s e n s u s ; N o , t h a t d i s a g r e e d w i t h c o n s e n s u s

Table Key:

71755v 58288v 33564 93128i 11211 58409 87133i 94158i 97053i 42424i 77777i 23068 40104i 87048i 92653 34284i 23117 74564 14151 52781 47603 14152 45511 11821 Acetyl (K)

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Dimethyl (K) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Dimethyl (R) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Methyl (K) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Methyl (R) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Nitro (Y) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Phospho (STY)

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1

Sulfo (Y) 1

1 1

1 1 1 1 1

Trimethyl (K) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

An = Andromeda/MaxQuant MG = MS-GFDB pF = pFind Sc = Scaffold AS = A-Score MM = MyriMatch Pk = PEAKS SM = Spectrum Mill By = Byonics MO = MODa PkDB = PEAKSDB Sq = Sequest IH = In-house software O = OMSSA

Ppi =Protein Pilot ST = SpectraST IP = IDPicker Ot = Other PPr = Protein Prospector TPP = TransProteomic Pipeline M = Mascot P/PP = Pep/Prot Prophet PR = PhosphoRS XL = Excel Mde = Mascot Delta Score PD = ProteomeDiscoverer PW = ProteoWizard XT = X!Tandem Mdi = Mascot Distiller

71755v 58288v 33564 93128i 11211 58409 87133i 94158i 97053i 42424i 77777i 23068 40104i 87048i 92653 34284i 23117 74564 14151 52781 47603 14152 45511 11821

Peaklist Mgf mzML mzML mgf mgf mgf mzML mgf_nd mgf mgf_nd mgf mzML mzML mgf_nd mgf mgf mzML mgf mgf_nd mgf mgf mgf_nd mgf mgf_nd

mzML mzML mgf_nd WIFF

Spectral Pre-Processing Pk Pk PPi PPi MDi Ot SM pF Pk MDi PD PkDB PW PkDB Sq

Peptide Identification

By Pk Pk M PPi M O PPr pF M MG M P/PP SM pF M Pk M PPi M O M M PD PkDB PkDB O ST MM O TPP PkDB Sq

ST XT IH XT XT XT Ot

Discover Unexpected Mods By Pk Pk M PPi ST PPr pF IH MO SM pF M Pk PPi O M PkDB PkDB

Site Localization Pk Pk MDe PPi M PPr pF M IH M SM pF M Pk M AS IH An MDe PD

PkDB O MM O PkDB Sc Ot Sq ST IH

Results Filtering By Pk Pk IH PPi P/PP P/PP PPr pF IP IH XL IH XL pF M Pk M XL Sc M PR PkDB XL Ot TPP TPP XL PkDB

NTT 2 1 1 2 1 ? 1 1 1 2 1 ? 1 2 2 2 2 ? 1 2 2 2 ? 2

Experience 5-10 years 5-10 years 5-10 years >10 years < 1 year 3-4 years 5-10 years >10 years 5-10 years 5-10 years 5-10 years 3-4 years 5-10 years >10 years 1-2 years >10 years 1-2 years >10 years >10 years 1-2 years >10 years >10 years 1-2 years >10 years

1.48%

4.94%

6.1%

4.37%

n/a 13.68%

8.56%

2%

1.85%

16.67%

2.52%

n/a n/a

n/a

7.14% 2.92%

8.11%

3.45%

7.95%

2.52%

8%

3.03%

5.72% n/a

0

50

100

150

200

250

300

350

400

450

500

550

71755v 58288v 33564 93128i 11211 58409 87133i 94158i 97053i 42424i 77777i 23068 40104i 87048i 92653 34284i 23117 74564 14151 52781 47603 14152 45511 11821

# S

pike

d P

eptid

e PS

Ms

(FLR

%)

# Spiked PSMs Mod Loc Certainty N # Spiked PSMs Mod Loc Certainty Y, Ignored

# Spiked PSMs Wrong Mod Loc; Mod Loc Certainty Y # Spiked PSMs Correct Mod Loc; Mod Loc Certainty Y

Synthetic Peptide ID by Participant

Red corresponds to the presence of at least 1 PSM for a spiked synthetic peptide modified with the correct localization reported and the correct modification name. The localization certainty may have been reported as either Y or N. PSMs containing modification of residues other than s,t,y,k,r were excluded.

0

2000

4000

6000

8000

10000

12000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Cum

mul

ativ

e #

Spec

tra

# Participants Agreeing

Phospho vs Sulfo

Observe modified fragment ions.

Observe ‘unmodified’ fragment ions. Spectrum looks essentially identical to unmodified peptide spectrum

DISLSDY(Phospho)K

DISLSDY(Sulfo)K

Peak Processing • Two types of peak lists were supplied o Deisotoped (AB SCIEX MS Data Converter)

• Cannot infer fragment charge state • Possibly lower chance of false fragment matches

o Non-deisotoped (Mascot Distiller) • Can infer fragment charge state • Possibly higher chance of false fragment matches

• For 238 consensus spectra the peak lists had different specified charge state • 193 results only possible with deisotoped peak list • 45 results only possible with non-deisotoped peak list

0

1000

2000

3000

4000

5000

6000

7000

71755v 58288v 33564 93128i 11211 58409 87133i 94158i 97053i 42424i 77777i 23068 40104i 87048i 92653 34284i 23117 74564 14151 52781 47603 14152 45511 11821

# S

pect

ra

# No Mods # Common Mods (^q,^c,m,n,q) # Nterm Mods # AA Mutation Mods # Interesting Mods

0

100

200

300

400

500

600

700

# S

pect

ra

# Interesting Mod Loc Certainty N # Interesting Mod Loc Certainty Y

0 500

1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500

# Sp

ectr

a

#ND No Id, Diff from Consensus #Y<3 P Id Yes #YD Yes Id, Diff from Consensus #NS No Id, Same as Consensus #YS Yes Id, Same as Consensus

24 submissions from 23 participants. 9 were iPRG members. Participation was international and covered a wide range of experience level. Who Participated:

16

8

ABRF Member Nonmember

15 4

2 2 1

Bioinformatician/Software Developer Mass Spectrometrist Lab Scientist Director/Manager

13 4

4 2 1

North America Europe Asia Australia/NZ Africa 17

2

1 1 3 Academic

Manufacturer/Vendor Biotech/Pharma/Industry Government Other

5

6 11

2

Core Only

Software development only (no research facility) Conduct both core functions and non-core lab research Non-core research lab

Mixed Spectra Exposed by Peak Processing

464.59 3+ Non-deisotoped peaklist

465.19 2+ Deisotoped peaklist