Upload
phungcong
View
214
Download
1
Embed Size (px)
Citation preview
465.19 2+
464.59 3+
iPRG2012: A Study on Detecting Modified Peptides in a Complex Mixture A B R F
Proteome Informatics Research Group
John Cottrell1 , Karl R. Clauser2, Robert J Chalkley3, Ruixiang Sun4, Eugene Kapp5, Matt Chambers6, W. Hayes McDonald6, Henry H. Lam7, Nuno Bandeira8, Eric Deutsch9 and Thomas Neubert10
1Matrix Science, London, UK; 2The Broad Institute of MIT and Harvard, Cambridge, MA; 3University of California, San Francisco, CA; 4Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; 5Ludwig Institute for Cancer Research, Parkville, Victoria, Australia; 6Vanderbilt University, Nashville, TN; 7Hong Kong University of Science and Technology, Hong Kong, China; 8University of California, San Diego, CA; 9Institute for Systems Biology, Seattle, WA; 10New York University School of Medicine, New York, NY
Study Goals
1. Evaluate ability of participants to identify modified peptides in a complex mixture 2. Find out why result sets might differ between participants 3. Produce a benchmark dataset, along with an analysis resource
Study Materials
• 5600 TripleTOF dataset (AB-SCIEX) – WIFF, mzML, dta, MGF (de-isotoped);–
conversions by MS Data Converter 1.1.0 – MGF (not de-isotoped – conversion by
Mascot Distiller 2.4)
• 1 FASTA file (SwissProt S. cerevisiae, human, + 1 bovine protein + trypsin from Dec. 2011)
• 1 template (Excel) • 1 on-line survey (Survey Monkey)
Study Instructions
1. Analyze the dataset 2. Report the peptide spectrum matches in the
provided template 3. Report measures of reliability for PTM site
assignments (optional) 4. Complete an on-line survey 5. Attach a 1-2 page description of your
methodology
How Much Do the Identif ications Overlap?
• Reasonable number of participants from around the globe, mainly experienced users but a few first-timers
• Large spread in number of spectra identified • False negatives (NS) are generally much higher than false positives, so there is
generally room for improvement • Peak list was a significant factor on performance • Varied performance in detecting PTMs
• Most participants struggled with sulfation • Multiply phosphorylated harder to find than singly
• Most common errors in site assignment were: • Reporting sulfo(Y) as phospho(ST) • Mis-assignment of site/s in multiply phosphorylated peptides
• The iPRG2012 are in the process of preparing the data for publication. If you participated and would like to help out, contact the iPRG through [email protected].
For more information on the iPRG and for copies of this poster and the talk please visit: http://www.abrf.org/iPRG after the meeting
Acknowledgements: The iPRG are grateful to all of the participants. We would also like to thank Jeremy Carver (UCSD) for serving as the “Anonymizer”.
Prel iminary Conclusions
Study Design • Use a common, rich dataset • Use a common sequence database • Allow participants to use the bioinformatic tools and methods of their choosing • Use a common reporting template • Report results at an estimated 1% FDR (at the spectrum level) • Ignore protein inference
A Proteome Informatics Challenge
Nature uses a wide variety of protein post-translational modifications to regulate protein structure and activity and tandem mass spectrometry has emerged as the most powerful analytical approach to detect these moieties. However, modified peptides present special challenges for characterization. First, they are generally present at sub-stoichiometric levels, meaning that without enrichment strategies samples are dominated by unmodified peptides, so finding the modified peptides may be a challenge. Secondly, the modifications may have unique fragmentation behaviors in collision-induced dissociation (CID), which may need to be considered by database search engines. Finally, if there are multiple residues within a given peptide that could bear a particular modification type, then it is necessary to identify fragment ions that frame either side of the modification site in order to be able to localize the exact site of modification within the peptide. The Proteome Informatics Research Group (iPRG) created a collaborative data analysis study to enable proteomics laboratories to evaluate their ability to find a variety of post-translationally modified peptides within a complex peptide mixture background. The dataset consists of nearly twenty thousand high resolution and high mass accuracy tandem mass spectra. Within the sample there are peptides with a range of different natural and chemical modifications. This study enabled participants to evaluate their data analysis capabilities and approaches relative to others in analyzing a common data set, with a particular emphasis on their ability to detect and characterize peptides with modifications of potential biological significance.
Total Spectra vs. Interesting Mods
There is a very wide range in the total number of spectra with identified peptides. Once one focuses only on the spectra containing modifications for which the ability to localize the modification to a particular residue, the range is much narrower. The 5 rightmost participants went so far as to report only spectra of modified peptides.
Room for Improvement in ID Certa inty Thresholds
I d e n t i f i c a t i o n s r e p o r t e d a s : Ye s t h a t m a t c h e d t h e c o n s e n s u s ; N o , b u t s t i l l m a t c h i n g t h e c o n s e n s u s ; Ye s , b u t a d i f f e r e n t a n s w e r t h a n t h e c o n s e n s u s ; Ye s , < 3 c o n s e n s u s ; N o , t h a t d i s a g r e e d w i t h c o n s e n s u s
Table Key:
71755v 58288v 33564 93128i 11211 58409 87133i 94158i 97053i 42424i 77777i 23068 40104i 87048i 92653 34284i 23117 74564 14151 52781 47603 14152 45511 11821 Acetyl (K)
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Dimethyl (K) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Dimethyl (R) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Methyl (K) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Methyl (R) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Nitro (Y) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Phospho (STY)
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
Sulfo (Y) 1
1 1
1 1 1 1 1
Trimethyl (K) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
An = Andromeda/MaxQuant MG = MS-GFDB pF = pFind Sc = Scaffold AS = A-Score MM = MyriMatch Pk = PEAKS SM = Spectrum Mill By = Byonics MO = MODa PkDB = PEAKSDB Sq = Sequest IH = In-house software O = OMSSA
Ppi =Protein Pilot ST = SpectraST IP = IDPicker Ot = Other PPr = Protein Prospector TPP = TransProteomic Pipeline M = Mascot P/PP = Pep/Prot Prophet PR = PhosphoRS XL = Excel Mde = Mascot Delta Score PD = ProteomeDiscoverer PW = ProteoWizard XT = X!Tandem Mdi = Mascot Distiller
71755v 58288v 33564 93128i 11211 58409 87133i 94158i 97053i 42424i 77777i 23068 40104i 87048i 92653 34284i 23117 74564 14151 52781 47603 14152 45511 11821
Peaklist Mgf mzML mzML mgf mgf mgf mzML mgf_nd mgf mgf_nd mgf mzML mzML mgf_nd mgf mgf mzML mgf mgf_nd mgf mgf mgf_nd mgf mgf_nd
mzML mzML mgf_nd WIFF
Spectral Pre-Processing Pk Pk PPi PPi MDi Ot SM pF Pk MDi PD PkDB PW PkDB Sq
Peptide Identification
By Pk Pk M PPi M O PPr pF M MG M P/PP SM pF M Pk M PPi M O M M PD PkDB PkDB O ST MM O TPP PkDB Sq
ST XT IH XT XT XT Ot
Discover Unexpected Mods By Pk Pk M PPi ST PPr pF IH MO SM pF M Pk PPi O M PkDB PkDB
Site Localization Pk Pk MDe PPi M PPr pF M IH M SM pF M Pk M AS IH An MDe PD
PkDB O MM O PkDB Sc Ot Sq ST IH
Results Filtering By Pk Pk IH PPi P/PP P/PP PPr pF IP IH XL IH XL pF M Pk M XL Sc M PR PkDB XL Ot TPP TPP XL PkDB
NTT 2 1 1 2 1 ? 1 1 1 2 1 ? 1 2 2 2 2 ? 1 2 2 2 ? 2
Experience 5-10 years 5-10 years 5-10 years >10 years < 1 year 3-4 years 5-10 years >10 years 5-10 years 5-10 years 5-10 years 3-4 years 5-10 years >10 years 1-2 years >10 years 1-2 years >10 years >10 years 1-2 years >10 years >10 years 1-2 years >10 years
1.48%
4.94%
6.1%
4.37%
n/a 13.68%
8.56%
2%
1.85%
16.67%
2.52%
n/a n/a
n/a
7.14% 2.92%
8.11%
3.45%
7.95%
2.52%
8%
3.03%
5.72% n/a
0
50
100
150
200
250
300
350
400
450
500
550
71755v 58288v 33564 93128i 11211 58409 87133i 94158i 97053i 42424i 77777i 23068 40104i 87048i 92653 34284i 23117 74564 14151 52781 47603 14152 45511 11821
# S
pike
d P
eptid
e PS
Ms
(FLR
%)
# Spiked PSMs Mod Loc Certainty N # Spiked PSMs Mod Loc Certainty Y, Ignored
# Spiked PSMs Wrong Mod Loc; Mod Loc Certainty Y # Spiked PSMs Correct Mod Loc; Mod Loc Certainty Y
Synthetic Peptide ID by Participant
Red corresponds to the presence of at least 1 PSM for a spiked synthetic peptide modified with the correct localization reported and the correct modification name. The localization certainty may have been reported as either Y or N. PSMs containing modification of residues other than s,t,y,k,r were excluded.
0
2000
4000
6000
8000
10000
12000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Cum
mul
ativ
e #
Spec
tra
# Participants Agreeing
Phospho vs Sulfo
Observe modified fragment ions.
Observe ‘unmodified’ fragment ions. Spectrum looks essentially identical to unmodified peptide spectrum
DISLSDY(Phospho)K
DISLSDY(Sulfo)K
Peak Processing • Two types of peak lists were supplied o Deisotoped (AB SCIEX MS Data Converter)
• Cannot infer fragment charge state • Possibly lower chance of false fragment matches
o Non-deisotoped (Mascot Distiller) • Can infer fragment charge state • Possibly higher chance of false fragment matches
• For 238 consensus spectra the peak lists had different specified charge state • 193 results only possible with deisotoped peak list • 45 results only possible with non-deisotoped peak list
0
1000
2000
3000
4000
5000
6000
7000
71755v 58288v 33564 93128i 11211 58409 87133i 94158i 97053i 42424i 77777i 23068 40104i 87048i 92653 34284i 23117 74564 14151 52781 47603 14152 45511 11821
# S
pect
ra
# No Mods # Common Mods (^q,^c,m,n,q) # Nterm Mods # AA Mutation Mods # Interesting Mods
0
100
200
300
400
500
600
700
# S
pect
ra
# Interesting Mod Loc Certainty N # Interesting Mod Loc Certainty Y
0 500
1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500
# Sp
ectr
a
#ND No Id, Diff from Consensus #Y<3 P Id Yes #YD Yes Id, Diff from Consensus #NS No Id, Same as Consensus #YS Yes Id, Same as Consensus
24 submissions from 23 participants. 9 were iPRG members. Participation was international and covered a wide range of experience level. Who Participated:
16
8
ABRF Member Nonmember
15 4
2 2 1
Bioinformatician/Software Developer Mass Spectrometrist Lab Scientist Director/Manager
13 4
4 2 1
North America Europe Asia Australia/NZ Africa 17
2
1 1 3 Academic
Manufacturer/Vendor Biotech/Pharma/Industry Government Other
5
6 11
2
Core Only
Software development only (no research facility) Conduct both core functions and non-core lab research Non-core research lab
Mixed Spectra Exposed by Peak Processing
464.59 3+ Non-deisotoped peaklist
465.19 2+ Deisotoped peaklist