Upload
kimberly-adele-terry
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Eat Raw & Fresh: Introducing isotopic Mass-to-charge
Ratio and Envelope Fingerprinting (iMEF) and
ProteinGoggle for Protein Database Search
Zhixin(Michael) Tian
CNCP
11/15/2012
What is mass?
5 1 1 5 1 2 5 1 3 5 1 4 5 1 5 5 1 6 5 1 70
2 0
4 0
6 0
8 0
1 0 0
Rel
ativ
e A
bu
nd
ance
(%
)
m /z (z = 1 )L. C. Dias, et al. J. Org. Chem. 2012, 77, 4046.
Monoisotopic mass(m/z, z=+1)
Missing monoisotopic mass in proteinMonoisotopic mass
Average mass: Error: ±1 u at 16,000 u (13C/12C ratio’s variability)
Mass of the most abundant isotope Error: ±1 Da or more
(mis-assignment of # of contributing heavy isotopes )
7 7 7 0 7 7 7 5 7 7 8 0 7 7 8 5 7 7 9 0 7 7 9 50
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
1 0 0
Rel
. A
bu
nd
. (%
)
m /z (z = 1 , 7 0 A v e ra g in e s )
0 2 0 4 0 6 0 8 0 1 0 0 1 2 00
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
1 0 0
Rel
. Ab
un
d. (
%)
n (A v e ra g in e )
: most significant & accurate
Deisotoping (Deconvolution)
Algorithms: AID-MS, ESI-ISOCONV, LASSO, MapQuant, MasSPIKE, MATCHING, msInspect, Peplist, quadratic deisotoping, RAPID, THRASH, Wang’s method, Zhang’s program, and ZSCORE
Steps:1.Calculate background noise level2.Determine charge state using FT/Patterson technique3.Calculate theoretical profile4.Fit with observed isotopic profile5.Monoisotopic mass
Search Engines: ProSightPC, SEQUEST, Mascot, X!Tandem, InsPecT, OMSSA, Andromeda, pFind
RAW File
MS Spectrum(iE)
MS/MS Spectra(iE)
A1/P1
Parent(Exp. mass)
Parent(Theo. mass)
Protein Database
A2/P3
CandidatesFragments
(Theo. mass)Fragments(Exp. mass)
A1/P2
A2/P4
Initial IDsFinal IDs
Peptide Mass Fingerprinting (PMF)
Ubiquitin - MS spectrum (profile)
5 0 0 1 0 0 0 1 5 0 0 2 0 0 00 .0
3 .0 x 1 0 6
6 .0 x 1 0 6
9 .0 x 1 0 6
1 .2 x 1 0 7
1 .5 x 1 0 7
856 .5 857 .0 857 .5 858 .0 858 .5 859 .0
z = 1 0
z=1
2z=
11
z=1
0
z=9
z=8
z=7A
bu
nd
an
ce
(a
u)
m /zz=
6
Ubiquitin – MS/MS (ETD) Spectrum (Profile)
4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 00
1 x 1 0 5
2 x 1 0 5
3 x 1 0 5
4 x 1 0 5
5 x 1 0 5
6 x 1 0 5
8 2 0 8 2 5 8 3 0 8 3 5 8 4 0 8 4 5
Inte
ns
ity
(a
u)
m /z
0 2000 4000 6000 8000 100000.0
5.0x105
1.0x106
1.5x106
2.0x106
2.5x106
3.0x106
3.5x106
4.0x106
Inte
ns
ity
(a
.u.)
Monoisotopic Mass (Da)
Database search with PMF using ProSightPC
NMFs = 92NUMFs = 219P score = 4.86E-98
Neil L. Kelleher, et al. Nat. Biotechnol. 2001, 19, 952
f - the total number of observed fragments (NMFs + NUMFs);
n - the number of matching fragments (NMFs).
x - the mean probability that a mass of an observed fragment ion will
randomly match one from a generic protein
111.1 - the mass of the average amino acid, weighted for its
occurrence in proteins;
2 - the number of fragment ions generated from each bond cleavage,
which is assumed to be 2 (b- and y-type ions or c-and z•-type ions); Ma - the mass accuracy (a Ma of ±1 Da translates to a 2 Da window).
Definition of P_Score
Is “MFs” really good?
?
277.0 277.5 278.0 278.5 279.0 279.50
20
40
60
80
100
-10, -4
5
3, -3
3
Rela
tive A
bu
nd
an
ce (
%)
m/z
C2
1+
0,
0
864 865 866 8670
20
40
60
80
100
9, 20
Rela
tive A
bu
nd
an
ce (
%)
m/z
c7
1+
4, -2
2
Is “NUMFs” really good?
PeakPicking:SNRThreshold = 3.0BackgroundRatio = 5.0FitType = Lorentzian
DeconvPep:MaxCharge = 25ThScore = 0.0
AdvDeconv:MaxAbundancePeak = 3ScanNoModifier = 0MaxMissPeak = 3MassErr = 1.0E-05ThClustExt = 0.0IntsRangeErr = 0.5
THRASH (92+219=311) RAPID (28+49=77)
Better “deisotoping”?
NO “deisotoping”?
What is a mass spectrum?
5 0 0 1 0 0 0 1 5 0 0 2 0 0 00 .0
3 .0 x 1 0 6
6 .0 x 1 0 6
9 .0 x 1 0 6
1 .2 x 1 0 7
1 .5 x 1 0 7
856 .5 857 .0 857 .5 858 .0 858 .5 859 .0
z = 1 0
z=12
z=11
z=10
z=9
z=8
z=7A
bu
nd
ance
(au
)
m /zz=
6
MS of Ubiquitin
x, y coordinates
8 5 6 .5 8 5 7 .0 8 5 7 .5 8 5 8 .0 8 5 8 .50 .0
3 .0 x 1 0 6
6 .0 x 1 0 6
9 .0 x 1 0 6
1 .2 x 1 0 7
1 .5 x 1 0 7
Ab
un
da
nc
e (
au
)
m /z (z = 1 0 )
Profile Exp.
m/z
Exp.
Abundance 856.9821 6061857.0825 21811857.1826 52841857.2809 82342857.3782 93523857.4746 96019857.5714 75857857.6682 60680857.7663 42420857.8669 27294857.9680 14752858.0681 5685858.1685 1120858.2717 919858.3671 316858.4594 147
2
2
( )
22
1( )
2
x
f x e
0
20
1( ; , )
[1 ( ) ]f x x
x x
Centroid
8 5 6 .5 8 5 7 .0 8 5 7 .5 8 5 8 .0 8 5 8 .5 8 5 9 .0
0 .0
2 .0 x 1 0 6
4 .0 x 1 0 6
6 .0 x 1 0 6
8 .0 x 1 0 6
1 .0 x 1 0 7
1 .2 x 1 0 7
1 .4 x 1 0 7
Ab
un
da
nc
e (
au
)
m /z (z = 1 0 )
The nature of the iE of an ion
What are in a protein database?
MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG
C378H630N105O118S1
8 5 6 .5 8 5 7 .0 8 5 7 .5 8 5 8 .0 8 5 8 .5 8 5 9 .00
2 0
4 0
6 0
8 0
1 0 0
Rel
ativ
e A
bu
nd
ance
(%
)
m /z (z= 1 0 )
Exp.
m/z
Exp.
Abundance 856.9690 3.95857.0692 18.83857.1695 45.88857.2698 76.13857.3701 96.65857.4703 100.00857.5706 87.76857.6709 67.12857.7711 45.63857.8714 27.99857.9716 15.67858.0719 8.09858.1721 3.87858.2724 1.73858.3726 0.73858.4729 0.29
x, y coordinates
Centroid
RAW File
MS Spectrum(iE)
MS/MS Spectra(iE)
Protein Database
Candidates
Initial IDsFinal IDs
A1/P1
Parent(Exp. mass)
Fragments(Exp. mass)
A1/P2
iMEF(isotopic m/z & Envelope Fingerprinting)
A2/P3
A2/P4
Parent(Theo. mass)
Fragments(Theo. mass)
Parent(Theo. iE)
Fragments(Theo. iE)
A/P1
A/P2
1st isotopic peakDB A1/F1
Parent ion exp. iEParent ion theo. iEA2 F2
Protein candidates
Fragment ion exp. iEsFragment ion theo. iEsA2/F3
Preliminary protein IDs
2nd isotopic peakDB A1/F1 Y
3rd isotopic peakDB A1/F1 Y
Initial protein ID
NMFs PTM_Scores
Initial protein IDs
Final IDsRemove
duplicates
Isotopic peakexclusion list
Norm. isotopic peaks
removed
N
Combined initial protein IDs
N
Preliminary protein candidates
N
N
Y
Y
Y
N
N
N
Pre-Step 1: Customized database
MS Precursor ions
MS/MS fragment ions
0 1 0 2 0 3 0 4 0 5 00 .0
0 .5
1 .0
1 .5
2 .0
Rel
ativ
e A
bu
nd
ance
(%
)
% o f c h a n g e d p o in ts in a s te p s iz e o f 0 .1
Pre-Step 2: Noise level determination
0 2 4 6 8 1 00
2
4
6
8
1 0
Rel
ativ
e A
bu
nd
ance
(%
)
% o f c h a n g e d p o in ts in a s te p s iz e o f 0 .1
4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 00
2
4
6
8
1 0
Re
lati
ve
Ab
un
da
nc
e (
%)
m /z5 0 0 1 0 0 0 1 5 0 0 2 0 0 0
0 .0
0 .5
1 .0
1 .5
2 .0
Re
lati
ve
Ab
un
da
nc
e (
%)
m /z
Ubiquitin - MS spectrum (profile)
5 0 0 1 0 0 0 1 5 0 0 2 0 0 00 .0
3 .0 x 1 0 6
6 .0 x 1 0 6
9 .0 x 1 0 6
1 .2 x 1 0 7
1 .5 x 1 0 7
856 .5 857 .0 857 .5 858 .0 858 .5 859 .0
z = 1 0
z=1
2z=
11
z=1
0
z=9
z=8
z=7A
bu
nd
an
ce
(a
u)
m /zz=
6
Ubiquitin – MS/MS (HCD) spectrum (profile)
4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 00
1 x 1 0 5
2 x 1 0 5
3 x 1 0 5
4 x 1 0 5
5 x 1 0 5
6 x 1 0 5
8 2 0 8 2 5 8 3 0 8 3 5 8 4 0 8 4 5
Inte
ns
ity
(a
u)
m /z
9 1 0 .7 9 1 0 .80
1 x 1 0 4
2 x 1 0 4
3 x 1 0 4
Re
lati
ve
In
t. (
au
)
m /z
Step 1: Profile to centroid (MS & MS2)
Step 2: iMF of precursor ion candidates
8 5 4 8 5 5 8 5 6 8 5 7 8 5 8 8 5 9 8 6 00 .0
2 .0 x 1 0 4
4 .0 x 1 0 4
6 .0 x 1 0 4
8 .0 x 1 0 4
1 .0 x 1 0 5
Ab
un
da
nc
e (
au
)
m /z
isolation window (±3 m/z units)
857.47461 (4 ppm)
… … … … … …
8 5 7 8 5 8 8 5 90
2 0
4 0
6 0
8 0
1 0 0
60 21 20 13 1 0 -10
-6 -3 2 -2 -27
-70
-45
-55
-47
E x p e rim e n ta l T h e o re tic a l
Ab
un
da
nc
e (
%)
m /z
IP A C O (5 % )
15 15 15 13 9 5 1 -3 -6 -5 -4 -4 -4 -1 -6 -16
IPM
D
IP
AD
Step 3: iEF of precursor ion candidates
IPACO 5%IPMD 15ppmIPAD 30%
Step 4: iMF of fragment ion candidates
2 7 0 2 7 1 2 7 2 2 7 3 2 7 4 2 7 5 2 7 6 2 7 7 2 7 8 2 7 9 2 8 00
1 x 1 0 5
2 x 1 0 5
3 x 1 0 5
Ab
un
da
nc
e (a
u)
m /z
277.13278 (5 ppm)
C1;MAX_MZ=149.07431&C2;MAX_MZ=277.132888&C3;MAX_MZ=390.216952&C4;MAX_MZ=537.285366&C5;MAX_MZ=636.353779&C6;MAX_MZ=764.448743&C7;…
277.0 277.5 278.0 278.5 279.0 279.50
20
40
60
80
100
-10, -4
5
3, -3
3
Rela
tive A
bu
nd
an
ce (
%)
m/z
C2
1+
0,
0
Step 5: iEF of fragment ion candidates
IPACO 5%IPMD 10ppmIPAD 50%
Human histone H4_S1acK16acK20me2
Exemplary PTM_Score assignment
ID of ubiquitin from ETD
NMFs = 91 IPACO=10, IPMD=15, IPAD=100
IPMDO=20, IPMDOM=30, IPADO=20, IPADOM=200
0 5 10 15 200
20
40
60
80
100
NM
Fs
IPMD (ppm)
0 50 100 150 2000
20
40
60
80
100
NM
Fs
IPAD (%)
0 10 20 30 40 500
40
80
120
160
200
NM
Fs
IPACO (%)
NMFs vs. IPACO NMFs vs. IPMD NMFs vs. IPAD
Pros and Cons
Pros:
As-strict-as-you-choose confidence
•Strict quality control (QC)
Fine discrimination of close iEs
In-situ unwrapping of overlapped iEs
Cons:
More complex and bigger database
More data points for fingerprinting
Comparison with ProSightPC
277.0 277.5 278.0 278.5 279.0 279.50
20
40
60
80
100
-10, -4
5
3, -3
3
Rela
tive A
bu
nd
an
ce (
%)
m/z
C21+
0,
0
Pros: As-strict-as-you-choose confidence
Layman’s choice of parameters
Default values with statistical significance!
Pros: Fine discrimination of close iEs
b38-533+ b18-33
3+ or b19-343+ (b6-22-H2O)3+
Exp. m/z Theo. m/z IPMD Theo. m/z IPMD Theo. m/z IPMD
599.6575 599.6478 16 599.6511 11 599.6595 -3
599.9919 599.9821 16 599.9855 11 599.9939 -3
600.3242 600.3164 13 600.3197 8 600.3281 -6
600.6616 600.6506 18 600.6539 13 600.6623 -1
5 9 9 .5 6 0 0 .0 6 0 0 .5 6 0 1 .00
2 0
4 0
6 0
8 0
1 0 0 E x p e r im e n ta l
b 1 8 -3 33 + /b 1 9 -3 4
3 +
(b 6 -2 2 -H 2 O )3 +
b 3 8 -5 33 +
Rel
ativ
e A
bu
nd
ance
(%
)
m /z
Pros: In-situ unwrapping of overlapped iEs
Proportional partition k: # of overlapped isotopic peaksm: # of isotopic peak in each iEn: # of overlapped iEs
1036.5 1037.0 1037.5 1038.0 1038.5 1039.00.0
5.0x104
1.0x105
1.5x105
1037.0 1037.5 1038.0 1038.5
-3,0
-2,1
1-2,-3421,-31
2,1
9
1,-15
1,0
1,-5
m/z
c73 - 8+
1,2
2
1037.5 1038.0 1038.5 1039.0
-2,3
-2,-
22-1
,1
-4,-
29
-1,00,-
7
-1,-
42
0,1
4
m/z
c64 - 7+
14,-
16
1036.5 1037.0 1037.5 1038.00
30
60
90
120
150
18,-38
-2,-23
-2,-10-2
,-14
-1,-15
-1,7
-2,1
0
-1,0
-1,4
1
Rela
tive A
bundance (%
)
m/z
z74 - 8+
-1,-27
z74-8+
c73-8+
c64-7+
Ab
un
da
nc
e (
au
)
m/z
Other improvements and utilities
Improvements:
Bi-section method for fast indexing of candidates
LASSO-like approach to untangle overlapped iEs
Additional utilities:
A comprehensive confidence score
False discovery rate (FDR)
Customized ion types to look for new dissociation
channels
Customized MODs for the search of new modification or
labeled proteins
MS/MS spectrum annotation with matching fragments
CONCLUSIONS
An as-confident-as-you-choose protein database search algorithm, iMEF, has been created and implemented in the search engine ProteinGoggle
The principle of iMEF with ProteinGoggle is demonstrated with identification of ubiquitin from its tandem mass spectrum using ETD
iMEF as implemented in ProteinGoggle has been able to unwrap complex overlapping isotopic envelopes and confidently provide embedded fragment ions
iMEF could be adapted for peptide and glycan database search with customized databases
ACKNOWLEDGEMENTS
Funding: DICP “Research Start”China “Youth 1000-talents Theme”
The KENES. Co. Ltd.Miao ZhouShijin LiuBin Yang
DNL2003Li LiBo WangJing LiXu Zhao
THANK YOU VERY MUCH!