Proteomics and Glycoproteomics(Bio-)Informatics
of Protein Isoforms
Nathan EdwardsDepartment of Biochemistry and
Molecular & Cellular Biology
Georgetown University Medical Center
Outline
Tandem mass-spectrometry of peptides
Detection of alternative splicing protein isoforms
Phyloproteomics using top-down mass-spec.
Characterization of glycoprotein microheterogeneity by mass-spectrometry
2
Mass Spectrometer
3
Ionizer
Sample
+_
Mass Analyzer Detector
• MALDI• Electro-Spray
Ionization (ESI)
• Time-Of-Flight (TOF)• Quadrapole• Ion-Trap
• ElectronMultiplier(EM)
Mass Spectrum
4
Mass is fundamental
5
Sample Preparation for MS/MS
6
Enzymatic Digestand
Fractionation
Single Stage MS
7
MS
Tandem Mass Spectrometry(MS/MS)
8
Precursor selection
Tandem Mass Spectrometry(MS/MS)
9
Precursor selection + collision induced dissociation
(CID)
MS/MS
Why Tandem Mass Spectrometry?
MS/MS spectra provide evidence for the amino-acid sequence of functional proteins.
Key concepts: Spectrum acquisition is unbiased Direct observation of amino-acid sequence Sensitive to small sequence variations
10
Unannotated Splice Isoform
Human Jurkat leukemia cell-line Lipid-raft extraction protocol, targeting T cells von Haller, et al. MCP 2003.
LIME1 gene: LCK interacting transmembrane adaptor 1
LCK gene: Leukocyte-specific protein tyrosine kinase Proto-oncogene Chromosomal aberration involving LCK in leukemias.
Multiple significant peptide identifications11
Unannotated Splice Isoform
12
Unannotated Splice Isoform
13
Translation start-site correction
Halobacterium sp. NRC-1 Extreme halophilic Archaeon, insoluble membrane
and soluble cytoplasmic proteins Goo, et al. MCP 2003.
GdhA1 gene: Glutamate dehydrogenase A1
Multiple significant peptide identifications Observed start is consistent with Glimmer 3.0
prediction(s)17
Halobacterium sp. NRC-1ORF: GdhA1
K-score E-value vs PepArML @ 10% FDR Many peptides inconsistent with annotated
translation start site of NP_279651
0 40 80 120 160 200 240 280 320 360 400 440
18
What if there is no "smoking gun" peptide…
20
What if there is no "smoking gun" peptide…
21
What if there is no "smoking gun" peptide…
22
HER2/Neu Mouse Model of Breast Cancer
Paulovich, et al. JPR, 2007 Study of normal and tumor mammary tissue
by LC-MS/MS 1.4 million MS/MS spectra
Peptide-spectrum assignments Normal samples (Nn): 161,286 (49.7%) Tumor samples (Nt): 163,068 (50.3%)
4270 proteins identified in total 2-unique generalized protein parsimony
23
Nascent polypeptide-associated complex subunit alpha
24
7.3 x 10-8
Pyruvate kinase isozymes M1/M2
25
2.5 x 10-5
Phyloproteomics
Fragment intact proteins (top-down MS)
Match the spectra to protein sequences
Place the organism phylogenetically
Works even for unknown microorganisms without any available sequences
26
27
E:\Yersinia Work\yr_inclusion 3/11/2009 3:43:13 PM yrohdei
RT: 19.04 - 25.39
19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0
Time (min)
0
20
40
60
80
100
0
20
40
60
80
100
Re
lative
Ab
un
da
nce
25.3619.9919.93
25.2720.04 25.2319.89 23.0322.97 23.08
20.1019.83 23.64 25.1923.7022.88 24.6324.5720.1422.82
20.2019.7822.7220.2519.48
22.5220.41 22.0821.8420.60 21.04
20.00
21.03 21.46
NL: 1.66E8
TIC MS yr_inclusion
NL: 1.01E7
TIC F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00] MS yr_inclusion
yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
10
20
30
40
50
60
70
80
90
100
Re
lative
Ab
un
da
nce
576.83z=2
840.16z=7
720.39z=2 903.81
z=3785.41
z=4694.62
z=4
584.57z=4
928.49z=4559.55
z=41804.48
z=?992.53
z=3200.78z=?
329.71z=?
1253.14z=?
555.29z=4
1610.27z=?
1883.75z=?
1491.23z=?
1118.93z=?
1666.89z=?
1345.30z=?
461.16z=?
756.70 +8 MW 6044.11
CID Protein Fragmentation Spectrum from Y. rohdei
28
E:\Yersinia Work\yr_inclusion 3/11/2009 3:43:13 PM yrohdei
RT: 19.04 - 25.39
19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0
Time (min)
0
20
40
60
80
100
0
20
40
60
80
100
Re
lative
Ab
un
da
nce
25.3619.9919.93
25.2720.04 25.2319.89 23.0322.97 23.08
20.1019.83 23.64 25.1923.7022.88 24.6324.5720.1422.82
20.2019.7822.7220.2519.48
22.5220.41 22.0821.8420.60 21.04
20.00
21.03 21.46
NL: 1.66E8
TIC MS yr_inclusion
NL: 1.01E7
TIC F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00] MS yr_inclusion
yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
10
20
30
40
50
60
70
80
90
100
Re
lative
Ab
un
da
nce
576.83z=2
840.16z=7
720.39z=2 903.81
z=3785.41
z=4694.62
z=4
584.57z=4
928.49z=4559.55
z=41804.48
z=?992.53
z=3200.78z=?
329.71z=?
1253.14z=?
555.29z=4
1610.27z=?
1883.75z=?
1491.23z=?
1118.93z=?
1666.89z=?
1345.30z=?
461.16z=?
756.70 +8 MW 6044.11
CID Protein Fragmentation Spectrum from Y. rohdei
Match to Y. pestis 50SRibosomal Protein L32
Exact match sequence…
29
Phylogeny: Protein vs DNA
30
Protein Sequence 16S-rRNA Sequence
What about mixtures?
31
34
DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128
Eight proteins identified with "large" |Δ|
Identified E. herbicola proteins
36
DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128
Extract N- and C-terminus sequence supported by at least 3 b- or y-ions
Identified E. herbicola proteins
37
E. herbicola protein sequences
39
Phylogenetic placement of E. herbicola
Phylogram Cladogramphylogeny.fr – "One-Click"
Glycoprotein Microheterogeneity
Glycosylation is important, but our analytic tools are rather rudimentary Detach glycans (PNGase-F) and analyze glycans Detach glycans (PNGase-F) and analyze peptides Get glycan structures, but no association with protein
or protein site, or Get glycosylation sites, but no association with glycan
structures. We analyze glycopeptides directly…
Challenges all facets of glycoproteomics40
Altered N-Glycosylation in Cancer
•41
NX
S/T
COO-
NH3+Fut-VIII(α1-6 Fuc)Comunale, 2010
GnT-V(β1-6 GlcNAc)Wang, 2007
ST-VI Gal1(α 2-6 NeuAc)Hedlund, 2008
Fut-VI(α1-3 Fuc)Higai,2008
Glycosyltransferase Expression or Glycan Analyses GalNAc Sialic Acid Gal GlcNAc Man
K. Chandler
The informatics challenge
Identify glycopeptides in large-scale tandem mass-spectrometry datasets Many glycopeptide enriched fractions Many tandem mass-spectra / fraction
Good, but not great, instrumentation QStar Elite – CID, good MS1/MS2 resolution
Strive for hypothesis-generating analysis Site-specific glycopeptide characterization Glycoform occupancy in differentiated samples
42
CID Glycopeptide Spectrum
43
Observations
Oxonium ions (204, 366) help distinguish glycopeptides from peptides… …but do little to identify the glycopeptide
Few peptide b/y-ions to identify peptides… …but intact peptide fragments are common
If the peptide can be guessed, then… …the glycan's mass can be determined
44
Hap
tog
lob
in (
HP
T_H
UM
AN
)
NLFLNHSE*NATAK
MVSHHNLTTGATLINE
VVLHPNYSQVDIGLIK
Haptoglobin Standard
45
• N-glycosylation motif (NX/ST)* Site of GluC cleavage
Pompach et al. Journal of Proteome Research 11.3 (2012): 1728–1740.
Tuning the filters…
We estimate the number of false-positives……so that the user can tune the search parameters
47
Application of Exoglycosidasesto locate Fucose
At ITIH4 site N517
48LPTQNITFQTE
LPTQNITFQTE
LPTQNITFQTE
LPTQNITFQTE
K. Chandler
NVVFVIDK ITIH4 Glycopeptide
49
K. Chandler
Similar Glycopeptides Spectra( mass Δ ~ +162 Da)
50
MVSHHNLTTGATLINE
?
+162 Da
Fragmented Glycopeptides( mass Δ ~ +162 Da)
51
MVSHHNLTTGATLINE
?
+162 Da
MVSHHNLTTGATLINE
Propagating Annotations
•MVS+A1G1
•MVS+A1G1
•MVS+A2G2
•MVS+A2G2
•MVS+A2G2
•VVL+A1G1
•VVL+A2G2
52
G. Berry
Summary
Mass-spectrometry coupled with protein chemistry and good informatics can look beyond the obvious to the unexpected...
…and there is plenty to find!
53
Acknowledgements
Edwards lab Kevin Chandler Gwenn Berry
Fenselau lab (UMD) Colin Wynne Avantika Dhabaria
Goldman lab (GU) Kevin Chandler Petr Pompach
NSF Graduate Fellowship (Chandler)
Funding: NCI
54