45
Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center

Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Embed Size (px)

Citation preview

Page 1: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Proteomics and Glycoproteomics(Bio-)Informatics

of Protein Isoforms

Nathan EdwardsDepartment of Biochemistry and

Molecular & Cellular Biology

Georgetown University Medical Center

Page 2: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Outline

Tandem mass-spectrometry of peptides

Detection of alternative splicing protein isoforms

Phyloproteomics using top-down mass-spec.

Characterization of glycoprotein microheterogeneity by mass-spectrometry

2

Page 3: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Mass Spectrometer

3

Ionizer

Sample

+_

Mass Analyzer Detector

• MALDI• Electro-Spray

Ionization (ESI)

• Time-Of-Flight (TOF)• Quadrapole• Ion-Trap

• ElectronMultiplier(EM)

Page 4: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Mass Spectrum

4

Page 5: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Mass is fundamental

5

Page 6: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Sample Preparation for MS/MS

6

Enzymatic Digestand

Fractionation

Page 7: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Single Stage MS

7

MS

Page 8: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Tandem Mass Spectrometry(MS/MS)

8

Precursor selection

Page 9: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Tandem Mass Spectrometry(MS/MS)

9

Precursor selection + collision induced dissociation

(CID)

MS/MS

Page 10: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Why Tandem Mass Spectrometry?

MS/MS spectra provide evidence for the amino-acid sequence of functional proteins.

Key concepts: Spectrum acquisition is unbiased Direct observation of amino-acid sequence Sensitive to small sequence variations

10

Page 11: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Unannotated Splice Isoform

Human Jurkat leukemia cell-line Lipid-raft extraction protocol, targeting T cells von Haller, et al. MCP 2003.

LIME1 gene: LCK interacting transmembrane adaptor 1

LCK gene: Leukocyte-specific protein tyrosine kinase Proto-oncogene Chromosomal aberration involving LCK in leukemias.

Multiple significant peptide identifications11

Page 12: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Unannotated Splice Isoform

12

Page 13: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Unannotated Splice Isoform

13

Page 14: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Translation start-site correction

Halobacterium sp. NRC-1 Extreme halophilic Archaeon, insoluble membrane

and soluble cytoplasmic proteins Goo, et al. MCP 2003.

GdhA1 gene: Glutamate dehydrogenase A1

Multiple significant peptide identifications Observed start is consistent with Glimmer 3.0

prediction(s)17

Page 15: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Halobacterium sp. NRC-1ORF: GdhA1

K-score E-value vs PepArML @ 10% FDR Many peptides inconsistent with annotated

translation start site of NP_279651

0 40 80 120 160 200 240 280 320 360 400 440

18

Page 16: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

What if there is no "smoking gun" peptide…

20

Page 17: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

What if there is no "smoking gun" peptide…

21

Page 18: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

What if there is no "smoking gun" peptide…

22

Page 19: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

HER2/Neu Mouse Model of Breast Cancer

Paulovich, et al. JPR, 2007 Study of normal and tumor mammary tissue

by LC-MS/MS 1.4 million MS/MS spectra

Peptide-spectrum assignments Normal samples (Nn): 161,286 (49.7%) Tumor samples (Nt): 163,068 (50.3%)

4270 proteins identified in total 2-unique generalized protein parsimony

23

Page 20: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Nascent polypeptide-associated complex subunit alpha

24

7.3 x 10-8

Page 21: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Pyruvate kinase isozymes M1/M2

25

2.5 x 10-5

Page 22: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Phyloproteomics

Fragment intact proteins (top-down MS)

Match the spectra to protein sequences

Place the organism phylogenetically

Works even for unknown microorganisms without any available sequences

26

Page 23: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

27

E:\Yersinia Work\yr_inclusion 3/11/2009 3:43:13 PM yrohdei

RT: 19.04 - 25.39

19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0

Time (min)

0

20

40

60

80

100

0

20

40

60

80

100

Re

lative

Ab

un

da

nce

25.3619.9919.93

25.2720.04 25.2319.89 23.0322.97 23.08

20.1019.83 23.64 25.1923.7022.88 24.6324.5720.1422.82

20.2019.7822.7220.2519.48

22.5220.41 22.0821.8420.60 21.04

20.00

21.03 21.46

NL: 1.66E8

TIC MS yr_inclusion

NL: 1.01E7

TIC F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00] MS yr_inclusion

yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

10

20

30

40

50

60

70

80

90

100

Re

lative

Ab

un

da

nce

576.83z=2

840.16z=7

720.39z=2 903.81

z=3785.41

z=4694.62

z=4

584.57z=4

928.49z=4559.55

z=41804.48

z=?992.53

z=3200.78z=?

329.71z=?

1253.14z=?

555.29z=4

1610.27z=?

1883.75z=?

1491.23z=?

1118.93z=?

1666.89z=?

1345.30z=?

461.16z=?

756.70 +8 MW 6044.11

CID Protein Fragmentation Spectrum from Y. rohdei

Page 24: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

28

E:\Yersinia Work\yr_inclusion 3/11/2009 3:43:13 PM yrohdei

RT: 19.04 - 25.39

19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0

Time (min)

0

20

40

60

80

100

0

20

40

60

80

100

Re

lative

Ab

un

da

nce

25.3619.9919.93

25.2720.04 25.2319.89 23.0322.97 23.08

20.1019.83 23.64 25.1923.7022.88 24.6324.5720.1422.82

20.2019.7822.7220.2519.48

22.5220.41 22.0821.8420.60 21.04

20.00

21.03 21.46

NL: 1.66E8

TIC MS yr_inclusion

NL: 1.01E7

TIC F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00] MS yr_inclusion

yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

10

20

30

40

50

60

70

80

90

100

Re

lative

Ab

un

da

nce

576.83z=2

840.16z=7

720.39z=2 903.81

z=3785.41

z=4694.62

z=4

584.57z=4

928.49z=4559.55

z=41804.48

z=?992.53

z=3200.78z=?

329.71z=?

1253.14z=?

555.29z=4

1610.27z=?

1883.75z=?

1491.23z=?

1118.93z=?

1666.89z=?

1345.30z=?

461.16z=?

756.70 +8 MW 6044.11

CID Protein Fragmentation Spectrum from Y. rohdei

Match to Y. pestis 50SRibosomal Protein L32

Page 25: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Exact match sequence…

29

Page 26: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Phylogeny: Protein vs DNA

30

Protein Sequence 16S-rRNA Sequence

Page 27: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

What about mixtures?

31

Page 28: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

34

DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128

Eight proteins identified with "large" |Δ|

Identified E. herbicola proteins

Page 29: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

36

DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128

Extract N- and C-terminus sequence supported by at least 3 b- or y-ions

Identified E. herbicola proteins

Page 30: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

37

E. herbicola protein sequences

Page 31: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

39

Phylogenetic placement of E. herbicola

Phylogram Cladogramphylogeny.fr – "One-Click"

Page 32: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Glycoprotein Microheterogeneity

Glycosylation is important, but our analytic tools are rather rudimentary Detach glycans (PNGase-F) and analyze glycans Detach glycans (PNGase-F) and analyze peptides Get glycan structures, but no association with protein

or protein site, or Get glycosylation sites, but no association with glycan

structures. We analyze glycopeptides directly…

Challenges all facets of glycoproteomics40

Page 33: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Altered N-Glycosylation in Cancer

•41

NX

S/T

COO-

NH3+Fut-VIII(α1-6 Fuc)Comunale, 2010

GnT-V(β1-6 GlcNAc)Wang, 2007

ST-VI Gal1(α 2-6 NeuAc)Hedlund, 2008

Fut-VI(α1-3 Fuc)Higai,2008

Glycosyltransferase Expression or Glycan Analyses GalNAc Sialic Acid Gal GlcNAc Man

K. Chandler

Page 34: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

The informatics challenge

Identify glycopeptides in large-scale tandem mass-spectrometry datasets Many glycopeptide enriched fractions Many tandem mass-spectra / fraction

Good, but not great, instrumentation QStar Elite – CID, good MS1/MS2 resolution

Strive for hypothesis-generating analysis Site-specific glycopeptide characterization Glycoform occupancy in differentiated samples

42

Page 35: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

CID Glycopeptide Spectrum

43

Page 36: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Observations

Oxonium ions (204, 366) help distinguish glycopeptides from peptides… …but do little to identify the glycopeptide

Few peptide b/y-ions to identify peptides… …but intact peptide fragments are common

If the peptide can be guessed, then… …the glycan's mass can be determined

44

Page 37: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Hap

tog

lob

in (

HP

T_H

UM

AN

)

NLFLNHSE*NATAK

MVSHHNLTTGATLINE

VVLHPNYSQVDIGLIK

Haptoglobin Standard

45

• N-glycosylation motif (NX/ST)* Site of GluC cleavage

Pompach et al. Journal of Proteome Research 11.3 (2012): 1728–1740.

Page 38: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Tuning the filters…

We estimate the number of false-positives……so that the user can tune the search parameters

47

Page 39: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Application of Exoglycosidasesto locate Fucose

At ITIH4 site N517

48LPTQNITFQTE

LPTQNITFQTE

LPTQNITFQTE

LPTQNITFQTE

K. Chandler

Page 40: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

NVVFVIDK ITIH4 Glycopeptide

49

K. Chandler

Page 41: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Similar Glycopeptides Spectra( mass Δ ~ +162 Da)

50

MVSHHNLTTGATLINE

?

+162 Da

Page 42: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Fragmented Glycopeptides( mass Δ ~ +162 Da)

51

MVSHHNLTTGATLINE

?

+162 Da

MVSHHNLTTGATLINE

Page 43: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Propagating Annotations

•MVS+A1G1

•MVS+A1G1

•MVS+A2G2

•MVS+A2G2

•MVS+A2G2

•VVL+A1G1

•VVL+A2G2

52

G. Berry

Page 44: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Summary

Mass-spectrometry coupled with protein chemistry and good informatics can look beyond the obvious to the unexpected...

…and there is plenty to find!

53

Page 45: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown

Acknowledgements

Edwards lab Kevin Chandler Gwenn Berry

Fenselau lab (UMD) Colin Wynne Avantika Dhabaria

Goldman lab (GU) Kevin Chandler Petr Pompach

NSF Graduate Fellowship (Chandler)

Funding: NCI

54