33
Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Embed Size (px)

Citation preview

Page 1: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Principles of Shotgun Proteomics and Proteogenomics

Boris MačekProteome Center Tuebingen

InnoMol Proteomics WorkshopApril 8, 2014

Page 2: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Aebersold R and Mann M. 2003. Nature 422: 198-207

General MS-based proteomics workflow

Page 3: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Principle of protein database search

m/z

Inte

nsity

Database

m/z

Inte

nsity

m/z

Inte

nsity

Translated Genomic SequenceTheoretical Spectra for Proteins

Theoretical spectra that fall into the defined mass range.

Each of them is compared to our fragmentIon spectra.

3

Page 4: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

>sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3 MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSSWRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFYLKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFYYEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGDAGEGEN>sp|P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1 MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASWRIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVFYYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVFYYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGEEQNKEALQDVEDENQ>sp|P62258-2|1433E_HUMAN Isoform SV of 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE MVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASWRIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVFYYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVFYYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGEEQNKEALQDVEDENQ>sp|Q04917|1433F_HUMAN 14-3-3 protein eta OS=Homo sapiens GN=YWHAH PE=1 SV=4 MGDREQLLQRARLAEQAERYDDMASAMKAVTELNEPLSNEDRNLLSVAYKNVVGARRSSWRVISSIEQKTMADGNEKKLEKVKAYREKIEKELETVCNDVLSLLDKFLIKNCNDFQYESKVFYLKMKGDYYRYLAEVASGEKKNSVVEASEAAYKEAFEISKEQMQPTHPIRLGLALNFSVFYYEIQNAPEQACLLAKQAFDDAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDQQDEEAGEGN>tr|F2Z3E5|F2Z3E5_HUMAN Hydroxyacid-oxoacid transhydrogenase, mitochondrial OS=Homo sapiens GN=ADHFE1 PE=4 SV=1 MAAAARARVAYLLRQLQRAACQCPTHSHTYSQDGCFKY>tr|Q5SS58|Q5SS58_HUMAN MHC class I polypeptide-related sequence A OS=Homo sapiens GN=MICA PE=4 SV=2 MGQRDQGLDRERKGPQDDPGSYQGPERRNFLKEDAMKTKTHYHAMHADCLQELRRYLESGVVLRRTVPPMVNVTRSEASEGNITVTCRASSFYPRNIILTWRQDGVSLSHDTQQWGDVLPDGNGTYQTWVATRICRGEEQRFTCYMEHSGNHSTHPVPSGKVLVLQSHWQTFHVSAVAAGCCYFCYYYFLCPLL>tr|Q5T409|Q5T409_HUMAN Disrupted in schizophrenia 1 OS=Homo sapiens GN=DISC1 PE=2 SV=1 MPGGGPQGAPAAAGGGGVSHRAGSRDCLPPAACFRRRRLARRPGYMRSSTGPGIGFLSPAVGTLFRFPGGVSGEESHHSESRARQCGLDSRGLLVRSPVSKSAAAPTVTSVRGTSAHFGIQLRGGTRLPDRLSWPCGPGSAGWQQEFAAMDSSETLDASWEAACSDGARRVRAAGSLPSAELSSNSCSPGCGPEVPPTPPGSHSAFTSSFSFIRLSLGSAGERGEAEGCPPSREAESHCQSPQEMGAKAASLDGPHEDPRCLSRPFSLLATRVSADLAQAARNSSRPERDMHSLPDMDPGSSSSLDPSLAGCGGDGSSGSGDAHSWDTLLRKWEPVLRDCLLRNRRQMEVISLRLKLQKLQEDAVENDDYDKAETLQQRLEDLEQEKISLHFQLPSRQPALSSFLGHLAAQVQAALRRGATQQASGDDTHTPLRMEPRLLEPTAQDSLHVSITRRDWLLQEKQQLQKEIEALQARMFVLEAKDQQLRREIEEQEQQLQWQGCDLTPLVGQLSLGQLQEVSKALQDTLASAGQIPFHAEPPETIRSLQERIKSLNLSLKEITTKVCMSEKFCSTLRKKVNDIETQLPALLEAKMHAISGNHFWTAKDLTEEIRSLTSEREGLEGLLSKLLVLSSRNVKKLGSVKEDYNRLRREVEHQETAYETSVKENTMKYMETLKNKLCSCKCPLLGKVWEADLEACRLLIQSLQLQEARGSLSVEDERQMDDLEGAAPPIPPRLHSEDKRKTPLKESYILSAELGEKCEDIGKKLLYLEDQLHTAIHSHDEDLIHSLRRELQMVKETLQAMILQLQPAKEAGEREAAASCMTAGVHEAQA

Database

Translated Genomic SequenceTheoretical Spectra for Proteins

MaxQuantMaxQuantSoftwareSoftware

(20,246 reviewed proteins)(20,246 reviewed proteins)

(51,188 un-reviewed)(51,188 un-reviewed)

Homo Sapiens Reference Proteome71,434 entries

Homo Sapiens Reference Proteome71,434 entries

4

Principle of protein database search

Page 5: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

MS instrumentation in proteomics

Aebersold R and Mann M. 2003. Nature 422: 198-207

Page 6: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Gradient elution:~200 nl/min

Column (75 µm)/spray tip (8 μm)

Reverse-phase C18 beads, 3 μm

Nanoflow LC/MS interface set-up:

Platin-wire2.0 kV

Sample Loading:~700 nl/min

No precolumn or split!

LTQ-Orbitrap

Proxeon Easy nLC nanoflow LC System

12-15 cm

Coupling LC to MS for complex mixture analysis

Page 7: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

BSA tryptic in-solution digest 50 fmol on column

Coupling LC to MS for complex mixture analysis

Page 8: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

SourceLinear ion trap

(LTQ) C-TrapOctopolecoll. cell

Orbitrap

LTQ-Orbitrap (2005)

MS-Full Scan

MS2 MS2

0 300 600 900 1200 1500 1800

Orbitrap-MS

LTQ-MS

LTQ-FT MS/MS optimized scan cycle:

Time [msec]

MS2 MS2 MS2

→ peptide mass measurement

→ peptide sequencing

Page 9: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Data processing workflow: MaxQuant

Page 10: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

□ CID Identified+ CID Not Iidentified

Acquisition speed

LTQ Orbitrap XL LTQ Orbitrap Velos

Page 11: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

# of MS/MS Scans

Acquisition speed

Page 12: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC)

Quantitation and identification by MS(nanoscale LC-MS/MS)

Lys-12C6 Lys-13C6

Resting cells Treated (drug, GF)

Combine and lyse,protein purification

or fractionation

”normal AA” ”heavy AA”

Proteolysis(trypsin, Lys-C, etc.)

Page 13: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Current research at the PCT

• Proteogenomics• B. subtilis, E. coli (Krug et al, 2011, Mol Bosystems; 2013 MCP)• Pristionchus pacificus (Borchert et al, 2010, Genome Res)• cancer cell lines/tissues

• Proteomics for systems biology• In-depth sequencing and quantitation of model organisms (B.subtilis, E.coli, S. pombe, A. thaliana) (Soufi et al, 2010, J Prot Res; Schütz et al, 2011, Plant Cell; Soufi et al, 2012, Curr Opinion Microbiol; Soares et al, 2013, JPR)

• Phosphoproteomics• targets of Aurora kinase in S. pombe (Koch et al, 2011, Science Signaling)• targets of protein kinase D in human cells (Franz-Wachtel et al., 2012, MCP)• targets of S/T/Y kinases and phosphatases in B.subtilis and E.coli

• Protein modifications• ubiquitylation (Ikeda et al, 2011, Nature)• lysine acetylation (Carpy et al., in preparation)

• Clinical proteomics• genetic rescue of Fragile X phenotype in FMR1 KO mice

Page 14: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Super-SILAC in Bacteria

Page 15: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Super-SILAC in Bacteria

Page 16: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

E. coli: Replicate 1 and 2

Parameter Number

Total MS/MS 757,835

Total Peptides Identified 18,273

Total Proteins Identified 2,292

Single Peptide Hits 6.5%

Total Proteins Quantified* 1923*in all phases of growth

Soufi et al. in preparation

Page 17: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Biological reproducibility

Soufi et al. in preparation

Page 18: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Proteome dynamics during growth

Soufi et al. in preparation

Page 19: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Dynamics of stress proteins during growth

Soufi et al. in preparation

Page 20: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Estimation of absolute copy numbers

OD

60

0

Time (min)

T1 T2

T3

T4T5 T6 T7

1800 5760

UPS standard (iBAQ)

Soufi et al. in preparation

Page 21: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Summary of absolutely quantified proteins

During Growth MembraneProteins

Identified 2,292 684

Quantified (All Phases)

1,923 588

Absolutely Quantified

2,096 494

Soufi et al. in preparation

Page 22: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Most abundant Proteins (ES)

Protein Copies per cell (ES)

Elongation factor Tu 1;P-43 341,047.56

Outer membrane protein A 313,464.22

Braun lipoprotein 216,037.00

Cysteine synthase A;O 187,791.26

Enolase 164,914.38

DNA-binding protein HU-alpha 136,208.45

Scavengase P20;Thiol peroxidase 131,599.61

Glyceraldehyde-3-phosphate dehydrogenase A 127,416.09

Malate dehydrogenase 123,943.77

IDP;Isocitrate dehydrogenase [NADP] 117,787.02

High-affinity zinc uptake system protein znuA 111,748.80

Cadmium-induced protein yodA 107,098.12

Outer membrane protein C 106,108.02

50S ribosomal protein L6 98,724.11

Universal stress protein A 94,784.63

Soufi et al. in preparation

Page 23: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Dynamic range of protein abundance

Soufi et al. in preparation

Co

un

t

Log2 Protein Copy Number

Blue: All proteins Red: Membrane proteins

Page 24: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

• Application of tandem mass spectrometry to genome re-annotation• Search MS/MS spectra against a database containing the complete genome translated in 6 reading frames

Proteogenomics

Page 25: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Problem: database size and structure

•Incompatibility with some data processing programs

•Long search times

•Decreased sensitivity of database search

•Unequal target and decoy search spaces

•Most translated frames are in fact decoy sequences

•Overestimation of the FDR

Predicted ORFsFrame1Frame2Frame3Frame4Frame5Frame6REV_Predicted ORFsREV_Frame1REV_Frame2REV_Frame3REV_Frame4REV_Frame5REV_Frame6

Predicted ORFsREV_Predicted ORFs

„Ususal“ Proteomics applications

Proteogenomics applications

Page 26: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

• Model Gram-negative bacterium• Small (4.6 Mb) and well characterized genome• ~4,300 protein coding genes (manually annotated and reviewed) • Comprehensive high accuracy MS dataset comprising >42,000 unique

peptide sequences from >2,600 proteins

• Hypothesis: genome annotation approaches completeness• Assessment of general properties of a simple proteogenomic experiment

Results I

Proteogenomics of E. coli

MS/MS spectra acquired

MS/MS spectra identified

MS/MS spectra identified (%)

Peptide sequences

Novel peptides

Decoy peptides

Lab contaminant peptides

E. coli proteins

MQ 1,941,724 370,231 19,1 33,964 263 336 306 2,653TPP

1,941,724 162,028 8.3 25,724 59 0 209 2,524

Page 27: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

1.9M peptide mass spectra

Results I

Proteogenomics of E. coli

Page 28: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

A B

C D

Position (Mb)

MFEVTFWWRDPQGSEEY... VGSESWWQSK TWGYGVTALKVGSESWWQSKHGPEWQRLNDEMFEVTFWWRDPQGSEEY...

Annotated genes

Detected peptides

Six-frame ORFs

MLNQKIQNPNPDELMIEVDLCYELDPYELKLDEMIEAEP... KPPQIRISL ...NAVFKPPQIRISL LATNFGGWILMLNQKIQNPNPDELMIEVDLCYELDPYELKLDEMIEAEP...

Position (Mb)

tref

Annotated genes

Detected peptides

Six-frame ORFs

PEP = 0.027976 PP = 0.9504

PEP = 4.02E-08 PP = 0.9999

yhja tref yhjb

fepa fes ybdz

fes

Proteogenomics of E. coli

Krug et al. Mol Cell Proteomics, 2013

Page 29: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Majority of Novel Peptides are False Positives

Results IKrug et al. Mol Cell Proteomics, 2013

Page 30: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Assessment of Processing Workflows

Results IKrug et al. Mol Cell Proteomics, 2013

Page 31: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Deep Proteome Coverage of Escherichia coli

20-fold base coverage of 27.5% genome sequence

0 50 100 150

Mean: 20 scansMedian: 7 scans

MS/MS scans

Results IKrug et al. Mol Cell Proteomics, 2013

Page 32: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Conclusions

• proteomics reaches analytical capacity to identify and quantify all gene products in microorganisms grown in culture

• several regulatory protein modifications (e.g. S/T/Y-phosphorylation, lysine acetylation) can routinly be analyzed on a global scale

• many challenges ahead:• analysis of H/D-phosphorylation• analysis of environmental samples• coverage of genome/protein sequence by detected peptides

• future developments:

• faster MS/MS acquisition• smarter acquisition software• large-scale targeted proteomics• metaproteomics and individual proteomics

Page 33: Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

Acknowledgements

Proteome Center TuebingenBoumediene SoufiNelson C. Soares

Philipp SpätKarsten Krug

Alejantro CarpySasa PopicSilke Wahl

Funding