Proteomicscbs.umn.edu/sites/cbs.umn.edu/files/public/downloads/... · 2019. 3. 29. · repositories...

Preview:

Citation preview

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Proteomics Data Analysis using Galaxy-P

Center for Mass Spectrometry and Proteomics

November 23rd 2015 Pratik Jagtap

http://www.cbs.umn.edu/msp

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Proteomics Data Analysis using Galaxy-P •  Proteomics Workflow •  Search Databases •  Galaxy Platform •  Generating a Database within GalaxyP •  Peaklist Conversion •  Search algorithms •  Using search algorithms within GalaxyP •  Protein Inference •  Using PeptideShaker within GalaxyP

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Documentation: http://z.umn.edu/augworkshopgalaxyp

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Proteomics Data Analysis using Galaxy-P •  Proteomics Workflow •  Search Databases •  Galaxy Platform •  Generating a Database within GalaxyP •  Peaklist Conversion •  Search algorithms •  Using search algorithms within GalaxyP •  Protein Inference •  Using PeptideShaker within GalaxyP

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Eng  et  al  2011  Mol  Cell  Proteomics.  10(11):  R111.009522.  

PROTEOMICS WORKFLOW

5

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

SEARCH DATABASES

Mass spectrum Reference Protein Database

from genomic annotation Peptide Spectral Match

6

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Swiss-Prot is the manually annotated and reviewed section of the UniProt Knowledgebase (UniProtKB).

It is a high quality annotated and non-redundant protein sequence database,

which brings together experimental results, computed features and scientific

conclusions. http://en.wikipedia.org/wiki/Swiss-Prot

TrEMBL contains high-quality computationally analyzed records, which are enriched with automatic annotation.

The translations of annotated coding sequences in the EMBL-Bank/GenBank/DDBJ nucleotide sequence database are automatically processed and entered in

TrEMBL. http://en.wikipedia.org/wiki/TrEMBL

PROTEOMIC DATABASES

7

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

CUSTOMIZED PROTEOMIC DATABASES

Customized database

repositories (CPTAC / UniMesh)

Genomic DNA

sequences.

Expressed sequence

tags / cDNA sequences.

Six-frame translation

Three-frame translation

Metagenomic databases.

Translation

RNASeq data.

Translation and database reduction

workflows

Proteomic databases.

8

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

LOOKING BEYOND THE KNOWN PROTEOME

Mass spectrum Reference Protein Database

from genomic annotation

Cancer / Disease related Databases such as COSMIC, IARC p53, OMIM…

Deep genome sequencing data from ICGC, TCGA and CPTAC

RNASeq data (Customized OR

Combined)

6-frame DNA sequences. 3-frame cDNA sequences. Identification of

peptides corresponding

to novel proteoforms.

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Proteomics Data Analysis using Galaxy-P •  Proteomics Workflow •  Search Databases •  Galaxy Platform •  Generating a Database within GalaxyP •  Peaklist Conversion •  Search algorithms •  Using search algorithms within GalaxyP •  Protein Inference •  Using PeptideShaker within GalaxyP

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

GALAXY PLATFORM

Benefits of Galaxy •  A web-based bioinformatics data analysis platform. •  Software accessibility and usability. •  Share-ability of tools, workflows and histories. •  Reproducibility and ability to test and compare results after using multiple

parameters. •  Software tools can be used in a sequential manner to generate analytical workflows

that can be reused, shared and creatively modified for multiple studies.

Goecks J et al Genome Biol. 2010;11(8):R86.

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

TOOLS & WORKFLOWS •  Software tools can be used in a sequential manner to generate analytical

workflows that can be reused, shared and creatively modified for multiple studies.

For example, Protein Database Downloader downloads UniProt protein FASTA

databases of various organisms.

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Galaxy-P: https://galaxyp.msi.umn.edu/

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Proteomics Data Analysis using Galaxy-P •  Proteomics Workflow •  Search Databases •  Galaxy Platform •  Generating a Database within GalaxyP •  Peaklist Conversion •  Search algorithms •  Using search algorithms within GalaxyP •  Protein Inference •  Using PeptideShaker within GalaxyP

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

INPUTS : Mass spectral data and search database.

The dataset will be searched against FASTA database with human proteins, contaminant proteins, spiked in proteins and a subset of 3-frame translated cDNA database from EnSEMBL.

INPUTS: a) MGF formatter MGF files. (dataset collection) b) ABRF-Spike4: FASTA sequences of 4 spiked in proteins. c) FASTA File from EnSEMBL Searches: Subset of 3-frame translated cDNA database from EnSEMBL (our template for identifying novel proteoforms). d) Human UniProt FASTA file + contaminant proteins.

HeLa cell lysate

4 proteins spiked in (10 fmols each)

Digested O/N with trypsin

Liquid chromatography fractionation (10 fractions)

Thermofinnigan Orbitrap Velos (Orbi MS, MS/MS HCD)

RAW Files

mzml files

msconvert

MGF files

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Log in using your MSI login and password. Click on http://z.umn.edu/history1 Import history and click on ‘start using this history’ Click on http://z.umn.edu/workflow1 Choose import to copy the workflow into your user workflows. On the confirmation screen, select start using this workflow to navigate to your user. In the workflows menu select Run Workflow 1 from the drop down menu. Appropriately assign each input database from History 1 to the corresponding input or the workflow and ‘Run’ the workflow.

GENERATING A DATABASE

1  

2  

3  

4  

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

WORFLOW 1

17

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Tools used in the workflow

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Select History 1

Import history

Start using this history

Select Workflow 1

Import workflow

Start using this workflow

Run Workflow 1

INPUT

WORKFLOW

http://z.umn.edu/history2

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Proteomics Data Analysis using Galaxy-P •  Proteomics Workflow •  Search Databases •  Galaxy Platform •  Generating a Database within GalaxyP •  Peaklist Conversion •  Search algorithms •  Using search algorithms within GalaxyP •  Protein Inference •  Using PeptideShaker within GalaxyP

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Eng  et  al  2011  Mol  Cell  Proteomics.  10(11):  R111.009522.  

PROTEOMICS WORKFLOW

21

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

INPUTS : Mass spectral data and search database.

The dataset will be searched against FASTA database with human proteins, contaminant proteins, spiked in proteins and a subset of 3-frame translated cDNA database from EnSEMBL.

INPUTS: a) MGF formatter MGF files. (dataset collection) b) ABRF-Spike4: FASTA sequences of 4 spiked in proteins. c) FASTA File from EnSEMBL Searches: Subset of 3-frame translated cDNA database from EnSEMBL (our template for identifying novel proteoforms). d) Human UniProt FASTA file + contaminant proteins.

HeLa cell lysate

4 proteins spiked in (10 fmols each)

Digested O/N with trypsin

Liquid chromatography fractionation (10 fractions)

Thermofinnigan Orbitrap Velos (Orbi MS, MS/MS HCD)

RAW Files

mzml files

msconvert

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Eng  et  al  2011  Mol  Cell  Proteomics.  10(11):  R111.009522.  

MASS SPECTRAL DATA

23

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

RAW DATA CONVERSION TOOL

.RAW

msconvert ProteoWizard

mzML

http://z.umn.edu/msconvert

MGF Formatter

MGF

http://z.umn.edu/mgfformatter

24

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Click on http://z.umn.edu/history2b Import history and click on ‘start using this history’

5  

6  

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Proteomics Data Analysis using Galaxy-P •  Proteomics Workflow •  Search Databases •  Galaxy Platform •  Generating a Database within GalaxyP •  Peaklist Conversion •  Search algorithms •  Using search algorithms within GalaxyP •  Protein Inference •  Using PeptideShaker within GalaxyP

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

A face in the crowd: recognizing peptides through database search. Eng et al 2011 Mol Cell Proteomics. 10(11)

PROTEOMICS WORKFLOW

27

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Mass spectrum Reference Protein Database

from genomic annotation Peptide Spectral Match

DATABASE SEARCH

28

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Nesvizhskii et al Nature Methods - 4, 787 - 797 (2007)

DATABASE SEARCH

29

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Nesvizhskii et al Nature Methods - 4, 787 - 797 (2007)

DATABASE SEARCH

30

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

SEARCHGUI

Vaudel M. et al Proteomics (2011) 11(5) https://code.google.com/p/searchgui/ 31

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Comet

Myrimatch

MSGF+

MS Amanda

MULTIPLE SEARCH ALGORITHMS Tabb et al, J. Proteome Res., 2007, 6 (2)

Eng et al, Proteomics. 2013, 13(1)

Kim and Pevzner PA. Nat Commun., 2014, 5(1)

Geer et al, J Proteome Res., 2004,3(5).

Craig and Beavis. Bioinformatics., 2004, Jun 20(9)

Dorfer et al, J Proteome Res., 2014, 13(8).

32

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

MULTIPLE SEARCH ALGORITHMS

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Proteomics Data Analysis using Galaxy-P •  Proteomics Workflow •  Search Databases •  Galaxy Platform •  Generating a Database within GalaxyP •  Peaklist Conversion •  Search algorithms •  Using search algorithms within GalaxyP •  Protein Inference •  Using PeptideShaker within GalaxyP

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Click on http://z.umn.edu/history3b Import history and click on ‘start using this history’

7  

8  

Identification Algorithms: OMSSA, MS-GF+ and Comet Database Search Parameters 1: Precursor Accuracy Unit: ppm 2: Precursor Ion m/z Tolerance: 10.0 3: Fragment Ion m/z Tolerance: 0.01 4: Enzyme: Trypsin 5: Number of Missed Cleavages: Not implemented 6: Database: input_database.fasta 7: Forward Ion: b 8: Rewind Ion: y 9: Fixed Modifications: mmts on c 10: Variable Modifications: oxidation of m

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

SEARCHGUI PARAMETERS

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

SEARCHGUI PARAMETERS

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Identification Algorithms: OMSSA, MS-GF+ and Comet Database Search Parameters 1: Precursor Accuracy Unit: ppm 2: Precursor Ion m/z Tolerance: 10.0 3: Fragment Ion m/z Tolerance: 0.01 4: Enzyme: Trypsin 5: Number of Missed Cleavages: Not implemented 6: Database: input_database.fasta 7: Forward Ion: b 8: Rewind Ion: y 9: Fixed Modifications: mmts on c 10: Variable Modifications: oxidation of m

SEARCHGUI PARAMETERS

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Proteomics Data Analysis using Galaxy-P •  Proteomics Workflow •  Search Databases •  Galaxy Platform •  Generating a Database within GalaxyP •  Peaklist Conversion •  Search algorithms •  Using search algorithms within GalaxyP •  Protein Inference •  Using PeptideShaker within GalaxyP

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

PEPTIDESHAKER

Vaudel et al Nature Biotechnology, 33, (2015)

http://galaxyproteomics.github.io/peptideshaker/

40

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Slide from Alexey Nesvizshkii talk at http://www.scivee.tv/node/12671

PEPTIDESHAKER : PROTEIN INFERENCE

41

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Click on http://z.umn.edu/history4b Import history and click on ‘start using this history’

9  

10  

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

4.3 Peptide Shaker in GalaxyP

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

PEPTIDESHAKER : TARGET-DECOY SEARCH

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

PEPTIDESHAKER : TARGET-DECOY SEARCH

45

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

4.3 Peptide Shaker in GalaxyP

46

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

PEPTIDESHAKER: OUTPUTS

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

PEPTIDESHAKER: OUTPUTS

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

http://z.umn.edu/augworkshopgalaxyp

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Complex Workflows Galaxy-P provides an integrated platform for every step of proteogenomic analysis. •  Build target database – download and

translate EST databases or perform gene prediction with Augustus.

•  Numerous tools for identification and text manipulation.

•  Workflow utilizing BLAST to identify novel peptides.

•  Tool to assess peptide-spectrum matches and visualize spectra.

•  Visualize identified peptides on the genome. •  140 steps: Seamless, integrated

proteogenomic workflow.

Flexible and accessible workflows for improved proteogenomic analysis using Galaxy framework. J. Proteome Res., DOI: 10.1021/pr500812t Link: z.umn.edu/pgfirstlook

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Links to workflows, webcast, pages, documentation and publications.

Workflows Proteogenomic studies: http://z.umn.edu/pg140 Metaproteomic studies: http://z.umn.edu/metaproteomics1

Webcast Using ProteinPilot within Galaxy-P: z.umn.edu/ppingp

Pages Proteogenomics page: z.umn.edu/proteinpilotpage Metaproteomics page: z.umn.edu/metaproteomicspage

Workshop / Tutorial on proteogenomics: Mass Spectrometry-based Proteomics Data Analysis using Galaxy-P: z.umn.edu/gcc2015gp

Manuscripts

•  Metaproteomic analysis using the Galaxy framework. Proteomics. (2015) doi: 10.1002/pmic.201500074. PMID: 26058579.

•  Multi-omic data analysis using Galaxy. Nat Biotechnol. (2015) 33(2):137-9. doi: 10.1038/nbt.3134. PMID: 25658277

•  Flexible and accessible workflows for improved proteogenomic analysis using the Galaxy framework. J Proteome Res. (2014)13(12):5898-908. doi: 10.1021/pr500812t. PMID:25301683

•  Proteomic profiles in acute respiratory distress syndrome differentiates survivors from non-survivors. PLoS One. (2014) 7;9(10):e109713. doi: 10.1371/journal.pone.0109713. PMID: 25290099

•  Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations. BMC Genomics. (2014) 15:703. doi: 10.1186/1471-2164-15-703. PubMed PMID: 25149441

Proteogenomics page: z.umn.edu/proteinpilotpage Metaproteomics page: z.umn.edu/metaproteomicspage

5: Sheynkman GM, Johnson JE, Jagtap PD, Shortreed MR, Onsongo G, Frey BL, Griffin TJ, Smith LM.

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Proteomics Data Analysis using Galaxy-P •  Proteomics Workflow •  Search Databases •  Galaxy Platform •  Generating a Database within GalaxyP •  Peaklist Conversion •  Search algorithms •  Using search algorithms within GalaxyP •  Protein Inference •  Using PeptideShaker within GalaxyP

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

QUESTIONS?

Follow us on twitter.com/usegalaxyp

Visit http://usegalaxyp.org

or http://galaxyp.msi.umn.edu

or

Recommended