27
The ProteomeXchange Consortium: 2016 update Dr. Juan Antonio Vizcaíno Proteomics Team Leader EMBL-European Bioinformatics Institute Hinxton, Cambridge, UK

ProteomeXchange update HUPO 2016

Embed Size (px)

Citation preview

EMBL-EBI Now and in the Future

The ProteomeXchange Consortium: 2016 updateDr. Juan Antonio Vizcano

Proteomics Team LeaderEMBL-European Bioinformatics InstituteHinxton, Cambridge, UK

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016

1

PSI Spring Meeting 2017

Beijing Proteome Research Center, ChinaApril 24-26, 2017April 23: 2nd PHOENIX Mini-Symposium on Frontiers of ProteomicsApril 27: Hiking the Great Wall

Focus topics:Quality control: qcMLProteogenomics formatsproXI: proteomics eXpression InterfacePrivacy and Proteomics Data

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016

2

OverviewGeneral introduction to ProteomeXchange

Overall submission statistics

Updated HPP guidelines

Specifics about MassIVE (Nuno)

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016

ProteomeXchange: A Global, distributed proteomics database

PASSEL (SRM data)

PRIDE (MS/MS data)

MassIVE (MS/MS data)

Raw

ID/Q

Meta

Mandatory raw data deposition since July 2015

Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories.

http://www.proteomexchange.org

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016

ProteomeXchange: A Global, distributed proteomics database

PASSEL (SRM data)

PRIDE (MS/MS data)

MassIVE (MS/MS data)

Raw

ID/Q

Meta

jPOST(MS/MS data)

Mandatory raw data deposition since July 2015

Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories.

http://www.proteomexchange.orgNew in 2016

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016ProteomeCentralMetadata / ManuscriptRaw DataResults

Journals

Peptide Atlas Receiving repositories

PRIDE

Researchers results

Raw dataMetadata

PASSEL

Research groupsReanalysis of datasets

MassIVE

jPOST MS/MS data(as completesubmissions)

Any other workflow (mainly partial submissions)

DATASETS

SRM data

Reprocessed results

MassIVEProteomeXchange data workflow

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016

6

ProteomeCentral: Centralised portal for all PX datasetshttp://proteomecentral.proteomexchange.org/cgi/GetDataset

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016ProteomeCentralMetadata / ManuscriptRaw DataResults

Journals

Peptide Atlas Receiving repositories

PRIDE

Researchers results

Raw dataMetadata

PASSEL

Research groupsReanalysis of datasets

MassIVE

jPOST MS/MS data(as completesubmissions)

Any other workflow (mainly partial submissions)

DATASETS

SRM data

Reprocessed results

MassIVEProteomeXchange data workflow

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016

8

ProteomeCentralMetadata / ManuscriptRaw DataResults

Journals

UniProt/neXtProt

Peptide Atlas

Other DBs Receiving repositories

PRIDE

GPMDB

Researchers results

Raw dataMetadata

PASSEL

proteomicsDB

Research groupsReanalysis of datasets

MassIVE

jPOST MS/MS data(as completesubmissions)

Any other workflow (mainly partial submissions)

DATASETS

OmicsDIIntegration with other omics datasets

SRM data

Reprocessed results

MassIVEProteomeXchange data workflow

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016

9

OmicsDI: Portal for omics datasetshttp://www.ebi.ac.uk/Tools/omicsdi/Aims to integrate of omics datasets (proteomics, transcriptomics, metabolomics and genomics at present). PRIDE MassIVEjPOSTPASSELGPMDB

ArrayExpressExpression Atlas

MetaboLightsMetabolomics WorkbenchGNPS

EGAPerez-Riverol et al., 2016, BioRXxiv

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016PRIDE Proteomes provide an across-dataset and quality filtered view on PRIDE Archive data. Good PSMs are assessed using the PRIDE Cluster approach, based on spectral clustering.10

OmicsDI: Portal for omics datasets

Perez-Riverol et al., 2016, BioRXxiv

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016PRIDE Proteomes provide an across-dataset and quality filtered view on PRIDE Archive data. Good PSMs are assessed using the PRIDE Cluster approach, based on spectral clustering.11

OverviewGeneral introduction to ProteomeXchange

Overall submission statistics

Updated HPP guidelines

Specifics about MassIVE (Nuno)

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016

Countries with at least 100 datasets: 1105 USA 546 Germany 411 United Kingdom 356 China 229 France 188 Netherlands 178 Canada 150 Switzerland 125 Australia 123 Spain 123 Denmark 117 Japan 101 Sweden

ProteomeXchange: 4,534 datasets up until 31st July, 2016Type: 4067 PRIDE 339 MassIVE 115 PeptideAtlas/PASSEL 13 jPOSTPublicly Accessible: 2597 datasets, 57% of all 2334 PRIDE 135 MassIVE 115 PASSEL 13 jPOST

Datasets/year: 2012: 102 2013: 527 2014: 963 2015: 1758 2016 (till end of July): 1184Top Species studied by at least 100 datasets:2010 Homo sapiens 604 Mus musculus 191 Saccharomyces cerevisiae 140 Arabidopsis thaliana 127 Rattus norvegicus 936 reported taxa in total

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016Datasets are being reused more and more.

Data download volume for PRIDE in 2015: ~ 200 TB

Vaudel et al., Proteomics, 2016

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016

14

OverviewGeneral introduction to ProteomeXchange

Overall submission statistics

Updated HPP guidelines

Specifics about MassIVE (Nuno)

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016HPP guidelines version 2.1

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016

16

CompletePartialComplete vs Partial submissions: processed resultsFor complete submissions, it is possible to connect the spectra with the identificationprocessed results and they can be visualized.

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016Complete vs Partial submissions: experimental metadata

CompletePartialGeneral experimental metadata about the projects is similar. However, at the assay level information in partial submissions is not so detailed

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016An observer of ProteomeXchange consortium - iProX

Proteome data sharing platform in ChinaFocusingCollection and sharing of proteome experiment raw dataStandardized metadata of proteome experimentVisualization of proteome dataset

ProvidingA User friendly data submission pipelineStructured management of datasets An effective user authority systemStandardized metadata collectionPowerful computing, storage, and network resources to support the pipelineRemote data backup and synchronous updatewww.iprox.org

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016OverviewGeneral introduction to ProteomeXchange

Overall submission statistics

Updated HPP guidelines

Specifics about MassIVE (Nuno)

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016 MassIVE updateMingxun Wang1,2,4, Jeremy Carver1,4, Nuno Bandeira1-4

1Center for Computational Mass Spectrometry2Computer Science and Engineering3Skaggs School of Pharmacy and Pharmaceutical Sciences4University of California, San Diego

Center forComputationalMassSpectrometry

http://massive.ucsd.edu

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 201621

http://massive.ucsd.edu http://proteomics.ucsd.edu MassIVE InteractivityMassIVE = Mass spectrometry Interactive Virtual Environment

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016Massive reanalysisCommunity knowledge requires reproducible, well-characterized results

MS-GF+ standard database searchReanalyzed 15 TB of Human data with ~185M MS/MS spectra79 million new FDR-controlled PSMs3.6 million modified versions of 2.8 million unique peptide sequences

CPTAC colon cancer available with 5 different results sets[Original] Imported CPTAC results: 6.9M PSMs[Reanalysis] MS-GF+ database search: 8.9M PSMs, 70k mod variants (169k total)[Reanalysis] Spectral library search (MSPLIT): 10M PSMs, including 387K mixture spectra[Reanalysis] Proteogenomics searches of TCGA transcriptomics sequences (Enosi): 6.8M total PSMs, 19,728 proteogenomic events[Reanalysis] Blind modification search (MODa): 7.8M PSMs, 2.8M PSMs for 221k mod variants (306k total), 203K new mod variants (unique modified peptides)

http://massive.ucsd.edu http://proteomics.ucsd.edu

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016Massive: Do it yourselfMSGF+ - Database search engineMSPLIT Spectral Library Search EngineENOSI ProteoGenomic Search EngineMODa- Multi-blind modification database search engineSpectral Networks spectral alignment-based analysis and propagation of identificationsMulti-pass - MSPLIT, MSGFDB, MODa cascade Search WorkflowMSGFDB - Database search engineMSPLIT-DIA Spectral Library Search for SWATHUpload your own! (mzIdentML, mzTab, TSV)

http://massive.ucsd.edu http://proteomics.ucsd.edu

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016Check what others think the spectrum is Massive Search

Find peptide, proteins, PTMsAgreement in spectrum identification?One-stop search across tens of millions of PSMs

OriginalReanalysishttp://massive.ucsd.edu http://proteomics.ucsd.edu

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016What can you do?How can the community work together to reveal the whole human proteome?Mass spectrometrists share DataAt least: partial submissions with raw mass spectrometry data and enough metadata to allow for reanalysisEspecially useful: rare tissues/conditions or very deep acquisition

Biologists share KnowledgeAt least: complete submissions with FDR-filtered results in open format (mzIdentML or mzTab)Especially useful: human-curated knowledge of proteins, PTMs, endogenous peptides, etc

Bioinformaticians share ReanalysesAt least: FDR-filtered results in open format (mzIdentML or mzTab)Especially useful: algorithms that identify new types of PSMs (e.g., PTM-specific, mixtures)http://massive.ucsd.edu http://proteomics.ucsd.edu

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016Aknowledgements: PeopleAttila CsordasTobias TernentGerhard Mayer (de.NBI)

Yasset Perez-RiverolManuel Bernal-LlinaresAndrew Jarnuczak

Former team members, especially: Rui Wang Florian Reisinger Noemi del Toro Jose A. Dianes Henning Hermjakob

Acknowledgements: The PRIDE Team and all PX partnersAll data submitters !!!

Eric DeutschZhi SunDavid CampbellNuno BandeiraMingxun WangJeremy CarverYasushi IshihamaShujiro OkudaShin Kawano

Follow new datasets @proteomexchange

Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 201627

PXD identifierHits/ No files = dataset downloadsDataset Title

PXD00056146578/ 2383 = 20A draft map of the human proteome

PXD00158713435/140 = 96

DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics

PRD00006612748/4090 = 3

Quantitative Proteomics Analysis of the Secretory Pathway

PXD0006584004/460 = 9

Global phosphoproteomic profiling reveals distinct signatures in B-cell non-Hodgkin

PXD0001493781/598 = 6The potato tuber mitochondrial proteome

PXD00086512535/1368 = 9Mass spectrometry based draft of the human proteome