22
mzTab - Reporting MS-based Proteomics and Metabolomics Results Dr. Johannes Griss Proteomics Services Team EMBL-EBI Hinxton, Cambridge, UK Division of Immunology, Allergy and Infectious Diseases Department of Dermatology Medical University of Vienna, Austria Dr. Juan A. Vizcaíno on behalf of

The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Embed Size (px)

DESCRIPTION

This is the talk I gave in HUPO 2014 on behalf of Johannes Griss about the mzTab data standard format.

Citation preview

Page 1: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

mzTab - Reporting MS-based Proteomics and Metabolomics Results

Dr. Johannes Griss

Proteomics Services Team

EMBL-EBI

Hinxton, Cambridge, UK

Division of Immunology, Allergy and Infectious Diseases

Department of Dermatology

Medical University of Vienna, Austria

Dr. Juan A. Vizcaíno on behalf of

Page 2: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Overview

• Need for mzTab

• Details about the data format (mzTab 1.0)

• Existing software implementations

• Extension of mzTab 1.0 for metabolomics

Page 3: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

•Develops data format standards for proteomics.

•Both data representation and annotation standards.

•Involves data producers, database providers, software producers, publishers, …

•Active Workgroups: MI, MS, PI, Mod, (Protein Separation).

•Inter-group activities: MIAPE and Controlled Vocabularies.

•Started in 2002, so some experience already…

www.psidev.info

HUPO Proteomics Standards Initiative

Page 4: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

PSI-MS/PI Standard File Formats before mzTab

• TraMLSRM

• mzQuantMLQuantitation

• mzIdentMLIdentification

• mzMLMS data

Page 5: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Reasons for an additional file format (mzTab)• mzIdentML and mzQuantML (necessary) focus on

complete representation of proteomics results

• Complex XML-based file formats

• Specialised software required for visualisation

• In-depth bioinformatics understanding required to create and use files

• No simple method to communicate final results to non-proteomics experts

• No simple method to utilise files through scripting languages and standard statistical software

Page 6: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab – Aims

• Store final results of MS-based experiment in a single file

• Quantitation data

• Identification data

• Small Molecule data

• Reduce complexity to make data accessible to non-proteomics / bioinformatics experts

• Be easily accessible using “standard” software

Page 7: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab – Aims

• What the format does NOT aim at:

• Replace mzIdentML or mzQuantML for proteomics approaches

• Contain the complete data of a MS based experiment

• Provide fully detailed evidence for the data

• Allow a researcher to recreate the process which led to the results

Page 8: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Why a tab-delimited file?

• Using XML based formats requires sophisticated bioinformatics expertise

• Many researchers are still used to use MS Excel to “look” at or exchange their data.

• Standard tab-delimited file formats for transcriptomics (MAGE-TAB) and molecular interactions (MI-TAB) data were already successful

Page 9: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab format

http://mztab.googlecode.com

Page 10: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab - Sections

• Basic information about experiment and sample• Key-Value pairsMetadata

• Basic information about protein identifications• Table-basedProtein

• Information about quantified peptides• Table-basedPeptide

• Information about identified spectra• Table-basedPSM

• Basic information about identified small molecules• Table-basedSmall Molecule

Page 11: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Metadata section - Example

Page 12: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab –Modes and Types

• Modes (depending on the level of detail):

• ‘Summary’: only the ‘final results’.

• ‘Complete’: detailed information for each individual assay or replicate is provided.

• Types:

• ‘Identification’: Only identification results.

• ‘Quantification’: They can also contain identification results.

• Overall, 4 different files “flavors” are possible, so very flexible design.

Page 13: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Protein Section (label-free)

Page 14: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Protein Section (label-free)

Page 15: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Peptide Section (label-free)

• Only used in “Quantification” files.

Page 16: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

PSM section (identification data)

Page 17: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab – Current implementations

• jmzTab (Java API): Version 3.0 is now a stable version. Manuscript published in the journal Proteomics.

• mzTab Validator, PRIDE XML to mzTab converter (PRIDE team).

• mzIdentML and mzQuantML to mzTab converters (Andy Jones group).

• MaxQuant: exporter in beta is available.

• OpenMS (version 1.10).

• R/Bioconductor package Msnbase (L. Gatto, Cambridge University).

• LipidDataAnalyzer (J. Hartler, University of Graz, see next talk).

• Metabolights (EBI).

Page 18: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab – ongoing development

• More detailed modelling of MS metabolomics data

• Led by S. Neumann (COSMOS EU FP7 project).

• Extension from one to three sections.

Example file exists at

https://github.com/sneumann/mtbls2/faahKO.mzTab

http://www.cosmos-fp7.eu/

Page 19: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab format related publications

http://code.google.com/p/mztab/

J. Griss et al., MCP, 2014

Q.W. Xu et al., Proteomics, 2014

Page 20: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab format

http://mztab.googlecode.com

Page 21: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Current PSI-MS/PI Standard File Formats

• mzTabFinal Results

• TraMLSRM

• mzQuantMLQuantitation

• mzIdentMLIdentification

• mzMLMS data

Page 22: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Acknowledgements

Johannes GrissQing-Wei XuHenning Hermjakob

Timo SachsenbergMathias WalzerOliver Kohlbacher

http://mztab.googlecode.com

Andy Jones

S. Neumann and other COSMOS partners

PSI editor and reviewers… and many others have also contributed

BBSRC PROCESS grantBBSRC ProteoSuite grant