73
10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé [email protected] Bioinformatics support – ProteoRed Proteomics Facility, National Center for Biotechnology, Madrid

10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé [email protected] Bioinformatics support – ProteoRed

Embed Size (px)

Citation preview

Page 1: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

10. Standards in Proteomics

MS bioinformatics analysis for proteomics

Salvador Martínez de Bartolomé[email protected] support – ProteoRedProteomics Facility, National Center for Biotechnology, Madrid

Page 2: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Index

• Need of standards in Proteomics• HUPO-PSI

– Organization– Standard data formats– MIAPEs

• PEFF: A Common Sequence Database Format in Proteomics

• PRIDE• Standard data format converters

Page 3: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Index

• Need of standards in Proteomics• HUPO-PSI

– Organization– Standard data formats– MIAPEs

• PEFF: A Common Sequence Database Format in Proteomics

• PRIDE• Standard data format converters

Page 4: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Proteomics data is often only made available as arbitrarily formatted PDF tables, carrying important limitations:

• Source data (mass spectra) are not made available• No peer review validation possible• Very little raw materials for testing innovative in silico techniques are available

• Automated (re-)processing of the identifications is impossible (eliminating objective technique comparison)

Need of standards in Proteomics

Page 5: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Thoughts in Standards•Bradshaw RA, Burlingame AL, Carr S, Aebersold R.Reporting protein identification data: the next generation of guidelines.Mol Cell Proteomics. 2006 May;5(5):787-8.

•Wilkins et al. Guidelines for the next 10 years of proteomics.Proteomics. 2006 Jan;6(1):4-8.

•Nature Biotechnology 2006, Nov:

•Editorial: Standards Operating Procedures

•Burgoon LD. The need for standards, not guidelines, in biological data reporting and sharing.

•Ball C. Are we stuck in standards?

•Nature Biotechnology: Planned focus issue and Community Consultation on Standards: http://www.nature.com/nbt/consult/index.html

Page 6: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Need of standards in Proteomics

• Proteomics: No standardized reporting, not standard database submission

• Proteomics data is generated at a high rate, and lost at a high rate

• Experiments are repeated unnecessarily, the field advances slower than necessary

Page 7: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Need of standards in Proteomics

• Standards for:

• Exchange data

• Compare data

• Review data

• Reproduce results

• Store data

Page 8: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Index

• Need of standards in Proteomics• HUPO-PSI

– Organization– Standard data formats– CVs– MIAPEs

• PEFF: A Common Sequence Database Format in Proteomics

• PRIDE• Standard data format converters

Page 9: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Index

• Need of standards in Proteomics• HUPO-PSI

– Organization– Standard data formats– CVs– MIAPEs

• PEFF: A Common Sequence Database Format in Proteomics

• PRIDE• Standard data format converters

Page 10: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

HUPO PSIProtein Standard Initiative

http://www.psidev.info

Page 11: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

HUPO PSIProtein Standard Initiative

Meetings http://www.psidev.info

Page 12: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

HUPO PSI

• Open community initiative

• Develop data format standards

• Data representation and annotation standards

• Involve data producers, database providers, software producers, publishers

Protein Standard Initiativehttp://psidev.info

The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics and to facilitate data comparison, exchange and verification

Proteomics 2003, 3 (7): The proteomics standards initiative.Orchard,S. , Hermjakob,H. , Apweiler,R. 

Page 13: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

HUPO PSI structure

Main unit is workgroup• Gel Electrophoresis• Molecular Interactions• Sample Processing• Mass spectrometry• Proteomic Informatics (MS oriented)• Protein Modifications

Transversal activities• One Steering Group• Controlled vocabulary• MIAPE guidelines

Page 14: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

HUPO PSI structure

• Annual workshop, reporting activity at annual HUPO, conference calls, dedicated workshops

• No permanent funding, active members work on their “spare time”

• Website (http://psidev.info) and mailing-lists

• PSI Document process•Vizcaino, J.A., Martens, L., Hermjakob, H., Julian, R.K. and Paton, N.W. (2007) The PSI formal document process and its implementation on the PSI website. Proteomics 7: 2355-2357. 

Page 15: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

HUPO PSI document process

Candidate Recommendation

submitted to PSI Editor

PSI Editor reviews draft

PSI Editor submits draft to PSI-SG

PSI Editor returns Draft Revise

Pass

15 Day PSI-SG Comment

PSI Editor reviews

comments

PSI Editor posts & announces

PSI Working Draft Proposal

(PWD-R.P)

Revise

Pass

30-day Public Comment

PSI Editor reviews

comments

PSI Editor returns Draft, remove

PWD from indexRevise

PSI Editor posts & announces PSI Final

Document Proposal(PFD-R.P)

Pass

PSI-WG submits PFD-R.P with supporting documents (tutorials,etc)

To PSI-SG requestingPFD-R status

PSI-SG reviews request

PSI-SG Provide Feedback to WG

Chairs

PSI-SG and PSI Editor conduct

Formal External Review

Pass

Revise

60 day Formal Review and Public

Comment

PSI-SG Examines Reviews Revise

PSI Editor posts & announces PSI Final

Document(PFD-R)

Pass

Community consultation at:http://www.nature.com/nbt/consult/

Page 16: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

HUPO PSI structure

Page 17: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

HUPO-PSI

• Project status

Page 18: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

HUPO-PSIPSI deliverables• Data formats

• MIML

• mzML

• AnalysisXML

• gelML

• giML

• spML

•MIAPE minimal reporting requirements

• One parent document - The minimum information about a proteomics experiment (MIAPE), Nature Biotechnology 25, 887-893 (2007)

• MIAPE MI, MS, MSI, GE, GI, CC, CE, SP

• Formats (XML schema, instance docs, specification docs)

• Controlled Vocabularies• MIAPE docs (representation and annotation standards)

Page 19: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Index

• Need of standards in Proteomics• HUPO-PSI

– Organization– Standard data formats– CVs– MIAPEs

• PEFF: A Common Sequence Database Format in Proteomics

• PRIDE• Standard data format converters

Page 20: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Standard data formats for

Experimental data: spectra, acquisition parameters, acquisition equipment, ...

Analyzed data: identifications, quantitations, data analysis software ...

Page 21: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Standard data formatsExperimental data: spectra, acquisition parameters, acquisition equipment, ...

mzXML 2.0

mzXML 1.05

mzXML 3.0 mzXML 4.0

mzXML 2.0

mzML 1.0

mzML: Released on June 1st, 2008

Seattle Proteome Center at the Institute for Systems Biology

HUPO-PSI

• data format capturing peak list information. • Its aim is to unite the large number of current formats (pkl's, dta's, mgf's, .....) into one • It is NOT a substitute for the rawfile formats of the instrument vendors. Some vendors, if not all, will provide software transforming their raw files to that standards

Page 22: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Sample instance document mzML 1.0

Page 23: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Standard data formats for

Experimental data: spectra, acquisition parameters, acquisition equipment, ...

Analyzed data: identifications, quantitations, data analysis software ...

Page 24: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Standard data formatsAnalyzed data: identifications, quantitations, data analysis software ...

• describes the results of identification and quantitation processes for proteins, peptides and protein modifications from mass spectrometry

protXML

pepXML

AnalysisXML

AnalysisXML: v1.0 – candidate (Dic 08)

Seattle Proteome Center at the Institute for Systems Biology

HUPO-PSI

Page 25: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Sample instance document AnalysisXML (beta)

Page 26: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Standard data formatsOther data:

XML data format

MIAPE

GelML MIAPE GE

GelInfoML MIAPE GI

miXML MIAPE MIMIX

spML MIAPE SP

Page 27: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

proprie-tary format

mass spectrometer B

mass spectrometer A

converter

mzML

search engine A

search engine B

analysisXML

Public repository

Standard data formats

Page 28: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Index

• Need of standards in Proteomics• HUPO-PSI

– Organization– Standard data formats– CVs– MIAPEs

• PEFF: A Common Sequence Database Format in Proteomics

• PRIDE• Standard data format converters

Page 29: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Controlled VocabulariesThe Controlled Vocabularies (CVs) of the Proteomic Standard Initiative (PSI) provide a consensus annotation system to standardize the meaning, syntax and formalism of terms used across proteomics, as required by the PSI Working Groups.

Each PSI working group develop the CVs required by the technology or data type it aims to standardize, following common recommendations for development and maintenance.

At the PSI meeting in Washington (Sept 06), it was decided that all PSI working groups should adopt the same CVs standardizing some overlapping concepts (units and resources).

Page 30: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

100173

TOF T.O.F.

time of flight

time-of-flight

What is a CV?TermSynonyms

Controlled Vocabularies

Page 31: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Controlled Vocabularies• PSI CVs are composed of two documents:

• a design principle description• the implementation of the CVs in OBO (Open Biomedical Ontologies)

•Developing CVs is a process of collecting, and if necessary defining terms.

• Every effort must be made to adopt and re-use existing ontologies or CVs where they exist, to avoid “re-inventing the wheel”.

Page 32: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Ontology Lookup Servicehttp://www.ebi.ac.uk/ontology-lookup/

• The OLS provides a web service interface to query multiple ontologies from a single location with a unified output format.

Page 33: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Ontology Lookup Servicehttp://www.ebi.ac.uk/ontology-lookup/

Page 34: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed
Page 35: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Index

• Need of standards in Proteomics• HUPO-PSI

– Organization– Standard data formats– CVs– MIAPEs

• PEFF: A Common Sequence Database Format in Proteomics

• PRIDE• Standard data format converters

Page 36: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Sufficiency and practicability

• Unambiguous description of the experimental context• Allow understanding of the results and their interpretation• Sufficient to permit a critical evaluation• In principle allow recreation of the work

MIAPE: Minimum Information About a Proteomics ExperimentTaylor, C.F., Paton, N.W., Lilley, K.S., Binz, P.A., Julian, R.K., Jr., Jones, A.R., Zhu, W., Apweiler, R., Aebersold, R., Deutsch, E.W., Dunn, M.J., Heck, A.J., Leitner, A., Macht, M., Mann, M., Martens, L., Neubert, T.A., Patterson, S.D., Ping, P., Seymour, S.L., Souda, P., Tsugita, A., Vandekerckhove, J., Vondriska, T.M., Whitelegge, J.P., Wilkins, M.R., Xenarios, I., Yates, J.R., 3rd and Hermjakob, H. (2007)

The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 25: 887-893.

Page 37: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

• It is:– Describing a list of information and data to provide when an experiment is reported (it is a content descriptor)

• Peptide sequence, scores, modifications, mass errors, etc.

– Helping to assess quality control• Number of replicates, expected error rate

MIAPE guidelines

Page 38: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

• It is not:– Describing the way to run an experiment

• does not specify the use of a search engine in particular• does not force the use of one protocol

– Describing the data representation• Use excel to create a table with these five following columns:…

– Including any quality judgment• need 30% sequence coverage to identify a protein• “The absence of thorough validation of both analytical and biological results, including error analysis should result in rejection”• “Authors should justify the use of a very small database or database that excludes common contaminants, since this may generate misleading assignments”

MIAPE guidelines

Page 39: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

• MIAPE Gel Electrophoresis (GE) v1.4• MIAPE Gel Informatics (GI) v0.5• MIAPE Mass Spectrometry (MS) v2.22• MIAPE Mass Spectrometry Informatics (MSI) v0.8• MIAPE Column Chromatography (CC) v1.0• MIAPE Capillary Electrophoresis (CE) v0.7• MIAPE Sample Preparation and handling (SP) v0.2• MIAPE Molecular Interactions (MI) v1.1.2

MIAPE guidelines

Page 40: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Online tool to generate and store MIAPE documentshttp://www.proteored.org

Page 41: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

A MIAPE generator tool

Fill all minimal informationby hand

Fill only some changes or new items by hand, and add automatically static informationfrom previous MIAPE documents

ProteoRedserver

Page 42: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

A MIAPE generator toolhttp://www.proteored.org

Page 43: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

A MIAPE generator tool

Page 44: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

A MIAPE generator tool

Page 45: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

A MIAPE generator tool

Page 46: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

A MIAPE generator tool

Page 47: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

HUPO-PSI: MIAPE Gel Electrophoresis v1.2

Page 48: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed
Page 49: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Generate XML

Generate report

Delete document

Edit document

Page 50: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

MIAPE Reports

Generate report

Page 51: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

MIAPE Reports

Page 52: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

MIAPE Reports

Page 53: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

MIAPE Reports

Page 54: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

MIAPE Reports

Page 55: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

MIAPE Reports

Page 56: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

MIAPE Reports

Page 57: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

MIAPE Reports

Page 58: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Index

• Need of standards in Proteomics• HUPO-PSI

– Organization– Standard data formats– MIAPEs

• PEFF: A Common Sequence Database Format in Proteomics

• PRIDE• Standard data format converters

Page 59: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

A Common Sequence Database Format in Proteomics

•P-A Binz, S Seymour, J Shofsthal, D Creasy, E Kapp

•Problem: interpretation of current fasta format by search engines:• Protein identifiers• Description• Taxonomy• Other annotation (PTMs, sequence variants, etc)

•Propose an alternative to the heterogeneous fasta format, ideally generated by the database providers, or alternatively via an accepted converter, to submit one single source sequence database to various search engines

• SwissProt and EBI already agreed on the principle

•Format proposal reached (not only for MS, flexible, extensible)

PEFF: PSI Extended FASTA Format

Page 60: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

• A unified format for protein and nucleotide sequence databases to be used by sequence search engines and other associated tools (spectra library search tools, sequence alignment software, data repositories, etc).

PEFF: PSI Extended FASTA Format

• Enables consistent extraction, display and processing of information such as protein/nucleotide sequence database entry identifier, description, taxonomy, etc. across software platforms.• Allows the representation of structural annotations such as post-translational modifications, mutations and other processing events.• Flat file that includes a header of meta data to describe relevant information about the database(s) from which the sequence has been obtained (i.e., name, version, etc).• Sequence database providers are encouraged to generate this format as part of their release policy or to provide appropriate converters that can be incorporated into processing tools.

Page 61: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed
Page 62: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Index

• Need of standards in Proteomics• HUPO-PSI

– Organization– Standard data formats– MIAPEs

• PEFF: A Common Sequence Database Format in Proteomics

• PRIDE• Standard data format converters

Page 63: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

PRIDE – Protein Identification Database

• Turns publicly available data into publicly accessible data

• Protein identifications

• Experimental detail

• Peak lists

• Linkout to raw data

• Fully open source

• Fully open data

• Implementation of PSI standards as they are released

Page 64: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

proprie-tary format

mass spectrometer B

mass spectrometer A

converter

mzML

search engine A

search engine B

analysisXML

Public repository

PRIDE

PRIDE

Page 65: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

PRIDE – Protein Identification Database

...Tomorrow with Alberto Medina

Page 66: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Index

• Need of standards in Proteomics• HUPO-PSI

– Organization– Standard data formats– MIAPEs

• PEFF: A Common Sequence Database Format in Proteomics

• PRIDE• Standard data format converters

Page 67: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Standard data format converters

• msconvert (ProteoWizard):– From: mzML, mzXML, Thermo RAW, MGF – To: mzML, mzXML – Vendor format reading restrictions: Thermo

RAW: Windows with XCalibur XDK installed

Page 68: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Standard data format converters

• ReAdW version 4.0.2:– From:

• Thermo RAW

– Exports to: • mzXML• mzML (not yet updated to final mzML 1.0 standard; try

msconvert)

– Requires a valid installation of the Thermo XCalibur software system, as it relies on the XCalibur libraries.

Page 69: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Standard data format converters

• CompassXport 1.3.6 :– From:

• analysis.baf (instrument families: APEX, micrOTOF, micOTOF-Q, ...)

• analysis.yep (esquire/HCT instrument family)• AutoXecute run for LCMaldi (instrument family: autoFlex,

ultraFlex, ...)• fid files (flex instrument family)

– Exports to: • mzXML version 2.1 • mzData, version 1.05• mzML in progress

– Do not requires to install Bruker propietary software

– Replace to mzBruker

Page 70: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Standard data format converters

• massWolf 4.0.2 (1st july 08):– From:

• MassLynx native acquisition files

– Exports to: • mzXML

– Requires installation of MassLynx software on the same computer

– You must select the appropriate massWolf download to match the version of your MassLynx software (4.0 or 4.1).

Page 71: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Standard data format converters

• mzWiff 4.0.2 (1st July 08):– From:

• Analyst native acquisition (.wiff) files

– Exports to: • mzXML

– Requires installation of Analyst software

Page 72: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Standard data format converters

• T2DExtractor (Dec 07):– From:

• data from a SCIEX/ABI 4000 series MALDI TOFTOF instruments

– Exports to: • mzXML

Page 73: 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed

Standard data format converters

• Trapper 4.1.0 (17 th july 08):– From:

• Agilent MassHunter format (.d directories)

– Exports to: • mzXML

– Requires Agilent's MHDAC software installed– This software will be included in the

upcoming 4.1.0 TPP distribution