5
Comparative and Functional Genomics Comp Funct Genom 2003; 4: 16–19. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.232 Feature Meeting Review: The HUPO Proteomics Standards Initiative meeting: towards common standards for exchanging proteomics data Hinxton, Cambridge, UK, 19–20 October 2002 Sandra Orchard, Paul Kersey, Henning Hermjakob* and Rolf Apweiler EMBL Outstation–European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK *Correspondence to: Henning Hermjakob, EMBL Outstation – European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. E-mail: [email protected] Received: 14 November 2002 Accepted: 14 November 2002 Abstract The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics and to facilitate data comparison, exchange and verification. Initially the fields of protein–protein interactions (PPI) and mass spectroscopy have been targeted and the inaugural meeting of the PSI addressed the questions of data storage and exchange in both of these areas. The PPI group rapidly reached consensus as to the minimum requirements for a data exchange model; an XML draft is now being produced. The mass spectroscopy group have achieved major advances in the definition of a required data model and working groups are currently taking these discussions further. A further meeting is planned in January 2003 to advance both these projects. Copyright 2003 John Wiley & Sons, Ltd. Keywords: proteomics; spectroscopy; protein–protein interactions Introduction The Proteomics Standards Initiative was estab- lished following a meeting in April 2002, jointly organized by HUPO and NAS, at which the urgent need for standardization of proteomics data was recognized. Rolf Apweiler (Sequence Database Group, European Bioinformatics Institute) opened the proceedings by explaining that a deci- sion had been made to address these issues ini- tially in the fields of mass spectroscopy and protein–protein interactions (PPI). This inaugu- ral meeting of the Proteomics Standards Initiative brought together representatives from the database producer, user and software producer communi- ties, who were seen as essential in establishing and maintaining the required standards and who were jointly charged over the 2 days of the meet- ing with laying the groundwork that would enable these objectives to be met. The delegates listened to a short presentation by Alvis Brazma (EBI), outlining the successful standardization of microarray data in the MGED process, before splitting into two working parties to address the issues facing their respective fields. Protein–protein interactions (PPI) group The session commenced with a brief introduction from each of the PPI databases represented at the meeting as to the ethos and coverage of their particular product. This included presentations by representatives from Hybrigenics SA, DIP, BIND, MINT, GIN-DB, PPID and IntAct, a public repository of PPI data that will be launched by the EBI early in 2003. The meeting was then thrown open to address a number of key issues. Is there a requirement for a community standard? Data exchange is essential for the purposes of data comparison, benchmarking and quality con- trol, all of which are particularly important in a field like protein–protein interaction, where the Copyright 2003 John Wiley & Sons, Ltd.

Meeting Review: The HUPO Proteomics Standards Initiative ...downloads.hindawi.com/journals/ijg/2003/906569.pdf · The Proteomics Standards Initiative (PSI) aims to define community

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Meeting Review: The HUPO Proteomics Standards Initiative ...downloads.hindawi.com/journals/ijg/2003/906569.pdf · The Proteomics Standards Initiative (PSI) aims to define community

Comparative and Functional GenomicsComp Funct Genom 2003; 4: 16–19.Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.232

Feature

Meeting Review: The HUPO Proteomics StandardsInitiative meeting: towards common standards forexchanging proteomics dataHinxton, Cambridge, UK, 19–20 October 2002

Sandra Orchard, Paul Kersey, Henning Hermjakob* and Rolf ApweilerEMBL Outstation–European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK

*Correspondence to:Henning Hermjakob, EMBLOutstation–EuropeanBioinformatics Institute,Wellcome Trust GenomeCampus, Hinxton, Cambridge,UK.E-mail:[email protected]

Received: 14 November 2002Accepted: 14 November 2002

AbstractThe Proteomics Standards Initiative (PSI) aims to define community standardsfor data representation in proteomics and to facilitate data comparison, exchangeand verification. Initially the fields of protein–protein interactions (PPI) and massspectroscopy have been targeted and the inaugural meeting of the PSI addressed thequestions of data storage and exchange in both of these areas. The PPI group rapidlyreached consensus as to the minimum requirements for a data exchange model; anXML draft is now being produced. The mass spectroscopy group have achieved majoradvances in the definition of a required data model and working groups are currentlytaking these discussions further. A further meeting is planned in January 2003 toadvance both these projects. Copyright 2003 John Wiley & Sons, Ltd.

Keywords: proteomics; spectroscopy; protein–protein interactions

Introduction

The Proteomics Standards Initiative was estab-lished following a meeting in April 2002, jointlyorganized by HUPO and NAS, at which the urgentneed for standardization of proteomics data wasrecognized. Rolf Apweiler (Sequence DatabaseGroup, European Bioinformatics Institute)opened the proceedings by explaining that a deci-sion had been made to address these issues ini-tially in the fields of mass spectroscopy andprotein–protein interactions (PPI). This inaugu-ral meeting of the Proteomics Standards Initiativebrought together representatives from the databaseproducer, user and software producer communi-ties, who were seen as essential in establishingand maintaining the required standards and whowere jointly charged over the 2 days of the meet-ing with laying the groundwork that would enablethese objectives to be met.

The delegates listened to a short presentationby Alvis Brazma (EBI), outlining the successfulstandardization of microarray data in the MGED

process, before splitting into two working partiesto address the issues facing their respective fields.

Protein–protein interactions (PPI) group

The session commenced with a brief introductionfrom each of the PPI databases represented atthe meeting as to the ethos and coverage oftheir particular product. This included presentationsby representatives from Hybrigenics SA, DIP,BIND, MINT, GIN-DB, PPID and IntAct, a publicrepository of PPI data that will be launched by theEBI early in 2003. The meeting was then thrownopen to address a number of key issues.

Is there a requirement for a communitystandard?

Data exchange is essential for the purposes ofdata comparison, benchmarking and quality con-trol, all of which are particularly important in afield like protein–protein interaction, where the

Copyright 2003 John Wiley & Sons, Ltd.

Page 2: Meeting Review: The HUPO Proteomics Standards Initiative ...downloads.hindawi.com/journals/ijg/2003/906569.pdf · The Proteomics Standards Initiative (PSI) aims to define community

Meeting Review 17

standard high-throughput methods are known toyield high false-positive and false-negative rates.A community standard should allow simple accessto core protein interaction data, while being exten-sible to exchange data with a high level of detail.Many users will require only simple indexing andinterface systems; larger organizations will haverequirements that are more complex but will havethe infrastructure to develop much of this them-selves. The confidentiality of data could be seenas an issue that might inhibit organizations fromcontributing; however, this question has alreadybeen addressed by the various sequence databaseswhere entries can be flagged and retained bythe parent database until permission is given forrelease. It was recognized early on in the discus-sions that a minimum standard for data exchangeneeded to be developed and a formal mecha-nism for monitoring and maintaining this standardput in place. Valuable lessons can be learned inthis area from MGED’s experience of defining aminimal standard for the exchange of microarraydata.

Definition of use cases

The potential use of the data has to be under-stood before the minimum common standard canbe defined. Most of the groups represented at themeeting were interested in making graphical repre-sentations of PPIs and in making interspecies com-parisons based on sequence or structural homology.To compare data from different systems, a cor-rect description of the source systems is essential,including details of species, strain and, in somecases, tissue, cell type and disease state. Domainidentification and the dynamic properties of PPIswere also common requirements, whilst the func-tional outcome of PPIs and the effects of sequencevariations and posttranslational modifications wereseen as desirables. Some users have a requirementfor in-depth experimental detail; however, this wasfelt to be beyond the scope of a data exchangeformat and would have to be retrieved from theliterature. Links to public databases were seen asessential when available but would not be mademandatory, since this would compromise the trans-fer of unpublished data between collaborating lab-oratories.

Outline data structure

The need for a multi-level approach was soonrecognized, with Level 1 designed to fulfil basicrequirements and be suitable for rapid implemen-tation, whilst subsequent levels will contain morefeatures, yet remain compatible backwards. Theinterchange format will need to be able to repre-sent both binary and n-ary (complex) interactions.The topology of the latter would then be describedwithin each set.

Each Interchange Format Record will report oneor more interactions supported by one or moreexperiments. Predicted interactions are allowed andwill be clearly flagged. Wherever the sequenceof the interactors is available in public databases,appropriate cross-references should be given. Thesequence should be given in the interaction recordwhen it is not available from public databases, andmay always be given.

Each entry will need to contain the accessionnumber of its parent database. Parent databaseswill be identified by a prefix. This will require aregistry service, which will have to be recognizedand maintained. It is proposed to use PSI/HUPO asthe authority for this and a host site will have tobe established, which can be accessed by databaseswishing to submit data.

The standardization of experimental design pro-vides a particularly complex set of issues for thefield of PPI, in which researchers use a host ofdiverse techniques and practices. Level 1 of thestandard will not attempt to provide a full descrip-tion of the experimental design, but will provide themeans to clearly classify the experiments throughhierarchical controlled vocabularies.

A work group has been set up to developcommon controlled vocabularies for experimentalmethods and other attributes of protein interactiondata. These will be used by the interaction datastandard and will be made available via the GlobalOpen Biological Ontologies (GOBO) website.

To capture a larger part of the interaction datathat is generated worldwide, the support of majorbiochemistry and proteomics journals in this pro-cess is seen as crucial. It is proposed that, oncea PSI PPI level 1 standard has been established,the major public database providers will collec-tively approach journals and funding agencies torequest that deposition of published interaction datain public databases will be strongly encouraged as

Copyright 2003 John Wiley & Sons, Ltd. Comp Funct Genom 2003; 4: 16–19.

Page 3: Meeting Review: The HUPO Proteomics Standards Initiative ...downloads.hindawi.com/journals/ijg/2003/906569.pdf · The Proteomics Standards Initiative (PSI) aims to define community

18 Meeting Review

part of the publication process. This would be sim-ilar to the deposition requirement for nucleotidesequence data, and the current encouragement todeposit DNA microarray data.

PPI molecular interaction interchangeformat record structure

The structure of an Interchange Format recorddefining both mandatory and optional fields wasdiscussed in great detail and a draft document wasproduced. A small working party was formed toproduce an XML draft of this consensus, whichwill then be further refined and finally presented tomembers of the PSI at a meeting in January. ThePPI group aim to have a publicly available versionof the level 1 format available by Spring 2003.

Mass spectrometry

This session discussed two questions — the use ofstandards in the field of mass spectrometry and thepotential use of a public data repository for massspectrometry data.

Following presentations on various aspects ofmass spectrometry by Alexey Nesvizhskii (ISB,Seattle, MI), Arkadiusz Nawrocki (CPA,Odense, Denmark) and Rulin Zhang (SynXPharma, Toronto, Canada), the group receiveda demonstration of PEDRo, a tool developed at theUniversity of Manchester to capture data and meta-data from proteomics experiments that includemass spectrometry as one component. PEDRo hasbeen designed according to the MGED guidelinesand has a similar scope to the microarray datamodel, capturing the complete process of scientificexperiment from hypothesis formation through topeak identification. A consideration of PEDRo ledto the discussion as to whether a single repositorywould encompass the diverse needs of mass spec-trometry in the context of proteomics or whetherseparate standards for each type of experiment,with separate repositories for each type of data,would be required. As the issues became apparent,questions of feasibility were also raised. Exampleswere given of ambitious plans to design softwarethat supported data from all types of proteomicsexperiments, which had eventually been replacedby projects aimed at capturing only one particularworkflow.

Mass spectrometry data exists at many levels,from raw data, through peak lists and peptideidentification, to protein identification; on top ofthis is the desire to mine data. Huge amountsof variation (and manual interpretation) exist inthe processes that effect these transformations.The following specific points were discussed inmore detail.

The purpose of new repositories

One projected use was to provide an audit trailfor publications, so that the producers of bulk orcomplex data would be able to fully describe (andbe held to account for) methodologies that couldnot appear in print medium; this would require thecooperation of journals. Another purpose could beto allow the user to explore/mine the data, prefer-ably in a biological context. Important conceptshere are ‘the minimal description of the experi-ment’ and ‘validation criteria’.

How many repositories?

A component-based approach, with different repos-itories for different types of proteomics experiment,was considered, but fears were expressed that thiswould disrupt the audit trail, or make biologicalinterpretation of the data impossible. How to goabout capturing the meaningful results of an exper-iment that resulted in the conclusion that two pro-teins interact, without a wasteful overlap with PPIdatabases, was discussed at intervals throughoutthe meeting.

Would the users enter all the data?

The hope was expressed that if a standard couldbe produced, LIMS systems might automaticallyproduce compliant output. However, proteomics isoften not fully automated and many data pointsmight be missing.

Participation of equipmentmanufacturers and other parties

The view was expressed that the participationof equipment manufacturers was essential to theultimate success of any new standard. In areas suchas hypothesis description and preliminary samplepreparation, substantial opportunities for overlap

Copyright 2003 John Wiley & Sons, Ltd. Comp Funct Genom 2003; 4: 16–19.

Page 4: Meeting Review: The HUPO Proteomics Standards Initiative ...downloads.hindawi.com/journals/ijg/2003/906569.pdf · The Proteomics Standards Initiative (PSI) aims to define community

Meeting Review 19

with other groups involved in standardization wereperceived, and enthusiasm expressed for takingthese forward.

Error rates

There is little public awareness among poten-tial users of the data of problems, such as esti-mating error rates and the statistical complexityin producing the final protein identifications. Aneed to raise community awareness of these issueswas recognized.

Three work groups have now been established:

• Group 1 will work on the definition of massspectrometry data, and the subsequent data anal-ysis, as far as protein identification. A draftmodel has been produced, which includes thefacility for recursive analysis and refinement ofthe peak list.

• Group 2 are modelling the process of samplepreparation, considering the overall workflow ofproteomics experiments in which ‘mass spec-trometry’ was one component, up to the pointwhere a sample is ready to be loaded into thespectrometer. Again, a recursive model has beenused, whereby a sample could undergo manycycles of preparative steps.

• Group 3 are considering likely user demands ofany implemented system. The interests of bothexpert mass spectrometrists and biological usersare being considered. A system should supportthe ability to query with peak lists, and withknown sample compositions, against the resultsof previous experiments; and should also allowusers to query across experiments to observe theconcomitant changes in identified species.

The findings of these working groups will be pre-sented during the HUPO conference in November2002 and the way forward can then be discussedwith input from the wider proteomics community,who will be attending that meeting.

Conclusions

There was a remarkable consensus between dele-gates attending the PSI meeting to the effect thatvaluable data would be lost without public repos-itories and common interchange formats makinginformation accessible to the scientific commu-nity. Major progress was made in the field ofprotein–protein interactions, with a draft exchangeformat being produced and work on an XML ver-sion in progress. The mass spectroscopy group hasto undertake more groundwork, to establish com-mon needs and requirements, to identify what datais appropriate for public access and the degree ofsupplementary information which is required to bestored alongside, but important advances have beenmade and it is hoped that this group will have pre-liminary results by early 2003.

All such efforts require support from the usercommunity and from the scientific press and fund-ing agencies. Members of the PSI will be activelycanvassing such collaboration, but input is wel-come from any quarter. Anyone wishing to becomeinvolved is invited to visit http://psidev.sf.net, toparticipate in the discussion groups listed, and tocontribute to the further development of commu-nity standards for proteomics data. A further meet-ing of the PSI is planned for 22–24 January 2003 inHinxton, Cambridge, UK. Details will be publishedvia the website.

Related websites

BIND: http://bind.ca/DIP: http://dip.doe-mbi.ucla.edu/Hybrigenics: http://www.hybrigenics.frIntAct Project: http://www.ebi.ac.uk/intact/MINT: http://cbm.bio.uniroma2.it/mint/MGED: http://www.mged.orgPPID: http://www.anc.ed.ac.uk/mscs/PPIDPSI: http://psidev.sf.net/

The Meeting Reviews of Comparative and Functional Genomics aimto present a commentary on the topical issues in genomics studiespresented at a conference. The Meeting Reviews are invited; theyrepresent personal critical analyses of the current reports and aim atproviding implications for future genomics studies.

Copyright 2003 John Wiley & Sons, Ltd. Comp Funct Genom 2003; 4: 16–19.

Page 5: Meeting Review: The HUPO Proteomics Standards Initiative ...downloads.hindawi.com/journals/ijg/2003/906569.pdf · The Proteomics Standards Initiative (PSI) aims to define community

Submit your manuscripts athttp://www.hindawi.com

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporation http://www.hindawi.com

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttp://www.hindawi.com

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

International Journal of

Microbiology