View
108
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Outline of Talk: - The Genomic Standards Consortium - Reporting raw data and meta Minimum information checklists for standardised reporting of metagenome data data (contextual data)
Citation preview
Standards to support science http://gensc.org/
The Genomic Standards Consortium
Minimum information checklists for
standardised reporting of
metagenome dataEcogenomics: from Data to Knowledge
13-14 February, 2014CSIRO
Canberra, Australia
Standards to support science http://gensc.org/
Introducing myself
University of Oxford e-Research Centre staff member since December 2011
Project: development of a metagenomics portal at the EBI (BBSRC grant)
Visitor European Bioinformatics Institute (EBI)
Secretary of the Genomic Standards Consortium (GSC)
Standards to support science http://gensc.org/
Outline of this talk
The Genomic Standards Consortium
Reporting raw data and metadata (contextual data)
Standards to support science http://gensc.org/
Introduction to the Genomic Standards Consortium (GSC)
The GSC was established in 2005. It is an open membership community working towards better descriptions of our collection of genomes, metagenomes and marker gene sets
Standards to support science http://gensc.org/
The GSC Mission
the implementation of new (meta)genomic standards
methods of capturing and exchanging metadata
harmonization of metadata collection and analysis efforts across the wider genomics community
Standards to support science http://gensc.org/
The GSC fulfils its mission by
• Organizing meetings • Forming working groups• Creating Consensus Products
Standards to support science http://gensc.org/7
2005: where is the contextual data?
“It is now clear that the full potential of sequence analysis can only be achieved if the geographic and environmental context of the sequence data is considered, herewith referred to as contextual data”
Standards to support science http://gensc.org/
GSC 11,Hinxton,
2010
GSC 12Bremen,
2011
GSC 13BGI 2012
Community-driven solutions
Taking the ‘Common Path’ towards building consensus:
• Identify the problem• Define a community to address it• Define scope of the solution• Implement solution• Gain adoption of solution
GSC 14Oxford,
2012
Standards to support science http://gensc.org/
What are standards?
A standard is a convention that gives uniformity to an area of research or innovation.
Standards unite groups and enable collective change.
Standards provide the language in which innovation is written.
Standards to support science http://gensc.org/
StandardsprinciplesNot everything should be ‘standardized’
Aggregation of data, information, and knowledge requires standard ways of doing things
Standards provide foundations; Standards should drive innovation (think of electrical plugs or the internet)
Pick the right concepts to standardize – at the right time, with the right people
Requires good ‘group think’ – or ‘systems thinking’
Standards to support science http://gensc.org/
What, when, where, how?
Contextual data
TaxaHabitat
Date and TimeLatitude/Longitude
Environmental measurementsDNA extraction method
Sequencing method
Standards to support science http://gensc.org/
GSC Standards
Standards to support science http://gensc.org/
GSC Minimum Information checklists
Minimum Information about any Sequence (MixS)
• Minimum Information about a (Meta)Genome SequenceMIGS/MIMS specifies a formal way to describe genomes/ metagenomes in more detail than is currently captured in public repository documents.
• Minimum Information about a MARKer gene SequenceThe MIMARKS checklist: 'electronic laboratory notebook' containing core contextual data items required for consistent reporting of marker gene investigations. MIMARKS uses the MIGS/MIMS checklists with respect to the nucleic acid sequence source and sequencing contextual data, but extends them with further experimental contextual data such as PCR primers and conditions, or target gene name.
Standards to support science http://gensc.org/
Use of MIxS
Please provide this minimum information when you publish
• a genome• a metagenome• a marker gene study (e.g. ribosomal genes)
INSDC (DDBJ, ENA, GenBank) accept this information and encourage its submission to their public DNA databases
Standards to support science http://gensc.org/15
Core MIxS
Item BA EU PL VI ORGME SU SPsubmitted to insdc M M M M M M M M
investigation type M M M M M M M M
project name M M M M M M M M
geographic location (latitude and longitude) M M M M M M M M
geographic location (country and/or sea,region) M M M M M M M M
collection date M M M M M M M M
environment (biome) M M M M M M M M
environment (feature) M M M M M M M M
environment (material) M M M M M M M M
environmental package M M M M M M M M
sequencing method M M M M M M M M
“M”=mandatory “C”=conditional mandatory “X”=recommended “-”=not applicable
Truly “minimal” with 11 contextual data items
Standards to support science http://gensc.org/
MIxS Standards
Yilmaz et al. Nature Biotech. 2011; 29:415-420
Standards to support science http://gensc.org/
Example Checklist Construction
Standards to support science http://gensc.org/
Controlled vocabularies and ontologies
Consistent reporting greatly enhanced the usablility of data
The MIxS standard provides a number of controlled vocabularies
The GSC encourages the use of ontologies, e.g. EnvO (environmental ontology)
Standards to support science http://gensc.org/
Example: some metadata from marine sample
Standards to support science http://gensc.org/
Meta-Analysis
Genes/OTUs Environment (pH)
Standards to support science http://gensc.org/
New sequencing technologies: rapid data increase
In recent years, new sequencing technologies have been developed. • Sequencing cost per base has
dropped rapidly• Amount of sequence in INSDC
database is currently doubling every 8 months
• Democratisation of sequencing: bench top sequencers make technology available to individual labs
Standards to support science http://gensc.org/
To exploit fully the promise of scientific data we need both innovation and community agreement on how to provide appropriate stewardship of these resources for the benefit of all.
Requires the evolution of our scientific, technological and sociological thinking....
The Data Bonanza
Standards to support science http://gensc.org/
The GSC is running a range of consensus-driven projects and is now making a call for community compliance/community involvement
More information: http://gensc.org/
The next GSC meeting will in held in Oxford, UK (30 March-2 April 2014)
Standards to support science http://gensc.org/
gensc.org
Standards to support science http://gensc.org/
AcknowledgementsThe GSC efforts are contributed on a volunteer basis by a wide range of participants, including GSC authors, working group members, workshop participants and adopters.
Special Thanks to the GSC Board:Linda Amaral-Zettler, MBLGuy Cochrane, EMBL-EBI Jim Cole, MSUNeil Davies (Berkeley)Peter Dawyndt, University of Ghent Dawn Field, CEH (Chair of GSC)George Garrity, MSUJack Gilbert, Argonne National LabFrank Oliver Glöckner, MPI-BremenLynette Hirschman, MITRE Hans-Peter Klenk, DSMZ Renzo Kottmann, MPI-BremenRob Knight (University of Colorado
Nikos Kyrpides, DOE, JGIFolker Meyer, Argonne National LabNorman Morrison (University of Manchester)Inigo San Gil , LTERSusanna Sansone, University of OxfordLynn Schriml, University of Maryland (Treasurer of GSC)Peter Sterk, GSC (Secretary of GSC)Dave Ussery DTU Owen White, University of MarylandJohn Wooley, UCSD (PI of RCN4GSC)
Institutional Liasons to the GSC BoardIlene Mizrachi (NCBI/GenBank)Tatiana Tatusova (NCBI/RefSeq)