Click here to load reader

FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014

  • View
    924

  • Download
    3

Embed Size (px)

DESCRIPTION

Overview of FAIR data concept (http://www.dtls.nl/dtl/news/fairport-workshop.html) and my related activities + overview of NPG Scientific Data

Text of FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014

  • Data Consultant, Honorary Academic Editor Associate Director, Principal Investigator The rise of the data-centric ! research and publication enterprises! Susanna-Assunta Sansone, PhD! ! RIKEN Yokohama, 25 June, 2014 http://www.slideshare.net/SusannaSansone
  • About myself! o activities and interests! FAIR data! o concept! o my related projects! Scientic Data! o rationale! o Data Descriptors! o examples! Outline!
  • My areas of activity:! Data capture and curation! Data (nano)publication! Data provenance ! Open, community ontologies and standards! Semantic web! Software development! Training! Communities I work with/for:! As part of:! UK, European and international consortia! Pre-competitive informatics public-private partnerships! Standardization initiatives! with e.g.:!
  • Notes in Lab Books (information for humans) Spreadsheets andTables ( the compromise) Facts as RDF statements (information for machines) Notes and narrative! Spreadsheets and tables! Linked data and nanopublications! Enabling reproducible research and open science, driving science and discoveries ! Increase the level of annotation at the source, tracking provenance and using community standards
  • https://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/ Credit to:
  • A great start, but not enough! image by Greg Emmerich
  • Researchers and bioinformaticians in both academic and commercial science, along with funding agencies and publishers, embrace the concept that both DATA: entities of interest e.g., genes, metabolites, phenotypes and METADATA: experimental steps e.g., provenance of study materials, technology and measurement types should be Findable, Accessible, Interoperable and Reusable Worldwide movement for FAIR data
  • The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 8 sample characteristic(s) experimental design experimental variable(s) technology(s) measurement(s) protocols(s) data le(s) ......
  • The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 9 make annotation explicit and discoverable structure the descriptions for consistency ensure/regulate access deposit and publish etc. To make this dataset FAIR, one must have tools, standards and best practices to: report sufficient details capture all salient features of the experimental workflow
  • General-purpose, configurable format, designed to support: description of the experimental metadata, making the annotation explicit and discoverable provenance tracking use community standards, such as minimal reporting guidelines and terminologies designed to be converted to - a growing number of - other metadata formats, e.g. used by EBI repositories analysis ! method! script! Data le or ! record in a database!
  • The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project ISA powers data collection, curation resources and repositories, e.g.:
  • Reporting standards and data interoperability Including minimum information reporting requirements, or checklists to report the same core, essential information Including controlled vocabularies, taxonomies, thesauri, ontologies etc. to use the same word and refer to the same thing Including conceptual model, conceptual schema from which an exchange format is derived to allow data to flow from one system to another Community-developed, standards are pivotal to structure, enrich the description and share datasets, facilitating understanding and reuse!
  • Growing number of reporting standards + 130 + 150 + 303 Source:BioPortal Databases, ! annotation,! curation ! tools ! implementing ! standards! miame! MIAPA! MIRIAM! MIQAS! MIX! MIGEN! CIMR! MIAPE! MIASE! MIQE! MISFISHIE.! REMARK! CONSORT! MAGE-Tab! GCDML! SRAxml! SOFT! FASTA! DICOM! MzML! SBRML! SEDML! GELML! ISA-Tab! CML! MITAB! AAO! CHEBI! OBI! PATO! ENVO! MOD! BTO! IDO! TEDDY! PRO! XAO! DO VO! Source:BioSharing Source:BioSharing
  • Which standards and database can we use/recommend I work in the field of cell migration research, which one are applicable to me? I us cell migration in translational research, are there specific clinical standards?
  • Registering and cataloging is just step one; the next one are: Develop assessment criteria for usability and popularity of standards Associate standards to data policies and databases Assemble journal and funder policies re data storage Make fully cross-searchable Intended goal: help stakeholders make informed decisions
  • About myself! o activities and interests! FAIR data! o concept! o my related projects! Scientic Data! o rationale! o Data Descriptors! o examples! Outline!
  • FAIR data - roles and responsibilities Data has to become an integral part of the scholarly communications! Responsibilities lie across several stakeholder groups: researchers, data centers, librarians, funding agencies and publishers! But publishers occupy a leverage point in this process!
  • Human Genome 2001 62 Pages, 150 Authors, 49 Figure, 27 tables Encode Project 2012 30 papers, 3 Journals Journal publishing - changing landscape !
  • Helping you publish, discover and reuse research data Visit nature.com/scientificdata Email [email protected] Tweet @ScientificData Supported by:! Honorary Academic Editor Susanna-Assunta Sansone, PhD Managing Editor Andrew L Hufton, PhD Editorial Curator Victoria Newman Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators
  • ! ! ! Launched on May 27th, 2014 A new online-only publication for descriptions of scientically valuable datasets in the life, environmental and biomedical sciences, but not limited to these! Credit for sharing your data Focused on reuse and reproducibility Peer reviewed, curated Promoting Community Data Repositories Open Access
  • ! ! ! Experimental metadata or! structured component! (in-house curated, machine- readable formats)! Data Descriptor: narrative and structure! Article or ! narrative component! (PDF and HTML)!
  • Data Descriptor: narrative! Sections:! Title! Abstract! Background & Summary! Methods! Technical Validation! Data Records! Usage Notes ! Figures & Tables ! References! Data Citations! ! Focus on data reuse! Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.! Does not contain tests of new scientic hypotheses! In traditional publications this information is not provided in a sufficiently detailed manner However this information is essential for understanding, reusing, and reproducing datasets
  • Data Descriptor: narrative! Sections:! Title! Abstract! Background & Summary! Methods! Technical Validation! Data Records! Usage Notes ! Figures & Tables ! References! Data Citations! ! Focus on data reuse! Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.! Does not contain tests of new scientic hypotheses! Joint Declaration of Data Citation Principles by the Data Citation Synthesis Group, incl.: - CODATA - Research Data Alliance (RDA), - Force11
  • In-house curation team:! assists users to submit the structured content via simple templates and an internal authoring tool! performs value-added semantic annotation of the experimental metadata! For advanced users/service providers willing to export ISA-Tab for direct submission, we will release a technical specication:! analysis ! method! script! Data le or ! record in a database! Data Descriptor: structure (CC0)!
  • Export to various formats (ISA_tab, RDF, etc) Linking between research papers, Data Descriptors, and data records Making data discoverable !
  • 2 4 3 10 4 1 4 3 4 DNA and protein sequence Functional genomics Genetic association and genome variation Metagenomics Molecular interactions Organism- or disease-specic Proteomics Taxonomy and species diversity Traces and sequencing reads Omics is emphasized among basic life- sciences repositories We currently recognize over 50 public data repositories! We have integrated systems with both:! ! ! Helping authors nd the right place for the data!

Search related