23
An Open Repository Model for Acquiring Knowledge about Scientific Experiments EKAW 2016 – November 21 th , 2016 Bologna, Italy Martin O’Connor, Marcos Martínez-Romero, Attila L. Egyedi, Debra Willrett, John Graybeal, and Mark A. Musen Stanford University, Stanford, CA, USA Stanford University metadatacenter.org

An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

An Open Repository Model for Acquiring Knowledge about Scientific Experiments

EKAW 2016 – November 21th, 2016Bologna, Italy

Martin O’Connor, Marcos Martínez-Romero, Attila L. Egyedi, Debra Willrett, John Graybeal, and Mark A. Musen

Stanford University, Stanford, CA, USA

Stanford University metadatacenter.org

Page 2: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

Reproducibility Problem in Science

Page 3: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

Metadata Key to Addressing Problem

•  Crucial for reproducibility in biomedicine–  Locate experimental datasets online–  Understand how the experiments were performed–  Reuse the data to perform new analyses

•  Journals and funding agencies increasingly require making experimental data and metadata available

Page 4: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

Many Metadata Standards have been Developed

Page 5: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

However: Metadata Submission is Hard

Page 6: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

Metadata  

Summary  Data  Matrix  

Raw  Data  

Submission  Interface  

Metadata Submission is Hard - II

Page 7: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

age Age AGE `Age

age (after birth) age (in years)

age (y) age (year) age (years) Age (years) Age (Years)

age (yr) age (yr-old)

age (yrs) Age (yrs)

age [y] age [year] age [years] age in years

age of patient Age of patient age of subjects

age(years) Age(years) Age(yrs.) Age, year age, years

age, yrs age.year

age_years

Result: Poor Metadata

Variants  of  ‘age’  metadata  field  in  Gene  Expression  Omnibus  (GEO)  repository  

Page 8: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

Our Solution: CEDAR - A Metadata Ecosystem

•  Overcome the impediments to creating high-quality metadata

•  Facilitate–  Creation–  Acquisition–  Use–  Evaluation–  Refinement

•  Key goal: create a sharable metadata exchange format – a template model - for publishing, searching, exchanging metadata

Page 9: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

CEDAR Template Model Goals•  Must describe composite

structure of templates•  Implemented using standard

formats•  Express semantics•  Metadata instances:

–  Linked to controlled terms–  Easily serializable–  Easily validated–  Easily indexed–  Interchange with RDF–  High readable–  Produced/consumed via

REST APIs and usable in JavaScript front ends

–  Meets FAIR goals

Study

Principal Investigator

Description

Name

Institution

Name

ZIP

Title

Metadata Template

FieldsTemplateElements

Page 10: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

JSON Schema + JSON-LD JSON-LD

Using JSON Schema and JSON-LD for CEDAR Template Model

Page 11: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

What is JSON Schema?•  Technology for describing and validating the

structure of JSON documents

•  Provides a structural description of any JSON document

•  JSON documents that are specified with JSON Schema can be structurally validated against their associated schemas

•  Analogous to XML Schema

Page 12: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

What is JSON-LD?•  A lightweight syntax to serialize Linked Data in JSON

•  Allows existing JSON to be interpreted as Linked Data with minimal changes

•  JSON-LD is primarily intended to be a way to:–  use Linked Data in Web-based programming environments–  build interoperable Web services–  store Linked Data in JSON-based storage engines

•  Core contribution: add semantics to JSON documents

•  W3C Recommendation: https://www.w3.org/TR/json-ld/

Page 13: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

{ "$schema": "http://json-schema.org/draft-04/schema#", "@type": "https://repo.metadatacenter.org/core/Template",

"@id": "https://repo.metadatacenter.org/templates/434334", "title": ”Study", "description": ”Study template", "type": "object", "_ui": {...}, "properties": {

"title": {...}, ”description": {...}, ”principalInvestigator": {...} }, "required": ["title", "description", "principalInvestigator"]

"additionalProperties": false}

Using JSON Schema to Define Template Structure

Page 14: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

{ "title": { "@value": "Immune biomarkers study" }, "description": { "@value": "Immune biomarkers …" }, "principalInvestigator": {

"name": { "@value": "Dr. P.I" }, "institution": { "name": { "@value": "Stanford" },

"zip": { "@value": "94305" } } }}

Using JSON-LD to add Semantics to Metadata Instances

Page 15: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

{ "@type": "http://semantic-dicom.org/dcm#Study", "@id": "https://repo.metadatacenter.org/template_instances/55417", "@context": {

"title": "https://schema.org/title", "name": "https://schema.org/name", "description": "https://schema.org/description",

"zip": "https://schema.org/postalCode", "pi": "https://myschema.org/property/hasPI", "institution": "https://myschema.org/property/hasInstitution" },

"title": { "@value": "Immune biomarkers study" }, "description": { "@value": "Immune biomarkers …" }, ”principalInvestigator": {

"@type": "https://schema.org/Person", "@id": "https://repo.metadatacenter.org/template_elements/557", "name": { "@value": "Dr. P.I" }, "institution": {

"@type": "https://schema.org/Organization", "@id": "https://repo.metadatacenter.org/template_elements/37", "name": { "@value": "Stanford" }, "zip": { "@value": "94305" }

} }}

Using JSON-LD to add Semantics to Metadata Instances - II

Page 16: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

CEDAR Metadata Instances can be transformed to an RDF Graph

tinstances:55417

telements:37

telements:557

Immune biomarkers study

Immune biomarkers …

schema:Organization

schema:Person

dcm:Study

Dr. P.I.

Stanford

94305

rdf:type

rdf:type

rdf:type

schema:nameschema:description

schema:name

schema:nameschema:postalCode

myschema:hasPI

myschema:hasInstitution

Page 17: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

CEDAR  Template  Model  

Controlled  terminologies  

Model drives CEDAR Workbench

Page 18: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

Template Designer provides Template Creation

Page 19: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

Metadata Editor automatically generates an Acquisition Interface

Page 20: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

Metadata Editor Adds Semantics

Page 21: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

Initial Results•  Public alpha release in September 2016•  Represented all public metadata in

ImmPort repository (146 studies)•  Represented an array of public ISA-

created biomedical studies (~300)•  Represented 60k ISO 11179-based

Common Data Elements from NCI•  Currently working with Stanford Digital

Repository and several research groups

Page 22: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

Summary•  We have developed a standards-based

template model for representing, publishing, and sharing templates and metadata

•  Provides strong interoperation with Linked Open Data

•  Metadata easy to create/consume using off-the-shelf tools

•  Very easy to work with using CEDAR tools

Page 23: An Open Repository Model for Acquiring Knowledge about ...€¦ · EKAW 2016 – November 21th, 2016 Bologna, Italy ... format – a template model - for publishing, searching, exchanging

CEDAR Resources•  Web site: http://metadatacenter.org•  Workbench: https://cedar.metadatacenter.net•  GitHub: https://metadatacenter.github.io