Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
An Open Repository Model for Acquiring Knowledge about Scientific Experiments
EKAW 2016 – November 21th, 2016Bologna, Italy
Martin O’Connor, Marcos Martínez-Romero, Attila L. Egyedi, Debra Willrett, John Graybeal, and Mark A. Musen
Stanford University, Stanford, CA, USA
Stanford University metadatacenter.org
Reproducibility Problem in Science
Metadata Key to Addressing Problem
• Crucial for reproducibility in biomedicine– Locate experimental datasets online– Understand how the experiments were performed– Reuse the data to perform new analyses
• Journals and funding agencies increasingly require making experimental data and metadata available
Many Metadata Standards have been Developed
However: Metadata Submission is Hard
Metadata
Summary Data Matrix
Raw Data
Submission Interface
Metadata Submission is Hard - II
age Age AGE `Age
age (after birth) age (in years)
age (y) age (year) age (years) Age (years) Age (Years)
age (yr) age (yr-old)
age (yrs) Age (yrs)
age [y] age [year] age [years] age in years
age of patient Age of patient age of subjects
age(years) Age(years) Age(yrs.) Age, year age, years
age, yrs age.year
age_years
Result: Poor Metadata
Variants of ‘age’ metadata field in Gene Expression Omnibus (GEO) repository
Our Solution: CEDAR - A Metadata Ecosystem
• Overcome the impediments to creating high-quality metadata
• Facilitate– Creation– Acquisition– Use– Evaluation– Refinement
• Key goal: create a sharable metadata exchange format – a template model - for publishing, searching, exchanging metadata
CEDAR Template Model Goals• Must describe composite
structure of templates• Implemented using standard
formats• Express semantics• Metadata instances:
– Linked to controlled terms– Easily serializable– Easily validated– Easily indexed– Interchange with RDF– High readable– Produced/consumed via
REST APIs and usable in JavaScript front ends
– Meets FAIR goals
Study
Principal Investigator
Description
Name
Institution
Name
ZIP
Title
Metadata Template
FieldsTemplateElements
JSON Schema + JSON-LD JSON-LD
Using JSON Schema and JSON-LD for CEDAR Template Model
What is JSON Schema?• Technology for describing and validating the
structure of JSON documents
• Provides a structural description of any JSON document
• JSON documents that are specified with JSON Schema can be structurally validated against their associated schemas
• Analogous to XML Schema
What is JSON-LD?• A lightweight syntax to serialize Linked Data in JSON
• Allows existing JSON to be interpreted as Linked Data with minimal changes
• JSON-LD is primarily intended to be a way to:– use Linked Data in Web-based programming environments– build interoperable Web services– store Linked Data in JSON-based storage engines
• Core contribution: add semantics to JSON documents
• W3C Recommendation: https://www.w3.org/TR/json-ld/
{ "$schema": "http://json-schema.org/draft-04/schema#", "@type": "https://repo.metadatacenter.org/core/Template",
"@id": "https://repo.metadatacenter.org/templates/434334", "title": ”Study", "description": ”Study template", "type": "object", "_ui": {...}, "properties": {
"title": {...}, ”description": {...}, ”principalInvestigator": {...} }, "required": ["title", "description", "principalInvestigator"]
"additionalProperties": false}
Using JSON Schema to Define Template Structure
{ "title": { "@value": "Immune biomarkers study" }, "description": { "@value": "Immune biomarkers …" }, "principalInvestigator": {
"name": { "@value": "Dr. P.I" }, "institution": { "name": { "@value": "Stanford" },
"zip": { "@value": "94305" } } }}
Using JSON-LD to add Semantics to Metadata Instances
{ "@type": "http://semantic-dicom.org/dcm#Study", "@id": "https://repo.metadatacenter.org/template_instances/55417", "@context": {
"title": "https://schema.org/title", "name": "https://schema.org/name", "description": "https://schema.org/description",
"zip": "https://schema.org/postalCode", "pi": "https://myschema.org/property/hasPI", "institution": "https://myschema.org/property/hasInstitution" },
"title": { "@value": "Immune biomarkers study" }, "description": { "@value": "Immune biomarkers …" }, ”principalInvestigator": {
"@type": "https://schema.org/Person", "@id": "https://repo.metadatacenter.org/template_elements/557", "name": { "@value": "Dr. P.I" }, "institution": {
"@type": "https://schema.org/Organization", "@id": "https://repo.metadatacenter.org/template_elements/37", "name": { "@value": "Stanford" }, "zip": { "@value": "94305" }
} }}
Using JSON-LD to add Semantics to Metadata Instances - II
CEDAR Metadata Instances can be transformed to an RDF Graph
tinstances:55417
telements:37
telements:557
Immune biomarkers study
Immune biomarkers …
schema:Organization
schema:Person
dcm:Study
Dr. P.I.
Stanford
94305
rdf:type
rdf:type
rdf:type
schema:nameschema:description
schema:name
schema:nameschema:postalCode
myschema:hasPI
myschema:hasInstitution
CEDAR Template Model
Controlled terminologies
Model drives CEDAR Workbench
Template Designer provides Template Creation
Metadata Editor automatically generates an Acquisition Interface
Metadata Editor Adds Semantics
Initial Results• Public alpha release in September 2016• Represented all public metadata in
ImmPort repository (146 studies)• Represented an array of public ISA-
created biomedical studies (~300)• Represented 60k ISO 11179-based
Common Data Elements from NCI• Currently working with Stanford Digital
Repository and several research groups
Summary• We have developed a standards-based
template model for representing, publishing, and sharing templates and metadata
• Provides strong interoperation with Linked Open Data
• Metadata easy to create/consume using off-the-shelf tools
• Very easy to work with using CEDAR tools
CEDAR Resources• Web site: http://metadatacenter.org• Workbench: https://cedar.metadatacenter.net• GitHub: https://metadatacenter.github.io