Upload
alasdair-gray
View
286
Download
1
Embed Size (px)
Citation preview
@gray_alasdair www.macs.hw.ac.uk/~ajg33
1
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
Alasdair J G GrayHeriot-Watt University www.macs.hw.ac.uk/[email protected]@gray_alasdair
Michel DumontierStanford University
M. Scott MarshallMAASTRO Clinic
30/11/2016
Open PHACTS Example
Data Cache (Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON) DomainSpecificServices
Identity Resolution
Service
IdentifierManagement
Service
“Adenosine receptor 2a”
EC2.43.4CS4532
P12374
Cor
e Pl
atfo
rm
ChEMBL-RDF
ChEMBLv13
Chem2Bio2RDF
SD
v13v12
v2 or v8
Which ChEMBL version?
@gray_alasdair www.macs.hw.ac.uk/~ajg33 3
Open PHACTS Example
OPS Example
Data Cache (Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON) DomainSpecificServices
Identity Resolution
Service
IdentifierManagement
Service
“Adenosine receptor 2a”
EC2.43.4CS4532
P12374
Cor
e Pl
atfo
rm
ChEMBL-RDF
ChEMBLv13
Chem2Bio2RDF
SD
v13v12
v2 or v8
Open PHACTSDiscovery PlatformHistoric Use Case
~January 2012
Open PHACTS v2.1ChEMBL 20
http://tiny.cc/ops-datasets
Which ChEMBL version?
@gray_alasdair www.macs.hw.ac.uk/~ajg33 6
Challenges• Datasets available
– In many versions over time– In different formats– From many mirrors/registries
• Datasets build on each other• Files do not carry metadata• Registries
– Can be out-of-date– Can contain conflicting information
30/11/2016 @gray_alasdair www.macs.hw.ac.uk/~ajg33 7
Scientists require data provenance!
Dublin Core Metadata Initiative
Widely usedBroadly applicable
– Documents– Datasets
✗Generic terms✗Not comprehensive✗No required properties
30/11/2016 @gray_alasdair www.macs.hw.ac.uk/~ajg33 8
“Date: A point or period of time associated with an event in the lifecycle of the resource.”
9@gray_alasdair www.macs.hw.ac.uk/~ajg33
Metadata carried with data– Directly embedded: void:inDataset
✗No versioning✗No checklist of requisite fields✗Only for RDF data
VoID: Vocabulary of Interlinked Datasets
30/11/2016
DCAT: Data CatalogSeparates Dataset and Distribution✗No versioning✗No prescribed properties
30/11/2016 @gray_alasdair www.macs.hw.ac.uk/~ajg33 10
W3C HCLS Group
HCLS Dataset Descriptions
61 Metadata properties from 18 vocabularies5 Modules: Core, Identifiers, Provenance, Distributions, Stats
Prescribed Usage
Element Property Value Summary Level
Version Level
Distribution Level
Core MetadataType declaration rdf:type dctypes:Dataset MUST MUST SHOULD
Type declaration rdf:type void:Dataset or
dcat:DistributionMUST NOT
MUST NOT MUST
Title dct:title rdf:langString MUST MUST MUSTAlternative titles dct:alternative rdf:langString MAY MAY MAY
Description dct:description rdf:langString MUST MUST MUST
… … … … … …
30/11/2016 @gray_alasdair www.macs.hw.ac.uk/~ajg33 15
ChEMBL: Summary Level
Requires Tooling
Creation Validation
30/11/2016 @gray_alasdair www.macs.hw.ac.uk/~ajg33 17
Implementations
RDF Platform
More coming…
HCLS Dataset Descriptions
https://www.w3.org/TR/hcls-dataset/Dumontier M, Gray AJG, Marshall MS, et al. (2016) The health care and life sciences community profile for dataset descriptions. PeerJ 4:e2331 https://doi.org/10.7717/peerj.2331
[email protected] @gray_alasdair