Upload
alasdair-gray
View
122
Download
1
Embed Size (px)
DESCRIPTION
This presentation gives an overview of the dataset description specification developed in the Open PHACTS project (http://www.openphacts.org/). The creation of the specification was driven by a real need within the project to track the datasets used. Details of the dataset metadata captured and the vocabularies used to model this metadata are given together with the tools developed to enable the specification's uptake. Over the course of the last 12 months, the W3C Healthcare and Life Science Interest Group have been developing a community profile for dataset descriptions. This has drawn on the ideas developed in the Open PHACTS specification. A brief overview of the forthcoming community profile is given in the presentation. This presentation was given to the Network Data Exchange project http://www.ndexbio.org/ on 2 April 2014.
Citation preview
Dataset Descriptions in Open PHACTS and
W3C HCLS IG
Alasdair J G GrayHeriot-Watt University
www.alasdairjggray.co.uk [email protected]
NDEx Call, April 2014
RDFNanopub
Db
VoID
Data Cache (Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)DomainSpecificServices
Identity Resolution
Service
Chemistry RegistrationNormalisation & Q/C
IdentifierManagement
Service
Indexing
Cor
e Pl
atfo
rm
P12374EC2.43.4
CS4532
“Adenosine receptor 2a”
RDF
VoID
Db
RDFNanopub
Db
VoID
RDF
Db
VoID
RDFNanopub
VoID
Public Content Commercial
Public Ontologies
User Annotations
Apps
Data Cache (Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON) DomainSpecificServices
Identity Resolution
Service
IdentifierManagement
Service
Cor
e Pl
atfo
rm
P12374EC2.43.4
CS4532
“Adenosine receptor 2a”
ChEMBL-RDF
ChEMBL
Apps
Chem2Bio2RDF
SD
v13v12v2 or v8
ChEMBL
January 2012
ChemSpider
• Data aggregator: over 400 sources– What data does it contain?– What version of ?? did they load?– When are new versions loaded?
• OPS data covers– ChEBI– ChEMBL– DrugBank
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 5
Metadata Challenges
• Datasets available– In many versions over time– In different formats– From many mirrors/registries
• Datasets build on each other• Files do not carry metadata• Registries
– Can be out-of-date– Can contain conflicting information
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 6
Users require data
provenance!
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 7
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 8
Description Model
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 9
Realisation of Dataset Descriptions
• Needs to be incorporated into data publishing pipeline
• Hard for publishers to provide conformant descriptions– Datasets are complex– Evolve over time– Seen as yet another burden
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 15
VoID Editor
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 16
Validator
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 17
W3C HCLS Group
HCLS Community Profile Model
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 19
Future Vision
Metadata: Write once, use many times• Provide rich and accurate provenance trail of
data– Automatic pipeline from VoID file to registries
• Align Open PHACTS with W3C HCLS– Update tools for HCLS profile
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 20