Upload
allen-eveleigh
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
SAIL: Documenting data content and quality, letting the computer take the strain
Caroline Brooks
Senior Research Analyst, College of Medicine, Swansea University
Ann Wrightson
Lead Technical Design Architect, NHS Wales Informatics Service
Hon. Research Associate, College of Medicine, Swansea University
Swansea Health Informatics Research & NWIS
Partners in establishing and sustaining SAIL
Wider collaboration in usability testing and innovation
>Sharing skills & thinking around secondary uses of
data
Ideas and facts
General approaches in data research:
People have ideas and test them using the available facts
Ideas come from the available facts
But – facts are not so easy to see in the data!
Researchers need help...
Which data resources contain the facts I need?
What do I need to know about this data to use it well?
What’s in this repository, anyway?
Dataset level – catalogue
What/from where/from whom/how collected/rights to use
Record level – dataset entry description
Data model (entity-relationship model)
Item level - field/attribute description
Data types/ranges/controlled terms
How good is this data? What can it do for me?
Item
Population of this field/attribute - Why present? Why
absent?
Significance of this field/attribute – What does it mean for
me?
Record
Evidential value of presence &/or absence of particular
record
Dataset
What work has already been done with this data?
Work already done – www.saildatabank.org
SAIL databank website includes human readable dataset
catalogue
Description, source, related publications, data model
Data Quality report (developed by SAIL team in 2013)
Standardized informative documentation for each dataset
Produced by automated analysis of data, published as PDF
Working with Canadian colleagues (MCHP and Pop Data
BC)
Technology refresh of SAIL platform (CIPHER project – 2013-14)
Work in progress
Machine-readable format for catalogue and data quality
information
Data Documentation Initiative (DDI) format
Initial target: publish on website as download link in
catalogue
Making outcomes of in-depth data quality work available for
reuse
Algorithms that instantiate clinical & social research
concepts
Evaluation of data coverage across populations of
individuals
Knowledge sharing with NWIS data warehouse team
Future directions
Further work on characterizing concepts in data – reproducible,
reusable
How to make good use of SNOMED CT in source data
New knowledge & skills needed, also issues with old/new
data
NWIS also working on this, another good area for
collaboration
More general use of knowledge models alongside data
Comprehensive & integrated metadata reference
architecture
Data annotation, e.g. using biomedical science ontologies
Thank you for your attention