10

SAIL: Documenting data content and quality, letting the computer take the strain Caroline Brooks Senior Research Analyst, College of Medicine, Swansea

Embed Size (px)

Citation preview

Page 1: SAIL: Documenting data content and quality, letting the computer take the strain Caroline Brooks Senior Research Analyst, College of Medicine, Swansea
Page 2: SAIL: Documenting data content and quality, letting the computer take the strain Caroline Brooks Senior Research Analyst, College of Medicine, Swansea

SAIL: Documenting data content and quality, letting the computer take the strain

Caroline Brooks

Senior Research Analyst, College of Medicine, Swansea University

Ann Wrightson

Lead Technical Design Architect, NHS Wales Informatics Service

Hon. Research Associate, College of Medicine, Swansea University

Page 3: SAIL: Documenting data content and quality, letting the computer take the strain Caroline Brooks Senior Research Analyst, College of Medicine, Swansea

Swansea Health Informatics Research & NWIS

Partners in establishing and sustaining SAIL

Wider collaboration in usability testing and innovation

>Sharing skills & thinking around secondary uses of

data

Page 4: SAIL: Documenting data content and quality, letting the computer take the strain Caroline Brooks Senior Research Analyst, College of Medicine, Swansea

Ideas and facts

General approaches in data research:

People have ideas and test them using the available facts

Ideas come from the available facts

But – facts are not so easy to see in the data!

Researchers need help...

Which data resources contain the facts I need?

What do I need to know about this data to use it well?

Page 5: SAIL: Documenting data content and quality, letting the computer take the strain Caroline Brooks Senior Research Analyst, College of Medicine, Swansea

What’s in this repository, anyway?

Dataset level – catalogue

What/from where/from whom/how collected/rights to use

Record level – dataset entry description

Data model (entity-relationship model)

Item level - field/attribute description

Data types/ranges/controlled terms

Page 6: SAIL: Documenting data content and quality, letting the computer take the strain Caroline Brooks Senior Research Analyst, College of Medicine, Swansea

How good is this data? What can it do for me?

Item

Population of this field/attribute - Why present? Why

absent?

Significance of this field/attribute – What does it mean for

me?

Record

Evidential value of presence &/or absence of particular

record

Dataset

What work has already been done with this data?

Page 7: SAIL: Documenting data content and quality, letting the computer take the strain Caroline Brooks Senior Research Analyst, College of Medicine, Swansea

Work already done – www.saildatabank.org

SAIL databank website includes human readable dataset

catalogue

Description, source, related publications, data model

Data Quality report (developed by SAIL team in 2013)

Standardized informative documentation for each dataset

Produced by automated analysis of data, published as PDF

Working with Canadian colleagues (MCHP and Pop Data

BC)

Technology refresh of SAIL platform (CIPHER project – 2013-14)

Page 8: SAIL: Documenting data content and quality, letting the computer take the strain Caroline Brooks Senior Research Analyst, College of Medicine, Swansea

Work in progress

Machine-readable format for catalogue and data quality

information

Data Documentation Initiative (DDI) format

Initial target: publish on website as download link in

catalogue

Making outcomes of in-depth data quality work available for

reuse

Algorithms that instantiate clinical & social research

concepts

Evaluation of data coverage across populations of

individuals

Knowledge sharing with NWIS data warehouse team

Page 9: SAIL: Documenting data content and quality, letting the computer take the strain Caroline Brooks Senior Research Analyst, College of Medicine, Swansea

Future directions

Further work on characterizing concepts in data – reproducible,

reusable

How to make good use of SNOMED CT in source data

New knowledge & skills needed, also issues with old/new

data

NWIS also working on this, another good area for

collaboration

More general use of knowledge models alongside data

Comprehensive & integrated metadata reference

architecture

Data annotation, e.g. using biomedical science ontologies

Page 10: SAIL: Documenting data content and quality, letting the computer take the strain Caroline Brooks Senior Research Analyst, College of Medicine, Swansea

Thank you for your attention