Upload
alex-henderson
View
248
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Presentation given at SPEC 2014, Krakow, Poland. 17-22 August 2014 [some slides do not display correctly, download the pdf for better quality] In our day-to-day practice we collect data, convert this to information, hopefully extract knowledge, and then pass this on to our peers, thereby advancing the global understanding of our field. This is a very linear process. What if we were to share our data? Have others take our information and combine it with their own? Such a branched process would likely result in more rapid discoveries and, potentially, a greater understanding. In order to facilitate data sharing we must define at least two interfaces with our peers; 1. A mechanism of them understanding the language of our data 2. A mechanism of passing on the context of our experiment Of course, both of these must work in reverse; we must understand their data and also their experimental context. These are separate yet related ideas. Our data are meaningless without context, but because we are ‘close to the action’ we do not explicitly document them. Recording the nature of our experiments can have benefits closer to home. Too often we find ourselves searching for results that we know we recorded, but have difficulty locating. Then there is the issue of recalling the exact experimental procedure involved in the sample preparation or data reduction. Documentation of these will lead to better laboratory practice all round. Earlier this year, a network of academic, clinical and industrial groups was constituted in the UK, with some international partners, to consider how best to push forward the use of infrared and Raman spectroscopies in the clinical arena: CLIRSPEC [1]. One of the work packages of the CLIRSPEC network is the development of standard protocols for data sharing. The work package falls, initially, into two parts; 1. How to easily and uniformly transfer our data between research teams and, by association, into an accessible archive. 2. How to record the provenance of our samples, the treatments they undergo, the experiments performed on them and the manner the resulting data was manipulated: the metadata. In this presentation we will outline the current position of the CLIRSPEC work package, both in terms of the performance of various candidate data formats (JCAMP-DX, SPC, netCDF, …), and the options for the recording of the metadata associated with the experimental procedure (controlled vocabularies, XML, RDF, ISA-TAB, …). Included here is the concept of a minimum reporting requirement for IR and Raman, particularly in the clinical arena, that we can all try to meet. None of this can happen without the buy-in of the community. We seek to engage everyone in a dialogue that will result in more consistent, and hopefully better, practice across all laboratories to further our understanding of clinical vibrational spectroscopy. [1] http://clirspec.org
Citation preview
ALEX HENDERSON & PETER GARDNER MANCHESTER INSTITUTE OF BIOTECHNOLOGY
UNIVERSITY OF MANCHESTER, UK HTTP://GARDNER-LAB.COM & HTTP://CLIRSPEC.ORG
SPEC 2014 Shedding New Light on Disease
Kraków, Poland. 17-22 August 2014
WHAT’S MINE IS YOURS (AND VICE VERSA):
DATA SHARING IN VIBRATIONAL SPECTROSCOPY
Sharing…
Why share?
Technique validation
Round-robins
Standard spectra for unknown identification
Standard operating procedure validation
Test visualisation schemes
Remote location of special samples
Remote location of special equipment
What to share?
Raw data files
Eg. For testing data processing procedures
Metadata for sample preparation
Sample SOP
Metadata for experimental procedure/protocol
Acquisition SOP
Processed data to save doing it yourself
What to share?
Raw data files
Eg. For testing data processing procedures
Metadata for sample preparation
Sample SOP
Metadata for experimental procedure/protocol
Acquisition SOP
Processed data to save doing it yourself
How to give?
Pen drive
CD
Dropbox
ftp server
Data repository
One-to-one
One-to-few
One-to-more
One-to-all Best solution
How to receive?
Data in different file formats introduces a barrier to
end user
Disconnect between analysis software and file
format
Incorrectly/poorly coded formats require additional
information
(hyper)Spectral data disconnected from sample
treatments or acquisition protocols
Third-party data analysis suites
Package Author Platform
CytoSpec Peter Lasch MATLAB
hyperSpec Claudia Beleites R
ProSpect Paul Bassan MATLAB
SpecToolbox Matt Baker (and friends) MATLAB
…
Not an exhaustive list, email me your package info
Author must write import filter for each version of
each vendor’s formats
Writing import filters
Slow
Laborious
Steep learning curve
Potential for error
Incomplete filter without sufficient test data
No access to file format specification/detail
IP issues with proprietary formats (NDA)
Some limited to (32-bit) Windows (eg. DLL or DDE)
Objectives 2014 – 2017
Deve
lop
ing
Understanding of interaction of light with clinical samples
Strategies for pre-processing and statistical analysis in clinical
spectroscopy
Pro
toco
ls
Preparation of cells, tissue and biofluids for clinical spectroscopy
Inter-group data sharing
Evid
en
ce
Power of spectroscopy for use in the clinical arena
Requirements of instrumentation suitable for use in the clinic
Clinical Infrared and Raman Spectroscopy for Medical
Diagnosis
PARTNERS
ACADEMIC Peter Gardner
Matthew J Baker
Nicholas Stone
Julian Moger
Josep Sulé-Suso
Francis Martin
Sergei G Kazarian
Hugh J Byrne
Roy Goodacre
John M Chalmers
Alex Henderson
Peter Lasch
Ganesh
Sockalingum
Bayden Wood
Peter Weightman
Gianfelice Cinque
Peter Rich
CLINICAL Noel Clarke
Jonathan Shanks
Timothy Dawson
Charles Davis
Pierre Martin-Hirsch
Hugh Barr
Neil Shepherd
John McGrath
Jim Brown
Sam Janes
INDUSTRIAL Agilent
Bruker
Cobalt Light Systems
Coherent UK
Perkin Elmer
Renishaw
@clirspec http://clirspec.org/
CLIRSPEC Work Package 6
Assess current spectral and image data attributes from the range of currently employed network instrumentation
Develop a standard data transfer format to allow free and easy dissemination of data between network members enhancing collaboration and efficiency of research funding
Provide a single software target, easing the development of third party software and its uptake within the clinical arena
Investigate the utility of standard spectra for specific diseases
Investigate the technological, cultural, ethical and IP issues in order to enable data sharing and reuse
CLIRSPEC Work Package 6
Assess current spectral and image data attributes from the range of currently employed network instrumentation
Develop a standard data transfer format to allow free and easy dissemination of data between network members enhancing collaboration and efficiency of research funding
Provide a single software target, easing the development of third party software and its uptake within the clinical arena
Investigate the utility of standard spectra for specific diseases
Investigate the technological, cultural, ethical and IP issues in order to enable data sharing and reuse
Data format requirements
Operating system neutral
Scalable to large file sizes (futureproof)
Random access (don’t unzip before reading)
File format description available (NDA open)
Other software available that can read it
Quick to write and, more importantly, quick to read
Able to hold (encrypted) instrumental parameters
Enables round-tripping, no information loss
…
Open data formats – Spectra
JCAMP-DX Over 4 compression systems
Some code available
Grams SPC Understands spectroscopy types and units
Some import filters available
CSV/text Simple to read
Not scalable
Not suitable for images
Loss of metadata
Hyperspectral images
Grams SPC
Pixel indexing issues, needs help
ENVI
Manual spectrum-centric or image-centric access
May require IDL library
NetCDF-4
Self-describing, accessed via libraries
Compression and streaming available
3D confocal and tomographic
NetCDF-4
Unlimited dimensionality
Optimised spectrum-centric or image-centric access
through ‘chunking’
Supported
Community input required
Data types that need to be supported
Irregularly shaped images
Collections of spectra
Discrete wavelength data (multispectral not
hyperspectral)
Time course (multiple dependent variables)
Software
Filters written, format testing etc.
THINKING and PLANNING!!
Registration at http://clirspec.org
Groups at http://clirspec.org
Updates at http://clirspec.org
Remember…