Upload
cory-robertson
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
15th of June 2009 Grids & e-Science 2009 Santander 1
eScience activities in a brain imaging research network.
David Rodríguez GonzálezSINAPSE collaborationNational e-Science Centre. School of Informatics& SFC Brain Imaging Research Centre, Division of Clinical NeuroscienceUniversity of Edinburgh
On behalf of the SINAPSE Collaboration.
15th of June 2009 Grids & e-Science 2009 Santander 2
Contents
SINAPSE eScience for SINAPSE Data Protection Data de-identification Data Sharing Portal Status & Plans
15th of June 2009 Grids & e-Science 2009 Santander 3
Contents
SINAPSE eScience for SINAPSE Data Protection Data de-identification Data Sharing Portal Status & Plans
15th of June 2009 Grids & e-Science 2009 Santander 5
Massive expansion in research imaging
All branches of medicine – particularly brain
Not just medicine – psychology, linguistics, engineering, parapsychology, etc.
In Scotland too!!! 8% UK population 12.5% of all highest rated departments. Highest concentration of biotech in Europe
Neuroscience – much larger than NIH But in 2006 there were machines, pockets of
excellence, but little cohesion
Slide by J. Wardlaw
15th of June 2009 Grids & e-Science 2009 Santander 6
The SINAPSE Project
Stands for Scottish Imaging Network: a Platform for Scientific Excellence.
Pooling initiative of six Scottish universities: Aberdeen, Dundee, Edinburgh, Glasgow, St. Andrews and Stirling.
Main objectives: develop imaging expertise, support multi-centre clinical research in conjunction
with the Clinical Research Networks, improve the ability of neuroscientists to collaborate
on clinical trials, have a direct impact on patient health.
15th of June 2009 Grids & e-Science 2009 Santander 7
SINAPSE – gluing it together
Networking CRNs – large patient populations CRFs – patient-focused research facilities Poolings – buys more science Individual projects – ageing cohorts, multicentre
studies Harmonise framework for image data
management Standardise imaging methods Make available image processing methods Ethics, good imaging research practice Translation from bench to bedside
Slide by J. Wardlaw
15th of June 2009 Grids & e-Science 2009 Santander 8
SINAPSE priority projects
Stroke, the brain and the blood-brain interface
Ageing brain to dementia
Novel molecular imaging markers for major psychiatric disorders
Innovative radiotracers for CNS inflammation
15th of June 2009 Grids & e-Science 2009 Santander 10
Contents
SINAPSE eScience for SINAPSE Data Protection Data de-identification Data Sharing Portal Status & Plans
15th of June 2009 Grids & e-Science 2009 Santander 11
e-Science for SINAPSE
Sharing of research data and applications between centres is an important part of the SINAPSE project’s objectives The increasing amount of data acquired in
modern imaging facilities and the distributed nature of SINAPSE require a proper data management strategy
National e-Science Centre actively involved in the SINAPSE collaboration Mainly through the IT & Image Analysis
Committee
15th of June 2009 Grids & e-Science 2009 Santander 12
eScience project activities
Information governance & data de-identification Networking Development of de-identification tool
Data sharing infrastructure Facilitating multi-centre studies
Portal for brain imaging Improving usability
Other Analysis methods
15th of June 2009 Grids & e-Science 2009 Santander 13
Contents
SINAPSE eScience for SINAPSE Data Protection Data de-identification Data Sharing Portal Status & Plans
15th of June 2009 Grids & e-Science 2009 Santander 14
Data Protection Act
UK’s Data Protection Act (1998). Implements the European Community Data Protection Directive 1995.
Establish individuals’ rights on data held about them and obligations for organisations or people processing personal data.
Personal data must be processed in a fair and lawful manner. 8 DPA principles.
Other legislation pieces apply to medical data. Common law: duty of confidentiality. Human Rights Act 1998 (article 8).
15th of June 2009 Grids & e-Science 2009 Santander 15
DPA in research
The DPA does not define the term “research purposes” apart from clarifying that it includes statistical or historical purposes.
Data processing for research should be ‘compatible’ with the purpose for which the data were originally obtained.
The data subjects should be aware that their personal information will be used for research purposes.
15th of June 2009 Grids & e-Science 2009 Santander 16
Anonymous Data
Coded (pseudonymised or linked anonymised) data: the identifiable information has been
substituted by alphanumerical sequences with no plain meaning.
The data is anonymous to the research team. The key to reverse the transformation shall be
held securely by a third party to avoid falling into the DPA.
(Fully) Anonymised data: all personal identifiers or codes have been
irreversibly removed.
15th of June 2009 Grids & e-Science 2009 Santander 17
MIDAS meeting (18th March 2009)
Medical Imaging Data Access and Sharing
Hosted in the e-Science Institute Brought together representatives from
the NHS Scotland & the universities Successful meeting with useful discussion
Came out with a roadmap for improving the data sharing between both sides
Report produced now being circulated between attendees
15th of June 2009 Grids & e-Science 2009 Santander 18
Contents
SINAPSE eScience for SINAPSE Data Protection Data de-identification Data Sharing Portal Status & Plans
15th of June 2009 Grids & e-Science 2009 Santander 19
SINAPSE DICOM De-Identification Toolkit
Implemented in Java. Configurable for each site. The idea is to deploy it as near as possible to the data
acquisition. Privacy Policy configurable using XML documents.
Different projects can apply different policies. The policy specifies the classes that will execute the
transformation of the data. Graphical tool for editing the policies.
These classes will be distributed in signed jars, and their authenticity will be checked using their hash.
For data provenance checks and auditing purposes the classes’ version will be tracked.
15th of June 2009 Grids & e-Science 2009 Santander 20
Data De-Identification
National PACS
CHI Transformation
ServiceSINAPSE
AnonymiserLocal Storage
Anonymous research data
Link Table
NHS Research Centre
Local RIS
15th of June 2009 Grids & e-Science 2009 Santander 21
CHI Transformation Service CHI (Community Health Index) is the National
unique identifier for NHS (Scotland) patients Used in any health related communication As it identifies the patient it is sensitive information
It is composed of 10 digits that include Date of birth Gender Control digit
Possibilities Reversible / Irreversible transformation Unique for all SINAPSE / Unique for each Data
Controller
15th of June 2009 Grids & e-Science 2009 Santander 22
Data input
De-identification
Metadata
extraction
Data output
Anonymiser workflow
File system
Receiver
File system
SFTP
Content Provenance
Structure Catalogue
15th of June 2009 Grids & e-Science 2009 Santander 23
DICOM standard
DICOM library
DICOM library adaptor
Anonymiser library
FieldTransformer
ApplicationPolicy Builder
FieldTransformer
FieldTransformer
FieldTransformer
Privacy
policy
SINAPSE Anonymiser components
15th of June 2009 Grids & e-Science 2009 Santander 24
FieldTransformers
Classes implementing an interface that are used for atomic transformations of the contents of fields.
Specified in run time by the used Privacy Policy.
Format independent. Only work with the content.
Examples: DatesTransformer StudyIDTransformer InformationOverwriter …
15th of June 2009 Grids & e-Science 2009 Santander 25
Privacy Policies
XML documents containing the rules for anonymising the data
Specify: The target fields The class used for the transformation
including: Version Digest Location (jar file)
Parameters
15th of June 2009 Grids & e-Science 2009 Santander 26
Policy Editor
A graphical tool to help building policy documents.
DICOM dictionary. Searches for “FieldTransformer”
classes in jar files.
15th of June 2009 Grids & e-Science 2009 Santander 28
Registry
A catalogue containing privacy policies.
The application can work without this. But it helps to set a coherent set of
policies. Transformer classes,
and the corresponding jar files.
15th of June 2009 Grids & e-Science 2009 Santander 29
Contents
SINAPSE eScience for SINAPSE Data Protection Data de-identification Data Sharing Portal Status & Plans
15th of June 2009 Grids & e-Science 2009 Santander 30
Data Sharing e-Infrastructure
For enabling multi-centre clinical research through data sharing
Some features of the proposed of the SINAPSE e-infrastructure project are: De-Identification, automatic compliance with data
protection policies; Security, advanced authentication and
authorisation within projects; Usability, providing a user friendly environment to
access data and applications; Modularity, conforming to relevant standards and
use of existing components; Centralisation, leveraging existing compute
clusters and storage.
15th of June 2009 Grids & e-Science 2009 Santander 31
Benefits
Easier Data Protection compliance for users
Enables secure data sharing Coherent view of available data
(single point of access) Roadmap for end-of-project data
publication & data curation
15th of June 2009 Grids & e-Science 2009 Santander 32
Data Storage & Access
Centralised model adopted: cheaper, easier, allows to reduce the IT burden undertaken by research staff. Although there are several grid projects that
provide DICOM functionalities. The research data will be encrypted
before storing it. Data organised per project
Access control using groups & roles. Authentication using Shibboleth due to
usability concerns regarding X.509 certificates.
15th of June 2009 Grids & e-Science 2009 Santander 33
Centralised Architecture (pros & cons)
Simpler Deployment Easier middleware release control Lesser impact in participant centres Easier to manage and use No default resilience
A second centre would be needed But this is only necessary for critical services With a good support a reasonable service can
be provided using a single centre
15th of June 2009 Grids & e-Science 2009 Santander 34
Deployment Plan
ECDF (http://www.is.ed.ac.uk/ecdf/) A singular facility along Scotland
Disk space and CPU time will be rented depending on the necessities.
1456 CPU cores 275 TB of disk
Also SINAPSE owned server to be hosted by ECDF: ECDF will provide basic hardware + software support SINAPSE services to be hosted in it:
Portal Data Catalogue Research Data encryption service OGSA-DAI Projects’ customised databases RAPID…
15th of June 2009 Grids & e-Science 2009 Santander 37
RESOURCES
DATA PROVIDER SERVICES
SINAPSE SERVICES
CPUs Storage Network Local Auth
SINAPSE Anonymiser CHI Transformation Service
VOMS JSS Metadata Catalogue
RD Key Storage
Portal Basic WS
Shibboleth
RAPID
RD Encryption OGSA-DAI
SINAPSE EXTERNALLOCAL
APPLICATIONS
Storage
Ageing Psychiatry Stroke …
CPUs
15th of June 2009 Grids & e-Science 2009 Santander 38
Authentication
Shibboleth federated authentication Single sign-on Delegated to home universities Users will continue using a method they
are already familiar with X.509 certificates are usual in Grids
But can be a handicap for some users
15th of June 2009 Grids & e-Science 2009 Santander 39
Authorisation
Dynamic Virtual Organisations Members should be added/removed
easily New VOs creation for new
projects/studies VO role management
Role based access Allows different access levels to
information for different users
15th of June 2009 Grids & e-Science 2009 Santander 40
Catalogues
Data Catalogue for keeping track of the files in the system
Metadata Catalogue storing key attributes extracted from the DICOM headers It will also keep information on de-identification
process for data provenance Clinical Information databases and
customised metadata databases can be deployed by the different projects
OGSA-DAI will be used to provide access to these resources
15th of June 2009 Grids & e-Science 2009 Santander 41
Contents
SINAPSE eScience for SINAPSE Data Protection Data de-identification Data Sharing Portal Status & Plans
15th of June 2009 Grids & e-Science 2009 Santander 42
Portal
A gridsphere based portal will give access to the resources.
Basic functionality to be provided by SINAPSE Data uploading Catalogues querying …
Different subprojects will develop their own customised portlets to be integrated in the portal
15th of June 2009 Grids & e-Science 2009 Santander 43
A portal for brain imaging(MSc project by Albert Heyrovský)
Motivations to facilitate the
usage of complex software packages
to provide access to large computing resources like ECDF
First application: brain perfusion imaging analysis
Easily extensible to other brain imaging applications
Portlets generated using the Rapid system developed at NeSC
15th of June 2009 Grids & e-Science 2009 Santander 44
Contents
SINAPSE eScience for SINAPSE Data Protection Data de-identification Data Sharing Portal Status & Plans
15th of June 2009 Grids & e-Science 2009 Santander 45
Status
The proposal was adopted by the SINAPSE IT & Image Analysis committee
Grant application to support pilot project (including hardware & storage resources) rejected Considering resubmission
SINAPSE De-Identification Toolkit deployed SBIRC (Edinburgh) Aberdeen Used for anonymising acute stroke study data
15th of June 2009 Grids & e-Science 2009 Santander 46
Plans
Development of new components started: Catalogues.
Portal for brain imaging to be kickstarted with the MsC in eScience student´s project
Collaboration with other centres CRIC (Edinburgh) TMRC (Dundee)