ADMIRAL progress summary Graham Klyne Image Bioinformatics Research Group Zoology Department, Oxford University ADMIRAL Project Meeting 20 May 2010

ADMIRAL progress summary

Graham KlyneImage Bioinformatics Research Group

Zoology Department, Oxford University

ADMIRAL Project Meeting20 May 2010

Institution and subject context ADMIRAL targets small life science research

groups (3-6 people) in a prestigious research department, each with world-class leadership

Research topics are very diverse: Silk: properties and genetic factors Animal behaviour: learning and decision making Evolutionary development: evolution of genes Elephant conservation in Africa

Seasonal field and laboratory data collection, interspersed with analysis and interpretation

Diverse data, including spreadsheets, images, videos and genetic sequences

Our approach to research users

Our researchers tend to be very busy, under pressure to publish high-impact papers e.g. in Nature

They are often away conducting field studies, using external facilities or at conferences

To gain their attention, we must offer something easily perceived to directly support their aims

We've tried to focus on “pain points”, providing solutions where they can already recognize problems or foresee needs

Sheer curation“curation by addition” *

We'll take what they've got, then improve it incrementally through various tools and techniques

Start with raw data from a shared file system, with automatic backups

Add tools to support annotation, packaging and data-repository submission Where possible, new tools should add immediate value

* “curation by addition” due to Ben O'Steen: http://oxfordrepo.blogspot.com/2008/10/modelling-and-storing-phonetics.html

Project structure recap

Data usage surveys to test requirements and assess improvements in data management

Phase 1: create a minimal front-to-back framework for dataset and metadata acquisition and repository submission Actually used by researchers Acquisition via file sharing system with parallel web access Annotation using Shuffl, creating RDF in the file system Repository submission by file transfer

Phase 2: selected incremental improvements, guided by feedback from researchers

Progress to date

Surveys from 3 of 4 research groups, initial analyses Elephant conservation group field station lost in flash floods

Access-controlled shared file area with automatic backup, accessible locally and via the web in use by the Silk Group Focusing on most engaged group, others to follow (soon!) 1-2 months slower progress than anticipated – more later

Started adapting Shuffl to create annotations for repository submission

Discussing submission details with OULS

Critical path:Test repository environment

Test repository submission mechanisms Elicit researcher feedback on repository

submission process Demonstrate and gather feedback on repository

access Elicit metadata requirements in “front-to-back”

context Leading to deployment of live repository

environment to complete project phase 1

Data use surveys:data management concerns

Data loss Automatic backups

Controlled data sharing Most want easy sharing within their group Recognizing the value of data re-use, but having many valid

reasons for resisting openness

Accessing and interpreting historical data Capturing sufficient metadata to allow colleagues and

collaborators to find and understand data sets Locating and retrieving data

Some interest in funder mandates, versioning, visualization, annotation, long-term preservation

Balancing user engagement with usable outputs >>

In the style of agile development, we are aiming to engage users through working software, rather than just surveys and recommendations

Start simple, and be led by researchers' needs A tension here between allocating effort to user

engagement vs technical development more later

Survey effort has been lead by David Shotton, himself a life-science researcher, who also serves on occasion as a proxy user

Technical approach:local files and web access >>

The foundation of our technical approach has been to create an access-controlled file system accessible using common file sharing mechanisms and also using Web protocols Linux, Samba, Apache, HTTP/WebDAV, LDAP All off-the-shelf open source software

We have strived to make the access controls work uniformly for local and web access

Early attempts to connect with University SSO have been postponed

Further tooling builds on web access

Technical approach:production-quality outputs >>

This is not a toy system: we are accepting custody of real users' valuable research data

Automated, repeatable system configuration, automated testing and live system monitoring are all part of the development effort

Expandable virtualization platform specified in consultation with departmental IT support

Automated daily backup to University-run hierarchical storage manager

Reflection: creating a user-led data management environment

Less is more (work!) Creating an initial system whose visible function is as close

as possible to users' current practices is arguably harder than creating new functions

Uniform controlled access via file sharing and the web has been particularly challenging

Despite modest goals, the first project phase has presented awkward technical challenges Progress review at

http://admiral-announce.blogspot.com/2010/04/sprint-6-and-progress-to-date-review.html

We have made a platform for web-based features to support evolving requirements

ADMIRAL Physical Architecture

ADMIRAL Logical Architecture

ADMIRAL Security Architecture

Documents

ADMIRAL progress summary Graham Klyne Image Bioinformatics Research Group Zoology Department, Oxford University ADMIRAL Project Meeting 20 May 2010