Upload
jenny-mitcham
View
132
Download
0
Embed Size (px)
Citation preview
Project update:A collaborative approach to “filling the digital preservation gap” for Research Data ManagementJulie AllinsonTechnology Development ManagerLibrary & ArchivesUniversity of York
6 November 2015
Filling the digital preservation gap:Project aim
“…to investigate Archivematica and explore how it might be used to provide digital preservation functionality within a wider infrastructure for Research Data Management.”
This is a collaborationUniversity of Hull:• Chris Awre – Head of Information Services, Library and
Learning Innovation• Richard Green – Independent Consultant• Simon Wilson – University ArchivistUniversity of York:• Julie Allinson – Technology Development Manager• Jen Mitcham – Digital ArchivistArtefactual Systems Jisc
Project structure• Phase 1 – explore: testing, research,
thinking -produce a report (3 months)• Phase 2 – develop: make
Archivematica better for RDM, plan implementation (4 months)
• Phase 3 – implement: set up proof of concepts at York and Hull (6 months)
Phase 1: Read all about it!
http://digital-archiving.blogspot.co.uk/
Why do we need digital preservation for research data?
• There is a digital preservation gap in current RDM infrastructures
• We can’t ignore digital preservation – moving targets for data retention mean we need to take this seriously
• Funder requirements around retention
University of York RDM questionnaire 2013
• Which data management issues have you come across in your research over the last five years?– “Inability to read files in old software formats on old
media or because of expired software licences”– 24% of 181 researchers who answered this question
admitted this had been a problem for them
Why do we need digital preservation for research data?
Why Archivematica?
“The goal of the Archivematica project is to give archivists and librarians with limited technical and financial capacity the tools,
methodology and confidence to begin preserving digital information today.”
Why Archivematica?• Standards-based• Open Source• Flexible and customisable• Compatible with hundreds of file formats• Advanced search and storage management• Integrated with third-party systems
From https://ww.archivematica.org/en/
Archivematica for RDM?• Flexible - can support different institutional needs and
workflows• Automates many digital preservation tasks• Can be integrated with other systems• Good for those with limited resources• Enhancements driven by and for the digital preservation
community
Archivematica for RDM?
It gives institutions greater confidence that they will be able to continue to provide access to usable copies of research data over time
Phase 2: Improving Archivematica1. Deliverable 1: Automated DIP regeneration 2. Deliverable 2: METS parsing tools3. Deliverable 3: Generic search REST API
(proof-of-concept)4. Deliverable 4: Support multiple checksum
algorithms5. Deliverable 5: Enhance PRONOM integration
6.Deliverable 6: Automation tools documentation
Deliverable One
✓Research Data needs to be kept,
but we don’t know if anyone will ever want it
and it might be *massive*
The Solution: enable the DIP to be generated ‘on request’ and not as part of the initial ingest
Deliverable Two
✓We want to be able to grab the DIP, and
metadata about it for pulling into our
repository
The Solution: a library to help with parsing and creating METS fileshttps://github.com/artefactual-labs/mets-reader-writer
Deliverable Three✓We want to be able to report on what we
haveThe Solution: a search API to answer basic questions about the number of files in storage, their formats, date of ingest, etc.** we’re working with DMAOnline @lancaster
Deliverable Four
✓With large datasets, the current checksum
mechanism in Archivematica could be a
bottleneck
The Solution: support for multiple checksum algorithms
Deliverable Five
✓What about all those file formats that
Archivematica can’t identify?
The Solution: mechanism for running file identification with multiple tools and a report of unidentified formats, working with PRONOM to improve their coveraage
Deliverable Six
✓We want to make it easier for Institutions to
adopt archivematica
The Solution: documentation and screencasts for Archivematica automation tools, eg.https://wiki.archivematica.org/Getting_started#Installation
All of these new features will become part of the core Archivematica code in
2016
Phase 3• The plan is to run a third phase of the project to:
✓implement prototype RDM workflows with preservation using the new Archivematica features at York and Hull
✓use the search API to populate DMAOnline with stats about datasets
✓do more community outreach • We will be pitching to Jisc in December for phase
three #fingerscrossed
How do York plan to use Archivematica?
Pure RDMonitor Archivematica
AIP
AIP Store
PUREWeb Services
Archivematica REST API
DIPRepository
Data Catalogue
Key:human to humanmachine to machinehuman to machine
Where to find out more
http://www.york.ac.uk/borthwick/
The Bigger Picture•Jisc are looking at building shared services for
RDM• Our project is inputting into the specification
and discussion• One area we’d be interested to find out more
about is the appetite for ‘above campus’ options - discussion planned for later.
How could you use Archivematica?• Host it in-house and link it to an existing
repository/access system (for example DSpace, CONTENTdm, Fedora/Hydra ...or a CRIS)
• Host it in-house and use as a standalone system (you would need to have a storage system in place and establish a way of facilitating access to the data)
• Sign up for a hosted instance of Archivematica with archivesDIRECT (combines Archivematica with DuraCloud storage)
• Sign up for a hosted instance of Archivematica with Arkivum (combines Archivematica with Arkivum storage)
Thanks!
Useful links:Borthwick website: http://www.york.ac.uk/borthwick/Digital archiving blog: http://digital-archiving.blogspot.co.uk/Archivematica: https://www.archivematica.org/en/Report: http://dx.doi.org/10.6084/m9.figshare.1481170