Upload
datascienceiqss
View
554
Download
1
Tags:
Embed Size (px)
Citation preview
Preservation of Research Data: Dataverse / Archivematica Integration
Allan Bell | Associate University Librarian, The University of British Columbia
Leanne Trimble | Data & Geospatial Librarian, OCUL Scholars Portal
The UBC Context
University of British Columbia Digital Preservation Strategy
● Digital Preservation Program○ cIRcle, DSpace-based repository○ Digitized collections in CONTENTdm○ New and legacy born digital archival material○ Websites (Archive-IT)○ Soon, Abacus Dataverse, Research Data
University of British Columbia Digital Preservation Strategy
● Use Archivematica as a tool to apply OAIS-compliant preservation processes
● Integrate Archivematica with existing systems used to manage digital objects
● Build internal technical and staff capacity
OAIS reference model
Archivematica
● “a free and open source digital preservation system that is designed to maintain standards-based, long term access to collections of digital objects” http://www.archivematica.org
● micro-services provide integrated suite of software tools in compliance with ISO-OAIS model
Digital Preservation ProgramCiRcle (Dspace)
• Archivematica receives submissions from DSpace
• Also have Archivematica to DSpace workflow
Digital Preservation ProgramCONTENTdm
• Master files uploaded to Archivematica
• Archivematica produces access versions and pushes to CONTENTdm
Digital Preservation ProgramRBSC/UA born-digital acquisition workflow
Digital Preservation Program TRAC Self Audit
• Trustworthy Repositories Audit and Certification (evolved into ISO 16363)
• Widely accepted criteria for assessing trustworthiness of digital repositories
• TRAC checklist is an auditing tool to assess the reliability, commitment and readiness of institutions to assume long-term preservation responsibilities
What is TRAC?
• The TRAC metrics assess three areas:a. Organizational Infrastructure - the repository's
administrative, staffing, financial, and legal functionsb. Digital Object Management - the handling of digital
objects from ingest to accessc. Technology, Technical Infrastructure and Security - the
technology used to handle ingested objects• These criteria represent best practices and current thinking
about the organizational and technological needs of trustworthy digital repositories.
TRAC Compliant Repositories
Centre for Research Libraries has audited and certified five repositories:•Chronopolis Report•CLOCKSS•Hathitrust Report•Portico Report•Scholars Portal
Digital Preservation Program
Conclusions• Greater comfort with and
understanding of the challenges around archiving digitized and born digital material
• Establishing a comprehensive digital preservation program is complex!
• Having tools is important, also need policies and procedures for certification (if desired)
Abacus Dataverse: Research Data Management
● UBC hosted instance for four Research Universities in British Columbia since 2014○ Abacus DSpace launched in 2009
● 1,700 studies (more than 28,000 files)● Actively used by researchers● Each school has full control and added discoverability for their data
○ Licensed data but also growing institutional research data collections
○ Each institution has its own subnet with■ OAI export to Summon (common Library Discovery Layer)■ Separate Dataverses for institutional research data
The Ontario Context
OCUL & Scholars PortalWho?
• 21 university libraries in OntarioWhat?
• Collective purchasing• Shared digital infrastructure• Collaborative planning and
assessmentHow?
*Scholars Portal*• OCUL’s shared technology
infrastructure, housing shared collections
More information: http://www.ocul.on.ca/
OCUL/SP & Research Data Management
Dataverse (OCUL hosted instance) – Hosted for OCUL since 2011– 330 studies (about 4,000 files)– Actively used by researchers from 7-8 institutions– Many in social science disciplines but some in
sciences (agriculture, polar research, geophysics, nursing…)
OCUL/SP & Research Data Management
• Services are evolving at each institution• Still trying to get a handle on:
– RDM support services required by researchers– RDM infrastructure requirements– RDM costs– Role of regional consortia in RDM services
OCUL/SP & Digital Preservation
• Trustworthy Digital Repository (TDR) certified for electronic journal content (since 2013)
• Currently working on Ontario Library Research Cloud (OLRC) project (2015 completion)
•Data Preservation: strong interest
National initiatives in Canada
‘Portage’
Canadian Association of Research Libraries led project aimed at building a library-based research data management network
2 aspects:• Network of expertise for research data
management• A national preservation and discovery network
for research data
National preservation network
Dataverse / Archivematica Integration
Dataverse/Archivematica Integration
Dataverse• Data• Metadata (DDI &
other)
Archivematica• Accept data and
metadata• Perform preservation
functions• Create Archival
Information Packages (AIPs)
Archival storage?
Local Data Repository (e.g. at SP or UBC)
Preservation Infrastructure (Portage)
Integration Middleware• Harvest content via Dataverse API (no
SWORD client capability ATM)• Package and submit to Archivematica
using SWORD
Project Participants
• Artefactual – Evelyn McLellan, Justin Simpson• Dataverse – Phil Durbin, Eleni Castro (& others)• Scholars Portal – Leanne Trimble, Alan Darnell• UBC – Allan Bell, Eugene Barsky• University of Alberta – Geoff Harder, Chuck
Humphrey, Larry Laliberte, Peter Binkley• Simon Fraser University – Alex Garnett
Functional Requirements
● Develop “middleware” which can transfer studies from Dataverse to Archivematica- Detect newly published studies & “major” new
versions- Harvest released studies from Dataverse - Utilize SWORD protocol- Submit to Archivematica - One Dataverse study = 1 SIP = 1 AIP
Functional Requirements (2)
● Investigate Archivematica pipeline decisions for data formats coming from Dataverse- File format normalization?- Connecting versions of the same dataset to one
another?- Handling DDI (and other) metadata records?
Possible features for future stages
• Dataverse as a SWORD client • Mechanism within Dataverse for researchers to
specify which datasets they want to target for preservation
• Returning information from Archivematica back to Dataverse (indication of preservation status within Dataverse)
Next Steps
• University of Toronto procurement process underway to contract the development work to Artefactual
• Develop the middleware (2015)• Recruit researchers to contribute data to ingest
(concurrent with development work)