29
Preservation of Research Data: Dataverse / Archivematica Integration Allan Bell | Associate University Librarian, The University of British Columbia Leanne Trimble | Data & Geospatial Librarian, OCUL Scholars Portal

Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Embed Size (px)

Citation preview

Page 1: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Preservation of Research Data: Dataverse / Archivematica Integration

Allan Bell | Associate University Librarian, The University of British Columbia

Leanne Trimble | Data & Geospatial Librarian, OCUL Scholars Portal

Page 2: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

The UBC Context

Page 3: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

University of British Columbia Digital Preservation Strategy

● Digital Preservation Program○ cIRcle, DSpace-based repository○ Digitized collections in CONTENTdm○ New and legacy born digital archival material○ Websites (Archive-IT)○ Soon, Abacus Dataverse, Research Data

Page 4: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

University of British Columbia Digital Preservation Strategy

● Use Archivematica as a tool to apply OAIS-compliant preservation processes

● Integrate Archivematica with existing systems used to manage digital objects

● Build internal technical and staff capacity

Page 5: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

OAIS reference model

Page 6: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Archivematica

● “a free and open source digital preservation system that is designed to maintain standards-based, long term access to collections of digital objects” http://www.archivematica.org

● micro-services provide integrated suite of software tools in compliance with ISO-OAIS model

Page 7: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Digital Preservation ProgramCiRcle (Dspace)

• Archivematica receives submissions from DSpace

• Also have Archivematica to DSpace workflow

Page 8: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Digital Preservation ProgramCONTENTdm

• Master files uploaded to Archivematica

• Archivematica produces access versions and pushes to CONTENTdm

Page 9: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Digital Preservation ProgramRBSC/UA born-digital acquisition workflow

Page 10: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Digital Preservation Program TRAC Self Audit

• Trustworthy Repositories Audit and Certification (evolved into ISO 16363)

• Widely accepted criteria for assessing trustworthiness of digital repositories

• TRAC checklist is an auditing tool to assess the reliability, commitment and readiness of institutions to assume long-term preservation responsibilities

Page 11: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

What is TRAC?

• The TRAC metrics assess three areas:a. Organizational Infrastructure - the repository's

administrative, staffing, financial, and legal functionsb. Digital Object Management - the handling of digital

objects from ingest to accessc. Technology, Technical Infrastructure and Security - the

technology used to handle ingested objects• These criteria represent best practices and current thinking

about the organizational and technological needs of trustworthy digital repositories.

Page 13: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Digital Preservation Program

Conclusions• Greater comfort with and

understanding of the challenges around archiving digitized and born digital material

• Establishing a comprehensive digital preservation program is complex!

• Having tools is important, also need policies and procedures for certification (if desired)

Page 14: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Abacus Dataverse: Research Data Management

● UBC hosted instance for four Research Universities in British Columbia since 2014○ Abacus DSpace launched in 2009

● 1,700 studies (more than 28,000 files)● Actively used by researchers● Each school has full control and added discoverability for their data

○ Licensed data but also growing institutional research data collections

○ Each institution has its own subnet with■ OAI export to Summon (common Library Discovery Layer)■ Separate Dataverses for institutional research data

Page 15: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

The Ontario Context

Page 16: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

OCUL & Scholars PortalWho?

• 21 university libraries in OntarioWhat?

• Collective purchasing• Shared digital infrastructure• Collaborative planning and

assessmentHow?

*Scholars Portal*• OCUL’s shared technology

infrastructure, housing shared collections

More information: http://www.ocul.on.ca/

Page 17: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

OCUL/SP & Research Data Management

Dataverse (OCUL hosted instance) – Hosted for OCUL since 2011– 330 studies (about 4,000 files)– Actively used by researchers from 7-8 institutions– Many in social science disciplines but some in

sciences (agriculture, polar research, geophysics, nursing…)

Page 18: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

OCUL/SP & Research Data Management

• Services are evolving at each institution• Still trying to get a handle on:

– RDM support services required by researchers– RDM infrastructure requirements– RDM costs– Role of regional consortia in RDM services

Page 19: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

OCUL/SP & Digital Preservation

• Trustworthy Digital Repository (TDR) certified for electronic journal content (since 2013)

• Currently working on Ontario Library Research Cloud (OLRC) project (2015 completion)

•Data Preservation: strong interest

Page 20: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

National initiatives in Canada

Page 21: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

‘Portage’

Canadian Association of Research Libraries led project aimed at building a library-based research data management network

2 aspects:• Network of expertise for research data

management• A national preservation and discovery network

for research data

Page 22: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

National preservation network

Page 23: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Dataverse / Archivematica Integration

Page 24: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Dataverse/Archivematica Integration

Dataverse• Data• Metadata (DDI &

other)

Archivematica• Accept data and

metadata• Perform preservation

functions• Create Archival

Information Packages (AIPs)

Archival storage?

Local Data Repository (e.g. at SP or UBC)

Preservation Infrastructure (Portage)

Integration Middleware• Harvest content via Dataverse API (no

SWORD client capability ATM)• Package and submit to Archivematica

using SWORD

Page 25: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Project Participants

• Artefactual – Evelyn McLellan, Justin Simpson• Dataverse – Phil Durbin, Eleni Castro (& others)• Scholars Portal – Leanne Trimble, Alan Darnell• UBC – Allan Bell, Eugene Barsky• University of Alberta – Geoff Harder, Chuck

Humphrey, Larry Laliberte, Peter Binkley• Simon Fraser University – Alex Garnett

Page 26: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Functional Requirements

● Develop “middleware” which can transfer studies from Dataverse to Archivematica- Detect newly published studies & “major” new

versions- Harvest released studies from Dataverse - Utilize SWORD protocol- Submit to Archivematica - One Dataverse study = 1 SIP = 1 AIP

Page 27: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Functional Requirements (2)

● Investigate Archivematica pipeline decisions for data formats coming from Dataverse- File format normalization?- Connecting versions of the same dataset to one

another?- Handling DDI (and other) metadata records?

Page 28: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Possible features for future stages

• Dataverse as a SWORD client • Mechanism within Dataverse for researchers to

specify which datasets they want to target for preservation

• Returning information from Archivematica back to Dataverse (indication of preservation status within Dataverse)

Page 29: Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble

Next Steps

• University of Toronto procurement process underway to contract the development work to Artefactual

• Develop the middleware (2015)• Recruit researchers to contribute data to ingest

(concurrent with development work)