Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1
© UKRI All rights reserved
Jaana Pinnick
British Geological Survey
Research Data and Digital Preservation Manager
29 October 2018
Building a Digital Preservation Programme
Enhancing the digital continuity of research data
© UKRI All rights reserved
What IS “digital preservation”?
Different interpretations of what digital preservation means, depending on role and experience
• Essential to develop and promote a common understanding of digital preservation concept if the NGDC is to develop its digital preservation culture
• Essential to integrate research activities of scientists and corporate data management procedures better, if progress is to be made in the long-term availability and usability = digital continuity of BGS’ data
2
© UKRI All rights reserved
Digital preservation includes…
• Persistent unique identifiers
• Significant properties
• Descriptive, discovery and T&Cs metadata
• Characterisation using technical metadata
• Preservation metadata
• Authenticity
• Data integrity – complete and unaltered data
• Fixity – unchanged digital files (using checksums)
• Maintaining access
• Renderability –continued ability to access a digital object
• Appraisal – what to preserve, what to dispose of
• Physical media obsolescence
• File format obsolescence
• Sustainability –maintenance and interoperability
© UKRI All rights reserved
Digital preservation includes…
• Persistent unique identifiers
• Significant properties
• Descriptive, discovery and T&Cs metadata
• Characterisation using technical metadata
• Preservation metadata
• Authenticity
• Data integrity – complete and unaltered data
• Fixity – unchanged digital files (using checksums)
• Maintaining access
• Renderability –continued ability to access a digital object
• Appraisal – what to preserve, what to dispose of
• Physical media obsolescence
• File format obsolescence
• Sustainability –maintenance and interoperability
3
© UKRI All rights reserved
PEOPLE!!!
But it is also about…
• Institutional policies and strategies
• Collaboration
• Advocacy
• Procurement and third party services
• Audit and certification
• Legal compliance
• Risk and change management
• Staff training and development
• Standards and best practice
© UKRI All rights reserved
PEOPLE!!!
But it is also about…
• Institutional policies and strategies
• Collaboration
• Advocacy
• Procurement and third party services
• Audit and certification
• Legal compliance
• Risk and change management
• Staff training and development
• Standards and best practice
http://handbook.dpconline.org/contents
4
© UKRI All rights reserved
TNA definitions
Digital preservation
The long-term archival management of digital information assets selected for their historical value, once they have passed out of business ownership
VS.
Digital continuity
The ability to use your information in the way that you need, for as long as you need. If you do not actively work to ensure digital continuity, your information can easily become unusable
© UKRI All rights reserved
Outline
• Initial review: Organisational background, what to preserve and for whom
• Defining the purpose: Objectives, benefits and challenges
• Taking the first steps: Assessing risk, using tools, and raising staff awareness
• Looking at the Big Picture: Writing a preservation policy and a business case
• Getting into it: Developing a preservation strategy and a digital asset register
5
© UKRI All rights reserved
British Geological Survey
© UKRI All rights reserved
Organisational background
• Approved Place of Deposit under the Public Records Act
• Making most of the data
freely available under the
Open Government Licence
(OGL)
• Under legal obligation to
manage some types of data
• UKRI best practice:
data that by their nature cannot be re-measured
or re-created […] may often warrant ‘indefinite
storage and preservation’
6
© UKRI All rights reserved
National Geoscience Data Centre (NGDC)
• Robust and diverse data management skills
• Statutory, commercial and voluntary data donations
• Geoscience data from NERC grant-funded projects
• Heterogeneous data type and volumes
• The long validity of geoscience data means permanent retention is often requiredOne of NERC Environmental Data Centres
© UKRI All rights reserved
What do we need to preserve?
7
© UKRI All rights reserved
Know your data!
What do we need to preserve?
Difficulty in appraising the
value of geoscience
research data
© UKRI All rights reserved
What is geoscience data?
• Borehole
• Bedrock
• Hydrogeology
• Geochemistry
• Seismic
• Marine geoscience
• Oil and gas
• Airborne geophysical
• Climate change
• Earth characteristics
• Rocks
• Sediments and soils
• Seismology
• Marine geology
• Land contamination
• Geological processes including erosion and volcanic activity
• Natural resources
• And many more
8
© UKRI All rights reserved
What is geoscience data?
• Borehole
• Bedrock
• Hydrogeology
• Geochemistry
• Seismic
• Marine geoscience
• Oil and gas
• Airborne geophysical
• Climate change
• Earth characteristics
• Rocks
• Sediments and soils
• Seismology
• Marine geology
• Land contamination
• Geological processes including erosion and volcanic activity
• Natural resources
• And many more
© UKRI All rights reserved
What is geoscience data?
• Borehole
• Bedrock
• Hydrogeology
• Geochemistry
• Seismic
• Marine geoscience
• Oil and gas
• Airborne geophysical
• Climate change
• Earth characteristics
• Rocks
• Sediments and soils
• Seismology
• Marine geology
• Land contamination
• Geological processes including erosion and volcanic activity
• Natural resources
• And many more
9
© UKRI All rights reserved
MSc stakeholder survey
Who uses our data? What for? How long?
• Heterogeneous stakeholder groups (academia, industry, services, government, general public…)
• Heterogeneous purposes of data use (business/personal decision making, consultancy work, public sector policy making, trading onwards, innovation, education, personal interest…)
• Length of data use (40% ten years or longer, 40% 3-9 years)
© UKRI All rights reserved
Defining the purpose
10
© UKRI All rights reserved
Objectives
© UKRI All rights reserved
• Maximise the long-term accessibility of digital data – by creating robust and fit-for-purpose contextual & preservation metadata
• Support innovation and economic growth using geoscience data – by working smarter, facilitating data reuse and increasing collaboration with scientific disciplines and partners
• Culture change – by raising awareness of and building up skills in digital preservation and research data management and by adopting and implementing best practice across the user community
Objectives
11
© UKRI All rights reserved
Benefits
© UKRI All rights reserved
• Preservation planning increases financial and operational efficiencies
• Increases the value of unrepeatable and unique geoscience datasets and time-series data
• Enhances the potential for income generation and new service models
• Deduplication of data reduces storage costs and facilitates data retrieval
• Historical research data available for reuse and analysis when new tools and techniques become available
Benefits
12
© UKRI All rights reserved
Data volumes vs. available resources
• Data deluge: sensor, real time, and monitoring data on the increase
• Funding: Building services using various sources
• Position: Amalgamating contradictory stakeholder requirements
• Staff: Securing a permanent digital skills base
A key challenge
© UKRI All rights reserved
Data volumes vs. available resources
• Data deluge: sensor, real time, and monitoring data on the increase
• Funding: Building services using various sources
• Position: Amalgamating contradictory stakeholder requirements
• Staff: Securing a permanent digital skills base
A key challenge
13
© UKRI All rights reserved
Taking the first steps
© UKRI All rights reserved
NGDC online data deposit portal
14
© UKRI All rights reserved
Tools: Using DROID
Checksum values for
fixity checks
© UKRI All rights reserved
Tools: Using DROID
File format profiling
15
© UKRI All rights reserved
Raising awareness
DPC International Digital Preservation Day 30 Nov 2017
DPC World Digital Preservation Day 29 Nov 2018
© UKRI All rights reserved
• The Simple Property-Oriented Threat (SPOT) Model for Risk Assessment defines six essential properties of successful digital preservation: availability, identity, persistence, renderability, understandability, and authenticity
• For each of these properties, a set of threats is identified which would seriously diminish the ability of the repository to achieve the property in question
• The threats are described at a high-level, and focus on outcome
• An outcome-based typology of threats that individual custodial institutions can use in evaluating their own situational risk and risk mitigation strategies
SPOT Model: Risk Matrix
http://mirror.dlib.org/dlib/september12/vermaaten/09vermaaten.html
16
© UKRI All rights reserved
• The Simple Property-Oriented Threat (SPOT) Model for Risk Assessment defines six essential properties of successful digital preservation: availability, identity, persistence, renderability, understandability, and authenticity
• For each of these properties, a set of threats is identified which would seriously diminish the ability of the repository to achieve the property in question
• The threats are described at a high-level, and focus on outcome
• An outcome-based typology of threats that individual custodial institutions can use in evaluating their own situational risk and risk mitigation strategies
SPOT Model: Risk MatrixRisk
priorityRISK description Consequences
Management or mitigation
methods
Tools or technologies
available
1
Bit errors, bit rot,
deterioration of digital
objects
Access to data may be lost
Data objects become unavailable
for preservation activities
Fixity information, checksums,
multiple copies of data
DROID, Fixity, Autopsy
2
Links between objects and
associated metadata not
captured or maintained
Long‐term usability of data
affected
Use unique identifiers for data
objects and link these to
descriptive and preservation
metadata IDs
Bagit
Use a relational
database to maintain
links
3
Changing technologies Hardware obsolescence
Media obsolescence
Authenticity of data lost if unable
to fully render the original
content
Creation of a technology watch
Use of open data formats
Migration
4
File format changes Access to data may be lost
Authenticity of data objects may
suffer
Format obsolescence
Migration, emulation, technical
metadata, use of open formats
DROID, Jhove, Apache
Tika, Python Magic
Library, SIARD
5
Sufficient preservation
metadata not captured or
created
Provenance and authenticity of
data unverifiable
Unable to make appropriate
preservation decisions in future
File identification tools
Data rescue and forensics
(expensive)
Maintain a full audit trail
Apache Tika, Dspace
May need manual
intervention
© UKRI All rights reserved
Looking at the Big Picture
17
© UKRI All rights reserved
Preservation policy
development
How to preserve? • Dissertation /stakeholder survey
findings• Review of publicly available
digital preservation policies and strategies
• Digital Preservation Coalition Handbook
• TNA ‘Parsimonious Preservation’• Includes: Scope, objectives,
benefits• Outlines preservation framework,
requirements and drivers, and roles and resources
• Describes key concepts for functional preservation
© UKRI All rights reserved
Preservation policy
development
How to preserve? • Dissertation /stakeholder survey
findings• Review of publicly available
digital preservation policies and strategies
• Digital Preservation Coalition Handbook
• TNA ‘Parsimonious Preservation’• Includes: Scope, objectives,
benefits• Outlines preservation framework,
requirements and drivers, and roles and resources
• Describes key concepts for functional preservation
18
© UKRI All rights reserved
Preservation policy
development
How to preserve? • Dissertation /stakeholder survey
findings• Review of publicly available
digital preservation policies and strategies
• Digital Preservation Coalition Handbook
• TNA ‘Parsimonious Preservation’• Includes: Scope, objectives,
benefits• Outlines preservation framework,
requirements and drivers, and roles and resources
• Describes key concepts for functional preservation
© UKRI All rights reserved
Who pays for preservation?
Writing a business
case
• Identified challenges and set objectives going forward
• Identified key benefits and opportunities to the organisation
• Outlined a modular preservation programme based on OAIS
• Estimated staff costs and proposed measures of success
• Future vision?
19
© UKRI All rights reserved
Who pays for preservation?
Writing a business
case
• Identified challenges and set objectives going forward
• Identified key benefits and opportunities to the organisation
• Outlined a modular preservation programme based on OAIS
• Estimated staff costs and proposed measures of success
• Future vision?
© UKRI All rights reserved
Who pays for preservation?
Writing a business
case
• Identified challenges and set objectives going forward
• Identified key benefits and opportunities to the organisation
• Outlined a modular preservation programme based on OAIS
• Estimated staff costs and proposed measures of success
• Future vision?
20
© UKRI All rights reserved
Repository certification
• Builds stakeholder confidence in the repository (funders, users, publishers etc.)
• Bench marking of NGDC processes, procedures and services against recognized standards
• Recognition as a trusted repository for the designated community
• Differentiates NGDC from other repositories
© UKRI All rights reserved
Accreditation ProcessMay – June 2016
Create Working Group
Identify Project Leads
ReviewedRequirements
Oct 2016 – May 2017
Regular monthly meetings to review
progress
Project Leads gathering responses
to requirements
June 2017
Submit Application
Sept 2017
Feedback received
Feb 2018Oct 2017
Edited application resubmitted
Seal Granted
21
© UKRI All rights reserved
Let’s get going!
© UKRI All rights reserved
Let’s get going!
22
© UKRI All rights reserved
Preservation strategy• An internal action plan for preservation activities
• Dynamic in nature, modified as more information becomes available
ACTION PLAN
collaboration
objectiveimprovement check
strategy
implementationschedule
act
© UKRI All rights reserved
Preservation strategy• An internal action plan for preservation activities
• Dynamic in nature, modified as more information becomes available
ACTION PLAN
collaboration
objectiveimprovement check
strategy
implementationschedule
act
23
© UKRI All rights reserved
Know your data: digital asset
register
© UKRI All rights reserved
Know your data: digital asset
register
1. What data assets currently exist?
2. Where are these assets located?
3. How have the assets been managed to date?
4. Which of these assets need to be maintained in the long term?
5. Do current data management practices place these assets at risk?
24
© UKRI All rights reserved
Know your data: digital asset
register
1. What data assets currently exist?
2. Where are these assets located?
3. How have the assets been managed to date?
4. Which of these assets need to be maintained in the long term?
5. Do current data management practices place these assets at risk?
© UKRI All rights reserved
Digital Object
Rights
Preservation Event
Agent
PREMIS metadata Ensuring Data Integrity
25
© UKRI All rights reserved
Digital Object
Rights
Preservation Event
Agent
PREMIS metadata Ensuring Data Integrity
© UKRI All rights reserved
OBJECT objectIdentifier
preservationLevel
significantProperties
objectCharacteristics
fixity
size
format
creatingApplication
storage
EVENT eventIdentifier
eventType
eventDateTime
eventDetailInformation
Sample of a PREMIS scheme
26
© UKRI All rights reserved
Lessons learned
• Consider your preservation strategies early on
• Create Readme notes with additional information to describe the data as early as possible
• Recommend a list of preferred/ open file formats and comply with it
• Back the up data using the 3-2-1 method
• Run fixity checks at ingestion and repeat at regular intervals, replacing corrupt data from other copies
• Include metadata fields to capture preservation events and populate them from the outset
• Monitor your data regularly!
© UKRI All rights reserved
Lessons learned
• Consider your preservation strategies early on
• Create Readme notes with additional information to describe the data as early as possible
• Recommend a list of preferred/ open file formats and comply with it
• Back the up data using the 3-2-1 method
• Run fixity checks at ingestion and repeat at regular intervals, replacing corrupt data from other copies
• Include metadata fields to capture preservation events and populate them from the outset
• Monitor your data regularly!
27
© UKRI All rights reserved
“Digital preservation is not a project to be completed over the next few years and then forgotten about. It is rather a
new way of approaching the whole digital data life cycle and the new
digital information world we live in.”