38
A centre of expertise in digital information management www.ukoln.ac.u k UKOLN is supported by: Digital Preservation Michael Day Research and Development Team Leader UKOLN, University of Bath Information Systems and Services, UWE, Bristol, 15 February 2011

Digital Preservation

Embed Size (px)

DESCRIPTION

Presentation slides for an introductory lecture given at the University of the West of England on the 15th February 2010

Citation preview

Page 1: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

UKOLN is supported by:

Digital Preservation

Michael DayResearch and Development Team Leader

UKOLN, University of Bath

Information Systems and Services, UWE, Bristol, 15 February 2011

Page 2: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Presentation outline

• Digital preservation overview– Some definitions– Technical challenges– Organisational challenges

• Approaches to solving the problem– Preservation Strategies– Tools for:

• Format characterisation (DROID)• Preservation Planning (Plato)

– The OAIS model:• Preservation metadata• Repository audit frameworks (TRAC, DRAMBORA)

Page 3: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Definitions

• Digital preservation:– Is mainly concerned with the sustainability of “content” for

a given period of time (not forever)– Largely about ensuring “continued access” to content– “The series of managed activities necessary to ensure

continued access to digital materials for as long as necessary” - Digital Preservation Coalition (DPC) Digital Preservation Definitions and Concepts list: http://www.dpconline.org/advice/preservationhandbook/introduction/definitions-and-concepts?q=definitions

– A combination of technical, organisational and legal challenges

Page 4: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Digital preservation basics

• An ongoing (lifecycle) approach to managing digital content based on:– The identification and adoption of appropriate

preservation strategies for content– The collection and management of appropriate metadata

(explicit and implicit knowledge, contexts)– The ongoing monitoring of technical contexts and the

application of preservation planning techniques– Continual monitoring of the organisation (audit)

Page 5: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

A multi-faceted set of challenges

• Technical– Strategies needed to

deal with ongoing obsolescence and scale

• Organisational– Access and reuse– Authenticity and

integrity– Sustainability (costs)– Legal (see Andrew

Charlesworth’slecture)

Page 6: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Technical challenges (1)

• Physical– Bits stored on a physical medium (or in the cloud?)– Focus 20 years ago was on new media types (e.g.

optical storage technologies) as a panacea– Bit-level preservation is still important – the first layer in a

viable preservation strategy

Page 7: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Obsolete media

Image courtesy of Frank Carey

Exhibition at NASA White Sands Test Facility, 2009

Page 8: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Technical challenges (2)

• Hardware and software dependence– Most digital objects are dependent on particular

configurations of hardware and software– Relatively short obsolescence cycles

Page 9: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Hardware and software dependence

Exhibition at NASA White Sands Test Facility, 2009Image courtesy of Frank Carey

Page 10: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Conceptual challenges (1)

• What is an digital object?– Some are analogues of traditional objects, e.g. meeting

minutes, research papers– Others are not, e.g. Web pages, blogs, GIS, 3D models

of chemical structures, research data more generally• Complexity• Dynamic nature• Interactivity

– Born digital vs. product of digitisation initiatives– Logical layer between physical storage of bits and the

conceptual objects that need preservation (includes data types, formats, etc.)

Page 11: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Conceptual challenges (2)

• Need to identify and document the “significant properties” (or characteristics) of content:– Recognises that preservation is context dependent, even user

specific (OAIS concept of 'designated community')

– Helps with choosing an acceptable preservation strategy

• Compare the ‘performance model’ developed by the National Archives of Australia (2002) - “The source of a record is a fixed message that interacts with technology. This message provides the record’s unique meaning, but by itself is meaningless to researchers since it needs to be combined with technology in order to be rendered as its creator intended. The process is the technology required to render meaning from the source”

– Focus on re-use (data curation)

Page 12: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Organisational challenges (1)

• Sustainability:– Ultimately the sustainability of content depends upon the long-

term sustainability of organisations• Focus on business models

– Organisational commitment:• “An institutional repository needs to be a service with

continuity behind it … Institutions need to recognise that they are making commitments for the long term” Clifford Lynch

• Need for policy development– Incentives for preservation:

• Clarity on roles and responsibilities needed• Who benefits? Who pays? “Free riding?”

Page 13: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Organisational challenges (2)

• Economic perspectives:– Blue Ribbon Task Force on Sustainable Digital

Preservation and Access: http://brtf.sdsc.edu/• Final report (Feb 2010) “Ensuring that valuable digital

assets will be available for future use is not simply a matter of finding sufficient funds. It is about mobilizing resources - human, technical, and financial - across a spectrum of stakeholders diffuse over both space and time. But questions remain about what digital information we should preserve, who is responsible for preserving, and who will pay.”

– JISC-funded LIFE (Life Cycle Information for E-Literature) has developed a predictive costing tool: http://www.life.ac.uk/

Page 14: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Organisational challenges (3)

• The challenge of scale:– The Web– Digitised content:

• Google Books– The “data deluge” in e-Science:

• New generations of instruments, computer simulations

• Many terabytes generated per day, petabyte scale computing (and growing)

• Cory Doctorow, “Welcome to the petacentre.” Nature, 455, pp 17-21, 4 Sep 2008

Page 15: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Organisational challenges (4)

• The need for collaboration:– Need for 'deep-infrastructure' for preservation recognised

as far back as 1996 by the Task Force on Archiving of Digital Information

• Digital preservation involves the "grander problem of organizing ourselves over time and as a society ... [to manoeuvre] effectively in a digital landscape" (p. 7)

– Building on existing networks– Role for national-level co-ordination:

• Digital Preservation Coalition (DPC), nestor (Germany), National Digital Information Infrastructure and Preservation Program (NDIIPP)

Page 16: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Organisational challenges (5)

• Learn the lessons from the past:– Things will go wrong– Do what you can to

enable recovery from disaster

– Digital technologies support replication (create more than one point of failure)

Page 17: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Digital preservation strategies (1)

• Main approaches:– Technology preservation (e.g., computing museums)– Digital archaeology (a post hoc approach)– Emulation (focusing on the environment, often used

where look-and-feel is important, e.g. computer games)– Migration (focusing on the content)

• A mature approach: A set of organised tasks designed to achieve the periodic transfer of digital information from one hardware and software configuration to another, or from one generation of computer technology to a subsequent one - CPA/RLG report (1996)

Page 18: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Digital preservation strategies (2)

• Preservation strategies are not in competition– Different strategies will work together, may be value in

diversification– Migration strategies mean difficult choices need to be

made about target formats

• But the strategy chosen has implications for:– The technical infrastructure required (and metadata)– Collection management priorities– Rights management

• Owning the rights to re-engineer software– Costs

Page 19: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Digital preservation strategies (3)

• Tools for format characterisation and validation– DROID - Digital Record Object Identification (based on

the PRONOM registry• Very important to know what types (formats) of

content exist in a particular collection (e.g., institutional repository or Web archive)

• Performs batch identification of file formats• http://www.nationalarchives.gov.uk/PRONOM/

Default.aspx– JHOVE - JSTOR/Harvard Object Validation Environment

• Used for format validation• http://hul.harvard.edu/jhove/

Page 20: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Digital preservation strategies (4)

• Plato preservation planning tool– Developed by EU Planets project– A decision support tool that helps users explore the

evaluation of potential preservation solutions against specific requirements and for building a plan for preserving a given set of objects

– Integrates file format identification (using DROID); some migration services; XML-based generic format characterisation using XCL (eXtensible Characterisation Languages)

– More info: http://www.ifs.tuwien.ac.at/dp/plato/intro.html– Integration with repositories tested by JISC KeepIt

project: http://preservation.eprints.org/keepit/

Page 21: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

The OAIS Reference Model

4-1.

2

MANAGEMENT

Ingest

Data Management

SIP

AIPDIP

queries

result setsAccess

PRODUCER

CONSUMER

Descriptive Info

AIP

orders

Descriptive Info

Archival Storage

Administration

Preservation Planning

OAIS Functional Entities (Figure 4-1)

http:public.ccsds.org/publications/archive/650x0b1.PDF

Page 22: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Preservation metadata

• Metadata and documentation is vitally important– Relates to OAIS concepts like Representation Information

and Preservation Description Information– Functions:

• Enables resource discovery - supports the development of finding aids

• Records meaning (structure and semantics)• Records context and provenance (authenticity)

– Standards that support digital preservation activities are under development:

• PREMIS Data Dictionary (for core metadata): http://www.loc.gov/standards/premis/

Page 23: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Repository audit frameworks (1)

• Repository audit frameworks first developed out of the OAIS Reference Model– OAIS Mandatory Responsibilities (only six of them):

• The main focus was on technical and organisational aspects, e.g.:

– That repositories ensure that preserved information (content) can be understood (independently understandable)

– That documented policies and procedures are being followed

• No clear concept of OAIS “compliance”

Page 24: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Repository audit frameworks (2)

• Trusted Repositories Audit and Certification (TRAC): Criteria and Checklist:– Source: http://www.crl.edu/archiving-preservation/digital-

archives/metrics-assessing-and-certifying– RLG-NARA Digital Repository Certification Task Force

checklist, revised by the Center for Research Libraries (CRL) and OCLC

– Criteria cover three main things:• Organisational Infrastructure

– Governance and viability, structure and staffing, financial sustainability, contracts, etc.

• Digital Object Management– Ingest, preservation planning, archival storage, etc.

• Technologies, Technical Infrastructure, & Security– Systems and infrastructure, etc.

Page 25: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Core repository principles (1)

• Ten Principles - agreed 2007 by CRL (US), Digital Curation Centre (UK), Nestor (Germany) and Digital Preservation Europe– The repository commits to continuing maintenance of digital

objects for identified community/communities.

– Demonstrates organizational fitness (including financial, staffing structure, and processes) to fulfill its commitment.

– Acquires and maintains requisite contractual and legal rights and fulfills responsibilities.

– Has an effective and efficient policy framework.

– Acquires and ingests digital objects based upon stated criteria that correspond to its commitments and capabilities.

Page 26: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Core repository principles (2)

• Ten principles (continued)– Maintains/ensures the integrity, authenticity and usability of

digital objects it holds over time. – Creates and maintains requisite metadata about actions taken

on digital objects during preservation as well as about the relevant production, access support, and usage process contexts before preservation.

– Fulfills requisite dissemination requirements.– Has a strategic program for preservation planning and action.– Has technical infrastructure adequate to continuing

maintenance and security of its digital objects.

• Available: http://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying/core-re

Page 27: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

TRAC Checklist example page

Page 28: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Repository audit frameworks (3)

• DRAMBORA (Digital Repository Audit Method Based on Risk Assessment)– Digital Curation Centre / Digital Preservation Europe– “Presents a methodology for self-assessment,

encouraging organisations to establish a comprehensive self-awareness of their objectives, activities and assets before identifying, assessing and managing the risks implicit within their organisation“

– Identifying risks and scoring each one on likelihood and impact

– Covers: organisational context, policies, assets, risks, etc.– Online tool (http://www.repositoryaudit.eu/about/)

Page 29: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Repository audit frameworks (4)

• A means of "asking the right questions" about repositories and documenting appropriate procedures and risks

• Both TRAC and DRAMBORA are under consideration by ISO technical committees– External badge of quality (a "certified preservation

repository")

or– Management tool for self assessment

Page 30: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Digital preservation basics (reprise)

• An ongoing (lifecycle) approach to managing digital content based on:– The identification and adoption of appropriate

preservation strategies for content– The collection and management of appropriate metadata

(explicit and implicit knowledge, contexts)– The ongoing monitoring of technical contexts and the

application of preservation planning techniques– Continual monitoring of the organisation (audit)

Page 31: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

The Future ...

• “It is always a mistake for a historian to try and predict the future. Life, unlike science, is simply too full of surprises” - Richard J. Evans, In defence of history (1997, p. 62)

Page 32: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Web links:

• PRESERV project: http://preservation.eprints.org/

• KeepIt project: http://preservation.eprints.org/keepit/

• Plato Preservation Planning tool: http://www.ifs.tuwien.ac.at/dp/plato/intro.html

• DRAMBORA: http://www.repositoryaudit.eu/about/

• RSP briefing paper on preservation and storage formats: http://www.rsp.ac.uk/pubs/briefingpapers-docs/technical-preservformats.pdf

• WePreserve cartoons at: http://www.youtube.com/user/wepreserve

Page 33: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Available: http://www.youtube.com/watch?v=PGFOZLecjTc

Page 34: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Further reading

• Blue Ribbon Task Force on Sustainable Digital Preservation and Access, Final Report (NSF, 2010) http://brtf.sdsc.edu/

• Digital Preservation Coalition, Digital preservation handbook: http://www.dpconline.org/advice/preservationhandbook/

• JISC infoNet, Digital repositories infoKit: http://www.jiscinfonet.ac.uk/infokits/repositories

• Paradigm Project, Workbook on Digital Private Papers: http://www.paradigm.ac.uk/workbook/index.html

• Marieke Guy, JISC Beginner’s Guide to Digital Preservation (UKOLN, 2010) http://blogs.ukoln.ac.uk/jisc-beg-dig-pres/

• Digital Preservation Coalition and Digital Curation Centre, What’s New (monthly current awareness bulletin): http://www.dpconline.org/newsroom/whats-new

Page 35: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Further reading (research data)

• National Science Board, Long-lived digital data collections: enabling research and education in the 21st century (NSF, 2005) http//www.nsf.gov/pubs/2005/nsb0540/

• Liz Lyon, Dealing with data; roles, rights, responsibilities and relationships (JISC, 2007) http://www.jisc.ac.uk/whatwedo/programmes/digitalrepositories2005/dealingwithdata.aspx

• Neil Beagrie, Jullia Chruszcz, and Brian Lavoie, Keeping research data safe: a cost model and guidance for UK universities (JISC, 2008) http://www.beagrie.com/publications.php

• Neil Beagrie, Brian Lavoie and Matthew Woollard, Keeping research data safe 2 (JISC, 2010) http://www.beagrie.com/publications.php

Page 36: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Questions?

Page 37: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Acknowledgments

• UKOLN is funded by the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, the Museums, Libraries and Archives Council (MLA), as well as by project funding from the JISC, the European Union, and other sources. UKOLN also receives support from the University of Bath, where it is based.

• More information: http://www.ukoln.ac.uk/

Page 38: Digital Preservation

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Thank you!