22
Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting storage, format management and preservation planning in the repository University of Southampton, 18-19 March 2010 Twitter hashtag #dprc (digital preservation repository course)

KeepIt Course 4: Putting storage, format management and preservation planning in the repository

Embed Size (px)

DESCRIPTION

This is the opening presentation for module 4 of the 5-module course on digital preservation tools for repository managers, presented by the JISC KeepIt project. This module puts storage, format management and preservation planning in the repository, by making such functions available from within the familiar repository interface. This introduction briefly reviews the previous module, which acted as a primer on preservation workflow, formats and characterisation, as preparation for the preservation planning tools to be encountered in this module. For more on this and other presentations in this course look for the tag ’KeepIt course’ in the project blog http://blogs.ecs.soton.ac.uk/keepit/

Citation preview

Page 1: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

Digital Preservation Tools for Repository ManagersA practical course in five parts

presented by the KeepIt project in association with

Module 4, Putting storage, format management and preservation planning in the repositoryUniversity of Southampton, 18-19 March 2010

Twitter hashtag #dprc (digital preservation repository course)

Page 2: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

Course structure• Module 1. Organisational issues Scoping, selection, assessment,

institutional parameters (19 January)• Module 2. Costs Lifecycle costs for managing digital objects, based on

the LIFE approach, and institutional costs (5 February)• Module 3. Description Describing content for preservation: provenance,

significant properties and preservation metadata (2 March)

• Module 4. Preservation workflow tools available in EPrints for format management, risk assessment and storage, and linked to the Plato planning tool from Planets (TODAY)

• Module 5. Trust (by others) of the repository’s approach to preservation; trust (by the repository) of the tools and services it chooses (30th March)

Page 3: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

Tools this module

• Eprints preservation apps, including the storage controller, Dave Tarrant and Adam Field, University of Southampton

• Plato, preservation planning tool from the Planets project, Andreas Rauber and Hannes Kulovits, TU Wien

Page 4: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

Steve Jobs launches Apple iPad

Picture by curiouslee http://www.flickr.com/photos/curiouslee/4320074421/

Page 5: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

Steve Jobs launches Apple iPad

Picture by curiouslee http://www.flickr.com/photos/curiouslee/4320074421/

“75 million people already own iPod Touches and iPhones. That's all people who already know how to use the iPad.”

Page 6: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

Some revision from KeepIt Module 3• Preservation workflow

Page 7: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

AnalyseCheck Action

• Migration• Emulation• Storage selection

• Format identification,

versioning• File validation

• Virus check• Bit checking and

checksum calculation

Toolse.g. DROID

JHOVEFITS

Preservation planningCharacterisation:Significant properties and technical characteristics, provenance, format, risk factors

Risk analysis

ToolsPlato (Planets)PRONOM (TNA)P2 risk registry (KeepIt)INFORM (U Illinois)KB

Preservation workflow

Page 8: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

1000 Ubiquity: degree of adoption of the format1001 Support: number of tools available which can access the format1002 Disclosure: extent to which the format documentation is publicly disclosed1003 Document Quality: completeness of the available documentation1004 Stability: speed and backwards-compatibility of version change1005 Ease of identification: ease with which the format can be identified1006 Ease of validation: ease with which the format can be validated1007 Lossiness: does the format use lossy compression1008 Intellectual property rights: whether or not the format is encumbered by IPR1009 Complexity: degree of content or behavioural complexity supported

Format risks

From PRONOM documentation (The National Archives), July 2008

Page 9: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

Format risksWord vs PDF TIFF vs JPEG XML vs PDF

1000 Ubiquity 1 1 1

1001 Support 1 1

1002 Disclosure

1003 Document Quality

1004 Stability 1 1

1005 Ease of identification1006 Ease of validation 1 1

1007 Lossiness 1 1

1008 Intellectual property rights

1

1009 Complexity 1 1 1

The WINNER is PDF TIFF XML

Page 10: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

A group task on format risks1. Choose two formats to compare (e.g. Word vs PDF,

Word vs ODF, PDF vs XML, TIFF vs JPEG)2. By working through the (surviving) list of format risks

select a winner (or a draw) between your chosen formats for each risk category (1 point for win)

3. Total the scores to find an overall winning format

4. Suggest one reason why the winning format using this method may not be the one you would choose for your repository

Page 11: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

Some revision from KeepIt Module 3• Preservation workflow

– Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective.

Page 12: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

Some revision from KeepIt Module 3• Preservation workflow

– Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective.

• Significant properties

Page 13: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

13

InSPECT SP Assessment Framework•Builds on Gero’s Function-Behaviour-Structure framework•FBS developed to assist engineers/designers to create & redesign artefactsThree categories:• Function: The design intention or purpose that is

performed.• Behaviour: The epistemological outcome derived

from the function & structure obtained by the stakeholder• Structure: The structural elements of the Object

that enables stakeholder to perform behaviour.•Artefact construction is product of designated function.•Behaviour is result of interaction between Function & Structure

Page 14: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

14

Exercise overview•Analyse the content of an email

• Analyse structure of email message• Determine purpose that each technical property performs

•Consider how email will be used by stakeholders• Identify set of expected behaviours• Classify set of behaviours into functions for recording

Page 15: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

15

Determine expected behaviours• What activities would a user – any

type of stakeholder – perform when using an email?

• Draw upon list of property descriptions performed in the previous step, formal standards and specifications, or other information sources.

Task 2:Identify the type of actions that a user would be able to perform using the email (Groups. 15 mins).

• E.g. Establish name of person who sent email

• E.g. May want to confirm that email originated from stated source.

Analyse structureIdentify purpose of technical properties

Determine expected behaviours

Associate structure with each function

Classify behaviours into functions

Review & finaliseSelect object type

for analysis

Recipient local-part

Behaviour Structure

Recipient domain-part

Trace-route

Recipient display-name

Sender local-part

Sender domain-part

Sender display-name

Message-id

references

In-reply-to

Body text colour

Body background

strikethrough

underline

Paragraph

Line break

Message text

subject

Page 16: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

16

1.3 cont. Categories of propertiesFive high-level categories

•Content e.g. character count

•Context e.g. date of creation

•Rendering e.g. bit depth

•Structure e.g. e-mail attachments

•Behaviour e.g. hyperlinks

Page 17: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

17

•Identify Stakeholders•Creator – view, annotate• Researcher corresponds during research with colleagues, peers, administrators etc.

•Recipient – reuses content• Student wants to understand research lifecycles by studying real-world practice

•Custodian – evidential chain• Maintains permanent email record for externally-funded projects, alongside data and eprint outputs

Select object type(s) for analysis

Determine actual behaviours

Classify behaviours into set of functions

Assign acceptablevalue boundaries

Review & finaliseIdentify stakeholder Cross-match functions

Page 18: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

Some revision from KeepIt Module 3• Preservation workflow

– Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective.

• Significant properties– We considered which characteristics might be significant using the function-

behaviour-structure (FBS) framework, and classifying the functions of formatted emails

– We recognised that assessment of behaviour, and so of significance, can vary according to the viewpoint of the stakeholder – e.g. creator, user, archivist

Page 19: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

Some revision from KeepIt Module 3• Preservation workflow

– Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective.

• Significant properties– We considered which characteristics might be significant using the function-

behaviour-structure (FBS) framework, and classifying the functions of formatted emails

– We recognised that assessment of behaviour, and so of significance, can vary according to the viewpoint of the stakeholder – e.g. creator, user, archivist

• Documentation– We looked at two means to document these characteristics, and the

changes over time1. Broad and established (PREMIS)2. Focussed, and work-in-progress (Open Provenance Model)

Page 20: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

Some revision from KeepIt Module 3• Preservation workflow

– Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective.

• Significant properties– We considered which characteristics might be significant using the function-

behaviour-structure (FBS) framework, and classifying the functions of formatted emails

– We recognised that assessment of behaviour, and so of significance, can vary according to the viewpoint of the stakeholder – e.g. creator, user, archivist

• Documentation– We looked at two means to document these characteristics, and the changes

over time1. Broad and established (PREMIS)2. Focussed, and work-in-progress (Open Provenance Model)

• Provenance in action: transmission and recording

Page 21: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

Provenance: a numbers game

• Transmission: recording vs word-of-mouth• Identifying what is significant about the information to be transmitted• Can be self-correcting!

Page 22: KeepIt Course 4: Putting storage, format management and preservation planning in the repository

Some revision from KeepIt Module 3• Preservation workflow

– Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective.

• Significant properties– We considered which characteristics might be significant using the function-behaviour-

structure (FBS) framework, and classifying the functions of formatted emails– We recognised that assessment of behaviour, and so of significance, can vary according

to the viewpoint of the stakeholder – e.g. creator, user, archivist

• Documentation– We looked at two means to document these characteristics, and the changes over time1. Broad and established (PREMIS)2. Focussed, and work-in-progress (Open Provenance Model)

• Provenance in action: transmission and recording– Through a simple game we learned that if we don’t recognise the necessary properties

at the outset, and maintain a record through all stages of transmission, the information at the end of the chain will likely not be the same as you started with