16
MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

Embed Size (px)

Citation preview

Page 1: MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

MODELLING THE DIGITAL PRESERVATION COSTS

Paul Wheatley

Digital Preservation Manager

British Library

Page 2: MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

22

Summary

Overview of the model: Aims Development process Model

Results

Evaluation

Conclusions

Page 3: MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

33

Scope

Acquisition

Ingest

Metadata

Storage

Access

Preservation

Page 4: MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

44

Background and aims

Previous work (see Final Report): National Archief, Digital Bewaring – full costing/audit approach Oltmans, Kol – lifecycle and strategies

Key aims: Make the first major step in defining and estimating the lifecycle cost

of digital preservation activities. Propose a model for comment by the wider preservation community Enable the LIFE Case Studies to be compared and contrasted by

providing some cost estimates for “P” in the Lifecycle Model. Attempt to identify the scale of preservation costs. Are they

dramatically high as suggested previously by many in the preservation community or are they more achievable as suggested recently (see Rusbridge, C, “Excuse Me... Some Digital Preservation Fallacies?”)?

Page 5: MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

55

Development process

Key cost factors, experimentation, iterative development and refinement

Based on evidence or indications of trends where possible

Editable inputs where key estimation or assumptions made

Cost component review

Application of draft model, refinement of inputs

Team review, refinement of model weaknesses

Page 6: MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

66

The Generic LIFE Preservation Model

Preservation = t * TEW + (t / ULE + PON) * (CRS + UME + PPA + QAA)

Expansion of calculated components:• ULE – Unaided Life Expectancy of a Format = BLE + 0.1*t• CRS – Cost of new rendering solution = (1 - PTA) * TDC * FCX + PTA * COA• PPA – Performing preservation action = PON * (SCM + n * HVM)• QAA – Quality Assurance = n * BCT * FCX• PTA – Proportion of Tool Availability = STA(1-t/20)+ETA(t/20)

Expansion of scaling components:• PON – Proportion of normalisation = 0.4• FCX - Format complexity (e.g. JPEG = 0.2, WMF = 0.4, PDF = 0.6, Word = 0.8)

Expansion of cost component inputs:• HVM – High volume migration cost per object = £0.05• BCT – Base cost of testing a preservation action per object = £0.17• UME – Update Metadata = 2 metadata officer weeks @ £30k annual salary = £1250• TDC – Tool development cost = 24 programmer months @ £30k annual salary - £60000• COA – Cost of available tool = £1500• TEW - Technology Watch = 1 metadata officer week @ £30k annual salary = £625• BLE - Base life expectancy = 8 (years)• STA – Starting tool availability = 0.5• ETA – Ending tool availability = 0.9• SCM – Setup cost of migration = £340

Page 7: MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

77

The Generic LIFE Preservation Model :key elements explained

Preservation = t * TEW + (t / ULE + PON) * (CRS + UME + PPA + QAA)

Frequency of action

TechWatch

Preservation action

Preservation cost of n objects of a particular format for the period 0 to t.

Preservation = + *

Eg. 20000 objects of the GIF format for a period of 10 years.

Monitoring formats and software for obsolescence

Updating and managing metadata (Representation Information).

The number of preservation actions within the time period calculated

Q/AUpdate

metadata

Performpreservation

action

Cost ofPreservation

tool

Page 8: MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

88

Series of small technology watch events and spikes of preservation activity at

increasing intervals

The occurrence of costs(1st detailed sample of the model)

Time (years)

Pre

serv

atio

n ac

tivity

0 t

Time (years)

Pre

serv

atio

n ac

tivity

0 t

Preservation actionPreservation = + *Tech

WatchFrequency of action

Example : FCLA Action Planshttp://www.fcla.edu/digitalArchive/

Base life expectancy = 8 yearsIncreases by a year every decade

Time (years)

Pre

serv

atio

n a

ctiv

ity

0 t

Preservation actions

Technology watch

Page 9: MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

99

Q/AUpdate

metadata

Performpreservation

action

Cost ofPreservation

tool

Complexity of file formats(2nd detailed sample of the model)

• Size• Complexity• Proprietary• Open• Standardised

Frequency of action

TechWatch

Preservation actionPreservation = + *

=

Category Complexity Examples

Simple 0.1 ASCII, Unicode

Bitmap 0.2 JPEG, GIF

Mark-up 0.3 XML, HTML

Vector 0.4 EMF, Draw

Multimedia 0.6 MPEG3, WAV

Document 0.8 Word, PDF

Complex 1 Oracle database dump

FormatComplexity

Page 10: MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

1010

Preservation tool cost (3rd detailed sample of the model)

Cost ofPreservationTool (CRS)

Frequency of action

TechWatch

Preservation actionPreservation = + *

Q/AUpdate

metadata

Performpreservation

action

=

Proportionof tool

Availability(PTA)

=

Cost of developing a new tool

Cost of acquiring

an existingtool

+

PT

A(1- )P

TA

ToolDevelopmentCost (TDC)

=Estimated as 24 programmer months @ 30k annual salary

(£60000)Format

ComplexityCost of

Availabletool

= Estimated as £1500Time (years)

Pro

port

ion

of to

ol

avai

labi

lity

(PT

A)

0 20

0%

100% Tool availability

(1-t/20) + (t/20)

STA

ETA

ETA

STA

= 0.9

= 0.5

Average proportionacross the time period

Preservation = t * TEW + (t / ULE + PON) * (CRS + UME + PPA + QAA)

Page 11: MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

1111

Estimated costs using the model

File FormatFormat Complexity

Number of objects

Frequency of pres action

GIF 0.2 225079 1.51

Case study name Sub category Year1 Year 10Percentage of total lifecycle cost

VDEP e-monographs £0.89 £1.45 4%

VDEP e-serials £10 £27 2%

Web archiving £425 £8509 62%

File Format

Technology watch

Preservationtool cost

MetadataPreservation action

Quality assurance

Total cost (over 10 years)

GIF £6,250 £7,027 £1,889 £7,008 £11,564 £33,738

Estimated preservation costs for GIF files in the Web Archiving

Case Study

Comparison of average object

preservation costs across the Case

Studies

Page 12: MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

1212

Model outputs:WA Case Study, percentage breakdown

Quality assurance

Preservation action

Metadata

Tool cost

Technology watch

1 5 10 20

Time period (years)

Breakdown of complete preservation costs over time in the WA Case Study

Page 13: MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

1313

Self evaluation of the model

Evaluation against key aims: Make the first major step in defining and estimating the lifecycle cost

of digital preservation activities. Propose a model for comment by the wider preservation community Enable the LIFE Case Studies to be compared and contrasted by

providing some cost estimates for “P” in the Lifecycle Model. Attempt to identify the scale of preservation costs. Are they

dramatically high as suggested previously by many in the preservation community or are they more achievable as suggested recently (see Rusbridge, C, “Excuse Me... Some Digital Preservation Fallacies?”)?

Page 14: MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

1414

Further work and refinement

Refinement based on real cost data, removal of assumptions

Level of detail

Format complexity

Re-ingest

More detailed discussion in the Final Report…

Page 15: MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

1515

Summary and conclusions

Estimating the cost is not easy but appears to be possible!

Provides a useful perspective on performing preservation

Focuses on achieving cost effective preservation

Page 16: MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

1616

Finally…

Two appeals to the audience:

Please cost, record and publish your preservation work

Provide comment on the preservation model:

Questions, comments, evaluation:[email protected]