Digital presevation

Preview:

Citation preview

Digital preservationfor ongoing access

Presentation for Council July 2008

David PearsonManager, Digital Preservation Section

Overview

1. We have lots of “digital stuff” in our collections and it is growing

2. We will lose access to it unless we take action

3. We need to manage the process of keeping it accessible and usable

4. Solutions have to be scalable, reliable and automated

1. “Digital stuff”- many collections

Oral HistoryPictures

Historical Newspapers

Maps

Manuscripts

Books

Web sites

Ephemera

Sheet music

Serial

How does it grow?

1. We collect it – Physical carriers– Online

• PANDORA web archive• Australian web domain harvests

2. We create it– Oral history interviews – Photographs – Publications

3. We convert it– Digitise our collections

Web Archives

• Web sites are collected selectively – Individually for access via PANDORA, or

– On a large scale via annual domain snapshots

• No control over content creation

• Lots of – File formats

– Individual files (Pandora ≈ 51 million, Domain harvest ≈ 1.3 billion files)

– Links

– Software (browser, plug-ins, readers)

• Internet content changes over time

Digitisation

• Around 135,000 items digitised

• Newspaper project = 4 million pages by 2010

• Internally created so we can control– Standards– File formats (e.g. TIFF,

JPEG, PDF )– Metadata– Workflows

• Issues– Growing volume

Physical carriers

• Approx. 12,000 items – grows by 1,000 a year

Issues• No control over creation

• Time lag before acquisition

• Variety of carriers (fragile) and file formats

• Require various hardware, software, operating systems, drivers to access

• Labour intensive to process and transfer to safe storage (growing backlog)

Growth : digital collection storage

0

50

100

150

200

250

300

350

Jan-03 Jul-03 Jan-04 Jul-04 Jan-05 Jul-05 Jan-06 Jul-06 Jan-07 Jul-07 Jan-08 Jul-08

Stor

age

size

(ter

abyt

es)

Australian Web Harvests

Newspapers

Type of Digital Collections2008

Pandora3%

Maps2%

Sheet Music4%

Manuscripts2%

Pictures7%

Oral History18%

Other3%

Historical Newspapers

21%

Australian Web Harvest

40%

Comparison of books collection & digital collection "book equivalents"

0.00

1.00

2.00

3.00

4.00

5.00

6.00

2005 2006 2007 2008

Year end June

"Boo

k Eq

uiva

lent

s" (m

illio

ns)

Digital Collection20 mb "bookequivalents"Books Collection

Growth: compared to books

2. Act or risk losing it

• “Digital stuff” is dependent on technology at all stages– Creation/capture

– Storage

– Access

• Technology changes rapidly thus software, hardware, media, file formats, operating systems become obsolete

• Unless managed deterioration can occur rapidly e.g. data can be corrupted or lost in storage or transfer process

Computer Museum

3. Managing to keep it

• “Not managing it” is not an option

• We need to

– Understand our “digital stuff” & associated risks

– Provide safe storage & ensure integrity

– Ensure access over time as technology changes

– Develop & implement preservation workflows, skills, standards, & strategies for ongoing access

– Enable content to be shared and used in different ways in the future

4. Solutions and implications

• Large scale automated processes

• Original research & time to deliver the solutions

• Reasonably long lead times

• Audit processes and quality control monitoring are critical

• Significant resources are required

Conclusions

• We are responsible for a lot of “digital stuff”• If we simply collect and store it, it will become

unusable in a relatively short time as technologies change

• Maintaining the ability to access it requires a lot of good management, planning, & dedicated resources

• We have to find and use solutions that can be applied automatically and reliably to billions of digital files

Recommended