29
Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK Monday 12th – Friday 16th May 2008 Giorgio Dimino RAI Research Centre [email protected] Storage and repositories

Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Embed Size (px)

Citation preview

Page 1: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

TAPE workshop on the curation and preservation of audiovisual collections

University of Glasgow, Scotland, UK

Monday 12th – Friday 16th May 2008

Giorgio Dimino

RAI Research Centre

[email protected]

Storage and repositories

Page 2: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Reference Model for an Open Archival Information System (OAIS)

Consultative Committee for Space Data Systems (CCSDS)

This document is a technical Recommendation for use in developing a broader consensus on what is required for an archive to provide permanent, or indefinite long-term, preservation of digital information.

This Recommendation establishes a common framework of terms and concepts which comprise an Open Archival Information System (OAIS). It allows existing and future archives to be more meaningfully compared and contrasted. It provides a basis for further standardization within an archival context and it should promote greater vendor awareness of, and support of, archival requirements.

Page 3: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

OAIS environment model

Producer Consumer

Management

OAISarchive

Provides content to archive

Uses the archive content

Decides archive strategic objectives

Page 4: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Data vs. InformationOAIS definition

Dataobject

Informationobject

Representationinformation

yeldsInterpretedusing its

10010111

What we store

What we want

Knowledge about data interpretation

Page 5: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Video data formats

Uncompressed raster formats YUV and RGB Standard definition 4:2:2 video, 270 Mb/s, requires 120 GB per hour

Lossless compression (e.g. JPEG2000) Variable efficiency, on average ½ of the uncompressed

Compressed formats (e.g. MPEG2, MPEG4, VC1,DV) Compression depends on the final quality expected, typical bit rates

from 3 Mb/s to 50 Mb/s, up to 100 times reduction The “Representation Information” needed to interpret

compressed formats is generally extremely complex. Rendering is done using specific software or hardware. The written specification must be seen only as a last resort disaster recovery option

Page 6: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Video quality, some considerations

Digital master Result of digitisation of analogue tapes. It becomes the new

master to replace the corresponding analogue tape. It should be stored at maximum quality

Publication master If keeping the all the digital masters on line is too expensive, a

surrogate master can be generated in some cases at lower quality from which all the subsequent publication copies will be derived by transcoding

Publication version The version that is delivered to the user of a particular service

(an archive can offer several services based on the same content)

Viewing version A version at reduced quality used for content selection

Page 7: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

OAIS Information Package

ContentInformation

PreservationDescriptionInformation

Packaging Information

DescriptiveInformation

•Provenance•Context•Reference•Fixity

•Data object•Representation information

Page 8: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Video packaging (wrappers)

SMPTE MXFMPEG2 TSMicrosoft ASFAVIApple QuicktimeAdobe Flash FLV SWF

For reference see http://www.digitalpreservation.gov/formats/fdd/descriptions.shtml

Page 9: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

OAIS collabration diagram

Page 10: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

OAIS functional entities

Page 11: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Storage technologies

Data tapes IBM LTO Ultrium 4 800 GB Quantum DLT-S4 800 GB Sony SAIT 800 GB StoragetekT10000 500 GB

Hard disk Up to 1 TB per disk 3.5” Several RAID configurations possible

Solid State Disks Still expensive but becoming interesting Capacity still lower than hd 128 GB (announced products) 2.5”

Optical Disks DVD RW 9 GB Blu-Ray 50GB

Page 12: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Some remarks

The choice of storage technologies depends on many factors, including: Total amount of data Expected increase rate Desired throughput Access performance Data security

No storage media can last forever No technology can be considered 100% reliable Never keep single copies! Obsolescence occurs very rapidly Data migration must be considered part of the

management process, not an emergency operation

Page 13: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Digital Vs Analogue Archive(Bookshelf meters required for 1000 hours of audio data)

0,00

5,00

10,00

15,00

20,00

25,00

30,00

1/4"ShortTape

(News)

1/4"Standard

Tape

33" Vinyl CD DAT 9GbyteHardDisk

20GbyteDLT

35GbyteDLT

800 GB today

1 TB today

Page 14: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Flat storage

File server

UserFront end

Selection

Content

Data base

NAS

Page 15: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Storage hierarchy

Near-Line

On line

Fast Hard Disk/RAID

Tape (robot)

Solid State Disk

RAM

RAID

Page 16: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Hierarchical Storage Management (HSM)

HD cacheTape robotic storage

File server

UserFront end

Selection

Content

Data base

Page 17: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Federated storage (GRID)

Based on GRID concepts of distributed computing and file system over a WAN

Multiple self-contained storage nodes interconnected Each storage node contains its own storage medium,

microprocessor, indexing capability, and management layer, generally based on commodity pc

Advantages Fault tolerance Scalability Throughput

Example: Google File System, Apache HADOOP

Page 18: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Basic functionalities

Virtualization The user sees a single file system

Data replication The system automatically manages the desired redundancy

Direct access to data Data move from storage node to client without intermediation

Dynamic reconfiguration Nodes can be switched on and off while the system is in

operation Automatic load balancing

Exploiting data replication and direct node access

Page 19: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Data blocking and replication

A data file is divided into fixed length blocks Each block is replicated n times on different nodes

File

data data datadata data datadata data datadata data data data data data data data data data data data data data data

Block 1Block 1

Block 1

Block 2Block 2

Block 2

Block 3

Block 3

Block 3

Block 4 Block 4

Node 1

Node 2Node 3

Node 4

Node 5

Page 20: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Architecture

NodeNode

NodeNode

NodeNode

Node

Node

DataNodes

NameNode

NameNode

userFilename

Nodes list

Data chunksCluster 1 Cluster 2

Node NodeNode Node Node NodeNode Node

Page 21: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Digital Asset Management (1)

A software system that implements all the archive management policies

Provides the archive administrator the necessary tools to Monitor the preservation state of the media Restore backup copies when primary media is damaged Monitor the use of the storage Monitor software/hardware failures Define ingestion and access policies

Should provide support for technology/system migration

Page 22: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Digital Asset Management (2)

Provides the necessary functionalities to implement the ingestion workflow Receive the SIP (or a batch of) Analyse the SIP, verify that all the vital metadata are valid Assign UMIDs Transcode SIP into AIP Generate proxies (low resolution video, key frames) Provide content documentation

Provides the functionalities to implement the access workflow Verify that the user has access rights Provide content selection functionalities (search retrieval and browsing) Verify content associated rights Transcode AIP into DIP (it can depend on user request) Deliver the DIP

Page 23: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

OAIS Functions of Archival Storage

Page 24: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Business rights management

A BRM is a system that manages content associated usage rights

Without an automated BRM system the reuse of content can be slowed down by manual rights clearing operations

Depending on the type of archive it can be convenient to have BRM closely coupled with DAM

Page 25: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Digital archive design (1)

Analyse and state clearly your business requirements What is your archive primary goal Who are your users

Producers Consumers

… and what are their needs

Assess your content Amount of items Conservation status Increase rate usage

Page 26: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Select archive video formats and quality Target archived quality depends on foreseen usage and

preservation issues Define the AIP (Archive Information Package)

Video coding File formats Associated metadata

Extimate storage requirements Amount of data Level of security of data Increase rate Input output performace

Digital archive design (2)

Page 27: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Define ingestion workflow and SIP Ingestion procedures are particularly critical if your content needs

digitization and restoration

Define access workflow and DIP Access is heavily dependent on proper documentation and

retrieval tools Properly dimension throughput

Affected by video bitrate and transcoding from AIP to DIP

Define archive maintenance procedures Consistency check Media replacement Disaster recovery

Digital archive design (3)

Page 28: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Consider migration Storage technology

Media capacity follows Moore’s law… but sometime there is a technology leap (e.g. from tape

library to hd arrays) Coding formats

Compression schemes become more efficient allowing grater bit saving at a given quality

– Older formats become obsolete– Transcoding generally implies possible loss of quality

Software/hardwareProprietary formats often pose upgrade constraints

Digital archive design (4)

Page 29: Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK

Centro Ricerche e Innovazione Tecnologica

Consider needs to interfacing to other systems Federated libraries Account systems Production Digital rights management

… and finally design or commission a system

Digital archive design (5)