55
The Library of Congress Audio-Visual Prototyping Project Carl Fleischhauer ([email protected]) Office of Strategic Initiatives, Library of Congress Sound Savings Conference University of Texas, Austin July 25, 2003 This slide show: lcweb.loc.gov/rr/mopic/avprot/SoundSavings03.ppt

The Library of Congress Audio-Visual Prototyping Project Carl Fleischhauer ([email protected]) Office of Strategic Initiatives, Library of Congress Sound Savings

Embed Size (px)

Citation preview

The Library of Congress

Audio-Visual Prototyping Project

Carl Fleischhauer ([email protected])Office of Strategic Initiatives, Library of Congress

Sound Savings ConferenceUniversity of Texas, Austin

July 25, 2003

This slide show: lcweb.loc.gov/rr/mopic/avprot/SoundSavings03.ppt

National Audio-Visual Conservation Center

• New Library of Congress facility for the Motion Picture, Broadcasting, and Recorded Sound Division (M/B/RS)

• Facility funded by the Packard Humanities Institute

• Will be in Culpeper, Virginia, 70 miles from Washington

• Planned to go operational 2005

Audio-Visual Prototyping Project

• Collections from the M/B/RS division and the American Folklife Center at LC

• Emphasis: reformatting endangered materials, especially magnetic tapes and instantaneous discs

• Current work: audio

• Future activities: video, copyright MP3s, content from web sites

• Prototyping period: 1999-2004

Motive 1: Alternative Preservation Approach

• Shortcomings of conventional practice: reformatting onto analog magnetic tape1 Short life expectancy

2 Generation loss with each copy

3 Cessation of manufacture of analog tape and tape recorders

Motive 1: Alternative Preservation Approach

• Desire to work in the digital realm• Emerging issues

– Deterioration of tangible born-digital, e.g., CD-Rs acquired by LC from music composer copyrights

– Emerging issue: preserving intangible born-digital content, e.g., MP3s from copyright and other acquisitions

Motive 2: Provide Access

• Limited access, since most items protected by copyright or require consideration of folk performer prerogatives

• LC researchers on Capitol Hill, collections in Culpeper

• Possible future authorized remote research sites

Illustration: sample preserved item

• We want to reproduce the artifact as a whole

• This example is a Marine Corps recording from the South Pacific in WW II– Audio from a disc copy of an Amertape Recording

Film original (film-with-grooves)– Images depict the film container and the disc label

Initial display of navigation tree and thumbnails

Close-up display of image & file-level metadata

Preservation Concept

• Content takes the form of information

packages aka digital objects

• Information packages consist of data (e.g.,

audio and image files, ) and metadata

Preservation Concept

• Not a CD or DVD approach

• Packages managed in digital repository

• Repository is server and storage-system based

• Paradox: – Content at any given moment depends upon

systems and media

– Content must be system and media independent

Four issues

1. Selecting the target format for reformatting

2. Quality of the reformatted copy

3. Shaping the object/package and the importance of metadata

4. Longevity in “media-less” environment

Issue 1Selecting the format

Selecting the format

• Disclosure– are specifications and tools available?

• Adoption– is the format already in wide use?

• Transparency– is encoding open to analysis with basic

tools?

Selecting the format

• Self-documentation– does object include metadata that explains

how to render or understand context?

• Fidelity– support for high resolution audio

• Sound field– support for stereo and/or surround sound

Audio formats

• Audio masters

– Bitstream: PCM sampling, uncompressed

– File format: WAVE (higher res)

– One-bit-deep formats (e.g., SONY DSD) of interest

but “ahead of the game” for us

• Service files

– WAVE (lower res) and MP3

Image formats

• Image Masters

– Bitstream: Uncompressed bitmapped

– File format: TIFF

• Service copies

– JPEGs

Issue 2Quality of the

Reformatted Copy

Key Parameters

• Sampling frequency– Render the waveform as “dots”– More dots contribute to greater accuracy,

capable of rendering high frequency sounds– Expressed as kilocycles per second or kiloherz – Compare to spatial resolution for images– Higher “pixels or dots per inch” contribute to

better clarity

Key Parameters

• Word length, bit depth– Greater bit depth means greater precision in

locating the sample in terms of amplitude– Greater bit depth means greater capacity to

represent dynamic range– Expressed as bits per sample– Compare to tonal resolution (color) for images– Higher “bits per pixel” mean more accurate color

Staff discussion of parameters . . .

• Consensus on word length– Everyone is sold that 24 bit is better than 16– Based on listening, objective measurement possible– “Extra data will protect you when the original has

wide or varying dynamics, or if an operator makes a mistake.”

– Compare to imaging and a downstream benefit• Master image at 12 or 16 bits per channel• Manipulate for aesthetic effect, save at 8 bits• No gaps in your histogram

Staff discussion of parameters . . .

• Less consensus on sampling frequency– Some of us thought this was the relevant question:

“What is the range of frequencies we might expect in this item?”

• 78 rpm disc from the acoustic era– 8-10 kilocycles per second, or less

– Rule of thumb: digitally sample at 2x frequency

– Will 25 kilocycles per second suffice?

• Folk music collector with a Nagra in 1970s– 14-18 kilocycles per second

– Will 44 or 48 kilocycles suffice?

Staff discussion of parameters . . .

• Engineers advocated sampling frequencies of 96 or even 192 kHz

• Discussion tended to look at practical production issues and possible downstream options

• Objective measurement is not relevant to some of these factors

• Very high resolution desired because:– “There may be hard-to-hear harmonics that you

won’t want to lose.”– “Copies with less noise and less distortion can more

successfully be restored in a post-process.”– “In the future we’ll have better enhancement tools

and post-processing, so save as much raw information as you can.”

– “What if you need extra data to support certain types of resource discovery?”

Staff discussion of parameters . . .

Staff discussion of parameters . . .

• Inherent fidelity of the original items not decisive.

• Informal A-B listening comparisons were helpful but not conclusive.

• Proposal to carry out empirical comparison of restoration actions applied to a high-res and a medium-res master.

Audio resolution for prototyping project

• Result of preceding discussion: the engineers work at the upper limit of the tools they have

• Reformatted content – Audio masters

• 96 kHz/24 bit mono or stereo (some at 48/24)

– Service files• 44.1 kHz/16 bit WAVE • 256 kbps MP3 (if stereo)

Image resolution for prototyping project

• Reformatted content– Borrow approach from other digitization

projects– Image Masters

• 300-400 lines/pixels per inch• 24 bit color

– Service copies• Same-size JPEGs

Two Sidebars

Sidebar on practices

• Professional equipment– For example, professional analog-to-digital

converters

• Some details– Masters as flat transfers, avoid/minimize cleanup– Copy mono discs with stereo cartridge, hope for

future process to “find the best groove wall”

Sidebar on practices

• Professional workers– Supervise and perform expert work

• Work requires knowledge and skills with antique formats and new digital technology

Sidebar on practices

• Some ideas for the future– Include apprentice workers in work team– Sort originals by “transfer efficiency” category– Use expert systems to help monitor transfers, spot

anomalies– For some categories, copy two or three items at once

• Inspired by – PRESTO project in Europe (http://presto.joanneum.ac.at/index.asp)– Image-based recovery from discs (http://www-cdf.lbl.gov/~av/)

Sidebar on objective measurement

• Imaging: targets

• Audio: test tones

• Outputs from targets/tones measure the performance of equipment

• They do not measure actual “content” images or sounds directly.

Sidebar on objective measurement

• Tools and practices not mature, even for imaging

• Need performance measures for digital systems– You can’t believe your scanner when it says 300 ppi

• Measure what actually comes through the system– Imaging example: use modulation transfer function

(MTF) as a yardstick for delivered spatial resolution– Pass-fail point not yet established for image

reformatting projects

Sidebar on objective measurement

• Tentative use of standard ITU test sequences known as CCITT 0.33– 28-second series of tones to test satellite broadcast

transmissions, mono and stereo– Recordings of the tones can be used to determine the

frequency response, distortion, and signal-to-noise ratio produced in a given recording system

– Pass-fail point not yet established for sound reformatting projects

Issue 3Shaping the information package and the importance of metadata

Information package

• Complex entity with multiple parts

• Data and Metadata

• Data in this context means the audio, video, or

image bitstreams

• Metadata includes

– Descriptive

– Administrative

– Structural

Descriptive metadata in the AV project

• For object as a whole– Often copy of descriptive data in LC central catalog

– MODS XML schema• http://www.loc.gov/standards/mods/

• Optional additional descriptive metadata for individual parts of object– Song titles, artists for disc sides or cuts

– Names of writers in manuscript file folder

– MODS “related items”

Administrative metadata in the AV project

• Persistent identifier, “ownership” info• Documentation of reformatting today and digital

migration tomorrow• About the source and actions taken to prepare items

for digitization, e.g., clean, bake• About the digitizing process• Rights data or at least categorization of objects for

management of access

Structural metadata in the AV project

• Relationships between parts of objects

• Example: long-playing record album

– Box, front

– Three discs, two sides each (audio segments)

– Disc label (images)

– Booklet, cover and 28 pages (images)

Illustration: three-lp-disc boxed set with booklet

Encoding the metadata

• AV project is using the emerging Metadata

Encoding and Transmission Standard (METS)

• http://www.loc.gov/standards/mets/

METS XML output (partial) displayed in Internet Explorer

Added metadata for long-term preservation

• To support long term content management• Examples:

– “Fixity” info, e.g., checksums to monitor file changes

– Pointers to documentation for file formats

– Pointers to documentation of the hardware/software environment required to render files

• No practice yet in AV prototyping project

• See RLG-OCLC preservation metadata report– http://www.oclc.org/research/pmwg/

Overall anxiety . . .

• Are we trying to capture too much metadata?

• Tools to automate the creation of metadata, especially administrative metadata, are critical

Issue 4

Longevity in a

media-less environment

Future LC repository

• Intersection of the AV project and Culpeper center with LC-wide digital planning (NDIIPP)

• LC repository design will be in terms of the NASA Open Archival Information System (OAIS) reference model

PRODUCERS

ADMINISTRATION

DATAMANAGEMENT

ARCHIVALSTORAGE

INGEST ACCESS

CONSUMERS

PRESERVATION PLANNING

Reference Model for an Open Archival Information System (OAIS)

SIP: Submission information package

PRODUCERS

ADMINISTRATION

DATAMANAGEMENT

ARCHIVALSTORAGE

INGEST ACCESS

CONSUMERS

PRESERVATION PLANNING

Reference Model for an Open Archival Information System (OAIS)

AIP: Archival information package

PRODUCERS

ADMINISTRATION

DATAMANAGEMENT

ARCHIVALSTORAGE

INGEST ACCESS

CONSUMERS

PRESERVATION PLANNING

Reference Model for an Open Archival Information System (OAIS)

DIP: Dissemination information package

PRODUCERS

ADMINISTRATION

DATAMANAGEMENT

ARCHIVALSTORAGE

INGEST ACCESS

CONSUMERS

PRESERVATION PLANNING

Reference Model for an Open Archival Information System (OAIS)

Current plan: The Culpeper facility will

produce and submit packages to LC’s future digital

repository.

While we wait for the OAIS-compliant repository . . .

• Continue to use UNIX-filesystem based storage • Orderly file storage, masters segregated from

service copies• METS metadata stored for now as individual

XML files• Virtual information packages are “ready to

submit”• METS also supports end-user display

What about smaller archives and libraries?

• The digital approach to content preservation depends on significant computer infrastructure

• Will we have a few consortial repositories to serve many smaller archives?

• Who and how would such arrangements be made?

What about smaller archives and libraries?

• Holding action?

• For audio, make multiple CD-Rs or DVD-Rs? Write to data tape?

• LC is challenged to give good advice today

Web Sites

• LC audio-visual prototyping project – http://lcweb.loc.gov/rr/mopic/avprot/

• LC enterprise-wide digital preservation planning– http://www.digitalpreservation.gov/ndiipp/

• Metadata Encoding and Transmission Standard (METS)– http://www.loc.gov/standards/mets/

Thank you . . .