Upload
poppy-allison-rogers
View
223
Download
1
Tags:
Embed Size (px)
Citation preview
The Library of Congress
Audio-Visual Prototyping Project
Carl Fleischhauer ([email protected])Office of Strategic Initiatives, Library of Congress
Sound Savings ConferenceUniversity of Texas, Austin
July 25, 2003
This slide show: lcweb.loc.gov/rr/mopic/avprot/SoundSavings03.ppt
National Audio-Visual Conservation Center
• New Library of Congress facility for the Motion Picture, Broadcasting, and Recorded Sound Division (M/B/RS)
• Facility funded by the Packard Humanities Institute
• Will be in Culpeper, Virginia, 70 miles from Washington
• Planned to go operational 2005
Audio-Visual Prototyping Project
• Collections from the M/B/RS division and the American Folklife Center at LC
• Emphasis: reformatting endangered materials, especially magnetic tapes and instantaneous discs
• Current work: audio
• Future activities: video, copyright MP3s, content from web sites
• Prototyping period: 1999-2004
Motive 1: Alternative Preservation Approach
• Shortcomings of conventional practice: reformatting onto analog magnetic tape1 Short life expectancy
2 Generation loss with each copy
3 Cessation of manufacture of analog tape and tape recorders
Motive 1: Alternative Preservation Approach
• Desire to work in the digital realm• Emerging issues
– Deterioration of tangible born-digital, e.g., CD-Rs acquired by LC from music composer copyrights
– Emerging issue: preserving intangible born-digital content, e.g., MP3s from copyright and other acquisitions
Motive 2: Provide Access
• Limited access, since most items protected by copyright or require consideration of folk performer prerogatives
• LC researchers on Capitol Hill, collections in Culpeper
• Possible future authorized remote research sites
Illustration: sample preserved item
• We want to reproduce the artifact as a whole
• This example is a Marine Corps recording from the South Pacific in WW II– Audio from a disc copy of an Amertape Recording
Film original (film-with-grooves)– Images depict the film container and the disc label
Preservation Concept
• Content takes the form of information
packages aka digital objects
• Information packages consist of data (e.g.,
audio and image files, ) and metadata
Preservation Concept
• Not a CD or DVD approach
• Packages managed in digital repository
• Repository is server and storage-system based
• Paradox: – Content at any given moment depends upon
systems and media
– Content must be system and media independent
Four issues
1. Selecting the target format for reformatting
2. Quality of the reformatted copy
3. Shaping the object/package and the importance of metadata
4. Longevity in “media-less” environment
Selecting the format
• Disclosure– are specifications and tools available?
• Adoption– is the format already in wide use?
• Transparency– is encoding open to analysis with basic
tools?
Selecting the format
• Self-documentation– does object include metadata that explains
how to render or understand context?
• Fidelity– support for high resolution audio
• Sound field– support for stereo and/or surround sound
Audio formats
• Audio masters
– Bitstream: PCM sampling, uncompressed
– File format: WAVE (higher res)
– One-bit-deep formats (e.g., SONY DSD) of interest
but “ahead of the game” for us
• Service files
– WAVE (lower res) and MP3
Image formats
• Image Masters
– Bitstream: Uncompressed bitmapped
– File format: TIFF
• Service copies
– JPEGs
Key Parameters
• Sampling frequency– Render the waveform as “dots”– More dots contribute to greater accuracy,
capable of rendering high frequency sounds– Expressed as kilocycles per second or kiloherz – Compare to spatial resolution for images– Higher “pixels or dots per inch” contribute to
better clarity
Key Parameters
• Word length, bit depth– Greater bit depth means greater precision in
locating the sample in terms of amplitude– Greater bit depth means greater capacity to
represent dynamic range– Expressed as bits per sample– Compare to tonal resolution (color) for images– Higher “bits per pixel” mean more accurate color
Staff discussion of parameters . . .
• Consensus on word length– Everyone is sold that 24 bit is better than 16– Based on listening, objective measurement possible– “Extra data will protect you when the original has
wide or varying dynamics, or if an operator makes a mistake.”
– Compare to imaging and a downstream benefit• Master image at 12 or 16 bits per channel• Manipulate for aesthetic effect, save at 8 bits• No gaps in your histogram
Staff discussion of parameters . . .
• Less consensus on sampling frequency– Some of us thought this was the relevant question:
“What is the range of frequencies we might expect in this item?”
• 78 rpm disc from the acoustic era– 8-10 kilocycles per second, or less
– Rule of thumb: digitally sample at 2x frequency
– Will 25 kilocycles per second suffice?
• Folk music collector with a Nagra in 1970s– 14-18 kilocycles per second
– Will 44 or 48 kilocycles suffice?
Staff discussion of parameters . . .
• Engineers advocated sampling frequencies of 96 or even 192 kHz
• Discussion tended to look at practical production issues and possible downstream options
• Objective measurement is not relevant to some of these factors
• Very high resolution desired because:– “There may be hard-to-hear harmonics that you
won’t want to lose.”– “Copies with less noise and less distortion can more
successfully be restored in a post-process.”– “In the future we’ll have better enhancement tools
and post-processing, so save as much raw information as you can.”
– “What if you need extra data to support certain types of resource discovery?”
Staff discussion of parameters . . .
Staff discussion of parameters . . .
• Inherent fidelity of the original items not decisive.
• Informal A-B listening comparisons were helpful but not conclusive.
• Proposal to carry out empirical comparison of restoration actions applied to a high-res and a medium-res master.
Audio resolution for prototyping project
• Result of preceding discussion: the engineers work at the upper limit of the tools they have
• Reformatted content – Audio masters
• 96 kHz/24 bit mono or stereo (some at 48/24)
– Service files• 44.1 kHz/16 bit WAVE • 256 kbps MP3 (if stereo)
Image resolution for prototyping project
• Reformatted content– Borrow approach from other digitization
projects– Image Masters
• 300-400 lines/pixels per inch• 24 bit color
– Service copies• Same-size JPEGs
Sidebar on practices
• Professional equipment– For example, professional analog-to-digital
converters
• Some details– Masters as flat transfers, avoid/minimize cleanup– Copy mono discs with stereo cartridge, hope for
future process to “find the best groove wall”
Sidebar on practices
• Professional workers– Supervise and perform expert work
• Work requires knowledge and skills with antique formats and new digital technology
Sidebar on practices
• Some ideas for the future– Include apprentice workers in work team– Sort originals by “transfer efficiency” category– Use expert systems to help monitor transfers, spot
anomalies– For some categories, copy two or three items at once
• Inspired by – PRESTO project in Europe (http://presto.joanneum.ac.at/index.asp)– Image-based recovery from discs (http://www-cdf.lbl.gov/~av/)
Sidebar on objective measurement
• Imaging: targets
• Audio: test tones
• Outputs from targets/tones measure the performance of equipment
• They do not measure actual “content” images or sounds directly.
Sidebar on objective measurement
• Tools and practices not mature, even for imaging
• Need performance measures for digital systems– You can’t believe your scanner when it says 300 ppi
• Measure what actually comes through the system– Imaging example: use modulation transfer function
(MTF) as a yardstick for delivered spatial resolution– Pass-fail point not yet established for image
reformatting projects
Sidebar on objective measurement
• Tentative use of standard ITU test sequences known as CCITT 0.33– 28-second series of tones to test satellite broadcast
transmissions, mono and stereo– Recordings of the tones can be used to determine the
frequency response, distortion, and signal-to-noise ratio produced in a given recording system
– Pass-fail point not yet established for sound reformatting projects
Information package
• Complex entity with multiple parts
• Data and Metadata
• Data in this context means the audio, video, or
image bitstreams
• Metadata includes
– Descriptive
– Administrative
– Structural
Descriptive metadata in the AV project
• For object as a whole– Often copy of descriptive data in LC central catalog
– MODS XML schema• http://www.loc.gov/standards/mods/
• Optional additional descriptive metadata for individual parts of object– Song titles, artists for disc sides or cuts
– Names of writers in manuscript file folder
– MODS “related items”
Administrative metadata in the AV project
• Persistent identifier, “ownership” info• Documentation of reformatting today and digital
migration tomorrow• About the source and actions taken to prepare items
for digitization, e.g., clean, bake• About the digitizing process• Rights data or at least categorization of objects for
management of access
Structural metadata in the AV project
• Relationships between parts of objects
• Example: long-playing record album
– Box, front
– Three discs, two sides each (audio segments)
– Disc label (images)
– Booklet, cover and 28 pages (images)
Encoding the metadata
• AV project is using the emerging Metadata
Encoding and Transmission Standard (METS)
• http://www.loc.gov/standards/mets/
Added metadata for long-term preservation
• To support long term content management• Examples:
– “Fixity” info, e.g., checksums to monitor file changes
– Pointers to documentation for file formats
– Pointers to documentation of the hardware/software environment required to render files
• No practice yet in AV prototyping project
• See RLG-OCLC preservation metadata report– http://www.oclc.org/research/pmwg/
Overall anxiety . . .
• Are we trying to capture too much metadata?
• Tools to automate the creation of metadata, especially administrative metadata, are critical
Future LC repository
• Intersection of the AV project and Culpeper center with LC-wide digital planning (NDIIPP)
• LC repository design will be in terms of the NASA Open Archival Information System (OAIS) reference model
PRODUCERS
ADMINISTRATION
DATAMANAGEMENT
ARCHIVALSTORAGE
INGEST ACCESS
CONSUMERS
PRESERVATION PLANNING
Reference Model for an Open Archival Information System (OAIS)
SIP: Submission information package
PRODUCERS
ADMINISTRATION
DATAMANAGEMENT
ARCHIVALSTORAGE
INGEST ACCESS
CONSUMERS
PRESERVATION PLANNING
Reference Model for an Open Archival Information System (OAIS)
AIP: Archival information package
PRODUCERS
ADMINISTRATION
DATAMANAGEMENT
ARCHIVALSTORAGE
INGEST ACCESS
CONSUMERS
PRESERVATION PLANNING
Reference Model for an Open Archival Information System (OAIS)
DIP: Dissemination information package
PRODUCERS
ADMINISTRATION
DATAMANAGEMENT
ARCHIVALSTORAGE
INGEST ACCESS
CONSUMERS
PRESERVATION PLANNING
Reference Model for an Open Archival Information System (OAIS)
Current plan: The Culpeper facility will
produce and submit packages to LC’s future digital
repository.
While we wait for the OAIS-compliant repository . . .
• Continue to use UNIX-filesystem based storage • Orderly file storage, masters segregated from
service copies• METS metadata stored for now as individual
XML files• Virtual information packages are “ready to
submit”• METS also supports end-user display
What about smaller archives and libraries?
• The digital approach to content preservation depends on significant computer infrastructure
• Will we have a few consortial repositories to serve many smaller archives?
• Who and how would such arrangements be made?
What about smaller archives and libraries?
• Holding action?
• For audio, make multiple CD-Rs or DVD-Rs? Write to data tape?
• LC is challenged to give good advice today
Web Sites
• LC audio-visual prototyping project – http://lcweb.loc.gov/rr/mopic/avprot/
• LC enterprise-wide digital preservation planning– http://www.digitalpreservation.gov/ndiipp/
• Metadata Encoding and Transmission Standard (METS)– http://www.loc.gov/standards/mets/