19
Preserving Software at Scale: The Stephen Cabrinety Collection Michael Olson, Stanford University Libraries Douglas White, National Institute of Standards and Technology

Preserving Software at Scale: The Stephen Cabrinety Collection

Embed Size (px)

Citation preview

Preserving Software at Scale: The Stephen Cabrinety Collection

Michael Olson, Stanford University LibrariesDouglas White, National Institute of Standards and Technology

Disclaimer

Trade names and company products are mentioned in the text or identified. In no case does such identification imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the products are necessarily the best available for the purpose.

The Collection and NIST Grant

Collection consists of ~ 15,000 software titles from 1975 – 1995

Grant (Sept. 2013 – Aug. 2014) funded by National Institute of Standards and Technology

Contains all media types from this period Disk images to be added to National Software Reference

Library (NSRL) Reference Data Set Disk images and photographs will be ingested into the

Stanford Digital Repository

Initial Stanford Tasks

Page software to campus Register software titles in Digital Object Registry (DRUID,

Title, Source ID) Enter descriptive metadata in NSRL database Print tracking sheet Ship to NIST

NIST NSRL Collection

Contains 14,500 pieces of computer software.Focuses on Windows, Mac, Linux operating systems and

popular applications. Modern formats : DVD & CD ROMs, 5¼ in. & 3 ½ in. disks.

Efforts 2005 to date:19,500 media images395 media errors (2%)3,500 photograph sets25,200 photos

SUL Cabrinety Collection

Focuses on games for Atari, Commodore, Amiga, Sega, Nintendo, and Apple systems.

27 different operating systems represented. Several formats : 8 in., 5¼ in., and 3 ½ in. computer disks,

cassettes, cartridges, CD-ROMs.

NIST Efforts to date:900 media images158 media errors (17%)1,100 photograph sets61,100 photos

NSRL Workstation

x

Workstation Equipment

Apple Mini, running Ubuntu 12.04 LTS5000K lighting stationCanon T3i, tetheredGolden Thread Object Level Target USB 3.5-inch floppy driveDevice Side Data FC5025 USB 5.25-inch floppy controller ATA 5.25-inch floppy driveUSB barcode scanner

Firefox browserJava photo organizer (custom, wraps gphoto2 etc.)Perl media imager (custom, wraps dfcldd etc.)

Cartridge Media

Using Retrode adapter for SEGA Genesis and Super Nintendo (SNES) games, plus plug-ins for Gameboy, Atari, Nintendo 64.

Could not generate a complete, consistent media image.

Every cartridge has metadata in a ROM “header” area; many include a checksum, for anti-piracy use.

NSRL can calculate the SNES and SEGA Genesis checksums.Game Boy and Nintendo are works in progress.

Detailed blog article recently published on Stanford website.

Results to date

Just received first batch of data from NIST – 360 GB = 870 software titles, 116,000 unique files

Capture success rate:– 83% with no modification or intervention– Can increase by 5% with human intervention during imaging– Can increase by 4% with intervention during image mount– 8% of media have many (> 10%) sector read errors

Lessons and Improvements Automation; less human interaction

Photography; use RAW and convert

Hardware for legacy media: Apple physical formats Large format floppy disks (8”) Cassettes

Cartridge batteries

Lessons and Improvements

Data modeling beginning this month for repository Copyright letter created to send to rights holders

Create persistent URL citation page (PURL) for software Integration into Stanford Catalog called SearchWorks –

when rights allow

Just received first batch of data from NIST 360 GB = 870 software titles, 116,000 unique files

Copyright permissions letter created

Questions?

Michael Olson, email: [email protected]

Douglas White email: [email protected]