1
PLAN 1. Prioritize and select digital collection. 2. Determine user group. 3. Verify content, type, and metadata. 4. Determine requirements for access, storage space, and server space. 5. Determine resource needs. GET 1. Assign or confirm unique identifiers. 2. Pre-process items as appropriate. 3. Transfer, package, and inventory items. DESCRIBE 1. Describe or catalog digital material at collection, object, or file level according to LOC best practices. 2. Provide metadata to identify, characterize, or place the digital object in context. SUSTAIN 1. Educate. 2. Survey. 3. Determine needs. 4. Develop preservation action plan. 5. Use existing LOC processes, tools, and practices. 6. Review preservation actions, plans, and policies. MAKE AVAILABLE 1. Determine and manage access. 2. Monitor preservation status. 3. State missing or corrupted file. 4. Ensure authenticity. 5. Enable search across data sets. 6. Enable search for derivative content. SERIAL & GOVERNMENT PUBLICATIONS DIVISION 1. Historic newspapers in the public domain (high demand and value but low risk) on high quality microfilm. 2. Publicly served immediately on the Web through Chronicling America site and API. 3. METS/ALTO objects and associated images. 4. Storage and access requirements 4 preservation bags (2 copies per 2 preservation server) + access bag online and offline storage 54 Mb/page at approx. 10,000 pages per partner per month for 20 years à tens of millions of pages 5. Responsible for ingesting and sustaining digital materials rather than digitizing and describing. 1. Partners assign files unique identifiers according to NDNP file naming conventions, in addition to generating digital signatures and fixity values using the NDNP Digital Viewer & Validation Toolkit. 2. Batches of approx. 10,000 pages/month from partner institutions arrive at Serials aggregated on external hard drives. 3. NDNP CTS workflow: Register batch delivery à Verify batch à QR à Mount drive to Bronze à Inventory à Malware scan à Copy to RDC staging à Bag in place à Ingest into ChronAm staging à Accept batch and copy to tape Sun 29, tape Frontier, and access Sun 11 à Ingest into ChronAm production à Return drive 1. METS objects are produced at the level of title, reel, and issue (including pages). 2. Metadata relates each page-level digital object to the issue, reel, and title. Selected standards include METS, MODS, PREMIS, MIX, MARC XML, and Z39.87. Once objects are ingested, ITS periodically supplies technical, administrative, and preservation metadata to Serials. GEOGRAPHY & MAP DIVISION 1. Data is acquired via Federal deposit, donation, or Acquisitions; CCP project data is downloaded from the Internet or acquired directly from client. 2. Most content is publicly accessible, with some private data used for specific Congressional research projects. 3. Accept any formats. Some content delivered on CD may have little existing metadata. Data requested by clients is verified for accuracy and completeness, as are data sources. 4. Server space is requested based on projected need, datasets kept up in archival storage. 5. Resource needs relate to the variety of formats taken in – management and access can be difficult, as most are proprietary and complex. 1. LC control numbers used as identifiers. Project data are filed by year, organization, client, and topic. 2. If purchased, data is checked for completeness against purchase order. Byte count is checked for files delivered by FTP. Data is sorted by geographic location to prepare for cataloging. Original copies of data are transferred to storage separately from copies used in projects. 3. NGA CDs and DVDs (excluding series CB01) are logged in an inventory. Files are checked against an inventory if one was received from the creator; otherwise BagIt serves as the inventory. Some files zipped for storage. MS Access database is being developed to inventory CCP project files. 1. Bibliographic catalog record is created if necessary (National Geospatial Intelligence CDs and DVDs have an existing record). 2. Metadata is extracted from the data if possible. MANUSCRIPT DIVISION 1. Unpublished born-digital records received as part of manuscript collections of personal papers and organizational records. 2. Discoverable through finding aids served publicly on-site at the Manuscript Reading Room. Access restrictions may require more granular access controls or enforceable triggers that make the collection available at a later date. 3. Sufficient metadata to record context of digital materials within archival collection. Bagged media + reports + collection metadata = SIP. 4. Access restrictions will determine reading room or Web access by collection. Storage requirements are currently small but expected to grow. 5. Resource needs still being determined. 1. Unique identifiers are assigned for tangible media in archival collection. 2. Accessioning: Discovery of media à Remove from original location and note context à Physical custody transfer to shelf à Collection catalog record à Registration and bagging prep Bagging: Receive and log media à Virus scan à Disk image à Bag media separately using BagIt à Create preliminary reports inc. directory structure, file identification, & validation à QR of bags and media logs à Return media to registrar 3. Registrar appraises flagged media & deletes as appropriate à Create SIP bag à Create tangible media backup stored in stacks à Update tracking database with # bytes ingested, # bytes disposed à Transfer to long-term storage (TBD) 1. During processing, original media are noted within collection-level finding aids encoded in EAD. 2. Before ingest, sufficient metadata is created to record context of digital materials within archival collection. The Library’s Strategic Plan sets an outcome for 2014 that “The Library has identified and proposed criteria for the preservation of the Library’s digital materials.” We’re working to effect that outcome by gathering information about existing digital preservation practices and comparing those practices to the life cycle framework developed by the Library’s Preservation Working Group. Ultimately, our comparative analysis may contribute to revision of the life cycle framework. Digital Preservation Policy Development at the Library of Congress Emily Reynolds & Chelcie Rowell 2012 OSI Junior Fellows

Digital Preservation Policy Development at the Library of Congress

Embed Size (px)

Citation preview

Page 1: Digital Preservation Policy Development at the Library of Congress

PLA

N

1. Prioritize and select digital collection.2. Determine user group.3. Verify content, type, and metadata.4. Determine requirements for access, storage space, and server space.5. Determine resource needs.

GE

T

1. Assign or confirm unique identifiers.2. Pre-process items as appropriate.3. Transfer, package, and inventory items.

DE

SC

RIB

E

1. Describe or catalog digital material at collection, object, or file level according to LOC best practices.

2. Provide metadata to identify, characterize, or place the digital object in context.

SU

STA

IN

1. Educate.2. Survey.3. Determine needs.4. Develop preservation action plan.5. Use existing LOC processes, tools, and practices.6. Review preservation actions, plans, and policies.

MA

KE

AVA

ILA

BLE

1. Determine and manage access.2. Monitor preservation status.3. State missing or corrupted file.4. Ensure authenticity.5. Enable search across data sets.6. Enable search for derivative content.

Serial & Government PublicationS

DiviSion

1. Historic newspapers in the public domain (high demand and value but low risk) on high quality microfilm.

2. Publicly served immediately on the Web through Chronicling America site and API.

3. METS/ALTO objects and associated images.4. Storage and access requirements

• 4 preservation bags (2 copies per 2 preservation server) + access bag

• onlineandofflinestorage• 54 Mb/page at approx. 10,000 pages per

partner per month for 20 years à tens of millions of pages

5. Responsible for ingesting and sustaining digital materials rather than digitizing and describing.

1. PartnersassignfilesuniqueidentifiersaccordingtoNDNPfilenamingconventions,inadditiontogeneratingdigitalsignaturesandfixityvaluesusing the NDNP Digital Viewer & Validation Toolkit.

2. Batches of approx. 10,000 pages/month from partner institutions arrive at Serials aggregated on external hard drives.

3. NDNP CTS workflow: Register batch delivery à Verify batch à QR à Mount drive to Bronze à Inventory à Malware scan à Copy to RDC staging à Bag in place à Ingest into ChronAm staging à Accept batch and copy to tape Sun 29, tape Frontier, and access Sun 11 à Ingest into ChronAm production à Return drive

1. METS objects are produced at the level of title, reel, and issue (including pages).

2. Metadata relates each page-level digital object to the issue, reel, and title. Selected standards include METS, MODS, PREMIS, MIX, MARC XML, and Z39.87. Once objects are ingested, ITS periodically supplies technical, administrative, and preservation metadata to Serials.

GeoGraPhy & maP DiviSion

1. Data is acquired via Federal deposit, donation, or Acquisitions; CCP project data is downloaded from the Internet or acquired directly from client.

2. Most content is publicly accessible, with some privatedatausedforspecificCongressionalresearch projects.

3. Accept any formats. Some content delivered on CD may have little existing metadata. Data requestedbyclientsisverifiedforaccuracyandcompleteness, as are data sources.

4. Server space is requested based on projected need, datasets kept up in archival storage.

5. Resource needs relate to the variety of formats taken in – management and access can be difficult,asmostareproprietaryandcomplex.

1. LCcontrolnumbersusedasidentifiers.Projectdataarefiledbyyear,organization,client,andtopic.

2. If purchased, data is checked for completeness against purchase order. Byte count is checked forfilesdeliveredbyFTP.Dataissortedbygeographic location to prepare for cataloging. Original copies of data are transferred to storage separately from copies used in projects.

3. NGA CDs and DVDs (excluding series CB01) are logged in an inventory. Files are checked against an inventory if one was received from the creator; otherwise BagIt serves as the inventory. Some fileszippedforstorage.MSAccessdatabaseisbeingdevelopedtoinventoryCCPprojectfiles.

1. Bibliographic catalog record is created if necessary (National Geospatial Intelligence CDs and DVDs have an existing record).

2. Metadata is extracted from the data if possible.

manuScriPt DiviSion

1. Unpublished born-digital records received as part of manuscript collections of personal papers and organizational records.

2. Discoverablethroughfindingaidsservedpubliclyon-site at the Manuscript Reading Room. Access restrictions may require more granular access controls or enforceable triggers that make the collection available at a later date.

3. Sufficientmetadatatorecordcontextofdigitalmaterials within archival collection. Bagged media + reports + collection metadata = SIP.

4. Access restrictions will determine reading room or Web access by collection. Storage requirements are currently small but expected to grow.

5. Resource needs still being determined.

1. Uniqueidentifiersareassignedfortangiblemediain archival collection.

2. Accessioning: Discovery of media à Remove from original location and note context à Physical custody transfer to shelf à Collection catalog record à Registration and bagging prep Bagging: Receive and log media à Virus scan à Disk image à Bag media separately using BagIt à Create preliminary reports inc. directory structure,fileidentification,&validationà QR of bags and media logs à Return media to registrar

3. Registrarappraisesflaggedmedia&deletesasappropriate à Create SIP bag à Create tangible media backup stored in stacks à Update tracking database with # bytes ingested, # bytes disposed à Transfer to long-term storage (TBD)

1. During processing, original media are noted withincollection-levelfindingaidsencodedinEAD.

2. Beforeingest,sufficientmetadataiscreatedtorecord context of digital materials within archival collection.

The Library’s Strategic Plan sets an outcome for 2014 that “The Library has identified and proposed criteria for the preservation of the Library’s digital materials.” We’re working

to effect that outcome by gathering information about existing digital preservation practices and comparing those practices to the life cycle framework developed by the Library’s Preservation Working

Group. Ultimately, our comparative analysis may contribute to revision of the life cycle framework.

Digital Preservation Policy Development at the Library of Congress

Emily Reynolds & Chelcie Rowell 2012 OSI Junior Fellows