Upload
selene
View
36
Download
1
Embed Size (px)
DESCRIPTION
Archival, Digital Preservation, and Records Management. David Millman, Columbia University Ron Thielen, University of Chicago. Agenda. Difference between an Archive, Repository, and Records Management The Three Reasons to Archive The State of the Industry, Government, Higher Ed, … Standards - PowerPoint PPT Presentation
Citation preview
May 12, 2006 Spring 2006 Common Solutions Group
Archival, Digital Preservation, and
Records Management
David Millman, Columbia University
Ron Thielen, University of Chicago
May 12, 2006 Spring 2006 Common Solutions Group
Agenda
Difference between an Archive, Repository, and Records Management
The Three Reasons to Archive The State of the Industry, Government,
Higher Ed, … Standards Policies and Processes Steps Toward Archival Some Key Issues
May 12, 2006 Spring 2006 Common Solutions Group
Differences between an Archive, Repository, and Records Management
Institutional Repository – A system for collecting, preserving, and disseminating scholarly content.
Archive – A collection of data that is maintained as a long-term record of a business, application, or information state. Archives are typically kept for auditing, regulatory, analysis or reference purposes rather than for application or data recovery. - SNIA
Records Management – The systematic control of records throughout their life cycle. – ARMA
May 12, 2006 Spring 2006 Common Solutions Group
Reasons to Archive
Legal and Regulatory ComplianceAs an Aid to Corporate Memory in
Order to Improve Operational Effectiveness
To Preserve Material of Potentially Historic and Enduring Value
May 12, 2006 Spring 2006 Common Solutions Group
Legal and Regulatory Issues
Some financial records need to be retained for statutory periods varying up to 10 years
Medical research needs to be retained beyond the life of the subject
Lack of process for retaining records may be at best lack of due diligence and at worst obstruction
It is increasingly common that courts are unwilling to accept the argument that discovery would be too difficult or expensive In some cases they are fining companies that are too slow
to comply with court orders
May 12, 2006 Spring 2006 Common Solutions Group
Improve Operational Effectiveness
Act as an Aid to Institutional Memory Assist Institutional Governance by Capturing
the Rationale for Decisions Operational in our Context Extends to
Scholarly Effectiveness
May 12, 2006 Spring 2006 Common Solutions Group
Historic and Enduring Value
Not always possible to know a priori what will have enduring value
Will a researcher in the next century be more interested in the content of a particular web site or how the content was presented and in our browser interface interactions? Both.
May 12, 2006 Spring 2006 Common Solutions Group
State of the IT Industry
Used to be all about compliance Increasing awareness that there are other
reasons for archival Scan of IT Industry Organizations Scan of IT Vendors Scan of Government Initiatives Scan of Higher Education Initiatives
May 12, 2006 Spring 2006 Common Solutions Group
IT Industry Organizations SNIA (Storage Network Industry Association) Data
Management Forum (DMF) LTACSI (Long Term Archive and Compliance Storage
Initiative) 100 Year Archive Task Force SDDF (Self Describing Data Format) Task Force
ARMA - Association for Records Managers and Administrators (aka RIM Professionals) – Working with the SNIA
AIIM – Association for Information and Image Management – Believes that ISO adoption of PDF/A is the way to address preservation
May 12, 2006 Spring 2006 Common Solutions Group
Scan of IT Vendors
Niche (generally seem to get it)
Archivas, Permabit, Yosemite 800 lb Gorillas (some get it, some don’t)
HP, IBM, EMC, Sun (aka StorageTek)
“Archival” Vendors (generally don’t seem to get it)
Commvault, Zantaz, ZipLip, iLumin, …
May 12, 2006 Spring 2006 Common Solutions Group
Survey of Government Authorities and Initiatives
LOC “Library of Congress” NARA “National Archives and Records
Administration” NDIIPP “National Digital Information
Infrastructure and Preservation Program”
May 12, 2006 Spring 2006 Common Solutions Group
Survey of Higher Education and Library Initiatives DSpace (an institutional repository, not an archive) FEDORA (ditto) Stanford LOCKSS (Lots of Copies Keep Stuff Safe) DAITSS (Dark Archive in the Sunshine State) NEDLIB (Networked European Deposit Library) JORUM (repository service, U.K.) Columbia (DSpace pilots; FEDORA in Socioeconomic
Data Center Long-Term Archive) CDAD (Chicago Digital Archive Depository) RLG Digital Repository Certification UCSD / SRB (Storage Resource Broker) JHOVE (Harvard--object validation service)
May 12, 2006 Spring 2006 Common Solutions Group
Standards(formal, ad-hoc, and otherwise)
OAIS “Open Archival Information System” PREMIS “Preservation Metadata Standard” METS “Metadata Encoding and Transmission
Standard” EAD “Encoded Archival Description” MADS “Metadata Authority Description Schema” MODS "Metadata Object Description Schema" DOD 5015.2 “Design Criteria Standard for Electronic
Records Management Software Applications” ISO 15489 (Records Management) and on … and on … and …
May 12, 2006 Spring 2006 Common Solutions Group
Standards for Access and Interoperation
Institutional Repository service vs Archive Scholarly/Instructional Access issues
Discovery Interoperation/reuse Citation stability
Digital Library issues Content structure Format migration
May 12, 2006 Spring 2006 Common Solutions Group
May 12, 2006 Spring 2006 Common Solutions Group
Policy/process
Strategies email: nightly incrementals (a backup strategy) digital library: quarterly curator sign-off (an
archival strategy) Faculty buy-in
minimum metadata? education
May 12, 2006 Spring 2006 Common Solutions Group
Education experiment:Spectrum of Stability
Activecollaboration
Versioning Citableworking-paper
Publication
Multiple users w/“collab space”functions
File systemmetaphor /w/some metadata
Institutionalrepository /metadata
Preserved /archived /cataloged
Scholarly research activityLibrary curation
May 12, 2006 Spring 2006 Common Solutions Group
Five Steps to Archival
Backup - a backup is not an archive, but backup processes, support personnel, and infrastructure may (or may not) support parts of the archival infrastructure
Simple Bitstream Preservation - keep from losing the information; adds fixity checking, digital media asset management to backup
Records Management - adds policy based classification and information life-cycle management
Intellectual Content Preservation - keep the format current; migrate (or emulate) formats & structures
Archival - adds bibliographic and administrative metadata
May 12, 2006 Spring 2006 Common Solutions Group
Sampling of Issues Not Enough Cooperation to Build Standards Based Archival Systems It’s not just about the data
Metadata is key – Where does it come from (harvest, contributor, cataloger?) Context is often necessary (e.g. roles, organizational structures both formal
and informal, provenance) A Backup is not an Archive IP & DRM Who’s Archive Is It? Digital Media Asset Management (tape is dead, long live tape) Balancing Collection of Everything vs. Determining Suitability of Material
for Archival (Selection Criteria) Data Classification (Metadata Driven, Policy Based Selection Processes?)
Requirements for Research Preservation and Dissemination Fixity Checking and Repair Disaster Recovery ?