Where are we with Digital Preservation? Andrew Waugh Public Record Office Victoria

Embed Size (px)

Text of Where are we with Digital Preservation? Andrew Waugh Public Record Office Victoria

  • Slide 1

Where are we with Digital Preservation? Andrew Waugh Public Record Office Victoria Slide 2 Where are we? It is not the end. It may not even be the beginning of the end. But it is undoubtedly the end of the beginning Winston Churchill This talk will cover Consensus views on digital presevation Open questions and future challenges Slide 3 What this presentation will cover Understanding (building systems) Storage (preserving the bit strings) Access (preserving the meaning) Metadata (preserving the context & authenticity) Transfer (overcoming system senescence) Slide 4 Understanding Communication requires shared terminology and concepts Open Archival Information System (OAIS) reference model (IS 14721:2003) http://public.ccsds.org/publications/archive/650x0b1.pdfhttp://public.ccsds.org/publications/archive/650x0b1.pdf High level terminology very widely used, but few use the detail in the model Does not cover preservation Pre web and detail does not reflect actual implementations Currently under review Slide 5 Trusted digital repositories How can you be sure if an organisation (& its system) is up to holding your digital objects? Trustworthy Repositories Audit and Certification CRL/NARA (2007) http://www.crl.edu/content.asp?l1=13&l2=58&l3=162&l4=91 Administrative focus rather than technical high level (cannot be tested) Based on OAIS, basis for audit checklists Slide 6 Audit checklists Provide tests to see if a repository can be trusted Drambora: DCC/DPE (2007) Risk based, self certification http://www.repositoryaudit.eu/ Slide 7 Public domain digital repositories Public domain digital repository code D-Space (http://www.dspace.org/) Fedora (http://www.fedora-commons.org/) Both came out of the academic community and primarily support institutional repositories Slide 8 Storage preserving the bit string Fundamental task of digital preservation is ensuring that the bits that make up the digital objects are preserved Solved problem large scale data repositories have existed for decades and there is lots of operational experience Archival twist: actively monitor health of stored objects using hashes Slide 9 Storage - future challenges Reducing storage cost (and chance for error) Swedish National Archives estimated in 2005 between 4 and 8 Euro per digitised page mostly in system and support costs http://www.tape-online.net/docs/Palm_Black_Hole.pdfhttp://www.tape-online.net/docs/Palm_Black_Hole.pdf Reducing risks Administrator risk vs packaged risk Ideal storage system Packaged (i.e. built in administration such as the Centera) Open so that you can trust it and replace components CLOCKSS Uses redundant copies at participating institutions to ensure preservation (LOCKSS) http://www.clockss.org/clockss/Home Slide 10 Access preserving the meaning What do you do when you no longer have an application to open the data files? Current approach is either Do nothing now with eventual migration Normalisation upon accession Future approach might be emulation Slide 11 Migration Save what you capture now and convert to new formats as required Web harvesting (studies show web sites are mostly safe formats HTML, XML, jpeg, gif, etc) Formats (and software) proving surprisingly resilient Slide 12 Normalisation Convert upon accession to small number of long term preservation formats E.g. PDF/A (PROV), ODF (NAA) Immediate cost upon accession, but expected lower long term management cost Criteria for good LTPF (Library of Congress) http://www.digitalpreservation.gov/formats/intro/intro.shtml Slide 13 Challenges What is it? Tools to determine file formats Pronom repository of format descriptions and DROID (format classifier) http://www.nationalarchives.gov.uk/pronom/ http://www.nationalarchives.gov.uk/pronom/ JHOVE (Harvard) classifier and simple validation http://hul.harvard.edu/jhove/http://hul.harvard.edu/jhove/ How accurate is the conversion? Is it a valid file according to the standard? Slide 14 Metadata is better data Metadata is information about the bit string What it is (semantic) What it is (technical) How it relates to other digital objects What is its history? How is it to be managed? Unfortunately, lots and lots of large metadata standards Slide 15 Metadata standards For an excellent summary of metadata standards see the Metadata chapter in the DCC Digital Curation Manual http://www.dcc.ac.uk/resource/curation- manual/chapters/metadata/metadata.pdf Slide 16 Digital preservation metadata Data Dictionary for Preservation Metadata (PREMIS) little descriptive information and nothing format specific http://www.loc.gov/standards/premis/http://www.loc.gov/standards/premis/ ISO 23081 (Metadata for records) National Archives Australia Recordkeeping Metadata Standard http://www.naa.gov.au/Images/rkms_pt1_2_tcm2-1036.pdf Slide 17 Future challenges Too many competing standards Which do I implement? Too many elements Increases cost of standard development and software implementation Few elements ever used Too expensive and too hard to capture metadata Slide 18 Transfer Overcoming system senescence Digital objects have a much longer life than the systems that hold them Move objects to digital repositories where they can be properly managed Move them from one digital repository to its replacement Storage is so cheap that holders may be tempted to keep digital objects (until it is too late) Slide 19 Future challenges Current systems are not designed around the assumption that digital objects must be relocated AIHT, Conceptual Issues from Practical Tests, Clay Shirky, D-Lib Magazine, Vol 11 No 12, December 2005, http://www.dlib.org/dlib/december05/shirky/12shirky.ht ml http://www.dlib.org/dlib/december05/shirky/12shirky.ht ml ADRI-UN/CEFACT work on a standard to transfer custody of digital records Slide 20 More information If I have whetted your appetite... PADI Annotated bibliography of digital preservation (http://www.nla.gov.au/padi/) D-Lib Magazine (http://www.dlib.org/) Slide 21 Final thoughts We know about compasses, and we have some charts, but there are a lot of rocks out there We are a long way from satellite navigation What about small/medium archives personal archives? Are photographs better digital or as negatives? http://www.wilhelm-research.com/http://www.wilhelm-research.com/