44
Your Digital Preservation Cookbook Sara Allain, Dan Gillean, and Sarah Romkey, Artefactual Systems Archives Association of B.C. Annual Conference, April 15, 2016 https://www.pinterest.com/pin/ 455145106065308238/ Your Digital Preservation Cookbook Sara Allain, Dan Gillean, and Sarah Romkey, Artefactual Systems Archives Association of B.C. Annual Conference, April 15, 2016

Your Digital Preservation Cookbook

Embed Size (px)

Citation preview

Page 1: Your Digital Preservation Cookbook

Your Digital Preservation Cookbook Sara Allain, Dan Gillean,

and Sarah Romkey,Artefactual Systems

Archives Association of B.C. Annual Conference,

April 15, 2016

https://www.pinterest.com/pin/455145106065308238/

Your Digital Preservation Cookbook Sara Allain, Dan Gillean,

and Sarah Romkey,Artefactual Systems

Archives Association of B.C. Annual Conference,

April 15, 2016

Page 2: Your Digital Preservation Cookbook

Today’s offerings:

1. Ingredient preparation (digital preservation actions)

2. Cooking (preservation storage)

3. Serving (providing access to digital content)

4. Kitchen management (policies and procedures)

http://www.sampletemplates.com/menu-templates/blank-menu-template.html

Page 3: Your Digital Preservation Cookbook

Ingredient Preparation

Digital preservation actions

Ingredient Preparation

https://www.atlaswearables.com/blog/2015/05/we-love-vegetables/

Page 4: Your Digital Preservation Cookbook

Preparation: Digital preservation actions

By taking on digital preservation prep, your files are better understood for the future.

Like properly prepped ingredients, prepared digital content is better cooked (preserved).

Unlike ingredients in your favourite recipe, prep activities actually increase their authenticity rather than transforming them into something new.

http://www.blogher.com/women-and-food-will-win-war-wwi

Page 5: Your Digital Preservation Cookbook

Preparation: FixityAre those ingredients what they say they are on the box?

Fixity, or checksums, record the order of the bits so it can be re-checked in the future.

Capturing fixity as early as possible in the accessioning process makes sense - don’t move the files several times before creating a checksum.

Checksums pair nicely with other functions, e.g. packaging (Bagit).

http://www.buzzfeed.com/leonoraepstein/16-fascinating-facts-about-jell-o#.uqrYYQqw7

Page 6: Your Digital Preservation Cookbook

Preparation: Virus scanKeep pests out of the kitchen!

Scan for viruses so you don’t ingest them into your preservation environment!

Quarantine functionality in a preservation system allows virus definitions time to update.

City of Vancouver Archives, Deer in Malahat Lookout KitchenWilliam Bros. Photographers Collection AM1545-S3-: CVA 586-497

Page 7: Your Digital Preservation Cookbook

Preparation: File identificationKnow your ingredients!

Know what you’re cooking with: identify file formats, ideally using digital signatures for increased precision.

Should identify not just the format, but also the version

Identifying the file formats accurately increases likelihood of getting more/better technical metadata.

City of Vancouver Archives, [Woman mixing ingredients at] Dale's [Roast Chicken] kitchen on Granville Street

William Bros. Photographers Collection AM1545-S3-: CVA 586-4012

Page 8: Your Digital Preservation Cookbook

Preparation: Validation, characterization, metadata extractionAre those noodles real?

Validation: is it a well-formed example of that particular file format?

Characterization: what are the particulars of this specific file? (e.g. size, codec, bitrate, etc)

Extracting this technical metadata from the files and storing in a standardized way helps ensure their longevity.

http://travelwireasia.com/2013/08/fake-food-japanese-style-that-looks-good-enough-to-eat/

Page 9: Your Digital Preservation Cookbook

Preparation: PII and sensitiveinformationLike in the analogue world, you may have a requirement to flag files that contain personally identifying information and restrict access to the originals.

Unlike the analogue world, there are tools available that can help you scan automatically for this information!

This task can be performed during processing, or after access is requested.

http://www.amazon.com/White-Horse-Whisky-Blindfolded-Taste/dp/B0159EOIXQ

Page 10: Your Digital Preservation Cookbook

Preparation: Normalization, migration, emulationStrategies for dealing with software obsolescence:

Normalization converts files into a more preservation-friendly format while retaining the originals

Migration migrates the files overtime as new file formats emerge.

Emulation preserves the files and their software/operating system.

UBC Archives, Two Students in Cooking Class in Home Economics, School of Family and Nutritional Science fonds, UBC 101.1/15

Page 11: Your Digital Preservation Cookbook

Preparation: Putting it all togetherIf that all sounded like more kitchen prep than Thanksgiving dinner, luckily there’s an easier way!

Digital preservation systems can tie much of the functionality together into one workflow.

Some of these functions are also taken care of in repository systems (coming up next).

http://freshome.com/2013/03/22/what-you-can-learn-from-the-jetsons-about-home-automation/

Page 12: Your Digital Preservation Cookbook

CookingPreservation storage

Cookinghttp://www.colonelsretreat.com/home/cooking.php

Page 13: Your Digital Preservation Cookbook

Cooking: Preservation storagePrep is critical, but it’s only the first step!

Like cooking a meal, preserving your content for the long term requires specific tools and methods.

As with food, the best way to preserve your digital content is to use an appropriate storage container to ensure that your content will be safe and usable for the long term.

https://www.flickr.com/photos/29069717@N02/10111289655/

Page 14: Your Digital Preservation Cookbook

Cooking: Preservation storagePreservatives and an airtight seal

Your storage container for digital content is a repository. Repositories come in many flavours:

• Can have a public interface or be closed off (“dark archive”)

• Can be a simple data store or something really complex

• May come with built-in tools to help you ensure that your data is valid for the long-term

https://www.flickr.com/photos/29069717@N02/10111289655/

Page 15: Your Digital Preservation Cookbook

Cooking: Fixity checkingSimpler - faster - better - surer!

Fixity checking ensures that your content is still viable.

By looking at the fixity record you created during preparation and then re-running the tool you used to create that fixity record in the first place, you can tell if your content is still viable - all the bits are still present and accounted for.

Your repository system should enable you to do this automatically - no human intervention needed, unless the fixity checks don’t match!

http://s.ecrater.com/stores/108769/55f584a6249cf_108769b.jpg

Page 16: Your Digital Preservation Cookbook

Cooking: RedundancyMake sure there’s enough for seconds. And thirds!

Making many copies of your digital content is critical to ensuring that you have back-up if something goes wrong. Two common kinds of redundancy are:

• Back-up copies of your database preserved on different servers

• Geo-redundancy, usually provided by a server hosting provider

https://c1.staticflickr.com/3/2096/5794109510_a4f966a812.jpg

Page 17: Your Digital Preservation Cookbook

Cooking: Technical metadataThe recipe for your digipres casserole

Technical metadata tells you what comprises the digital content as well as how it’s put together.

There are different standards depending on the type of technical metadata that you’re recording. PREMIS is widely used to capture metadata specifically relating to preservation; there are many others as well.

Following a standard means that your metadata will be consistent both within your repository and over time. http://www.midcenturymenu.com/2010/06/the-mid-century-menu-ham-banana-casserole/

Page 18: Your Digital Preservation Cookbook

Cooking: Audit and controlDon’t let strangers mess around in your kitchen!

Performing regular, holistic audits to check on the integrity of your files is the best way to ensure that they’re not degenerating over time.

Only authorized users should have access to your repository. Controlling who can edit your digital content - including metadata - is a crucial component to ensure that it’s stored safely and securely.

http://land.allears.net/blogs/jackspence/21%20Yak%20%26%20Yeti%2001.jpg

Page 19: Your Digital Preservation Cookbook

Cooking: Future proofingIf you start with the basics, you’ll be able to cook anything

Choosing the best repository system isn’t just about your present needs - it’s also about the future.

Ensuring that your repository is open and built around standards and best practices means that, if you need to, you can migrate to a new system.

Adhering to standards and best practices is like learning to chop an onion - it’s the foundation on which your collections rely.

http://ecx.images-amazon.com/images/I/81DGvz%2BcNZL.jpg

Page 20: Your Digital Preservation Cookbook

Serving

Managing Access

Servinghttps://www.sclv.com/Dining/Buffets.aspx

Page 21: Your Digital Preservation Cookbook

Serving: know your designated community!Who’s coming to dinner?

The OAIS reference model defines a designated community as:

“An identified group of potential Consumers who should be able to understand a particular set of information. The Designated Community may be composed of multiple user communities. A Designated Community is defined by the Archive and this definition may change over time.”

This means understanding that your end users might have different needs than the institutional actors responsible for ongoing preservation.

http://hahasforhoohas.com/stories/ten-things-you-never-want-say-dinner-guests-arrive

Page 22: Your Digital Preservation Cookbook

Serving: Applying access restrictions Knowing what not to serve is just as important as knowing what to serve!

You will need to make sure that you are applying appropriate access restrictions. These might be based on copyright, local statutes, donor restrictions, licenses, etc. You’ll need clear policies on who can access what when.

PREMIS Rights: http://www.loc.gov/standards/premis/

Coyle, Karen. “Rights in the PREMIS Data Model.” A report for the Library of Congress, December 2006. http://www.loc.gov/standards/premis/Rights-in-the-PREMIS-Data-Model.pdf

https://makeameme.org/meme/no-dinner-for-pczmhb

Page 23: Your Digital Preservation Cookbook

Serving: Creating access derivatives (DIPs)Or, don’t serve a whole chicken on wing night!

Preservation masters ≠ access copies!

For access, you want:

Smaller file sizes

In common formats

Supported by many web browsers and OSes

TIFF → JPG

WAV → MP3http://vancouverfoodster.com/2012/11/27/tasting-plates-chinatown-strathcona/

Dissemination Information Package (DIP): An Information Package, derived from one ormore AIPs, and sent by Archives to the Consumer in response to a request to the OAIS.

Page 24: Your Digital Preservation Cookbook

Serving: adding descriptive metadataLet your dinner guests know what’s on the menu

Use existing content standards: Dublin Core, ISAD(G), RAD (Canada), MODS, etc.

This can be done in a database or content management system (e.g. AtoM, ArchivesSpace, CollectiveAccess; custom databases, etc), or in locally created finding aids.

However you choose to do it, you will also need to think about how users are eventually going to access this information...

http://www.flavourbistro.co.nz/bistro-menu-g-173.html

Page 25: Your Digital Preservation Cookbook

Serving: indexing your content and making it discoverableSend out the dinner invitations!

Your end users (or consumers) will need a way to explore and understand the content you are making available.

Some facility for searching and browsing will greatly ease this.

If your resources are web-accessible, they can be indexed by search engines and become more broadly discoverable.

Indexing also includes adding access points - give your users a way into the content!

Access Software: A type of software that presents part of or all of the information contentof an Information Object in forms understandable to humans or systems.

http://www.sandyloujohnson.com/974-2/

Page 26: Your Digital Preservation Cookbook

Serving: Maintaining a relationship with the masterYou need to know where your hor d'oeuvres came from if you want to be able to serve them again in the future

Additional descriptive metadata created outside of the preservation workflow should remain linked to the AIP / digital object master.

Links to your rights statements are crucial for monitoring compliance!

Mutts comic strip, by Patrick McDonnell.http://farmtotablela.com/farm-table-humor/

Provenance: maintaining the digital chain of custody

If you need to generate updated DIPs in the future, you want to be able to re-trace that chain

Page 27: Your Digital Preservation Cookbook

Serving: Evaluating Access Systems for DigiPresChanneling your inner food critic

If you are looking to implement an existing access system as part of your digital preservation environment, here’s a summary of some of the factors to consider:

• Search and retrieval • Digital object display• Hierarchies and context • Access restrictions / rights management • Standards adherence• Data exchange and interoperability• Digital provenance (relationship to preservation masters)

https://www.pinterest.com/pin/73253931414036246/

Page 28: Your Digital Preservation Cookbook

Kitchen ManagementPolicies and Procedures

Kitchen Management

http://www.kitchenkitties.com/service-archive/kitchen-boot-camp/

Page 29: Your Digital Preservation Cookbook

Kitchen Management: The importance of policyDigital preservation is not all about tools and technology:

In standards like ISO 16363 (2012), policies and organizational infrastructure account for between ⅓ - ½ of the entire standard!

You need to ensure that your organization has the will, the capacity, and the vision to undertake digital preservation over the long-term.

http://recruitloop.com/blog/who-really-needs-to-get-involved-in-the-recruitment-process/

Page 30: Your Digital Preservation Cookbook

Kitchen Management: The importance of policyExample factors to consider:

• Does your organization’s mission statement explicitly cover a commitment to digital preservation?

• Do you have succession, contingency, and/or escrow plans in place?

• Do you have training policies around digital preservation?

• Are the duties of each staff associated with each link in the chain documented?

• Do you have an internal auditing mechanism?

• Do you have a long-term financial plan for your preservation?

http://liaisoncollegeoakville.com/chef-diploma-programs/specialist-chef/

Page 31: Your Digital Preservation Cookbook

Kitchen Management: The value of collaboration

This ain’t Iron Chef!!!

• Digital preservation is hard - and ongoing• Archives are underfunded - especially in

Canada• There’s a lot to learn…

But we can learn together, and share resources.

To be successful, we’ll need to collaborate, not compete - like a REAL professional kitchen!

http://www.popsugar.com/food/Interview-Next-Iron-Chef-Geoffrey-Zakarian-20967020

Page 32: Your Digital Preservation Cookbook

Shopping List

Tools and resources

http://www.middlevillemarketplace.com/shopping-list.php

Shopping List

Page 33: Your Digital Preservation Cookbook

Fixity

Tools to create checksums:md5deep: http://md5deep.sourceforge.net/

md5summer: http://www.md5summer.org/

Built into various preservation systems/tools: Archivematica, Preservica, Bagger, DuraCloud, etc.

Tools to verify checksums:Fixity: https://github.com/avpreserve/fixity

Built into various tools/systems as above

Tools to scan viruses

Clam AV : http://www.clamav.net/

Page 34: Your Digital Preservation Cookbook

Format identification

PRONOM database: http://www.nationalarchives.gov.uk/PRONOM/Default.aspx

Tools:Format Identifier for Digital Objects (FIDO): https://github.com/openplanets/fido

Siegfried: https://github.com/richardlehane/siegfried

File Information Tool Set (FITS): http://projects.iq.harvard.edu/fit

DROID: https://github.com/digital-preservation/droid

Page 35: Your Digital Preservation Cookbook

Characterization, validation, and metadata extraction

File Information Tool Set (FITS): http://projects.iq.harvard.edu/fits

Metadata extraction tool: http://meta-extractor.sourceforge.net

ffprobe: https://ffmpeg.org/ffprobe.html

Exiftool: http://www.sno.phy.queensu.ca/~phil/exiftool/

MediaInfo: https://mediaarea.net/en/MediaInfo

JHOVE: https://github.com/openpreserve/jhove

veraPDF: http://verapdf.org/

Page 36: Your Digital Preservation Cookbook

Normalization, migration, and emulation

Imagemagick: http://www.imagemagick.org/script/index.php

Inkscape: http://www.inkscape.org/

FFMPEG: http://ffmpeg.org/ffmpeg.html

Ghostscript: http://www.ghostscript.com/

KEEP solutions Emulation Framework: http://emuframework.sourceforge.net/

bwFLA Emulation as a Service: http://bw-fla.uni-freiburg.de/

Page 37: Your Digital Preservation Cookbook

Digital preservation systems

Archivematica: http://www.archivematica.org

Preservica: www.preservica.com

Rosetta: http://www.exlibrisgroup.com/category/RosettaOverview

Page 38: Your Digital Preservation Cookbook

Technical metadata standards

PREMIS: http://www.loc.gov/standards/premis/

Coyle, Karen. “Rights in the PREMIS Data Model.” A report for the Library of Congress, December 2006. http://www.loc.gov/standards/premis/Rights-in-the-PREMIS-Data-Model.pdf

METS: http://www.loc.gov/standards/mets/

PBCore: http://pbcore.org/schema/

NISO Metadata for Images in XML: http://www.loc.gov/standards/mix/

And many more, depending on the filetypes you’re working with!

Page 39: Your Digital Preservation Cookbook

Descriptive metadata standards

Dublin Core: http://dublincore.org/documents/dcmi-terms/

Rules for Archival Description (Canada): http://www.cdncouncilarchives.ca/archdesrules.html

General International Standard for Archival Description - ISAD(G): http://ica.org/en/isadg-general-international-standard-archival-description-second-edition

MODS: http://www.loc.gov/standards/mods/

Page 40: Your Digital Preservation Cookbook

Derivatives and indexing 1/2For derivatives: see the resources above for normalization. The same tools that are used for preservation normalization can also be used for creating access derivatives!

For search indexing: This will depend on how you are making your resources available. A search index generally needs to be one component of an application stack. Here are a few resources to look into:

Elasticsearch: https://www.elastic.co/products/elasticsearch

Solr: https://lucene.apache.org/solr/

Blacklight: http://projectblacklight.org/

Page 41: Your Digital Preservation Cookbook

Derivatives and indexing 2/2For adding indexing terms for discovery: Use existing controlled vocabularies whenever possible!

• Library of Congress vocabularies: http://loc.gov/library/libarch-thesauri.html

• Getty Vocabularies: http://www.getty.edu/research/tools/vocabularies/index.html

• Library Archives Canada controlled vocabularies: http://www.bac-lac.gc.ca/eng/services/government-information-resources/controlled-vocabularies/Pages/controlled-vocabularies.aspx

• UNESCO thesaurus: http://databases.unesco.org/thesaurus/

• JISC Directory of Metadata Vocabularies: http://www.jiscdigitalmedia.ac.uk/guide/controlling-your-language-links-to-metadata-vocabularies/

• RBMS Controlled Vocabularies for Use in Rare Book and Special Collections Cataloging: http://rbms.info/vocabularies/

Page 42: Your Digital Preservation Cookbook

Description, repository, and access systems

• Access to Memory: https://www.accesstomemory.org

• ArchivesSpace: http://www.archivesspace.org/

• CollectiveAccess: http://collectiveaccess.org/

• Omeka: http://omeka.org/

• Islandora: http://islandora.ca/

• Hydra: https://projecthydra.org/

• Avalon: http://www.avalonmediasystem.org/

• ResCarta Toolkit: http://www.rescarta.org/

Note that a lot of these systems will include the tools and standards described in the previous slides

Page 43: Your Digital Preservation Cookbook

General & policy resources

• CCSDS - Reference Model for an Open Archival Information System (OAIS): http://public.ccsds.org/publications/archive/650x0m2.pdf

• CCSDS - Audit and Certification of Trustworthy Digital Repositories: http://public.ccsds.org/publications/archive/652x0m1.pdf

• TRAC review tool, developed by Developed by MIT in a project led by Nancy McGovern, Head of Curation and Preservation Services at MIT Libraries: https://wiki.archivematica.org/Internal_audit_tool

• COPTR - Community Owned digital Preservation Tool Registry: http://coptr.digipres.org/Main_Page

• Open Preservation Foundation: http://openpreservation.org/

Page 44: Your Digital Preservation Cookbook

• POWRR Project - Preserving (Digital) Objects With Restricted Resources: http://digitalpowrr.niu.edu/

• DigiPres Commons: http://www.digipres.org/

• Digital Preservation Q & A: http://qanda.digipres.org/

• National Digital Stewardship Alliance - Levels of Preservation: http://ndsa.diglib.org/activities/levels-of-digital-preservation/

• NDSA Digital Preservation in a Box: http://dpoutreach.net/

• AVPreserve’s open source tools: https://www.avpreserve.com/avpsresources/tools/

• AVPreserve’s papers and presentations: https://www.avpreserve.com/avpsresources/papers-and-presentations/

General & policy resources