8
35 Before I get into my piece of the EXTRA grant, I have a few slides that discuss the history of our Library’s digital preservation program as well as our current digital preservation software system, and then I’ll go into how the EXTRA grant highlighted some new challenges that we’ll be facing as we begin working more with audio-visual materials. 36 I was hired in 2008 as the library’s first digital preservation archivist. As Molly mentioned, I came from the AV Archive in Special Collections with no experience in digital preservation. My first couple years were spent, among other things, researching the field and then developing a high-level digital preservation policy for our library. The scope of our policy includes Special Collections materials such as those in the AV Archive as well as Manuscripts, Photos, Rare Books, and other cultural heritage materials. We’re also tasked with preserving scholarly output from faculty and other unique collections throughout the library. Once I completed our policy, I began looking at the files we had saved on our server already. I had to go through all the

Text of Post-Digitization, Digital Preservation portion of AMIA 2015 presentation: Examining AV Enterprise at a Regional Academic Archives

Embed Size (px)

Citation preview

Page 1: Text of Post-Digitization, Digital Preservation portion of AMIA 2015 presentation: Examining AV Enterprise at a Regional Academic Archives

35

Before I get into my piece of the EXTRA grant, I have a few slides that discuss the history of our Library’s digital preservation program as well as our current digital preservation software system, and then I’ll go into how the EXTRA grant highlighted some new challenges that we’ll be facing as we begin working more with audio-visual materials.

36I was hired in 2008 as the library’s first digital preservation archivist. As Molly mentioned, I came from the AV Archive in Special Collections with no experience in digital preservation. My first couple years were spent, among other things, researching the field and then developing a high-level digital preservation policy for our library. The scope of our policy includes Special Collections materials such as those in the AV Archive as well as Manuscripts, Photos, Rare Books, and other cultural heritage materials. We’re also tasked with preserving scholarly output from faculty and other unique collections throughout the library.

Once I completed our policy, I began looking at the files we had saved on our server already. I had to go through all the files to determine if they fit our policy and if not, de-accession them. This took a long time, but was a necessary step to take to make sure our actions were in keeping with the new policy and to get a handle on all our legacy material.

Page 2: Text of Post-Digitization, Digital Preservation portion of AMIA 2015 presentation: Examining AV Enterprise at a Regional Academic Archives

37The next couple years were spent working on a few different specific digital preservation projects. One of those was web archiving, which was spurred by the fact that our university class catalog was no longer being printed. Special Collections staff often receive patron requests for printouts of specific class descriptions from decades before. However, with the catalog no longer being printed and the website changing each semester to keep up with current offerings, we had a problem. So, we began using the Internet Archive’s Archive-It service to crawl websites and we began archiving the related warc files ourselves as well.

The other thing going on was our search for a digital preservation management software system. We evaluated 4 different systems that were available at the time, both open source and vendor-based. We finally settled on Ex Libris’ Rosetta as being the best system for our needs.

38After purchasing Rosetta and testing it thoroughly, we went live with it in January of 2014. So far, we’ve ingested over 70 terrabytes, mostly tiff files. Of those 70 terrabytes, only 8 are Special Collections materials, and of those 8, so far only 1 terrabyte is AV. Most of our AV files are waiting to be ingested into Rosetta along with a huge backlog of other collection types, representing over 120 terrabytes of archival files. Of that 120, nearly 60 is AV material.

Page 3: Text of Post-Digitization, Digital Preservation portion of AMIA 2015 presentation: Examining AV Enterprise at a Regional Academic Archives

39This is the Rosetta homepage. You can see the number of ingests (called SIPs, or Submission Information Packages) that are currently in process in the middle aisle. On the right, there’s a running total of all preserved files and the total size of the archive.

40Now, finally, to EXTRA! As Molly mentioned, the grant created over 11 terrabytes of video. Those files are currently stored on a 4 node QC208 Qumulo cluster NAS. Our System Administrators create quarterly tape backups that are stored in Perpetual Storage, a granite storage vault in the side of a mountain near Salt Lake City. We’re hoping to add a 3rd storage instance in the future so that we can increase redundancy and meet the minimum requirements for true digital preservation according to the Open Archival Information System (or OAIS) model.

As I mentioned before, the EXTRA tapes have not yet been ingested into our Digital Archive because there are some collection organization decisions we still need to make with Molly and Jessica before ingestion.

Page 4: Text of Post-Digitization, Digital Preservation portion of AMIA 2015 presentation: Examining AV Enterprise at a Regional Academic Archives

41So, I think the biggest thing that the EXTRA grant opened our eyes to is the question of sustainability. Up until then, we’d been working mainly with image and text documents and size hadn’t become too big an issue. We’ve been storing AV materials for a few years, but they were mainly kept under the radar, meaning, I knew Molly had an increasing need for space on our archival server, but no one had really come up with a document that spelled out just how large the need for AV preservation was, especially keeping in mind that the magnetic media in the collection was rapidly deteriorating.

So, while the data grew, we weren’t paying much attention to the numbers and rate of growth and what that would mean in the long term. Once we realized how much space AV Archives really needed in order to preserve even just the highest risk items in their collection, we realized that our program was really not very sustainable. Right now we don’t even have a dedicated budget line for the Digital Archive. We just scramble every few years for more server space and funds to hire staff whenever we can. In fact, our digital preservation program has consisted of just one person, me, until we recently hired someone to work directly with Rosetta. Now we have a robust staff of TWO!

So, as we look at ways to become more sustainable, we need to figure out how much material the AV Archive needs to digitize for preservation, as well as estimate their rate of growth and what that means for the Digital Archive.

When it came to EXTRA, the grant paid for a year’s worth of preservation and that’s it. So we are currently looking into ways to pay for these activities, including offering digital preservation services to smaller institutions in the area for a fee.

Page 5: Text of Post-Digitization, Digital Preservation portion of AMIA 2015 presentation: Examining AV Enterprise at a Regional Academic Archives

42So, after discovering the many challenges that came from EXTRA, we’ve focused our efforts on a few projects that we hope will help us create a better digital preservation program.

43The first project to come from the EXTRA experience was a joint IT and AV Archive pilot project to assess AV collections for digital preservation. In the pilot, using my background as a film archivist, I spent a couple months evaluating a high-risk, high-value collection for preservation. Acknowledging that we will never have enough money or staff to digitize and preserve everything we have, we need a good system for evaluating which items in a collection should be digitized, and what the priority should be. One of the rules of the pilot was that I couldn’t view items unless they were completely unlabeled. This was due to the sheer number of items that needed to be evaluated. So, I relied on titles and some other specific collection information to create a spreadsheet detailing the collection.

From there, the idea was to prioritize items for digitization based on format and subject as well as to make sure we had marked which titles appeared in other collections. Diane Orr, the filmmaker whose collection I was evaluating, also worked for EXTRA during part of her career, so some of the items in her collection would have already been digitized as part of the EXTRA grant and we wanted to make sure we didn’t digitize titles twice.

We’re currently looking at continuing the pilot in a phase two to continue the evaluation criteria at the collection, rather than item, level.

Page 6: Text of Post-Digitization, Digital Preservation portion of AMIA 2015 presentation: Examining AV Enterprise at a Regional Academic Archives

44Another project spurred by EXTRA was the lossless compression working group. This group is currently active and we’re in the phase, with the help of programmers in IT, of testing some batch processing scripts to convert our uncompressed video to FFV1. We’d love to hear from folks who are using FFV1 or other codecs in their archives as we’re really new to this.

45As we look into ways to make our program more useful and relevant, we’re beginning to realize that not everything needs the same level of preservation, For instance, Molly and Jessica recently got a new 16mm print of the Utah independent filmmaker, Trent Harris’ film “Plan 10 From Outerspace”. Along with the new print, which is housed in our cold storage vault, they received a digital copy of the film. The 16mm print is the archival master, so do we need to fully preserve a digital master of something like that as well?

We’ve decided, mainly due to funding issues, that some items may not need the highest level of digital preservation. For those items, we will commit to bit-level preservation. We haven’t worked out all the policy level specifics yet, but we think bit-level preservation will include maintaining onsite and offsite backup copies, virus checking, fixity-checking, and periodic refreshment by copying files to new storage media. In other words, maintaining the integrity of the original file for later dissemination.

Bit-level preservation would also be appropriate for files given to us in a highly compressed format as well items already held in other digital archives.

Page 7: Text of Post-Digitization, Digital Preservation portion of AMIA 2015 presentation: Examining AV Enterprise at a Regional Academic Archives

46Finally, as I mentioned before, the biggest obstacle to our program is sustainability. We need to come up with ways to offer our services to campus and community. We’re already working closely with the State of Utah’s Department of Heritage and Arts as we are ingesting their collections into Rosetta. We do this for a yearly fee and those funds are funneled back into the program, helping with hardware costs. We’re also looking into the possibility of offering Rosetta services to other libraries in exchange for a fee to help sustain our yearly software fees. Really, though, true sustainability of our program will have to come at the University level. This will require work not only from my department, but also the Library Dean.

And that’s it for me. Molly’s gonna give a brief conclusion before we open for questions!