Getting started with Digital Preservation in Your Library
Are we drowning, treading water or swimming across the river?
La Crosse Public Library Archives is housed in a medium sized Wisconsin public library. We have a staff of two professional archivists, one full time associate librarian and two part time associate librarians. While the associate level does not require education beyond a college degree, it happens that all of the people in this level in the Archives have masters degrees in Library Studies.
We have a small reading room that is super-vised during open hours (53 hrs/wk).
In 2016 we answered over 13,000 questions
We have 940 processed archival collections that include our local publications.
Not unlike many public libraries, our collections in the Archives are focused on local history and genealogy. We also geographically restrict our collecting but this is based on each collection within the Archives. For example, our book collection would yield the most diversity in terms of geography (local history plus some areas of western Wisconsin and southeastern Minnesota) while our photographic and archival collections would be the most confined as to geography.
Collections include books, manuscripts, public records, local publications, maps, photographic images and ephemera. These collections exist in a variety of formats including VHS, 16mm, 8mm, slides you get the picture.
Our collections that include electronic formats are primarily photographic images, manuscripts, public records, local publications and maps; however, the majority are still physical objects that require housing in a controlled environment.
We have both analog to digital data as well as born digital content, so for some items we may have the original physical item and a digital representation, while other born digital material exists only in electronic format.
We also have materials that are lent to us for scanning and then returned to the donor in these cases we have analog to digital without owning the original object.
Appraisal is the term archivists use when determining if an item or group of items is worthwhile to be added to the repositorys holdings. I will not be talking about this assume that the appraisal decisions have already been made.
We look at collecting holistically in our minds we dont separate out different strategies for different formats. Meeting minutes are meeting minutes regardless if they are in an electronic format or not. How we handle them and provide access is a procedural difference not an intellectual one.
We created an email address for our electronically collected material called earchives@ so when we subscribe to newsletters or organizations, we can be added to distribution lists, etc. This way when someone leaves our department, we dont have to resubscribe to things.
We set up a structure on our storage server that has limited access to just Archives staff. We mirrored the structure of our collections that we label as archival (i.e., not books, maps, and other published items) so you can almost think of this file directory folders as recreating our separate collection types.
Photographs are in a different place on the server.
La Crosse Series
We collect born digital local publications in one of two ways either through email or we download them from websites.
If the materials are embedded or formatted inline email (i.e. not attached as a separate document to the email), then we save them as an HTML webpage but only save the first page. Why? Because we need to balance relevance to storage costs eventually the read more links will go dead on the host server but we will have saved the first page. We made an appraisal decision here based on collection samples.
Most of the emails contain an attachment or are directing us to their webpage. We save these as PDF documents to preserve formatting and have less version migration issues later. We can also more readily assure customers that this is an authentic capture. At this point we are also changing the file name to conform to our convention beginning with the year_month_date.
Sample of the harvest workflow worksheet
Now that we have identified and isolated electronic data we want to save, we use a file naming convention to help us access specific issues. This first column is part of the local publication list of files; the second column is part of one of those titles.
This is a very similar approach to how we handle local publications. Again it can be materials where we have digitized analog material to electronic content (so we likely have both the physical and digital representation) as well as content that comes to us electronically.
The file structure is similar again as it is based on the collection type and identifier but in this case these are meeting minutes instead of serial publications.
We use Archivists Toolkit as our collection management software. We maintain lists of the processed materials such as local publications as well as manuscripts and public records. There is an accessioning module as well and indexes to maintain name authority and subject authority lists.
The following is an example of the finding aid in AT for the LLFA collection for which we just looked at the file structure. AT is meant as a staff only tool the public never has access to it. We push the Encoded Archival Description or EAD out to our webpage for public access. The full finding aid for example can be found here:
What you are seeing here is the container list for the collection. However, there are
many notes (abstract, scope and contents, etc.) at the collection level description,
including a general note that says meeting minutes 2012-2017 are available only in
Here is the nitty gritty staff view of where the electronic content lives.
Here is an example of a mix of types of electronic materials within the same manuscript collection this helps orient the staffer to find what the customer needs. Ideally these would match the series names and structure of the finding aid.
We have physical and digital photographs in a variety of places. We maintain a Picture file that contains a wide variety of photographers, scenes, people, time periods and the like that are largely unrelated to any other type of collection, although we might have a see reference in the picture file subject heading list to particular manuscripts or public record series.
The file naming convention we use for these is based on a physical picture file collection identifier based on our subject heading list to help group photographic images together; then the box number; folder number and item number. For example: pc012-02-19-002.tif This tells the staffer exactly what collection, box, folder and item number they need to look for to find the original if we have in physical form.
For photographic collections from a single source or with specific restrictions or crediting demands, we tend to create a manuscript collection from this. We have more control and it makes sense to intellectually group these like materials together rather than have them loose amongst many other photographic images.
If we receive digitally born photographs, we dont change the name generally of the file. If we are scanning then we use the file naming convention mentioned previously. You may have noticed that when you saw the file structure of the local publications and LLFA minutes.
For photographs, we describe them individually rather than as a collection or group of items that we normally do in archival practice. The reason is because we need that granularity in the online searching environment. We also need individual control over each photo because we do not have rights to every photograph.
We devised a metadata spreadsheet that gives us intellectual control as we work through scanning the physical collection. This same spreadsheet can be used for digitally mastered images as well.
The file highlighted is an electronic version while the others listed in this
example are physical holdings.
About every two years, we face major migration and software upgrades which can make reading older data created in that software more problematic. I have often faced formatting or other challenges. You need to consider not only keeping backups of files but the software on which to manage and access those backups and file versions. This is not an easy thing to impress upon your IT guru. If you are ingesting digital content that is created outside your control, this could be a major stumbling block.
Our collection management software Archivists Toolkit is an open source tool that is no longer supported. We are stuck using an older version of Java to use it and the IT person wants us all up on new Windows 10 pcs by the end of the year. So we are investigating our options to ArchiveSpace but its no longer free and there will be learning curves and likely migration issues.
Where does the
collections land on
the NSDP chart?
We have dipped our toes and even begun to wade out into the deeper water of digital preservation but knowing that we cannot yet go and swim with shar