41
Emily Pfotenhauer June 24, 2014 BEST PRACTICES FOR MANAGING BORN DIGITAL CONTENT

Best Practices for Managing Born Digital Content

Embed Size (px)

DESCRIPTION

Webinar presented for WiLS by Emily Pfotenhauer, Recollection Wisconsin Program Manager, June 24, 2014. Based on information from the Demystifying Born Digital reports from OCLC Research and the Digital Preservation Education and Outreach (DPOE) curriculum developed by the Library of Congress.

Citation preview

Page 1: Best Practices for Managing Born Digital Content

Emily PfotenhauerJune 24, 2014

BEST PRACTICES FOR MANAGING

BORN DIGITAL CONTENT

Page 2: Best Practices for Managing Born Digital Content

Emily PfotenhauerRecollection Wisconsin Program Manager, [email protected]

Slides and links: http://recollectionwisconsin.org/borndigital

BEST PRACTICES FOR MANAGING BORN DIGITAL CONTENT

Page 3: Best Practices for Managing Born Digital Content

http://oclc.org/research/activities/borndigital.html

Page 4: Best Practices for Managing Born Digital Content

The mission of the DPOE program of the Library of Congress is to encourage individuals and organizations to actively preserve their digital content, building on a collaborative network of instructors, contributors, and institutional partners.

http://www.digitalpreservation.gov/education/

DIGITAL PRESERVATION OUTREACH AND EDUCATION

Page 5: Best Practices for Managing Born Digital Content

identify

select

store

protect

manage

provide

DPOE Modules for Managing Digital Content Over Time

Page 6: Best Practices for Managing Born Digital Content

WHAT IS DIGITAL CONTENT?

Digital content is any material that is published or distributed in a digital form, including text, data, sound recordings, photographs and images, motion pictures, and software. Digital materials created from analog sources Born-digital materials

Digital materials you currently have or create – or expect to have – that you want to preserve.

Page 7: Best Practices for Managing Born Digital Content

Born-digital resources are items created and managed in digital form. Digital photographs Digital documents Digital manuscripts Harvested web content Electronic records Data sets Digital art Digital media publications

Defining “Born Digital,” Ricky Erway, OCLC Researchhttp://oclc.org/content/dam/research/activities/hiddencollections/borndigital.pdf

DEFINING “BORN DIGITAL”

Page 8: Best Practices for Managing Born Digital Content

Everyone is creating digital content distributing digital content using digital content

And we are responsible for managing digital content

DIGITAL REALITY IN 2014

http://digitalbevaring.dk

Page 9: Best Practices for Managing Born Digital Content

WHAT’S THE PROBLEM?

Increasing amounts of digital assets are arriving on our doorstep

The digital assets arrive in all formats and on all formats

Time sensitive -- the longer we wait or the longer

our donors wait, the increased chance that something will be unreadable

Page 10: Best Practices for Managing Born Digital Content

Who takes the lead?What can I do?Where do I start?

Too technical (I don’t understand...)

Too daunting (I don’t have time...)

WHAT ARE THE CHALLENGES?

http://digitalbevaring.dk

Page 11: Best Practices for Managing Born Digital Content

Digital preservation combines policies, strategies and actions to ensure access to reformatted and born digital content regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time.

Working group on Defi ning Digital Preservation ALA Annual Conference, 6/24/2007

http://www.ala.org/alcts/resources/preserv/defdigpres0408

DIGITAL PRESERVATION

Page 12: Best Practices for Managing Born Digital Content

Digital materials on physical media (CDs, flash drives, floppy disks, etc.) have been stored along with other collection materials without having been copied, preserved, or made accessible.

A TYPICAL SCENARIO

Page 13: Best Practices for Managing Born Digital Content

WHAT COULD POSSIBLY GO WRONG?

Page 14: Best Practices for Managing Born Digital Content

Do no harm

Don’t do anything that prevents future action and use

Take action

Document what you do

FIRST STEPS: FOUR ESSENTIAL PRINCIPLES

Page 15: Best Practices for Managing Born Digital Content

Identifying content is a first step to planning for current and future preservation needs

Ask: what content do I have, will I have,might I have, must I have?

An inventory is the best way to identify what content you have now –

and raise awareness in your institution.

DPOE MODULE 1: IDENTIFY

http://digitalbevaring.dk

Page 16: Best Practices for Managing Born Digital Content

Good preservation decisions are based on an understanding of the possible content to be preserved

Not all digital content can or should be preserved

Preservation requires an explicit commitment of resources

WHY DO WE IDENTIFY CONTENT?

Page 17: Best Practices for Managing Born Digital Content

1. Identify and locate existing holdings.2. Count and describe digital media within each

collection.3. Remove media from collection (retain order

with photographs or separator sheets).4. Assign inventory number to each physical

piece.5. Record anything that is known about the

hardware, operating systems, and software used to create the fi les.

6. Calculate total amount of data (estimate).7. Re-house physical media in suitable storage.

FIRST STEPS: CREATE AN INVENTORY

Page 18: Best Practices for Managing Born Digital Content

Medium (6 CDs, 1 hard drive)

Format (pdfs, docs)

File Size (be consistent - MB, GB or TB)

Identifying information found on labels such as creator, title, description of contents and dates

Expected future growth, if any

COUNT AND DESCRIBE

Page 19: Best Practices for Managing Born Digital Content

Prioritize for further processing based on:

Significance and use of overall collectionDanger of loss of content (degradation) due

to age or type of mediaUniqueness – not replicated elsewhereQuantity of digital content

DPOE MODULE 2: SELECT

Page 20: Best Practices for Managing Born Digital Content

Cost: storage may be cheap, management is not…especially over time

Not all digital content may be appropriate for your organization to preserve. Matching mission to

content

Keeping delivery and access manageable and sustainable

WHY SELECT CONTENT TO PRESERVE?

Log jam on the St. Croix River, 1886Wisconsin Historical Society WHi-

2364

Page 21: Best Practices for Managing Born Digital Content

Ask yourself which digital content is most significant to your

organization? most extensive? most requested/used? easiest? oldest? newest? mandated? at risk?

SETTING PRIORITIES

Postal workers sorting mail, 1955Wisconsin Historical Society WHi-36392

Page 22: Best Practices for Managing Born Digital Content

Communication is key, particularly when content comes from external creators

Keep content creators in the conversationArrange a convenient time for them to talk about your preservation plans

Identify list of materials to review with themDocument the results and send them a copy

Sample policy: Minds@UWhttp://uwdcc.library.wisc.edu/minds/faq.shtml

INCLUDE CONTENT CREATORS

Page 23: Best Practices for Managing Born Digital Content

THEN WHAT?

Steps for transferring born-digital content from media you can read in-house:

1. Use a “clean” computer.

2. Use a write blocker.

3. Insert source media.

4. Create a disk directory.

5. Copy fi les from media to the directory.

6. Generate a copy of the directory.

7. Generate and record a checksum.

8. Create a readme fi le.

9. Copy the directory to trustworthy archival storage.

10. Return the original physical media to storage.

11. Create or update any associated descriptive tool(s).

Page 24: Best Practices for Managing Born Digital Content

Dedicated computer

Regularly scanned with up-to-date antivirus software

Non-networked

STEP 1: CLEAN WORKSTATION

UW-Madison Archives

Page 25: Best Practices for Managing Born Digital Content

Prevents the computer from altering fi le content and metadata (i.e. date, creator)

Do not open fi les until after transfer

STEP 2: WRITE BLOCKER

https://www.flickr.com/photos/joncrel/6285946610/

Page 26: Best Practices for Managing Born Digital Content

Do not attempt to open any fi les.

Examine media for cracks, breaks, etc.

Remove any sticky notes or anything else that could become loose.

STEP 3: INSERT SOURCE MEDIA

bitcurator.net

Page 27: Best Practices for Managing Born Digital Content

Create a directory on the clean machine for the current project.

Within the directory, create sub-directories: Master Folder (to hold the master copy of the file) Working Folder (to hold working copies of the

master copy) Documentation Folder (to hold metadata and

other information associated with the project)

STEP 4: CREATE A DISK DIRECTORY

Page 28: Best Practices for Managing Born Digital Content
Page 29: Best Practices for Managing Born Digital Content

Copy files from the source media to the master folder Copy files individually or in groups-OR- Create a disk image

Disk image = single fi le containing an authentic copy of a disk’s contents, retaining original metadata and fi le system structure

After transfer from source media, make a second working copy – ok to open these fi les

STEP 5: COPY FILES

Page 30: Best Practices for Managing Born Digital Content

Generate a copy of the disk directory information File names File sizes File extensions Dates

Store a digital copy in the project documentation folder

Print a copy to keep with the physical collection

STEP 6: COPY THE DISK DIRECTORY INFO

Page 31: Best Practices for Managing Born Digital Content

Checksums (aka “hash sums”) are created by programs running an algorithm against the contents of a fi le. (There are many free utilities that will perform this function for you.)

The resulting checksum is a short sequence of letters and/or numbers that uniquely identifies that fi le. (think “electronic fingerprint”)

STEP 7: RUN CHECKSUMS

Unix cksum utility

Page 32: Best Practices for Managing Born Digital Content

Checksums help maintain the INTEGRITY of your collections because they will tell you if things change over time.

If two fi les are exactly the same, the checksums of those fi les will also be exactly the same (generally speaking).

If a fi le becomes corrupted, degraded or is changed in some way, the next time you run the utility on it, the checksum will change.

WHY IS THIS A GOOD THING?

Page 33: Best Practices for Managing Born Digital Content

Things that will NOT affect checksums Moving items from one place to another Changing the file name

Run on the master fi les when a collection is completed

Set up a schedule to run “verify checks” periodically

CHECKSUMS: THINGS TO REMEMBER

Page 34: Best Practices for Managing Born Digital Content

Leave yourself (and others) some breadcrumbs

Brief description of contents, any retention schedule, naming conventions, steps taken in transfer

Store the fi le in the project documentation folder and store a printout of the readme fi le with the physical collection materials

STEP 8: CREATE A README FILE

Page 35: Best Practices for Managing Born Digital Content

Copy the directories containing the master fi les and project documentation to trustworthy archival storage

Store a second copy of the fi les in a different physical location

May delete working fi les at this time

STEP 9 : TRANSFER TO SECURE LOCATION

Page 36: Best Practices for Managing Born Digital Content

STEP 10: RETURN ORIGINAL TO STORAGE

Return original source media to appropriate storage

- OR –

Destroy the originals using a secure method

Page 37: Best Practices for Managing Born Digital Content

Inventory as well as any finding aid, collection-level record and/or accession record

Include steps taken during transfer and the current location(s) of the fi les

STEP 11: CREATE OR UPDATE ANY ASSOCIATED DESCRIPTIVE

TOOL(S)

http://digitalbevaring.dk

Page 38: Best Practices for Managing Born Digital Content

Do no harm

Don’t do anything that prevents future action and use

Take action

Document what you do

REVIEW: FOUR ESSENTIAL PRINCIPLES

Page 39: Best Practices for Managing Born Digital Content
Page 40: Best Practices for Managing Born Digital Content

The Signal: Library of Congress digital preservation blog http://blogs.loc.gov/digitalpreservation/

Minnesota State Archives – Electronic Records Management Resourceshttp://www.mnhs.org/preserve/records/electronicrecords.php

Practical E-Records bloghttp://e-records.chrisprom.com

Digital Curation Exchangehttp://digitalcurationexchange.org

Digital Curation Bibliography http://digital-scholarship.org/dcbw/dcb.htm

FURTHER RESOURCES

Page 41: Best Practices for Managing Born Digital Content

Emily PfotenhauerRecollection Wisconsin Program Manager, [email protected]

Slides and links: http://recollectionwisconsin.org/borndigital

THANK YOU!