63
Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo Oyarce

Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

Embed Size (px)

Citation preview

Page 1: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

Distributed Digital Preservation ETD Workshop

15th International Symposium on ETDsLima, Peru

Tuesday, September 11, 2012

Dr. Martin HalbertDr. Guillermo Oyarce

Page 2: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Workshop Overview

The move to digital theses and dissertations greatly enhances the accessibility and sharing of graduate student research, but it also raises grave concerns about the potential ephemerality of these digital resources.

How can institutions best ensure that the electronic theses and dissertations that they acquire from students today will be available to both current and future researchers?

This workshop will provide attendees with a foundation in ETD lifecycle management, focusing on both workflow design and technology approaches.

It will also discuss the growing role of international collaboration in the preservation of these digital assets, with a special focus on the approach of members of the international MetaArchive Cooperative which has been in operation for nine years.

9/11/2012 Slide 2

Page 3: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Presenters

Martin Halbert Dean of Libraries, University of North

Texas President, MetaArchive Cooperative

Guillermo Oyarce Professor, College of Information,

University of North Texas

9/11/2012 Slide 3

Page 4: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Attendees

We would like to have a shared understanding of who is attending the workshop. Very briefly (1 min or less), would each person say:

1. Please state your name and institution.2. Does your university either currently

accept ETDs, or are you considering an ETD program?

3. Do you have any specific things you want to hear about?

9/11/2012 Slide 4

Page 5: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Agenda

9:00 – 9:15 Welcome, introductions, overview of workshop

9:15 – 10:00 ETD lifecycle management and the need for distributed digital preservation

10:00 – 10:45 Collaboration between NDLTD and MetaArchive for preservation of ETDs

10:45 – 11:00 Break

11:00 – 12:00 ETD lifecycle management best practices project

12:00 – 12:30 Discussion, questions, and answers

9/11/2012 Slide 5

Page 6: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

ETD Lifecycle Management and the need for

Digital Preservation Programs

Page 7: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Estimating the Growth of ETD Institutional Programs Worldwide

9/11/2012 Slide 7

Networked Digital Library of Theses and Dissertations (NDLTD) announced in 2010 that there are now more than one million ETDs available worldwide

The NDLTD further notes that institutions worldwide are rapidly adopting ETD programs for many reasons, but especially to improve graduate education and access to research findingsSource: 2010 NDLTD Announcement:

http://www.ndltd.org/find/ndltd-union-catalog-surpasses-one-million-electronic-theses-and-dissertations/

Page 8: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

CNI 2010 North American Survey

9/11/2012 Slide 8

Have an ETD Program

73%

Planning an ETD Program

6%

No ETD Program

22%

North American Survey

Source: 2010 Report of the Coalition for Networked Informationhttp://www.arl.org/bm~doc/rli-270-etds.pdf

The Coalition of Networked Information conducted a study of North American institutions in 2010

Findings showed that most institutions already have an ETD program, and that ETD programs are still being adopted

The survey indicated that institutions are now considering updating their institutional ETD policies, especially regarding embargoes

Page 9: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

Slide 9

Growing ETD Associations

2/24/12

101 internationalmembers

U.S. Associations

Page 10: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Growth of ETDs in South America

Many new ETD repositories throughout South America

New projects to index ETDs at multiple institutions

Growing adoption at universities

9/11/2012

Slide 10

Page 11: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

However, Digital Information can be Ephemeral

Web URLs have a lifecycle, and are ephemeral due to many reasons; studies estimate that the average lifespan of a webpage is only 44-75 days

Digital information can be replicated indefinitely, but only as long as systematic procedures and policies are in place for the entire lifecycle

Unlike print theses and dissertations, ETDs therefore require the implementation of new digital preservation procedures to survive in the long term

9/11/2012

Slide 11

Page 12: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

Slide 12

Key Challenges

Can we ensure that ETDs acquired from students today will be available to future researchers? In 10 years? In a century?

How will institutions address the entire life cycle of ETDs?

How will libraries identify and institutionalize the best long-term curatorial practices for this important genre of digital content?

2/24/12

Page 13: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

What is Digital Preservation? Definitions:

The series of managed activities necessary to ensure continued access to digital materials for as long as necessary. – UK Digital Preservation Coalition

Digital preservation combines policies, strategies and actions that ensure access to digital content over time. – American Library Association

9/11/2012

Slide 13

Page 14: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

Slide 14

What is “Life-Cycle Management”

A concept closely related to Digital Preservation:

“…the progressive technology and workflow requirements needed to ensure long-term sustainability of and accessibility to digital objects and/or metadata.”

- U.S. Library of Congress Definition of Life-Cycle Management”

- http://www.loc.gov/preservation/about/prd/presdig/preslifecycle.html

2/24/12

Page 15: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

Slide 15

Digital Preservation Coalition (DPC) Definition of “Life-Cycle Management”

“…the need actively to manage the resource at each stage of its life-cycle and to recognise the inter-dependencies between each stage and commence preservation activities as early as practicable.”

http://www.dpconline.org/advice/preservationhandbook/introduction/definitions-and-concepts

2/24/12

Page 16: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

Slide 162/24/12

Page 17: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Backups versus Digital Preservation

What is the difference between a schedule for tape backups and a digital preservation program?

Backups are tactical operations. Backups are typically stored in a single location (often nearby or collocated with the servers backed up) and are performed only periodically. Backups are designed to address short-term data loss via minimal investment of money and staff time resources. Backups are better than nothing, but not a comprehensive solution to the problem of preserving information over time.

Digital preservation is a strategic program. Preserving digital information over long periods requires systematic attention and planning rather than benign neglect or simple reactive daily operations.

9/11/2012

Page 18: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Two Historical Strategies for Long-Term Survival of Information

9/11/2012

Slide 18

Single Central VaultCoordinated Distributionof Multiple Secure Copies

Page 19: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Distributed Digital Preservation (DDP) Programs

Engage in three main activities that distinguish them from other preservation approaches:

1. Replication of content, 2. Distribution of these replicated

copies to distinct geographical locations, and

3. Network organization to connect these replicated copies through routine operations, including checksum comparisons and repair activities

9/11/2012

Slide 19

Page 20: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Secure and Distributed Caches of Replicated Information

Why are the characteristics of geographically distribution and security so important? Because this strategy maximizes survivability of content in both individual and collective terms:

Security reduces the likelihood that any single cache will be compromised.

Distribution reduces the likelihood that the loss of any single cache will lead to a loss of the preserved content.

By creating collaborative networks of secure and distributed preservation caches, cooperative groups can also work together on more complex issues such as format migration

9/11/2012

Slide 20

Page 21: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Motivations for DDP

9/11/2012

Slide 21

Cooperative digital preservation efforts are one answer to a study jointly prepared by the US National Science Foundation (NSF) and the UK Joint Information Systems Committee (JISC) that found:

…new collaborative relationships that cross institutional and sector boundaries could provide important and promising ways to deal with the data preservation challenge. These collaborations could potentially help spread the burden of preservation, create economies of scale needed to support it, and mitigate the risks of data loss.

- Berman, F. and B. Schottlaender, “The Need for Formalized Trust in Digital Repository Collaborative Infrastructure, NSF/JISC Repositories Workshop (April 16, 2007)

http://www.sis.pitt.edu/~repwkshop/papers/berman_schottlaender.html

Page 22: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

9/11/2012

Slide 22

Examples of Major DDP Collaborative Alliances

Page 23: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Individual Institutional Repositories versus DDP AlliancesWhat is the difference between an IR and a distributed

digital preservation program?

The purpose of the IR is a local means of managing information. The IR is an institutional approach aimed at operationally managing information flow within the institution and providing access. It typically does not attempt to securely cache prioritized content at multiple geographically dispersed sites.

DDP Alliances mobilize efforts of multiple institutions. A digital preservation program entails a geographically dispersed set of secure caches of critical information. A true digital preservation program will require multi-institutional collaboration and at least some ongoing investment to realistically address the issues involved in preserving information over time.

9/11/2012

Page 24: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

Collaboration Between NDLTD and MetaArchive for Preservation of ETDs

Page 25: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

NDLTD Digital Preservation Initiative

In 2007 the NDLTD board of directors became concerned about the need for digital preservation of ETDs

Surveys were conducted to assess the status of ETD programs and preservation

Informed by these surveys, the NDLTD leadership approached the MetaArchive Cooperative for digital preservation services

9/11/2012

Slide 25

Page 26: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

2008 ETD Preservation Survey

Survey by Virginia Tech was designed to assess whether or not there was a need for an ETD preservation alliance

Received 95 responses from research universities, primarily in North America

Survey indicated great variation in repository infrastructures and formats accepted

Most surprising finding: 72% of responding institutions reported that they had no preservation plan for the ETDs they were collecting

Survey responses indicated widespread interest in participating in a collaborative digital preservation initiative

Survey: http://scholar.lib.vt.edu/staff/gailmac/ETDs2008PreservPaper.pdf

9/11/2012

Slide 26

Page 27: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

NDLTD/MetaArchive Alliance

Based on 2008 survey responses, NDLTD leadership negotiated a strategic alliance with the MetaArchive Cooperative for digital preservation purposes

A follow-up survey jointly conducted in 2009 by MetaArchive and NDLTD identified more specific needs for collaborative distributed digital preservation (DDP) efforts

Based on the findings of these two surveys, a DDP network for ETDs from NDLTD members was created by MetaArchive starting in 2008-2009

9/11/2012

Slide 27

Page 28: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

What is MetaArchive and what led to it?

The MetaArchive Cooperative arose from planning meetings by librarians and archivists in 2002-2003 discussing concerns about preserving digital archives

There was a feeling then that we needed to do something practical to help each other preserve our digital collections

Funding was sought from the U.S. Library of Congress for a national-scale digital preservation cooperative

9/11/2012

Page 29: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

MetaArchive Cooperative

An international distributed digital preservation cooperative for digital archives

A functioning DDP network for libraries and other cultural memory organizations

Established under the auspices of and with funding from the National Digital Information and Infrastructure Preservation Program (NDIIPP) of the U.S. Library of Congress

Provides training and models for other groups to establish similar distributed digital preservation networks

Fosters broader awareness of digital preservation issues

Sustained by cooperative fee memberships, LC contracts, and other sponsored funding

9/11/2012

Page 30: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

MetaArchive Home Page

9/11/2012

Page 31: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

MetaArchive Membership (and examples of members)

50+ institutions in 3 continents United States (US Library of Congress,

Indiana State University, University of North Texas, Folger Shakespeare Library, etc.)

United Kingdom (University of Hull) Spain (Consorci de Biblioteques

Universitaries de Catalunya) Brazil (Pontifícia Universidade Católica do

Rio de Janeiro)9/11/2012

Slide 31

Page 32: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

MetaArchive Costs

Three membership levels: Preservation members: USD$3,000/year Sustaining members: USD$5,500/year Collaborative members: USD$2,500/year

plus $100/year per participating institution

Server cost: USD$4,600 /3 yearsStorage cost: USD$1/GB/year

($1k/TB/year)9/11/2012

Slide 32

Page 33: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

MetaArchive DDP Network

9/11/2012

A distributed digital preservation network for digital archives, based on LOCKSS software

286 TB network with 24 secure caches 950+ Archival Units preserved Preserving collections of 21 members and

more than 50 institutions in 4 countries Provide preservation consulting and

training

Page 34: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

LOCKSS E-Journal Preservation Network Software Developed at Stanford

University by Vicky Reich and David Rosenthal

Enables libraries to preserve subscribed electronic journal content

Used by hundreds of libraries worldwide

MetaArchive adapted this software to use for preserving digital archives, including ETDs

9/11/2012

Page 35: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Collection Variety

Format agnosticCollections

include: ETDs Newspapers Images Multimedia files Datasets etc

9/11/2012

Page 36: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

NDLTD/MetaArchive ETD Preservation Network

NDLTD members can preserve ETDs in the MetaArchive DDP network through this strategic alliance

MetaArchive provides coordination mechanism for ETDs to be safely stored in six locations on different continents

Low cost solution for ETD digital preservation

9/11/2012

Slide 36

Page 37: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Examples of NDLTD/MetaArchive Work on ETD Preservation Studied lifecycle management problems that arise in

ETD programs Analyzed a range of ETD repository structures and

developed network exchange mechanisms (IR examples: Dspace, CONTENTdm, ETD-db)

Provided simple addition mechanisms so that as new and embargoed ETDs are added, members are able to easily add them to the archive

Developed mechanisms to version content, so that if ETDs are changed/replaced, reflected in preservation copies

Determined the need for documented best practices for ETD preservation readiness, leading to new project

9/11/2012

Page 38: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Considerations for Prospective ETD Preservation Sites Working partnership between college and

libraries must be established with clear roles and responsibilities.

Quality metadata must be created. A problem is that many ETD programs have untrained students assign metadata using ad hoc metadata formats

Folder and file structure in which ETD collections are stored is important, especially since preservation will be ongoing. There will be a need to ingest ETD files each new graduation cycle. This is easiest if the storage structure is built around the graduation cycle. Grouping by year may be a simple first step.

9/11/2012

Page 39: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Considerations for Prospective ETD Preservation Sites (cont.) Issues of rights management and embargo must be

managed well. Many institutions require the possibility of selective embargoed access, because they cannot immediately provide open access to all ETDs.

Choice of institutional repository software solution is important. This determines much functionality. Outsourcing to commercial vendors is sometimes tempting, but limits your control.

Collaboration with other institutional partners can be helpful, as the work of digital preservation can be divided among multiple partners. Taking an active collaborative role (like members in MetaArchive do) helps to ensure you are driving the solution of your problems.

9/11/2012

Page 40: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Technology Development: Building on Top of LOCKSS as a Solution for Lifecycle Management

Conspectus Database (Original)• Curators enter collection level entries for collections• Meant to be used for cooperative prioritization in DDP selection and

decision-making activities

Second Generation Conspectus Database• Integrates operation of all network functions• Designed in concert with guidance from other private LOCKSS

networks (PLNs) in ways that enable re-use

SAFE Audit Toolkit Designed by Data-PASS with feedback from MetaArchive Audit/monitoring framework Designed to work with LOCKSS and

other systems (iRODS, etc)

9/11/2012

Page 41: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Examples of Archives in Subject and Genre Domains

Electronic Theses and Dissertations (inter-consortia strategic alliance with NDLTD)

Newspapers (digitized and born-digital)

Early Modern Literature (broad area, with Folger Shakespeare Library as cornerstone)

Transatlantic Slave Trade Historical Data Additional archives regularly added

9/11/2012

Page 42: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Active Collaborations with Other Efforts NDLTD (IMLS-funded ETD project) U.S. Library of Congress (national assessment of

U.S. digital preservation projects) Chronicles in Preservation (digital newspaper

preservation project) Data-PASS Alliance (developing in-common

standard and tools for Private LOCKSS Network interoperation)

San Diego Supercomputer Center Chronopolis (PLN/ SRB interoperation testing and bridges)

UNT CODA Microservices (UNT interoperation testing and bridges)

9/11/2012

Page 43: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Different Kinds of MetaArchive Members

Preservation Members are the basic members, organizations responsible for the ongoing activity of preserving digital content. Preservation sites collectively comprise a preservation network.

Sustaining Members are preservation sites that wish to participate as leaders of the cooperative, and serve on the Steering Committee

Collaborative Members are large consortia members which preserve digital archives for groups of smaller institutions (example: Consorci de Biblioteques Universitaries de Catalunya)

9/11/2012

Page 44: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Other Catalytic Efforts by MetaArchive

Workshops (examples) Policy and Planning workshop at iPres

2012 US Library of Congress Digital

Preservation Outreach & Education (DPOE) workshops

Assisted in creation of other DDP networks Alabama, Arizona, UK LOCKSS, etc

Hosting the international DDP Frameworks working group

9/11/2012

Page 45: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

ETD Lifecycle Management Best Practices Project

Page 46: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

IMLS ETD Lifecycle Management Project

There is a need to better understand, document, and address the lifecycle management challenges presented by ETDs to ensure that colleges and universities have the requisite knowledge to properly curate these new collections permanently.

A project to identify and document these best practices has been funded by the U.S. Institute of Museum and Library Services.

The project is now underway, and will be complete in October 2013. Early project findings will be shared in this section of the workshop. We would like your advice on how to best share this information with the international community.

9/11/2012

Slide 46

Page 47: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

IMLS ETD Lifecycle Management Project Partners

1. University of North Texas2. Networked Digital Library of Theses and

Dissertations (NDLTD)3. Educopia Institute/MetaArchive Cooperative4. Virginia Tech5. Rice University6. Boston College 7. Indiana State University8. Pennsylvania State University9. University of Arizona

9/11/2012

Slide 47

Page 48: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

IMLS ETD Lifecycle Management Project Goals

A. Dissemination of Guidance Documents for Lifecycle Management of ETDs

B. Production of ETD Lifecycle Management Tools

C. Creation of Educational Materials and Associated Workshop

9/11/2012

Slide 48

Page 49: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Guidance Documents for Lifecycle Management of ETDs

1. Briefing on Access Levels and Embargoes of ETDs2. Briefing on ETD Copyright Issues and Fair Use3. Guidelines for Implementing ETD Programs - Roles &

Responsibilities4. Guidelines for Collecting Usage Metrics &

Demonstrations of Value for ETD Programs5. Overview of Formats, Complex Content Objects, and

Format Migration Scenarios for ETDs6. Overview of ETD Metadata & Lifecycle Event Record-

Keeping for ETDs7. Guide to ETD Program Cost Estimation and Planning8. Guide to Options for ETD Programs

9/11/2012

Slide 49

Page 50: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Software Micro-Services

The project will develop and disseminate a set of software tools to address specific needs in managing ETDs throughout their lifecycle. ETD format recognition PREMIS metadata event record-

keeping Virus checking Digital drop box with metadata

submission functionality9/11/2012

Slide 50

Page 51: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

Slide 51

Educational Materials and Associated Workshop

Educational Materials Workshop Syllabi Training Handouts and Exercises PowerPoint presentations

Full Workshop on ETD Lifecycle Best Practices

Will use these educational materials Will be held in 2013, perhaps in

conjunction with ETD 20132/24/12

Page 52: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Implementing ETD Programs - Roles & Responsibilities

Like any other project, implementing an ETD program requires that various stakeholders be identified, and the role and responsibility of each player be specified throughout the entire course of ETD management.

Effectively engaging stakeholders in project management, and successfully coordinating participants’ roles and responsibilities, are the keys that enable an ETD program to thrive over time. Without these crucial components, an ETD program could fail at the initial planning stage, or lack continued support for further development.

Different types of stakeholders have different interests and concerns: Graduate schools Academic libraries University IT Office Students

Carefully consider how to engage all stakeholders from the beginning.

9/11/2012

Slide 52

Page 53: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Access Levels and Embargoes One of the most contested topics in ETD program planning is the

question of ETD embargoes and levels of access restriction (as evidenced by both the NDLTD/MetaArchive surveys and the 2010 CNI survey previously cited).

An “embargo” of an ETD means delaying public access to the ETD, either temporarily or permanently.

Different stakeholders are particularly concerned about embargoes. There is concern by some academic fields (notably the humanities) that depositing a thesis or dissertation in a public repository somehow constitutes publication and prevents students from subsequently developing their work into a book. (A recent NDLTD survey of publishers indicates that publishers do not consider this to be the case.)

Options for embargo of ETDs can range from extremes of none to all ETDs stored in the repository. This is one of the most important policy decisions for ETD program developers.

9/11/2012

Slide 53

Page 54: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Copyright Issues and Fair Use What understanding or agreement is in place at your

institution? What rights does the university/college exert on student work? When does research belong to the university, and when does it not?

Will ETD, as a type of student work, incorporate sponsored research? Is there an obligation to sign University Intellectual Property (IP) agreement or need for embargo?

Issues of plagiarism and intellectual property rights of others

Who on the campus can provide guidance about ETD copyright and fair use?

ETD program may introduce students to notion of themselves as authors, and their rights as authors, experience with licenses, fair use, commercial publishers, etc.

ETD program does disservice to both students and institution if we don’t provide information to make informed decisions on copyright

9/11/2012

Slide 54

Page 55: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Collecting Usage Metrics & Demonstrations of Value for ETD Programs

Libraries have a long history of evaluating and studying use of library resources and collections, and ETDs should be no exception.

Quantitative approaches are the most frequently used evaluation methods, and most often focus on download statistics. Numerical evidence of ETD usage is a very compelling indication of the utility of an ETD program.

Qualitative evaluation of ETD usage is less commonly performed, but can provide more nuanced information. These techniques involve studying and collecting a variety of empirical information such as case study and interviews, along with interactional and visual observations.

Usage reports of all kinds should be prominently featured on the ETD program website, and easily reviewed by all users of the service.

Usage data can make a strong case for ETD program support to university administrations.

9/11/2012

Slide 55

Page 56: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Formats, Complex Content Objects, and Format Migration Scenarios for ETDs

Many institutions may wish to mandate which formats are to be used for ETD deposits. For example, many ETD programs mandate that the primary item deposited be some form of PDF, sometimes with format checking of the specific characteristics of the PDF.

For any files that are included as supplementary files to the ETD itself, while some flexibility is necessary, the institution should consider providing guidelines to the students.

Both primary and supplementary files should be checked for format validity and viruses upon deposit. Ideally, a fixity check should be performed at some point in the deposit process (MetaArchive does this).

Format migrations are anticipated by many ETD repositories. Format migrations may be manually batched or automated, depending upon how the institution wishes to structure trigger events.

9/11/2012

Slide 56

Page 57: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

ETD Metadata & Lifecycle Event Record-Keeping for ETDs

Responsibility for assigning metadata to ETDs is a critically important responsibility, and structures much of the ongoing management of ETDs.

Metadata may be assigned by librarians, student authors, student workers, or a mix of all of these. Recommendation is for librarians to provide quality control.

Process and nature of ETD metadata creation may be heavily influenced by the repository tool. Metadata should always accompany item if it is replicated (MetaArchive best practice)

An ETD specific metadata scheme has been developed by NDLTD and is being updated on an ongoing basis for use by ETD repositories. Metadata for deposited ETDs should be as thorough as possible.

PREMIS is a metadata standard for tracking transitions in the lifecycle of digital objects. It can be used to update ETD records (project is experimenting with this).

9/11/2012

Slide 57

Page 58: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

ETD Program Options and Cost Planning

There are a large number of options to consider in planning an ETD program, and costs of the program will depend on these many decisions.

Some institutions have completely outsourced the ETD program to external vendors. This may or may not be an attractive option; while it reduces the work required locally it has significant costs and requires relinquishing some control to the vendor.

Many institutions manage ETDs as part of their larger institutional repositories, thereby combining infrastructures. Digital preservation programs for ETDs can be accomplished by partnering with other institutions, either international alliances (like MetaArchive) or consortia.

Under any circumstance, a careful plan should be prepared in advance, with staffing, system, and other costs identified. There will undoubtedly be unexpected changes, but it gives you a place to start.

9/11/2012

Slide 58

Page 59: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

International Information Exchange

The NDLTD is interested in fostering international exchange of information on ETD best practices through events like this workshop

We are interested in hearing from you about what kinds of information would be most useful, whether the items discussed today or other topics.

If you would be interested in contributing to this international information exchange in some way (information about programs in your country, translation, etc.), please let us know and we will convey this to the NDLTD board.

9/11/2012

Slide 59

Page 60: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Aligning National Approaches to Digital Preservation (ANADP) – Edited Volume

On May 23-25 2011, more than 125 delegates from more than 20 countries gathered in Tallinn, Estonia, for the “Aligning National Approaches to Digital Preservation” conference.

This event explored how to create and sustain international collaborations to support the preservation of our collective digital cultural memory.

Organized and hosted by the US Library of Congress and others, this gathering established a strong foundation for future collaborative efforts in digital preservation.

This publication contains a collection of peer-reviewed essays that were developed by conference panels and attendees in the months following ANADP.

Above all, it highlights the need for strategic international collaborations to support the preservation of our collective cultural memory.

URL: http://educopia.org/publications/ANADP

9/11/2012

Slide 60

Page 61: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Workshop Summary

ETD programs are being implemented by most universities

Effective ETD programs require attention to the entire lifecycle of the ETDs from creation to long term preservation

Digital preservation programs are strategic commitments to long term survival of ETDs; collaboration with peer institutions is often helpful in this activity

As part of such collaborations, many institutions have found it useful to share information on the most successful practices noted in establishing ETD programs.

This workshop is one such effort to share best practices; the NDLTD and MetaArchive would like to assist in other such endeavors.

9/11/2012

Slide 61

Page 62: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

Discussion, Questions, and Answers

Page 63: Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo

DDP Workshop for ETDs

Contact Information

Contact information:

Martin Halbert ([email protected]) Guillermo Oyarce ([email protected]) Katherine Skinner, Educopia/MetaArchive Director

([email protected])

9/11/2012

Slide 63