Upload
charlene-lee
View
218
Download
0
Embed Size (px)
DESCRIPTION
APTrust Institutions Columbia University Johns Hopkins University Indiana University North Carolina State University Penn State University Stanford University Syracuse University University of Chicago University of Cincinnati University of Connecticut University of Maryland University of Miami University of Michigan University of North Carolina University of Notre Dame University of Virginia Virginia Tech
Citation preview
Can a Consortium Build a Viable Preservation
Repository?Presentation at CNI
March 31, 2014
Bradley Daigle (APTrust – University of Virginia) Stephen Davis (Columbia University)
Linda Newman (University of Cincinnati)Suzanne Thorin (APTrust – University of Virginia)Scott Turnbull (APTrust – University of Virginia)
www.aptrust.org
Academic Preservation TrustAcademic Preservation Trust, a consortium of 17 institutions, is taking a community approach in building and managing a repository infrastructure that will provide long-term preservation of the scholarly record. APTrust will also be a DPN first node.
www.aptrust.org
APTrust InstitutionsColumbia University Johns Hopkins
University Indiana UniversityNorth Carolina State
UniversityPenn State UniversityStanford UniversitySyracuse UniversityUniversity of ChicagoUniversity of Cincinnati
www.aptrust.org
University of Connecticut
University of MarylandUniversity of MiamiUniversity of MichiganUniversity of North
CarolinaUniversity of Notre
DameUniversity of VirginiaVirginia Tech
APTrust is hosted by the University of Virginia, which fully supports 5 ½ staff, including space and equipment.
Program DirectorLead EngineerJunior EngineerSystems EngineerContent Lead (1/2 time)
www.aptrust.org
Membership DuesMember dues: $20,000 annuallySupports partner meetings,
conference travel, contract and cloud services, marketing, and the web site
www.aptrust.org
What is the problem we are trying to solve?
Columbia UniversityUniversity of CincinnatiUniversity of Virginia
www.aptrust.org
Columbia University – Use Case 1Columbia University Libraries / Information Services has made commitments …to granting agencies to provide long-term
digital archiving for digital content created with grant funds
to third-party content creators to provide permanent access to born-digital content acquired from them
to continuing to collect and preserve archival collections, now partly or wholly born-digital content
to permanently preserve University-generated archival and research content
Columbia University – Use Case 2We must preserve the content of …
Local Digitization ProjectsPreservation-Related DigitizationInstitutional Repository / Data
SetsBorn Digital Archival ContentArchived Web SitesSuper Dark Archives – highly
secure
Columbia University – QuestionsWhy create our own single-institution long-
term preservation repository?Why divert scarce existing CUL/IS internal
equipment funds to storage on a permanent basis?
Why divert scarce existing CUL/IS staff time to creation, enhancement and maintenance of our own local preservation repository, permanently?
Why undergo the costs and staff investment in obtaining local TRAC certification?
Question: Why is digital preservation important to us?
Answer: We have digital collections where the original source material has deteriorated or is about to be intentionally destroyed. (Magnetic tapes, nitrate negatives considered flammable). The digital object is THE ONLY object.
Magnetic tape image by Daniel P. B. Smith. Released under the GNU Free Documentation License. http://en.wikipedia.org/wiki/File:Magtape1.jpg
Nitrate negative from Cincinnati Subway and Street Improvements (digital collection) http://drc.libraries.uc.edu/handle/2374.UC/702759
University of Cincinnati – Use Case
www.aptrust.org
University of Cincinnati – Use CaseQuestion: Why is digital preservation
important to us?Answer: We just moved a repository system from Columbus Ohio to our Cincinnati campus. 10 TBs of data, in 16 different VMDKs (virtual machine disk images) was transferred over the internet pipelineChecksums were created for each VMDK and verified upon receipt, some taking 24 hours to calculate.Checksums were also created for one-million+ files, compared with info in the repository database, and re-compared after the storage format was changed (from VMDK to NFS).
www.aptrust.org
University of Cincinnati – Use CaseQuestion: Why is digital preservation
important to us?Answer: (continued)We decided to test a full backup and restore. This took over a week, and we discovered that 16 of our digital assets were corrupt. We diagnosed the cause, adjusted, and repeated without error – but if we had not been comparing before and after checksums of all files we would not have known about the corruption. This process took a 1.5 months and offered a striking example of the care that must be taken to avoid losing data when moving large amounts of it.
www.aptrust.org
University of Cincinnati – Use CaseQuestion: Why is digital preservation
important to us?Answer: Our credibility is at stake. We want
to be believed.
www.aptrust.orgPhotograph; President Nixon with Elvis Presley; 20 Dec 1970; Richard Nixon Presidential Library and Museum, Yorba Linda, California. http://www.nixonlibrary.gov/forresearchers/find/av/photo/images/12_20_70_3.gif
University of Cincinnati – Use CaseQuestion: Why is digital preservation
important to us?Answer: (continued) We are promoting a new digital repository to
our faculty. Its raison d'être – why researchers should deposit their digital assets in this repository rather than or in addition to several short-term delivery systems on our campus – is long term persistence.
We have promised that their assets will also be preserved in a dark archive such as the Academic Preservation Trust. We have stated that preservation means bit-level integrity and format migration.
We have asserted that the Libraries’ traditional mission of preservation of the cultural record now applies to the digital scholarly record.
www.aptrust.org
University of Virginia Use Case
Integral part of our preservation and curatorial landscape
Soup to nuts process for analogue materials◦Selection◦Digitization◦Management◦Stewardship
UVa - continuedBorn Digital
◦It is all about transfer◦Disk images awaiting
arrangement◦Need and I/O space◦Digital Scholarship
Wish we had this yearsago
UVa Landscape
Local disk (please only temporary) / scratch disk
Spinning disk – still only backupLocal HSM – local tape backupAPTrust – more robust
preservation actionsDPN – dark archive
Basic Technology GoalsSimple submission packaging – BagItStrong Chain of Custody – LoggingFormat agnostic basic preservation -
FixityStrong auditing and reporting -
PREMISEasily reference items between
systems – IdentifiersSimple distribution package for
restoration - BagIt
Flow of Content in APTrust
Intellectual Object
Generic File1
Generic File2
Generic File3
Submission Bag
•Metadata (TagFiles)•Preservation Files•data/File1•data/File2•data/File3
DPN Bag
DPN Bag
DPN Bag
DPN Bag
Break apart bag and manage as separate fedora objects
Repackage to same bag format
Ingest
Restore
Bagged separately in DPN to support versioning
Related Fedora Objects
ChallengesAbstracting away from specific
repository softwareIdentifying content across
distributed systemsScaling solutions are still a mixed
bagManaging dependencies in a
consortiumDeleting content requires some
more work
Sustainability of ServiceCommon development
frameworks – HydraUse available cloud services -
AWSAlign with evolving preservation
ecosystem – OAIS & DDP◦Fedora 4◦Standards like OAIS and DDP
APTrust and TRAC CertificationAPTrust is committed to working toward
TRAC certification,APTrust is the first ever repository to be built
from the ground up taking TRAC into account.
A Certification Working Group has been established and will be advising and consulting with the APTrust staff and partners on TRAC objectives.
Initial development work is proceeding at the level of Digital Object Management and Infrastructure.
Examples of TRAC Requirements “The repository shall have an appropriate succession plan,
contingency plans, and/or escrow arrangements in place in case the repository ceases to operate or the governing or funding institution substantially changes its scope.”
“The repository shall have short- and long-term business planning processes in place to sustain the repository over time.”
“The repository shall have contracts or deposit agreements which specify and transfer all necessary preservation rights, and those rights transferred shall be documented.”
“The repository shall have the appropriate number of staff to support all functions and services.”
“The repository shall have and use a convention that generates persistent, unique identifiers.”
Academic Preservation Trust – part of the evolving national digital preservation infrastructure
“The Task Force envisions the development of a national system of digital archives, which it defines as repositories of digital information that are collectively responsible for the long-term accessibility of the nation’s social, economic, cultural and intellectual heritage instantiated in digital form.”Preserving Digital Information. Report of the Task Force on Archiving of Digital Information, commissioned by The Commission on Preservation and Access and the Research Libraries Group. May 1, 1996. Executive Summary, iii.