A Digital Preservation Repository for Duke University Libraries

  • View

  • Download

Embed Size (px)


A Digital Preservation Repository for Duke University Libraries. Jim Coble Digital Repository Developer j im.coble@duke.edu. Duke University. Research university in Durham, NC, USA 14,500 students, graduate and undergraduate Duke University Libraries Centrally administered library system - PowerPoint PPT Presentation

Text of A Digital Preservation Repository for Duke University Libraries

A Digital Preservation Repository for Duke University Libraries

A Digital Preservation Repository for Duke University LibrariesJim CobleDigital Repository Developerjim.coble@duke.eduOpen Repositories 20131Duke UniversityResearch university in Durham, NC, USA14,500 students, graduate and undergraduateDuke University LibrariesCentrally administered library system240 staff6 million+ volumesProfessional school libraries serving schools of Business, Law, Divinity, and Medicine

Open Repositories 2013Initial Goal: Preservation RepositoryFocus: Preservation InfrastructureImprove our processes around preservation of digital assetsReduce initial complexity by ignoring discovery and access issuesFirst Use Case: Digital Collections ProgramFamiliar with this contentDescriptive and technical metadata already existsSeparate discovery and access interface already exists

Open Repositories 2013Digital Collections ProgramDigitized content, in-house and out-sourced380,000 archival master files (~ 20 TB)Primarily still images, with some audio and videoLocally developed public access interfacehttp://library.duke.edu/digitalcollections/

Open Repositories 2013Current Scenario (Typical)Archival master filesProduced by librarys Digital Production Center (DPC)Stored on filesystemACE-AM for periodic checksum validationDescriptive metadataProduced by Cataloging and Metadata Services departmentMaintained in CONTENTdm (or elsewhere)Technical metadataGenerated and maintained by DPCNothing ties these elements together except local knowledge and a DPC identifierOpen Repositories 2013Initial Project GoalOpen Repositories 2013Descriptive MetadataPreservation RepositoryDPC Technical MetadataArchival Master FilesTechnologyFedora Commons RepositoryHydra Project FrameworkFedora (repository)Solr (index)Blacklight (discovery and access)Hydra-Head (object creation / management)Open Repositories 2013ResourcesExperience on prior project (abandoned before production)FedoraModeling digital collections contentTwo developersPart-time, though proportion of time increased throughout this projectWeb application development experience (Django/Python, Java servlets)No prior Ruby or Rails experienceOpen Repositories 2013TimelineSpring 2012: Prototype using Fedora command line utilities and Django using found timeJune 2012: Project formally launchedJuly 2012: OR 2012; growing interest in Hydra ProjectOctober 2012: HydraCamp at Penn State; Hydra-based development begins in earnestFebruary 2013: Initial pilot completedApril 2013: Duke becomes Hydra PartnerJune 2013: Production preservation repository launched with two collections ingested

Open Repositories 2013Content ModelsCollectionCollection-level descriptive metadataAggregated metadata about items / components in some casesItemItem-level descriptive metadataComponentDigital content file (e.g., TIFF image file)Technical metadataTargetExternal digitization target imageDigital content file for target imageOpen Repositories 2013Additional ModelsAdminPolicyUsed in Hydra Framework to specify access rightsIndividual objects are governed by a particular AdminPolicyPreservationEventRecords PREMIS Event data for IngestIngest validationPeriodic fixity checksAssociated with object to which it applies

Open Repositories 2013Metadata PracticesCollect metadata available at time of ingestCONTENTdmMarcXML from library catalogDigitization Guide from DPCetcStore collected metadata in its native formats in object datastreamsNormalize one set of descriptive metadata into Qualified Dublin Core for indexing and displayOpen Repositories 2013Batch IngestProblem to solve380,000 archival master files (~ 20 TB) spanning 8 years of digitization workSome areas of relative consistency across the collections but also some divergencesNeeded flexible batch ingest mechanismSolution: Ingest ManifestEnumerates the objects to be ingested in any given batchProvides information about nature and location of content files, metadata, and related objects

Open Repositories 2013Ingest ManifestYAML File:

Open Repositories 2013

Ingest Processor v1.0Reads manifest filePerforms any needed pre-ingest stepsCreates a repository object for each object in turnAdds appropriate datastreams and relationshipsCreates thumbnail image from uploaded digital contentCreates Ingestion PreservationEventValidates each ingested object in turnCompares repository object with manifestValidates content file against external checksum if availableCreates Validation PreservationEvent and first Fixity Check PreservationEvent

Open Repositories 2013Validation PreservationEventIn PreservationEvent eventMetadata datastream

Open Repositories 2013

Export SetsExample service built on top of repository infrastructureDelivering archival master files to authorized patrons upon requestCurrent process is manualDPC staff locate master file(s) on filesystemPossibly create a zip filePlace file(s) in pick-up location or copy onto CD, DVD, etc., for deliveryPre-Hydra prototype implementation was Django web app using Fedora REST API

Open Repositories 2013Export SetsBuilt on bookmark functionalityStaff member searches for content-bearing objects of interest and bookmarks themExport set can be created from bookmark listContent files are retrieved from the repository and bundled into a zip fileStaff member can download and deliver to patronZip file includes a README manifest listing the content files with basic metadata

Open Repositories 2013Export SetsExport sets can be named and stored for re-useBy default, zip file is also storedStaff member can delete the zip file (to save space) and re-generate it as needed from the export set recordWhen no longer needed, export set record can be deleted

Open Repositories 2013Open Repositories 2013ScreenshotWalk-ThroughRepository Home PageOpen Repositories 2013

Collection IndexOpen Repositories 2013

Collection Content: ItemsOpen Repositories 2013

Item Contents: ComponentsOpen Repositories 2013

Item MetadataOpen Repositories 2013

Collection FCRepo ViewOpen Repositories 2013

Creating Export SetOpen Repositories 2013

Creating Export SetOpen Repositories 2013

Export Set CreatedOpen Repositories 2013

Export Set Zip FileOpen Repositories 2013

Future PlansVersion 1.1 By September 2013Interface improvementsRefactored batch ingestFuture enhancementsIngest (batch and individual) performed by library staffEditing capabilityFuture Use CasesFaculty scholarship, electronic theses and dissertationsElectronic records and other born-digital contentDatasetsImage library for teaching / learning

Open Repositories 2013Questions?Jim Coblejim.coble@duke.eduDigital Repository DeveloperDuke University Libraries


Open Repositories 2013