16
CHRONOPOLIS TM and the DIGITAL PRESERVATION IMPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December 2009

C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

Embed Size (px)

Citation preview

Page 1: C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

CHRONOPOLISTM and the DIGITAL PRESERVATION

IMPERATIVEBrian E. C. Schottlaender

The Audrey Geisel University Librarian

ECAR Symposium, 4 December 2009

Page 2: C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

THE BRANSCOMB* PYRAMID FOR COMPUTING

Campus,Research Lab

High-end

Small-Scale, Home

FACILITIES APPLICATIONS

“Leadership-class” facilities. Maintained by

national labs and centers. Substantive

professional workforce

Community codes and professional software.

Maintained by large groups of professionals (NASTRAN,

Powerpoint, WRF, Everquest)

Mid-range university and research lab facilities.

Maintained by professionals and non-

professionals.

Community software and highly-used project codes. Developed and maintained by some professionals and

academics (CHARMM, GAMESS, etc.)

Private, home, and personal facilities. Supported by users

or their proxies.

Research and individual codes.

Supported by developers or their proxies.

*Chairman, NSF Blue-Ribbon Panel on High-Performance Computing (1993)

BECS|||ES09

Page 3: C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

THE BERMAN* PYRAMID FOR DATA

Campus, Library,

Data Center

Small-Scale, Home

FACILITIES COLLECTIONS

National-scale data repositories, archives, and libraries. High capacity,

high reliability environment maintained by professional

workforce.

Reference, important, and irreplaceable data

collections (PDB, PSID, Shoah, HathiTrust, etc. )

Local libraries and data centers. Commercial

data storage. Medium capacity, medium-high

reliability. Maintained by professionals.

Research data collections. Developed

and maintained by some professionals and

academics

Private repository. Supported by users

or their proxies. Low-medium

reliability, low capacity.

Personal data collections.

Supported by developers or their proxies.

*VC/R, RPI

High-End

BECS|||ES09

Page 4: C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

TICK … TICK … TICK … TICK … TICK

• There is a pressing need to preserve digital assets that represent the intellectual capital of scientific disciplines, educational communities, and government and cultural agencies.

• Many of these assets are increasingly at risk, whether as a consequence of:– lack of financial support; – evolution of storage and delivery

systems, access mechanisms, or encoding formats;

– calamity; or,– neglect. BECS|||ES09

Page 5: C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

ISSUE: FRAGILITY

• Dynamic: – May be revised or updated

instances, versions, editions – May change cumulatively or

interactively e.g., contributions to a listserv

– May be available in various “views”

• More easily altered [without recognition]

• More easily corrupted• Storage media have shorter life

spansBECS|||ES09

Page 6: C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

ISSUE: COMPLEXITY

• Linkages between and amongst them may change

• Increasingly, data and associated metadata cannot—or should not—be separated

• Some resources, like multimedia, are so closely linked to the software and hardware technologies that render them that they cannot be used outside those proprietary environments

• Need to be renderable on a variety of delivery devices

• Require access technologies that are changing at an ever-increasing pace

BECS|||ES09

Page 7: C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

ISSUE: SELECTION • Intellectual question What is “worth” archiving?

– Business content (e.g., patient records)– Scientific content (e.g., PDB)– Scholarly content (e.g., Electronic Cultural Atlas)– Cultural content (e.g., Shoah)– “Official” content (e.g., Govt. docs.)

• Physical question What is the ‘archival unit?’ – What is its extent? – What are its boundaries?

• Links?• Content of links?

• Intellectual and physical selection dimensions are not separate, but interrelated. E.g., determination of extent of digital object is necessary before harvest-based selection can take place.  

• Selection criteria cannot be generalized because they are dependent on the goals and policies of the particular stakeholder.

BECS|||ES09

Page 8: C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

QUESTIONS, QUESTIONS, QUESTIONS

• Who gets to decide what’s worth preserving?

• Who’s responsible for preserving it?– Where?– How?– For how long?

• Who gets access?– Why?– When?

• Who pays?– Content creators?– Content users?– The government?

BECS|||ES09

Page 9: C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

IT TAKES A VILLAGE …

“The successful preservation of valuable

digital assets will require the expertise and

collaboration of many institutions, both public

and commercial, to help craft the reliable,

economically sustainable, and trusted

environments necessary for housing,

managing, and ensuring our global knowledge

over time.”— Fran Berman et al. “The Need to Formalize Trust Relationships in Digital Repositories.” EDUCAUSE Review, Vol. 43, No. 3 (May/June 2008)

BECS|||ES09

Page 10: C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

WHO IS CHRONOPOLIS?• Chronopolis is being developed by

a national consortium, with initial funding from the Library of Congress/NDIIPP

• Chronopolis partners are :– San Diego Supercomputer Center

(SDSC) and the UC San Diego Libraries (UCSDL)

– University of Maryland Institute for Advanced Computer Studies (UMIACS)

– National Center for Atmospheric Research (NCAR) in Boulder, Colorado

BECS|||ES09

Page 11: C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

WHAT IS CHRONOPOLIS?

• Digital preservation environment using a data grid framework

• Designed to leverage capabilities at SDSC/UCSDL, UMIACS, and NCAR

• Emphasizes heterogeneous and redundant data storage systems

• Has a current storage capacity of 50 TB x 3 nodes

• Has three geographically distributed copies of data

• Includes detailed auditing and monitoring

BECS|||ES09

Page 12: C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

PARTNER ROLES

• SDSC, UMIACS, NCAR– Storage and network support– Complete copy of all data– Network testing– SRB support– Advanced data services

• UCSD Libraries– Metadata services, including analysis and specification

BECS|||ES09

Page 13: C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

DATA PROVIDERS• California Digital Library

– 12 TB of data– Crawls of political and government Web sites– ARC files, uniform size– BagIt protocol for data transfer

• Inter-University Consortium for Political and Social Research (ICPSR)– 10 TB of data– 40+ years of social science research– Millions of files– Already using SRB

• North Carolina State University Libraries– 6 TB of data– State and local geospatial data– BagIt protocol for data transfer

• Scripps Institution of Oceanography– 1 TB of data– 50 years of data from SIO research cruises– Already using SRB

BECS|||ES09

Page 14: C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

CORE CHRONOPOLIS TOOLS

• Storage Resource Broker (SRB)Data Grid Management System that

provides hierarchical logical namespaces to manage the organization of data

• Auditing Control Environment (ACE)Software to protect the integrity of digital

assets in the long term• SRB Replication Monitor

Webapp that watches registered directories and ensures that copies exist at designated mirrors

• BagItHierarchical file packaging format for the

exchange of generalized digital content• Web Portal

Supplies data providers with transparent, in-depth look at their holdings in all locations

BECS|||ES09

Page 15: C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

COMING UP

• Updated auditing procedures• Updated Web portal• Automation of collection ingest• New collections and storage

nodes• TRAC certification• Fully-fledged business model

BECS|||ES09

Page 16: C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December

http://chronopolis.sdsc.edu

[email protected]

BECS|||ES09