SfS-Getty, 4/25/03
Digital Longevity
Howard Besser
http://www.gseis.ucla.edu/~howard
SfS-Getty, 4/25/03
Digital Longevity
• Content Mgmt Systems• Content Format Standards (Image Identification,
Standards & other Metadata, Best Practices)• Longevity & Preservation Repositories• Digital Preservation activities• Other types of metadata standards• Special problems with Cultural Heritage Material
SfS-Getty, 4/25/03
Content Management Systems
SfS-Getty, 4/25/03
Content Management Systems• Used to…– Create and edit digital objects– Import & export digital objects– Manage objects (acquire, inventory, validate)
• Content Management Systems will Vary Depending on the Materials they Support– Metadata schemes will vary
• Descriptive Metadata– MARC/MODS/Dublin Core for Books– Code books for numeric datasets
• Administrative Metadata– Images, audio, test, etc.
SfS-Getty, 4/25/03
Content Format Standards (Images)
SfS-Getty, 4/25/03
Images-
• Content Format & Best Practices
• Identification/Provenance
• Technical Imaging metadata
• Special discovery & descriptive metadata
SfS-Getty, 4/25/03
Best practices
Use/Users/Collection:BenchmarkingMasters vs. DerivativesScanning-Administrative Metadata-Structural Metadata-
SfS-Getty, 4/25/03
Scanning Best Practices
• Think about users (and potential users), uses, and type of material/collection
• Scan at the highest quality that does not exceed the likely potential users/uses/material
• Do not let today’s delivery limitations influence your scanning file sizes; understand the difference between digital masters and derivative files used for delivery
• Many documents which appear to be bitonal actually are better represented with greyscale scans
• Include color bar and ruler in the scan
• Use objective measurements to determine scanner settings (do NOT attempt to make the image good on your particular monitor or use image processing to color correct)
• Don’t use lossy compression• Store in a common (standardized)
file format• Capture as much metadata as is
reasonably possiple (including metadata about the scanning process itself)
SfS-Getty, 4/25/03
Why Scale is important
SfS-Getty, 4/25/03
Identification/Provenance (Images)-
The number of variant forms of a work can be enormous Image Families A digital image frequently has many layers of parentage Information about the parentage that can indicate the
quality and veracity of the image (Dublin Core "Source" and "Relation")
how to deal with different versions derived from the same scan or different encoding schemes
Vocabulary Standards to express this
SfS-Getty, 4/25/03
The number of variant forms of a work can be enormous
different views of the same objectdifferent scans of the same photodifferent resolutionsdifferent compression schemesdifferent compression ratiosdifferent file storage formatsdifferent details of the same image...
Image Families
SfS-Getty, 4/25/03
Identification/Provenance
how to deal with different versions (browse, hi-res, medium res) derived from the same scan or different encoding schemes (TIFF, PICT, JFIF)
Vocabulary Standards to express this– VRA Surrogate Categories– CIMI's "Image Elements”
SfS-Getty, 4/25/03
Incorporate parts of Functional Requirements for Bibliographic
Records (FRBR)
• work
• expression
• manifestion
• item
• (and push into “change history” section of Technical Image Metadata)
SfS-Getty, 4/25/03
NISO/DLF Technical Image Metadata Workshop--4/99
(Z39.87-2002 draft)
create metadata needed to manage images in digital repositories over long periods of time (full life-cycle mgmt)
document image provenance & historyensure that the images will be rendered
accurately on any output device
SfS-Getty, 4/25/03
Technical Image Metadata
Focus on Metadata that may prove helpful for
managementusepreservation...
SfS-Getty, 4/25/03
Technical Image Metadata
In Scope
still, bit-mapped pictorial imagesscanned/reformatted images (+ born digital)
SfS-Getty, 4/25/03
Technical Image Metadata
Out of Scope
vector imagesmoving imagesimages of OCR-able textstructural and hierarchical relationships
between imagesrights management, terms of use(authenticity/security)
SfS-Getty, 4/25/03
Technical Image Metadata
Technical Image Metadata-Z39.87
Image parameters (MIME type, compression, colorspace & profile, …)
Image Creation (source, capture info, etc.)Image performance assessment (sampling,
colormap, whitepoint, target data, etc.)Change history (source, processing, etc.)
SfS-Getty, 4/25/03
Technical Image Metadata
Technical Image Metadata-Z39.87
additional XML implementation schema (MIX)
SfS-Getty, 4/25/03
Other Metadata
• Description of depiction/surrogate (What VRA calls its "Surrogate Categories")
• Description of original object
• Rights and Reproduction Information
• Location Information
• VRA Core, LCSH, TGM, AAT, ULAN, TGN, DOI, <indecs>, ...
SfS-Getty, 4/25/03
Longevity & Preservation Repositories
SfS-Getty, 4/25/03
Digital Preservation-
The ProblemPreservation RepositoriesPreservation MetadataOther Digital Preservation ActivitiesSpecial concerns of Cult Heritage
community
SfS-Getty, 4/25/03
Serious Longevity Problems
What we know from prior widespread digital file formats
Previous formats required little ongoing intervention (remote storage facilities, Iron Mtn); digital formats require intense ongoing management
The Short Life of Digital Info-
SfS-Getty, 4/25/03
The Short Life of Digital Info: Digital Longevity Problems
Disappearing InformationThe Viewing ProblemThe Scrambling ProblemThe Inter-relation ProblemThe Custodial ProblemThe Translation Problem
SfS-Getty, 4/25/03
The Viewing Problem
Digital Info requires a whole infrastructure to view it
Each piece of that infrastructure is changing at an incredibly rapid rate
How can we ever hope to deal with all the permutations and combinations
SfS-Getty, 4/25/03
The Scrambling Problem
Dangers from:
Compression to ease storage & deliveryContainer Architecture to enhance digital
commerce
SfS-Getty, 4/25/03
The Inter-relation Problem
-Info is increasingly inter-related to other info
-How do we make our own Info persist when it points to and integrates with Info owned by others?
-What is the boundary of a set of information (or even of a digital object)?
SfS-Getty, 4/25/03
The Custodial Problem
In the past, much of survival was due to redundancy
How do we decide what to save?Who should save it?
Mellon-funded E-Journal Archives
How should they save it?-
SfS-Getty, 4/25/03
The Custodial Problem:How to save information?
Methods for later accessRefreshingMigrationEmulation
Issues of authenticity and evidence
SfS-Getty, 4/25/03
The Translation Problem
Content translated into new delivery devices changes meaning– -A photo vs. a painting– -If Info is produced originally in digital form in
one encoded format, will it be the same when translated into another format?
– Behaviors
SfS-Getty, 4/25/03
Older Longevity Projectshttp://sunsite.berkeley.edu/Longevity/
CPA Task ForceGetty “Time & Bits” Conference & Follow-ups-Preservation experiments in US and Europe
NEDLIB, CURL, Michigan
Internet ArchiveLong Now
SfS-Getty, 4/25/03
Preservation Repositories:Projects based on OAIS Model
CEDARSNEDLIBPandoraCDLOCLC/RLG Working Group on
Preservation Metadata, Attributes of a Trusted Digital Repository, August 2001-
SfS-Getty, 4/25/03
Preservation Metadata
OCLC/RLG Working Group on Preservation Metadata, Preservation Metadata for Digital Objects: A Review of the State of the Art, January 31 2001
OCLC/RLG Working Group on Preservation Metadata, A Recommendation for Content Information, October 2001
SfS-Getty, 4/25/03
Preservation Repositories:Open Archival Info System Model
Producer
Management
Consumer
SfS-Getty, 4/25/03
Preservation Repositories:Open Archival Info System Model
High-level reference model describing submission, organization and management, and continuing access
Conceptual framework for different organizations to share discussions with a common language
Producers, consumers, management, actual repository SIP, DIP, AIP AIP consists of data objects plus representation info
(Content, Preservation Description, Packaging, Descriptive)
Originally developed for Space Science community
SfS-Getty, 4/25/03
Preservation Repositories -- AIP Metadata
• Preservation Description Info– reference info
– context info
– provenance info
– fixity info
• Packaging Info• Descriptive Info• Content Info
SfS-Getty, 4/25/03
OCLC/RLGDigital Repository Attributes
• Administrative responsibility
• Organizational viability
• Financial sustainability
• Technological suitability
• System security
• Procedural accountability
SfS-Getty, 4/25/03
OCLC/RLGSelected Recommendations
• Policies, Certification processes, Risk management, Persistent ID, Migration/Emulation experiments
• Stakeholders meet to decide how to describe what is in a dig repository
• Examine special properties of particular classes of digital objects
• Technical standards for exchange and interoperability btwn repositories
• Develop projects and case studies• Copyright issues
SfS-Getty, 4/25/03
Other Digital Preservation Activities-
LC Natl Dig Info Infrastructure & Preservation InterPARES Emulation Projects E-Journal Archiving ERPANET Persistent Naming
SfS-Getty, 4/25/03
LC’s National Digital Information Infrastructure and
Preservation Program• Authorized Dec 2000• LC, Dept of Commerce, NARA, White House
Office of Sci & Tech Policy• with help from CLIR, NLM, NAL, OCLC, RLG• Ongoing collab process• Commissioned papers on preserving: the Web,
periodicals, digital sound, E-Books, Digital TV, Digital Video
SfS-Getty, 4/25/03
InterPARES International Research on Permanent Authentication Records
in Electronic Systems
• Ongoing international archival world project examining how to make electronically-generated records last over time
• Developing the theoretical and methodological knowledge needed, then will formulate model policies, strategies, and standards
• In 2003 was extended to include images and rich media
SfS-Getty, 4/25/03
Emulation Projects
• CAMiLEON (Michigan/Leeds)
• NEDLIB
SfS-Getty, 4/25/03
E-Journal Archiving
• Issues– License, don’t own; may not be even able to obtain right to make archival
copy
– Increasingly no paper back-up at all
– Usually we don’t have the important redundancy factor
• Mellon funded projects (2001)– Yale, Harvard, Penn working w/individual publishers
– Cornell, NYPL--specific disciplines
– MIT exploring characteristics that change (dynamic)\
– Stanford--archiving software tools
SfS-Getty, 4/25/03
Electronic Resource Preservation and Access NETwork (ERPANET)
• Best practices and skills development for digital preservation of cultural heritage and scientific objects
• 3 year project launched Nov 2001; 1.2 million Euros
SfS-Getty, 4/25/03
Persistent Naming
URNs Handles PURLs Re-directs
SfS-Getty, 4/25/03
Other Elements-
• Actors Metadata
• Other Metadata
• Preserving Electronic Art
SfS-Getty, 4/25/03
Reference Modelsfor
Digital Libraries:Actors and Roles
DELOS/NSF Working Group
http://www.delos-nsf.actorswg.cdlib.org/
SfS-Getty, 4/25/03
NSF/DELOS Actors/Roles Project
• Classes of Actors, including– Persons– Organizations– automata
• Roles & implications– Production– Dissemination– Management– use
SfS-Getty, 4/25/03
Multimedia & Collaborative Authorship imply
• Not only:– Authors– Editors– Publishers
• But also creators of– Text– Illustrations– Composers– Musicians...
SfS-Getty, 4/25/03
And goes beyond conventional authors
• Others that are part of digital library process– Users– Catalogers– Reference librarians
• Even other groups/entities– Software agents– Mediators– Special rights holders...
Borbinha’s “naive tentative sketch” of the problem...
User Registered
AnonymousLibrarian
Agent
Creator Editor
Distributor
Preservation
Publication
Licensing Acquisition
RegistrationDissemination
Search
Digital Library
Access
SfS-Getty, 4/25/03
Benefits for
• Linking metadata to authority records
• Rights management
• Privacy protection
SfS-Getty, 4/25/03
Deliverables• Workshop proceedings: proceedings with invited contributions and papers selected from a call, intended to be a reference source for the current state of the art.
• White paper:– Definition and introduction to the problem.– Description and analysis of the requirements.– A proposal to the community for a reference model,
focusing on definitions of key concepts, terminology, classes of agents, services, relationships, etc.
– Proposals for an international agenda for further technical and collaborative developments.
SfS-Getty, 4/25/03
Core groupDELOS (Europe)• José Borbinha, National Library of
Portugal (DELOS coordinator)• Michel Mabe, Elsevier Science, UK
(Publishing industry)• Peter Mutschke, Social Science
Information Centre, Germany (Software agents, Information Retrieval)
• Hans-Jörg Lieder, Berlin State Library, Germany (LEAF project)
• Gunnar Karlsen, University of Bergen, Norway (Archives)
WIPO – World Intellectual Property Organisation
• Glenn Macstravic
NSF (USA)• John Kunze, University of California,
USA (NSF coordinator)• Barbara Tillett, Library of Congress,
USA (Libraries)• Becky Dean, OCLC, USA (Libraries
services)• Angela Spinazze, CIMI/RLG, USA
(Museums)• Howard Besser, University of
California, USA (Multimedia and digital art production)
DCMI - Dublin Core Metadata Initiative
• Warwick Cathro, National Library of Australia
SfS-Getty, 4/25/03
Work planPhase 1: Starting (March - April 2002)• Tuning objectives, scope, and action plan• Identification of reference sources • Call for contributions to the workshop
Phase 2: Internal Discussion (May - June 2002)• Analysis of the problem• Draft paper
Phase 3: Public Discussion (July - October 2002)• Expose the draft paper. Promote open public discussion • Workshop in Portugal (July 3-5). Workshop report • Draft paper (second version)
Phase 4: Conclusions (November - December 2002)• Review of the work done...• Final report
SfS-Getty, 4/25/03
... Actors and Roles ???
SfS-Getty, 4/25/03
Data Structures:The VRA Core
28 elements specifically for visual resource collections
Work Description Categories-Visual Document Description Categories-http://www.oberlin.edu/~art/vra/dsc.html
SfS-Getty, 4/25/03
VRA Core:Work Description Categories
Work typeTitleMeasurementsMaterialTechniqueCreatorRoleDateRepository nameRepository place
• Repository number• Current site• Original site• Style/period/group/
movement• Nationality/culture• Subject• Related work• Relationship type• Notes
SfS-Getty, 4/25/03
VRA Core:Visual Document Description
CategoriesVisual document typeVisual document formatVisual document measurementsVisual document dateVisual document ownerVisual document owner numberVisual document view descriptionVisual document subjectVisual document source
SfS-Getty, 4/25/03
Data Value Metadata(vocabularies)
LCSHTGMAATULANTGNVRA Core
SfS-Getty, 4/25/03
LCSH
very general
SfS-Getty, 4/25/03
Thesaurus for Graphic Materials
designed for subject indexing of pictorial materials, particularly large general collections of historical images
for cataloging and retrievalgood for general audiences and broad
approaches to the materialTGM-I: Subject Terms & TGM-II: Genre and
Physical Characteristic Termshttp://lcweb.loc.gov/rr/print/tgm/toc.html
SfS-Getty, 4/25/03
AAT
120,000 termsfor describing objects, textual materials,
images, architecture, and material culture from antiquity to present
large and complexhttp://www.getty.edu/gri/vocabularies/
SfS-Getty, 4/25/03
ULAN
name authorityhttp://www.getty.edu/gri/vocabularies/
SfS-Getty, 4/25/03
Thesaurus of Geographic Names
over 1 million recordshierarchical and globalthroughout historymost records include coordinates and
descriptive notes
SfS-Getty, 4/25/03
Metadata for Digital Commerce
DOI<indecs>-
SfS-Getty, 4/25/03
<Indecs>
formal structure for describing and uniquely identifying intellectual property itself, the people and businesses involved in its trading, and the agreements which they make about it (primarily for publishing, music, and visual arts)
will develop high-level specifications for the services that will be required to implement a global IP trading system based on this <indecs> generic data model
focus is on encoding rights at a high level, not on resource discovery likely to involve metadata schma registration and directory to allow
interoperation of personal identifiers for rightsholders and users supported by EEC DG-13 First meeting July 1999 http://www.indecs.org/
SfS-Getty, 4/25/03
What’s special about Cult Heritage Materials?
• Images & rich media
• Inter-relationships btwn parts
• For Contemporary Art: What is the Work?-
SfS-Getty, 4/25/03
LeWitt: Wall Drawing 340
SfS-Getty, 4/25/03
Installing LeWitt
SfS-Getty, 4/25/03
LeWitt Install Directions
SfS-Getty, 4/25/03
Complexity of Rich Media
• Works often have artistic nature (including video games)
• Enormous number of elements can, at times, be very important to preserve (pacing, original artifact, elements used to construct the artifact)
• Too complex to save every one of these aspects for every type of material
• Importance of saving documentation
SfS-Getty, 4/25/03
What can we do specific to Electronic Art?
• Works themselves may no longer even exist; in many cases, what we can save amounts to forensic evidence
• Enormous number of elements can, at times, be very important to preserve (pacing, original artifact, elements used to construct the artifact)
• Too complex to save every one of these aspects for every type of material
• Importance of saving pieces, representations, and documentation
• Involve the artists to capture their intentions
• Importance of Standards
• Familiarize ourselves with recent conservation developments (Who Knows?, TechArcheology, Tate, IMAP)
SfS-Getty, 4/25/03
Standards for encodingartists intentions
(group efforts w/i Cult Heritage community)
• Artists Interviews Project, Netherlands Institute for Cultural Heritage 1998-1999, Modern Art: Who Cares (http://www.icn.nl/english/6.4.2.html)
• TechArcheology: A Symposium on Installation Preservation (SFMOMA)
• More recent SFMOMA/Tate collaborations• IMAP• Guggenheim’s Variable Media
SfS-Getty, 4/25/03
Structural Metadata Standards for Encoding Multimedia-
(no time for details)
• SMIL
• MPEG 4
SfS-Getty, 4/25/03
A few questions our community should address
• Special issues raised by non-library institutions
• Special issues raised by images and rich media
• What is the work (or salient points we need to preserve)?
• Bring the arts communities (artist intent, BAVC) together with the preservation repository communities and the preservation metadata communities
• Specifically get Cult Heritage communities involved with the selected OCLC/RLG recommendations
• Get cult heritage groups started on working to make sure that structure standards incorporate our works
• What organizations will take responsibility to save today’s digital “ephemeral” materials (online ‘zines, arts discussion groups, etc.)?
SfS-Getty, 4/25/03
Digital Repository Traditions & Services require
SustainabilityInteroperabilityAccess
And all of these require Standards and Metadata
Digital Longevity
Howard Besser, NYU Moving Image Archiving & Preservation Program
• http://www.firstmonday.dk/issues/issue7_6/besser/• Baca, Murtha (ed). Introduction to Metadata, Los Angeles: Getty Information Institute, 1998• http://www.getty.edu/gri/standard/intrometadata/• http://www.gseis.ucla.edu/~howard/Metadata/UC-May00/• http://sunsite.berkeley.edu/Metadata/sp2000.html• http://sunsite.berkeley.edu/Longevity/• http://www.oclc.org/digitalpreservation/presmeta_wp.pdf• http://is.gseis.ucla.edu/us-interpares/• http://www.niso.org/commitau.html • http://www.ifla.org/II/metadata.htm• METS official site: http://www.loc.gov/standards/mets• UC Libraries Systemwide Operations and Planning Advisory Group (SOPAG) Site
http://www.slp.ucop.edu/sopag/ for the UC Digital Preservation & Archiving Committee Final Report