47
Metadata: Principles, Practices, Challenges Sandra Payette Digital Library Research Group Cornell University [email protected]

Metadata: Principles, Practices, Challenges Sandra Payette Digital Library Research Group Cornell University [email protected]

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

Metadata:Principles, Practices, Challenges

Sandra PayetteDigital Library Research Group

Cornell [email protected]

Metadata CREATOR: Plato

TITLE: The Republic

Image 1 cdrom 1Image 2 cdrom 1Image 3 cdrom 2

Metadata is structured data about data that facilitates discovery, use, and

management of the data to which it refers.

Access Control List

Metadata enables …

Resource Discovery Resource Presentation and

Navigation Rights Management Preservation

We must support all these functions, but also recognize that these are artificial categories for metadata.

General Principles:Metadata

Designing one kind (e.g, descriptive) without consideration of others (e.g., usage, rights, preservation) can compromise utility and interoperability over time

The metadata problem should be approached analytically and methodically, not ad hoc

Metadata is expensive; must cost-benefit Common metadata sets often represent a

flattened or simplified view of reality.

Challenge of Interoperability

Semantic

Structural

Syntactic

Media: CD-ROM

(refers to physical storage medium for digital image)

Media: 35mm film

(refers to original source)

Challenge of Interoperability

Semantic

Structural

Syntactic

Date: 10-6-99 Date: 6-10-99

Type: image/tiff Type: TIFF 4.0

Author: Sam Brown Author: Brown, S

Challenge of Interoperability

Semantic

Structural

Syntactic

<META name="DC.creator" content=”Junger, S"><META name="DC.title" content=”The Perfect Storm">

Creator: Junger,STitle: The Perfect Storm

Syntaxes for Expressing Metadata

HTML META tags Embed metadata in HTML documents Search engines can extract it

XML and SGML Communities can define own vocabularies

(DTDs/Schema) Separation of structural description from rendering info Increasing support for XML in browsers and other

software Resource Description Framework (RDF) using XML

Express complex relationships between resources Proprietary

TIFF headers Vendor-specific data structures

Functional Views of Metadata

Resource Discovery Resource Presentation and

Navigation Rights Management Preservation

Resource Discovery on Web

Scale: much content is not visible or not found, so it’s not indexed

Format: much content non-textual (e.g., images!)

Context: lacking! (causing precision error in search)

Rights: valuable content hidden behind firewalls

google

lycos

excite

Collection “A”Web Server

ImageDB

Text-basedsearch engines

Challenges

Resource Discovery on Web

google

lycos

excite

Collection “A”Web Server

ImageDB

“A”searchengine

MetadataStore

WebBrowser

Welcome to Collection ‘A”

Search:

Context established - customized metadata created at source

Resource Discovery:Dublin Core

15 descriptive elements Facilitates simple resource discovery on

Web Cross-disciplinary, international, genre-

independent Very active and accepted “standard”

100+ major projects 20+ countries

http://purl.oclc.org/dc/

Resource Discovery:Dublin Core Caveats

Designed for simple discovery, don’t force it to do more than it can (rights, preservation)

Qualification – can compromise meaning and interoperability

“Stratford”Hamlet

“Shakespeare”dc:creator.playwright

dc:creator.birthplace

Roll-up to root element and … “dc:creator = Stratford” ????

Open Archives Initiative (OAi)

Specification of simple metadata harvesting protocol to facilitate interoperability

Adoption of unqualified Dublin Core Element Set as required metadata

Common XML container format for metadata packaging

Institutional backing of CNI (Coalition for Networked Information) DLF (Digital Library Federation)

http://www.openarchives.org/

Exposing and Exchanging Metadata using OAi

harvester

ImageCollections

Electronicjournals

OPAC

E-texts

metadata

OAi

OAi Registered Repositories

arXivOCLC Thesis and DissertationsPerseus Digital LibraryPhysNetOxford Text ArchiveLibrary of Congress -- American MemoryCogPrintsHumboldt UniversityMIT ThesisLinguistic Data ConsortiumResource Discovery Network… and more

Resource Discovery:Principles

Decide what you want to be visible to which search services (Site home page? Specific items?)

Adopt standard metadata (e.g., DC) for cross-domain visibility of resources

Develop context-specific metadata to meet collection requirements

Design/adopt a metadata model that allows for graceful co-existence of multiple metadata sets

Express or expose metadata in syntax that promotes interoperability (e.g., XML, RDF)

Functional Views of Metadata

Resource Discovery Resource Presentation and

Navigation Rights Management Preservation

Structural Metadata

Facilitates Direct access to key points in objects Browsing objects Navigation (e.g., turning pages) Identification of relationships (e.g., parent/child) Access to different formats (e.g., TIFF, GIF, PDF)

Where is it? ASCII text files in directories Relational databases Embedded in documents or surrogates (e.g. XML,

SGML)

Structural metadata can be a byproduct of data management

atlantic

V0001 v0002

i0001 i0002

v0003

0001.tif 0002.tif 0003.tif

Level 1(journals)

Level 2(volumes)

Level 3(issues)

Level 4(articles)

harpurs

RelationalDatabase

atlantic (dir) V0002 (dir) I0001 (dir) 0001.tif 0002.tif

File System

Structure via Document Encoding

Current trend is to use mark-up languages to encode the structure of document objects

SGML Text Encoding Initiative (TEI/TEI-Lite DTDs) Memory of World DTD for rare library materials Encoded Archival Description (EAD)

XML Goettingen Digitization Center (XML and RDF) Making of America II (archival object DTD, plus EAD) METS (under development, support by DLF)

MOA2 DTD: Structural “Binding” of Images

<StructMap><div N='1' TYPE='Book' LABEL='Diary of Patrick Breen one of the Donner Party 1846-57…><fptr FILEID='HRJ1' MIMETYPE='image/jpeg' /><fptr FILEID='LRJ1' MIMETYPE='image/jpeg' /><fptr FILEID='LRG1' MIMETYPE='image/gif' /><fptr FILEID='T1' MIMETYPE='text/sgml' TAGID='titlepage' />

<div N='1' TYPE='Entry' LABEL='Friday Nov. 20th 1846 [Page 1]'><fptr FILEID='HRJ2' MIMETYPE='image/jpeg' /><fptr FILEID='LRJ2' MIMETYPE='image/jpeg' /><fptr FILEID='LRG2' MIMETYPE='image/gif' /><fptr FILEID='T1' MIMETYPE='text/sgml' TAGID='entry1'/></div><div N='2‘ TYPE='Entry‘ LABEL='Entry sat. 21st [Page 2]'><fptr FILEID='HRJ3' MIMETYPE='image/jpeg' /><fptr FILEID='LRJ3' MIMETYPE='image/jpeg' /><fptr FILEID='LRG3' MIMETYPE='image/gif' /><fptr FILEID='T1' MIMETYPE='text/sgml' TAGID='entry2'/></div>

Source: sunsite.berkeley.edu/moa2

Step 1

Text file (ASCII)Image file (TIFF)

Enabling Fine-grained Access to Images

….

Monday 30th Snowing fast wind W about 4 or 5 feet deep, no drifts looks as likely to continue as when it commenced no liveing thing without wings can get about

December 1st Tuesday Still snowing wind W snow about 5 1/2 feet or 6 deep difficult to get wood no going from the house completely housed up looks as likely for snow as when it commenced, our cattle all killed but three or four [of] them, the horses & Stantons mules gone & cattle suppose lost in the Snow no hopes of finding them alive

wedns. 2nd. Continues to snow wind W sun shineing hazily thro the clouds dont snow quite as fast as it has done snow must be over six feet deep bad file this morning

….

transcribe

Step 2

<div id=‘entry11’> Monday 30th Snowing fast wind W about 4 or 5 feet deep, no drifts looks as likely to continue as when it commenced no liveing thing without wings can get about </div>

<div id=‘entry12’> December 1st Tuesday Still snowing wind W snow about 5 1/2 feet or 6 deep difficult to get wood no going from the house completely housed up looks as likely for snow as when it commenced, our cattle all killed but three or four [of] them, the horses & Stantons mules gone & cattle suppose lost in the Snow no hopes of finding them alive</div>

<div id=‘entry13’> wedns. 2nd. Continues to snow wind W sun shineing hazily thro the clouds dont snow quite as fast as it has done snow must be over six feet deep bad file this morning</div>

Text file Encoded text file

mark-up

Enabling Fine-grained Access to Images

….

Monday 30th Snowing fast wind W about 4 or 5 feet deep, no drifts looks as likely to continue as when it commenced no liveing thing without wings can get about

December 1st Tuesday Still snowing wind W snow about 5 1/2 feet or 6 deep difficult to get wood no going from the house completely housed up looks as likely for snow as when it commenced, our cattle all killed but three or four [of] them, the horses & Stantons mules gone & cattle suppose lost in the Snow no hopes of finding them alive

wedns. 2nd. Continues to snow wind W sun shineing hazily thro the clouds dont snow quite as fast as it has done snow must be over six feet deep bad file this morning

….

Step 3

Title Page Entry 1 Entry 2 Entry 3 Entry 4 Entry 5 … …

Encoded text file File viewed in browser

parseand

render

Enabling Fine-grained Access to Images

<div id=‘entry11’> Monday 30th Snowing fast wind W about 4 or 5 feet deep, no drifts looks as likely to continue as when it commenced no liveing thing without wings can get about </div>

<div id=‘entry12’> December 1st Tuesday Still snowing wind W snow about 5 1/2 feet or 6 deep difficult to get wood no going from the house completely housed up looks as likely for snow as when it commenced, our cattle all killed but three or four [of] them, the horses & Stantons mules gone & cattle suppose lost in the Snow no hopes of finding them alive</div>

<div id=‘entry13’> wedns. 2nd. Continues to snow wind W sun shineing hazily thro the clouds dont snow quite as fast as it has done snow must be over six feet deep bad file this morning</div>

EAD: Encoded Archival Description

DTD for SGML mark-up of descriptive finding aids (e.g., inventories, registers, indexes, and guides)

Provides more detail about a collection than in typical catalog record

Facilitates access - “drill down” into collection Potential international standard Maintained jointly by Library of Congress and

Society of American Archivists (SAA)

EAD Example

http://sunsite.Berkeley.EDU/CalHeritage/

Presentation and Navigation:Principles

Decide how fine-grained you want the access experience to be

Determine the cost-benefit of creating this amount of structural metadata

Design/adopt a model (esp. DTD/Schema) that can be shared

Be prepared to express in XML, since it is poised to become standard on Web

Functional Views of Metadata

Resource Discovery Resource Presentation and

Navigation Rights Management Preservation

Rights and Security Metadata

Facilitates Access control Protection of intellectual property rights Transactions (e-commerce) Security (protect materials from attack) Monitoring

Digital Library Federation (DLF) Requirements

Must account for perspectives of publishers, intermediaries, users

Must not compromise privacy of users Must accommodate ambiguity, as found in

copyright (e.g., fair use) Metadata relationships

Descriptive (about objects) User profiles Rights declarations (Policies)

Expressing Policies for Automated Enforcement

Rights Metadata efforts (XML/RDF oriented) <indecs> Digital Object Identifier (DOI)

Policy Language initiatives Extensible Rights Markup Language (XrML)

(www.xrml.org) KeyNote (www.crypto.com/trustmgt/kn.html) Cornell’s PSLang (Language-based security)

Modeling the “rights” problem:<indecs>

Supported by copyright societies, publishers, recording industry

Fundamental Entities are modeled Creation Person Agreement

Inter-relationship of descriptive and rights metadata

Event-oriented (time and transactions) Model will be expressed in RDF Schema

<indecs>Sample Encoding for Rights Metadata

[EventIdentifier=License No 12345][EventType=Agreement]

[Person=John Smith] [Role=GranterOfRight][Person=Bill Brown] [Role=Grantee][Event=Event No 11111 [Role=Permitted Act]…

[EventIdentifier=11111][EventType = Usage Event]

[Person=Bill Brown] [Role=Downloader][Manifestation=TextFile1 “Make Money…”…

Source: http://www.indecs.org/pdf/model3.pdf

Functional Views of Metadata

Resource Discovery Resource Presentation and

Navigation Rights Management Preservation

Preservation Metadata

Image File Attributes:• formats • versions • compression

Image Attributes:• resolution• bit depth• orientation

Process Data:• creation date/time• equipment used

Rights Data:•Expiration dates•Copyright info•source statements

Descriptive Data:• author• title• publish date

Structure Data:• pagination• sub-groups

Electronic Records CommunityPerspective

Metadata requirements for preserving evidence

Six-layer metadata model Unique identifier Resource discovery metadata Data structure Terms and conditions Provenance information

Source: Pittsburgh Project, www.lis.pitt.edu/~nhprc

Unique Identifiers

Globally unique names (e.g., URN specification)

Name is permanent, location changes Resolution services to locate the object Implementations: PURL, Handles, DOI Can create your own local resolution system

cnri.dlib/april97-payetteNamingAuthority

ItemName

UniqueIdentifier:

URL: http://www.somewebserver.org/somedirectory/somefile

Conceptual Model

DescriptiveView

StructuralView

TechnicalView

RightsViewPreservation

Metadata View

A model for preservation should accommodate different metadata views

Model Projectsfor Preservation Metadata

Cedars (UK) Developed extensive preservation metadata set Evaluated all major initiatives, and influenced by

RLG and Pittsburgh work Using OAIS model for distributed archives http://www.leeds.ac.uk/cedars/metadata.html

National Library Australia Metadata to manage collections, objects, files Desired output of a metadata system http://www.nla.gov.au/preserve/pmeta.html http://www.nla.gov.au/padi

Wrap Up:Questions for setting metadata requirements

How will users locate digital image objects?

How will users interact with digital image objects or collections?

What policies are necessary to protect rights and provide access controls

How will the program assure permanence of digital materials?

Wrap Up… Best Practices for Metadata

Well-conceived data models Understand functional requirements metadata will support Modularity in design (provides flexibility and extensibility) Prevent data anomalies (remember DC example?)

Well-structured metadata (machine-interpretable) Express or expose metadata using standard syntax

(e.g., XML) Define “standard” community semantics and rules

(e.g., XML Schema, DTDs) Anticipate need to interoperate

Exposure of metadata in standard syntax and semantics