Upload
madeline-smith
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
Grace Agnew1/15/2000
Georgia Institute of Technology
SCALABLE DURABLE
METADATA:**A Tutorial**
Grace Agnew1/15/2000
Georgia Institute of Technology
MODEL Record Structure
Repository
Design
Data Element Registration Database
Population
Dissemination to Users
Data interchange
(other repositories)
BUILDING A METADATA REPOSITORY
Grace Agnew1/15/2000
Georgia Institute of Technology
Ingest
Archival Storage
Data ManagementDI
AIP
DI
AIP
P
r
o
d
u
c
e
r
SIP Access and
Dissemination
C
o
n
s
u
m
e
r
Requests
Other info
DIP
MODEL: Functional Component Model for an OAIS
CCSDS 650.0-R-1: Reference Model for an Open Archival Information System (OAIS). Red Book. Issue 1. May 1999. PDF.Available at: http://ssdoo.gsfc.nasa.gov/nost/isoas/overview.html
Grace Agnew1/15/2000
Georgia Institute of Technology
Repository Design:
• Scalable: Flexible, scalable metadata object and repository structure can serve an expanding domain.
•Standardized: Metadata object design and repository structure are shareable by other repositories within the domain and, optimally, by other domains
•Unambiguous: Repository structure and metadata object design can be consistently interpreted and utilized by human and machine users
•Effective: Data is well-managed for persistence over space and time and is readily accessible to users at point of need.
• Integrated Data repository integrates well with other data sources in the user information environment
Grace Agnew1/15/2000
Georgia Institute of Technology
Database Management System
• File Based (hierarchies, drag and drop “Microsoft model”)
• Relational (Oracle, MySQL, MS Access)
• Object-oriented (CORBA)
Data Management:
• Import and Export (Direct input; file transfer; batch import and export)
• Data validation, deletion, modification, migration mechanisms
• Scalable, accessible data storage
•Resource: Moore, Reagan, et al “Configuring and Tuning Archival Storage Systems” http://www.sdsc.edu/NARA/Publications/OTHER/HPSS-tuning/HPSS-tun.v3.html
Metadata Repository: Structural Elements
Grace Agnew1/15/2000
Georgia Institute of Technology
Security:
•Access Control: Levels of authorization for management, input, search & retrieval, display and download
• Data Integrity
Search and Retrieval:
• XQL (“XML Query Language”)
“XML Query Language (XQL) is a notation for addressing and filtering the elements and text of
XML documents. XQL is a natural extension to the XSL pattern syntax.”From: Robie, Jonathan, Joe Lapp and David Schach. “XML Query Language: XQL. http://www.cuesoft.com/xqlspec.htm
Display and Download:
Options: XML(XSL Stylesheets) documents; HTML documents; fielded flat files; ASCII files, Relational database, SAS file format, etc. Closely associated with access control
Tool: Zope: Open Source Web Application server (integration with MySQL; object oriented databases. http://www.digicool.com
Grace Agnew1/15/2000
Georgia Institute of Technology
Data Exchange between Repositories:
Z39.50: ANSI/NISO Z39.50-1995 (ISO 23950):
Client/Server computer-to-computer communications protocol that specifies query and retrieval of information: bibliographic data, full-text documents; images, and multimedia in a distributed network environment, across disparate computer systems, databases and search engines.
Current version: 3
http://lcweb.loc.gov/z3950/agency/document.html
Profiles:
Profile for Access to Online Thesauri: http://lcweb.loc.gov/z3950/agency/profiles/zthes-03.html
Profile for Access to Digital Library Collections: http://lcweb.loc.gov/z3950/agency/profiles/collections.html
CIMI Profile for Museum Collections http://lcweb.loc/gov/z3950/agency/profiles/cimi2.html
The Bath Profile:An International Z39.50 Specificationfor Library Applications and Resource Discovery http://www.ukoln.ac.uk/interop-focus/activities/z3950/int_profile/bath/draft/BathProfileRevisedPublicDraft10Jan2000.htm
“Conformance to this profile's specifications will improve international or extranational search and retrieval among library catalogues, union catalogues, and other electronic resource discovery services worldwide.”
.
Grace Agnew1/15/2000
Georgia Institute of Technology
Data Exchange between Repositories:
Z39.50 Variations:
Z-SQL (SQL query language and generic record export in Z39.50) http://www.dstc.edu.au/Research/Projects/Z+SQL/
ZORBA (CORBA object retrieval in Z39.50)
Ward, Nigel. Michael Lawley & Sonya Finnegan. “ZORBA: Information Retrieval Using Distributed Object Technologies”
http://www.dstc.edu.au/Research/Resource_Discovery/publications/zorba_eogeo98/
Tools:
LeVan, Ralph. “Building a Z39.50 Client” OCLC Online Computer Library Center. (pdf file)
Kunze, John A. “Basic Z39.50 Server Concepts and Creation.” University of California at Berkeley. (pdf file)
List of commercial and shareware systems: http://www.cni.org/pub/NISO/docs/Z39.50-brochure/50.brochure.part09.html
Grace Agnew1/15/2000
Georgia Institute of Technology
Data Exchange between Repositories:
XMI: Open information interchange model for exchange of models and data over the Internet in a standardized manner.
Tool: XMI Toolkit. Available from IBM Alpha Works. 90-day cost-free testing period. http://www.alphaworks.ibm.com
Common Warehouse Metadata Interchange:
Request for Proposal issued. Submissions due 9-17-1999
OMG Document ad/98-09-02. Available from http://www.omg.org
Objectives:
Establish an industry standard specification for common warehouse metadata interchange
Provide a generic mechanism that can be used to transfer warehouse metadata
Leverage existing vendor-neutral interchange mechanisms
Grace Agnew1/15/2000
Georgia Institute of Technology
DATA EXCHANGE BETWEEN REPOSITORIES
BXXP Protocol: Interesting New Development!
Multiplexes several generic application channels carrying XML (or other mime-type data) on a single socket connection.
Provides for segmented data, windowed flow control, user authentication, profile negotiation and secure transport
Block Architectural Precepts - Marshall Rose and Carl Malamud
http://www.ietf.org/internet-drafts/draft-mrose-blocks-architecture-00.txt
Blocks Simple Exchange Profile (M. Rose)
http://www.ietf.org/internet-drafts/draft-mrose-blocks-exchange-00.txt
Blocks eXtensible eXchange Protocol (M. Rose)
http://www.ietf.org/internet-drafts/draft-mrose-blocks-protocol-00.txt
Grace Agnew1/15/2000
Georgia Institute of Technology
Models:
Standard Syntax: UML - Unified Modeling Language
Tool: Rational Rose
http://www.rational.com/products/rose/index.jtmpl
Metamodel: Provides the Conceptual Schema for a Repository
Uses standard object modeling concepts
• <meta>Classes/Entities
• <meta>Relationships/Associations
• <meta>Attributes
Conceptual Data Model: How data is structured in the real world
Logical Data Model: How data is structured and processed by a computer system
Modeling Tool: http://www.isis.vanderbilt.edu/projects/gme/meta/default.html
Grace Agnew1/15/2000
Georgia Institute of Technology
Users
• Primary Constraints:
• Domain (e.g. Astrophysicists)
• Organization (e.g. at the University of X)
• Application needs: (e.g. research, teaching)
• Relationships:
•Related domains
• Other information sources within the information universe
•Example: AGRIS and CARIS metadata records (FAO) are in MARC format
• Other user groupsCritical to developing a metadata system is to understand the domain, the users, and how users interact with the domain. This conceptual framework is then mapped into a model for the repository
Grace Agnew1/15/2000
Georgia Institute of Technology
MODEL: X3.285- METAMODEL FOR THE MANAGEMENT OF SHAREABLE DATA http://pueblo.lbl.gov/~olken/X3L8/drafts/Metamodel/MetaModel_ToC.html
Data Registry: “A place to keep characteristics of data that are necessary to clearly describe, inventory, analyze and classify data. A data registry supports data sharing with cross-system and cross-organization descriptions of common units of data. A data registry allows users of shared data to have a common understanding of a unit of data’s meaning, representation and identification.”
X3.285 specifies the schema of a registry where descriptions of shareable data are stored. Defines relationships and constraints between components of the Model.
Data element (“indivisible” atomic unit of data)
• Data composite (collection of data elements treated as a unit)
• Property (distinguishes one object from another)
• Object class (set of concepts, abstractions or things in the universe that are bound or classed together)
Representation (Expression of the data element, through permissible values, datatype, and as applicable, a unit of quantity).
Grace Agnew1/15/2000
Georgia Institute of Technology
Domain Model: National Health Information Knowledgebase (Australia)
From: Australian Institute of Health and Welfare (AIHW):
http://www.aihw.gov.au/services/health/nhik.html
Grace Agnew1/15/2000
Georgia Institute of Technology
Person characteristic Accommodation characteristic Demographic characteristic Education characteristic Insurance / benefit characteristic Labour characteristic Legal characteristic Lifestyle characteristic Parenting characteristic Social characteristic Cultural characteristic Other person characteristic Physical characteristic
Information Model Subtypes
Accommodation characteristic The living arrangements of a PERSON. For example, the type of dwelling, age of dwelling, number of bedrooms, modification of dwelling to account for restricted movement etc. In the National Health Information Model, ACCOMMODATION / HOUSING CHARACTERISTIC may relate to where a PERSON usually resides or it may be of interest at an instance in time - for example while a PERSON is in receipt of care. [Show Linked Data Elements][Show Linked Work Programs]
Grace Agnew1/15/2000
Georgia Institute of Technology
Metadata: Definition and Rationale
“Data or information which help us perform one or more of the following functions with respect to data and information resources:
Finding
Interpreting/evaluating
Accessing
Analyzing
Managing
Preserving”
Boyko, Ernie. “Statistical Metadata: A User Perspective.” Open Forum on Metadata Registries. January 20, 2000.
Grace Agnew1/15/2000
Georgia Institute of Technology
Two Metadata Development Trends
Development of a metadata schema to precisely and unambiguously describe an information object, generally in a one-to-one metadata record to information object relationship.
Intrinsic: Incorporated within information object
Extrinsic: Located in a separate metadatabase with fielded link to information object
Objectives for Standardized Metadata Schema:
Provide description and management data at the information resource level.
Provide standardized metadata record formats to facilitate data storage, indexing, querying, retrieval, exchange and display.
Grace Agnew1/15/2000
Georgia Institute of Technology
Major domain-neutral Standards:
MARC (Machine Readable Data Records) primarily for library-based information resources
http://lcweb.loc.gov/marc
Dublin Core -15 standard elements with approved qualifiers. Primarily for web-based resources
http://purl.org/dc/ Content IntellectualProperty
Instantiation
Title Creator DateSubject Publisher TypeDescription Contributor FormatSource Rights IdentifierLanguageRelationCoverage
Drenth, B.D., et al. 1999. Guide to Best Practice: Dublin Core. Consortium for the Computer Interchange of Museum Information http://www.cimi.org/documents/meta_bestprac_final_ann.html
Grace Agnew1/15/2000
Georgia Institute of Technology
RDF - Resource Description Framework. Framing system, providing transparent transport for the metadata schemas defined and utilized within its wrapper. XML schema that is both human and machine interpretable. http://www.w3.org/TR/PR-rdf-syntax/
Key Concepts:
Resource: Any object uniquely identifiable by a URI (uniform resource identifier)
Property-type: Property associated with a resource.
Value: Associated with a property type--may be atomic (a string) or another resource, creating
a new hierarchy)
Grace Agnew1/15/2000
Georgia Institute of Technology
RDFProperty types express the relationships of values associated with resources:
“Famous Example”
The Author of “Metadata Overview” is Grace Agnew
Metadata Overview
http://www…….edu/meta
“Grace Agnew”
Resource
Property Type
Value
Author
Grace Agnew1/15/2000
Georgia Institute of Technology
Tools:
CORC: OCLC Cooperative Online Resource Catalog Project. http://purl.oclc.org/corc Information entered in a template is cross-cataloged in MARC, Dublin Core and RDF/Dublin Core. Membership to libraries of any description at no charge through July 1, 2000. Currently available for search and display to non-members. Use for MARC, Dublin Core and DC/RDF examples
DC.dot Generates records in Dublin Core and RDF/Dublin Core:
http://www.ukoln.ac.uk/metadata/dcdot/
Grace Agnew1/15/2000
Georgia Institute of Technology
Tools (cont’d):
Reggie http://metadata.net
Generates records in Dublin Core and RDF/Dublin Core. Provides a template for establishing a metadata registry.
MetaWeb: Provides software for establishing a gateway to search distributed Dublin Core. http://www.dstc.edu.au/RDU/MetaWeb/broker/search.html
Crosswalks between Formats:
http://www.ukoln.ac.uk/metadata/interoperability/
Grace Agnew1/15/2000
Georgia Institute of Technology
Data Element Registration
ISO/IEC 11179 - Specification and Standardization of Data Elements
Establishes concise, unambiguous definitions and context for atomic data elements, as well as the structure and format for the values that represent the data element, for sharing data, primarily in large datasets or technical reports.
Grace Agnew1/15/2000
Georgia Institute of Technology
ISO 11179
Six Parts:
11179-1 Framework for the Specification and Standardization of Data Elements
11179-2 Classification for Data Elements
11179-3 Basic Attributes of Data Elements
11179-4 Rules and Guidelines for the Formulation of Data Definitions
11179-5 Naming and Identification Principles for Data Elements
11179-6 Registration of Data Elements
Grace Agnew1/15/2000
Georgia Institute of Technology
Metadata Registry - ISO/IEC 11179Data elements within the described dataset are registered in ISO 11179-compliant registry to:
• standardize representation of the data element to enable shareability and durability (reuse) of data
• establish context and meaning for intelligent retrieval and interpretation of data
Data element is the equivalent of an attribute in a data or object model. The representation of a single property of a class of objects in the natural world.
Draft Standards: http://pueblo.lbl.gov/~olken/X3L8/drafts/draft.docs.html
Grace Agnew1/15/2000
Georgia Institute of Technology
Class
class attributes
Employee
Name
Identification Number
Address
Data elements
Employee Name
Employee ID No.
Employee Address
From Framework for the Specification and Standardization of Data Elements (draft) p. B-3
Formal Definition: “A unit of data for which the definition, identification, representation and permissible values are specified by means of a set of attributes.”
Grace Agnew1/15/2000
Georgia Institute of Technology
Principles in the Application of 11179
• Each Data Element receives a unique, unintelligent number to create a reusable, international data element
• Data elements are derived from understanding the domain data content and breaking it into meaningful atomic elements.
•Metadata registries consist of the data element and its attributes, which provide definition, meaning and precision of application
•Metadata registries are populated in two ways:
• Bottom up: Begin with the data element and its attributes
• Top down: Develop a classification hierarchy and populate with data elements
Grace Agnew1/15/2000
Georgia Institute of Technology
Major Data Element Attributes - My Effort at Interpretation of the Standard
Name: SMPTE_Time_and_Control_Code
Definition: Time and control code for tracking playback of film, audio, and video established by the Society of Motion Picture and Television
Engineers
Permissible value: example or formulation principle
hh:mm:ss;s
Value Domain: Set of permissible values (enumerated or unenumerated
SMPTE 12M-1995
Type name: Determinant
Data Type: Numeric
Format: hh:mm:ss;s where hh = hour (00-24) mm= minute (00-59) ss=second (00-59) and s=scene(0-N)
Maximum character: unknown Minimum character: unknown
Grace Agnew1/15/2000
Georgia Institute of Technology
Data Element ID: unintelligent number identifying reusable data element. May include version number. Will be
combined with the registration authority number (and version, if not already included) for a composite ID number
54367
Version: Used to identify modification to the data element
1
Context: Designation or description of the application environmentin which the name is applied or from which it originates
General: Time and control code used for tracking and editing audio, video and film media
Registry: Dublin Core Coverage metadata elements for audio, video and film DTDs: DC:Coverage.t.min DC:Coverage.t.max
Grace Agnew1/15/2000
Georgia Institute of Technology
Example:
<DC:Coverage.t.min DC.Scheme=“SMPTE”>19:31:57;1</DC:Coverage.t.min>
<DC:Coverage.t.max DC Scheme=“SMPTE”>19:32:07;7</DC:Coverage.t.max>
Data Element Concept: A concept represented by a data element, independent of any particular representation. Shared perception between two or more parties
Audiovisual media time and control code
Conceptual Domain: A set of possible valid values of a data element concept expressed without representation
time and control code for film, audio and video expressed in hours, minutes, seconds and subseconds.
Classification Level: Taxonomic location within the context of the registry
Structural Metadata. Audio and Video File Component Identification
Grace Agnew1/15/2000
Georgia Institute of Technology
Level of Ambiguity: Precision of Data Element Attribution
Generalization
Registration Status: incomplete, recorded, certified and registered.
Recommendation: work through the registry in many iterations. Do not move to “certified” or “registered” until taxonomy of registry is largely populated and data element has proven its durability and functionality through comment, review and use.
Incomplete Because some elements are missing and because I don’t know the max and min numbers of characters for representation!
Administrative Status: Designation of position of the data element in the registration process.
Awaiting InformationCaveat: The Above Example is Intended to illustrate the decision-making process for a first iteration of a data element and not to serve as a model.
Grace Agnew1/15/2000
Georgia Institute of Technology
Principles in the Application of 11179
Rigorous Registration Process encourages multiple iterations of data elements.
Data element statuses: incomplete, recorded, certified and registered
BENEFITS OF DATA ELEMENT APPROACH:
• unambiguous, shareable data that can be evaluated, analyzed and utilized on its intrinsic merits. Any kind of data can be described, including time series data measurements
•Precise value attributes result in population of data sets with authoritative, highly usable data.
• Versioning allows data analysts to track changes in naming, definition, etc. for accurate time series analyses
DRAWBACKS:
Context and description at the data set or information object level is lacking. Searching at the data element level does not provide sufficient description and meaning for document retrieval. Ex: data element “species” as used in “Registration of Endangered Species” or “Catalog of Species of Northern Michigan”
Grace Agnew1/15/2000
Georgia Institute of Technology
Registry Name: Biological Class NameDefinition: The systematic name that represents the biological Class.Example: MammaliaIdentifier: 20733Version: 1Administrative Status: InterimRegistration Status: StandardRepresentation Class: Name Unit of Measure: Precision:Submitting Organization: OIRMOrigin Description: Summary Report of Data Standards for Biological Taxonomy (Document) Note Description: A Class is a major subdivision of a phylum or division, usually consisting of several orders.Unresolved Issues:DISA:Create Date: 11/05/98Change Date: 05/26/99Value Domain Information
Definition: All names that represent the portion of a systematic name that is the biological Class.
Type Name: DeterminantDatatype: AlphanumericFormat: A(50)Determinant Type:Minimum Character: 5Maximum Character: 50
From:
United States
Environmental Protection Agency (EPA)
Environmental Data Register
http://www.epa.gov/edr
ISO 11179 Implementation - Environmental Data Registry
Grace Agnew1/15/2000
Georgia Institute of Technology
A
Country
Data element ConceptData element Concept
Country Code Domain-identifier: Afghanistan Belgium China .....
Conceptual Domain
Conceptual Domain
1..1 +represented_by
1..1 specifies
ISO 3166 -format: Number-item: 004 056 156......
VV
ISO 3166-Format: Alpha-3-item: AFG BEL CHN......
represents
1..1 represented_with
Conceptual Domain
Data Element Concept
Atomic Object
Country
Data Element
Country Represented with ISO 3166
Conceptual Domains
Country Code Domain Identifier:
Afghanistan
Belgium
China
Value Domain
ISO 3166Format: Alpha-3item: AFG, BELCHN
ISO 3166Format: Numberitem: 004 056 156
ISO 11179 Metadata Registry: Implementations
From: CBOP Consortium
Hajime Horiuchi [email protected]
http://www.cbop.gr.jp
Grace Agnew1/15/2000
Georgia Institute of Technology
ISO 11179 Metadata Registry: ImplementationsTraffic Management Data DictionarySection 3 Data ElementsVersion: 1.4 February 5, 1999 Annex 3 - Traffic ModelingDescriptive Name: PREDICTED_HovLaneVehicleCount_quantityDescriptive Name Context: Manage TrafficDefinition: Predicted number of vehicles within a user-specified time period that legitimately are using High Occupancy Vehicle (HOV) lanes in the road and highway network.Class Name: Traffic ModelingClassification Scheme Name: IEEE P1489, Annex BClassification Scheme Version: 19980706, V0.1.0Keywords: HOV Lane Vehicle CountRelated Data Concept:Relationship Type:ASN1 Name: Predicted-HOV-lane-vehicle-countASN1 Data Type: IntegerRepresentation Class Term: QuantityValue Domain: SI 10-1997; vehiclesValid Value Range:Valid Value List:Valid Value Rule:Valid Value Range: VALUE (0 to 100000)Internal Representation Layout: 9999999999Internal Layout Maximum Size:Internal Layout Minimum Size:Remarks: V1.1 - New data element.Data Concept Identifier: 3550Data Concept Version: V1.5Submitter Organization Name: TMDDLast Change Date: 19990205
Joint Effort:
Institute of Transportation Engineers (ITE): Federal Highway Administration (FHWA) and the American Association of State Highway and Transportation Officials (AASHTO)
http://www.ite.org/tmdd/
Grace Agnew1/15/2000
Georgia Institute of Technology
BEGIN_GROUP = MODULE_IDENTIFICATION ;
DEDSL_VERSION = 0.1; MODULE_TITLE = "Global Change Master Directory dictionary" ; MODULE_ADID = Not yet registered ;END_GROUP = MODULE_IDENTIFICATION ;
BEGIN_GROUP = ENTITY_DEFINITION ; NAME = Entry_ID ; MEANING = Unique identifier of the DIF ; SHORT_MEANING = Directory Entry Identifier ; VALUE_SYNTAX = STRING;END_GROUP = ENTITY_DEFINITION ;
BEGIN_GROUP = ENTITY_DEFINITION ; NAME = Entry_Title ; MEANING = Title of the DIF ; SHORT_MEANING = Directory Entry Title ; VALUE_SYNTAX = STRING;END_GROUP = ENTITY_DEFINITION ;
BEGIN_GROUP = ENTITY_DEFINITION ; NAME = Data_Set_Citation.Publication_Place ; MEANING = "The name of the city (and state or province and country if needed) where the data set was published or released." ; SHORT_MEANING = "Place where the data set was published or released." ; VALUE_SYNTAX = STRING ;END_GROUP = ENTITY_DEFINITION ;
BEGIN_GROUP = ENTITY_DEFINITION ; NAME = Data_Set_Citation.URL ; MEANING = "The Internet Uniform Resource Locator(s) (URL) of the data set." ; SHORT_MEANING = "URL of the data set." ; VALUE_SYNTAX = STRING ;END_GROUP = ENTITY_DEFINITION ;
ISO 11179 Variations: NASA Entity Dictionary
Specification Language (DESDL)
Source: Lou Reich. NASA/CCSDS
Grace Agnew1/15/2000
Georgia Institute of Technology
Elaboration in XML
<data element>
<name>Film_Video_Audio_Time_and_Control_Code</name>
<definition>Time and control code for tracking playback of film, audio, and video </definition>
<status>
<registration>Incomplete</registration>
<administrative>Awaiting review</administrative>
</status>
…
</data element>Based on: J. McCarthy, et al. “Using XML for Environmental Data Sharing”Open Forum on Metadata Registries. 1/20/2000
Grace Agnew1/15/2000
Georgia Institute of Technology
Metadata Schema Approach
Three Types of Metadata (Digital Library Federation Architecture Committee)
Descriptive: Discovery and Identification of an Object (Dublin Core, MARC, EAD, etc.)
Structural: Used to Display and Navigate an Object. Provide information on internal organization of an object
Administrative: Management information. Date created, modified, etc. Content file format (e.g. JPEG); rights information, etc.
Grace Agnew1/15/2000
Georgia Institute of Technology
OAIS Reference Model:
Content Information: The data object and its representation that makes it understandable to the user ( DLF: Structural)
Preservation Description: Provenance, Context, Reference and Fixity (DLF: Structural and
Administrative)
Descriptive Information: (DLF: Descriptive Information)
Grace Agnew1/15/2000
Georgia Institute of Technology
Descriptive:
Recommendations: In most cases, use standards-based Dublin Core as the base record - for interoperability.
Add fields to serve your domain user group as needed. Document and register any added fields. Create an XML DTD
Distribute metadata creation responsibilities:
Administrative and Structural : Largely provided by content digitizers
Descriptive: Largely provided by domain specialists
Recommendation: Use thesauri for controlled subject terminology.
Tool: Koch, Traugott, comp.Controlled vocabularies, thesauri and classification systems available in the WWW. DC Subject. http://www.lub.lu.se/metadata/subject-help.html
Grace Agnew1/15/2000
Georgia Institute of Technology
Structural:
Identification.
URN (Uniform Resource Names) - intended to persist
IETF RFC 1737
URL: “de facto” web naming and addressing standard
PURL: Permanent URL involves intermediate resolution by a third party.
Handles: Developed by CNRI. URN proposal emphasizes persistent names. Names maintained by object publisher or author.. The handle server reconciles permanent name with address changes. http://www.handle.net/
See also: Library of Congress. National Digital Library Program. “The Relationship between URNs, Handles and PURLs. http://lcweb2.loc.gov/ammem/award/docs/PURL-handle.html
Grace Agnew1/15/2000
Georgia Institute of Technology
Administrative Metadata
Issues:
Digital Persistence:
technology emulation (“recreate the technology needed to open and display)
• migration path/backward compatibility (“standards backward compatible 1 or more version to allow migration of
data”)
• interpolation (“technology interpolates to retrieve or enhance obsolete data”)
Grace Agnew1/15/2000
Georgia Institute of Technology
Managing Data for Digital Persistence:
Maintain information needed to create, retrieve and display each digital object.
include platform, processor; version info for software and OS
digital creation hardware and software
digital editing hardware and software
digital viewing hardware and software
calibration hardware and software
Visual Media:
Color: color space (RGB, CMYK); color look up table; color profile for digital camera or scanner; color chart used for calibration.
Grace Agnew1/15/2000
Georgia Institute of Technology
Compression:
Images: pixels: pixel array (ex: 2,000 x 3,000 ppi)
bit depth (8-bit, 16-bit, 24-bit, etc.)
Video and Audio:
If at all possible, save master file in uncompressed format:
e.g. IEEE (Institute of Electrical & Electronics Engineers)
CCIR 601 (Broadcast Digital Video)
NTSC -- 720 x 480
PAL -- 720 x 576
10-bit or 8-bit
For MPEG1, 2 and 4 include level of service; frame rate (fps); frame size; bit depth. Consider IPB ratio.
Grace Agnew1/15/2000
Georgia Institute of Technology
Rights Management
Management Restrictions
Management Conditions
Access Restrictions
Access Conditions
Use Restrictions
Use Conditions
Define rights management in a format that maps to future use of a resolver (e.g. resolve to an address with copyright and use information as opposed to embedded use and access text)
DOI - Digital Object Identifier. Development of the commercial publishing domain. Uses handles technology to resolve access and use (and to support ecommerce applications). http://www.doi.org
Grace Agnew1/15/2000
Georgia Institute of Technology
Recommendation: Re-Use metadata elements developed by respected Early Adopters:
Making of America II White Paper: http://sunsite.berkeley.edu/moa2/wp-v2.html
MOAII Document Type Definition: http://sunsite.berkeley.edu/MOA2/papers/DTD.html
National Library of Australia: PANDORA (Preserving and Accessing Networked Documentary Resources of Australia) http://www.nla.gov.au/pandora
Library of Congress - Structural Metadata Dictionary for LC Digital Objects http://lcweb.loc.gov:8081/ndlint/repository/attdefs.html
UKOLN (UK Office for Library and Information Networking): http://www.ukoln.ac.uk/metadata/cld/
Grace Agnew1/15/2000
Georgia Institute of Technology
Rights Metadata:
University of Pittsburgh. School of Information Sciences. “Functional Requirements for Evidence in Recordkeeping” http://www.lis.pitt.edu/~nhprc
Video Metadata:
Hunter, Jane and Liz Armstrong. A Comparison of Schemas for Video Metadata Representation
http://www8.org/w8-papers/3c-hypermedia-video/comparison/comparison.html
Hunter, Jane and Jan Newmarch. “An Indexing, Browsing, Search and Retrieval System for Audiovisual Libraries.” http://link.springer.de/link/service/series/0558/bibs/1696/16960076.htm
Administrative Metadata:
A-Core IETF Draft Standard for Metadata about Descriptive metadata (documenting provenance, etc.) Iannell & Campbell. http://metadata.net/admin/draft-iannella-admin-01.txt
Grace Agnew1/15/2000
Georgia Institute of Technology
Recommendations:
* Create XML DTD for metadata records by format
* Use Dublin Core with approved qualifiers as the base record
* Document metadata elements in a metadata registry
* Use RDF as the export wrapper (“report format for a relational database”)
AlphaWorks Tools can assist:
DDbE: accepts well-formed XML documents and constructs a DTD
XMI Toolkit: generate DTDs and share Java objects
XML Parser for Java: validating parser
XML Generator: generates instances of valid XML from a DTD
http://www.alphaworks.ibm.com
Grace Agnew1/15/2000
Georgia Institute of Technology
Metadata Resources. http://dewey.yonsei.ac.kr/metadata/links.htm
IFLANET. Digital Libraries: Metadata Resources http://www.ifla.org/II/metadata.htm
UK Office of Library Networking. Metadata for Preservation: CEDARS Project Document AIWO1http://www.ukoln.ac.uk/metadata/cedars/AIW01.html
National Library of Australia. PADI: Preserving Access to Digital Information. http://www.nla.gov.au/padi/
National Archives of Australia. Designing and Implementing Recordkeeping Systems. http://www.naa.gov.au/Govserv/techpub/DIRKSman/dirks.html
General References