49
Grace Agnew 1/15/2000 Georgia Institute of Technology SCALABLE DURABLE METADATA: **A Tutorial**

Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Embed Size (px)

Citation preview

Page 1: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

SCALABLE DURABLE

METADATA:**A Tutorial**

Page 2: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

MODEL Record Structure

Repository

Design

Data Element Registration Database

Population

Dissemination to Users

Data interchange

(other repositories)

BUILDING A METADATA REPOSITORY

Page 3: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Ingest

Archival Storage

Data ManagementDI

AIP

DI

AIP

P

r

o

d

u

c

e

r

SIP Access and

Dissemination

C

o

n

s

u

m

e

r

Requests

Other info

DIP

MODEL: Functional Component Model for an OAIS

CCSDS 650.0-R-1: Reference Model for an Open Archival Information System (OAIS). Red Book. Issue 1. May 1999. PDF.Available at: http://ssdoo.gsfc.nasa.gov/nost/isoas/overview.html

Page 4: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Repository Design:

• Scalable: Flexible, scalable metadata object and repository structure can serve an expanding domain.

•Standardized: Metadata object design and repository structure are shareable by other repositories within the domain and, optimally, by other domains

•Unambiguous: Repository structure and metadata object design can be consistently interpreted and utilized by human and machine users

•Effective: Data is well-managed for persistence over space and time and is readily accessible to users at point of need.

• Integrated Data repository integrates well with other data sources in the user information environment

Page 5: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Database Management System

• File Based (hierarchies, drag and drop “Microsoft model”)

• Relational (Oracle, MySQL, MS Access)

• Object-oriented (CORBA)

Data Management:

• Import and Export (Direct input; file transfer; batch import and export)

• Data validation, deletion, modification, migration mechanisms

• Scalable, accessible data storage

•Resource: Moore, Reagan, et al “Configuring and Tuning Archival Storage Systems” http://www.sdsc.edu/NARA/Publications/OTHER/HPSS-tuning/HPSS-tun.v3.html

Metadata Repository: Structural Elements

Page 6: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Security:

•Access Control: Levels of authorization for management, input, search & retrieval, display and download

• Data Integrity

Search and Retrieval:

• XQL (“XML Query Language”)

“XML Query Language (XQL) is a notation for addressing and filtering the elements and text of

XML documents. XQL is a natural extension to the XSL pattern syntax.”From: Robie, Jonathan, Joe Lapp and David Schach. “XML Query Language: XQL. http://www.cuesoft.com/xqlspec.htm

Display and Download:

Options: XML(XSL Stylesheets) documents; HTML documents; fielded flat files; ASCII files, Relational database, SAS file format, etc. Closely associated with access control

Tool: Zope: Open Source Web Application server (integration with MySQL; object oriented databases. http://www.digicool.com

Page 7: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Data Exchange between Repositories:

Z39.50: ANSI/NISO Z39.50-1995 (ISO 23950):

Client/Server computer-to-computer communications protocol that specifies query and retrieval of information: bibliographic data, full-text documents; images, and multimedia in a distributed network environment, across disparate computer systems, databases and search engines.

Current version: 3

http://lcweb.loc.gov/z3950/agency/document.html

Profiles:

Profile for Access to Online Thesauri: http://lcweb.loc.gov/z3950/agency/profiles/zthes-03.html

Profile for Access to Digital Library Collections: http://lcweb.loc.gov/z3950/agency/profiles/collections.html

CIMI Profile for Museum Collections http://lcweb.loc/gov/z3950/agency/profiles/cimi2.html

The Bath Profile:An International Z39.50 Specificationfor Library Applications and Resource Discovery http://www.ukoln.ac.uk/interop-focus/activities/z3950/int_profile/bath/draft/BathProfileRevisedPublicDraft10Jan2000.htm

“Conformance to this profile's specifications will improve international or extranational search and retrieval among library catalogues, union catalogues, and other electronic resource discovery services worldwide.”

.

Page 8: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Data Exchange between Repositories:

Z39.50 Variations:

Z-SQL (SQL query language and generic record export in Z39.50) http://www.dstc.edu.au/Research/Projects/Z+SQL/

ZORBA (CORBA object retrieval in Z39.50)

Ward, Nigel. Michael Lawley & Sonya Finnegan. “ZORBA: Information Retrieval Using Distributed Object Technologies”

http://www.dstc.edu.au/Research/Resource_Discovery/publications/zorba_eogeo98/

Tools:

LeVan, Ralph. “Building a Z39.50 Client” OCLC Online Computer Library Center. (pdf file)

Kunze, John A. “Basic Z39.50 Server Concepts and Creation.” University of California at Berkeley. (pdf file)

List of commercial and shareware systems: http://www.cni.org/pub/NISO/docs/Z39.50-brochure/50.brochure.part09.html

Page 9: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Data Exchange between Repositories:

XMI: Open information interchange model for exchange of models and data over the Internet in a standardized manner.

Tool: XMI Toolkit. Available from IBM Alpha Works. 90-day cost-free testing period. http://www.alphaworks.ibm.com

Common Warehouse Metadata Interchange:

Request for Proposal issued. Submissions due 9-17-1999

OMG Document ad/98-09-02. Available from http://www.omg.org

Objectives:

Establish an industry standard specification for common warehouse metadata interchange

Provide a generic mechanism that can be used to transfer warehouse metadata

Leverage existing vendor-neutral interchange mechanisms

Page 10: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

DATA EXCHANGE BETWEEN REPOSITORIES

BXXP Protocol: Interesting New Development!

Multiplexes several generic application channels carrying XML (or other mime-type data) on a single socket connection.

Provides for segmented data, windowed flow control, user authentication, profile negotiation and secure transport

Block Architectural Precepts - Marshall Rose and Carl Malamud

http://www.ietf.org/internet-drafts/draft-mrose-blocks-architecture-00.txt

Blocks Simple Exchange Profile (M. Rose)

http://www.ietf.org/internet-drafts/draft-mrose-blocks-exchange-00.txt

Blocks eXtensible eXchange Protocol (M. Rose)

http://www.ietf.org/internet-drafts/draft-mrose-blocks-protocol-00.txt

Page 11: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Models:

Standard Syntax: UML - Unified Modeling Language

Tool: Rational Rose

http://www.rational.com/products/rose/index.jtmpl

Metamodel: Provides the Conceptual Schema for a Repository

Uses standard object modeling concepts

• <meta>Classes/Entities

• <meta>Relationships/Associations

• <meta>Attributes

Conceptual Data Model: How data is structured in the real world

Logical Data Model: How data is structured and processed by a computer system

Modeling Tool: http://www.isis.vanderbilt.edu/projects/gme/meta/default.html

Page 12: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Users

• Primary Constraints:

• Domain (e.g. Astrophysicists)

• Organization (e.g. at the University of X)

• Application needs: (e.g. research, teaching)

• Relationships:

•Related domains

• Other information sources within the information universe

•Example: AGRIS and CARIS metadata records (FAO) are in MARC format

• Other user groupsCritical to developing a metadata system is to understand the domain, the users, and how users interact with the domain. This conceptual framework is then mapped into a model for the repository

Page 13: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

MODEL: X3.285- METAMODEL FOR THE MANAGEMENT OF SHAREABLE DATA http://pueblo.lbl.gov/~olken/X3L8/drafts/Metamodel/MetaModel_ToC.html

Data Registry: “A place to keep characteristics of data that are necessary to clearly describe, inventory, analyze and classify data. A data registry supports data sharing with cross-system and cross-organization descriptions of common units of data. A data registry allows users of shared data to have a common understanding of a unit of data’s meaning, representation and identification.”

X3.285 specifies the schema of a registry where descriptions of shareable data are stored. Defines relationships and constraints between components of the Model.

Data element (“indivisible” atomic unit of data)

• Data composite (collection of data elements treated as a unit)

• Property (distinguishes one object from another)

• Object class (set of concepts, abstractions or things in the universe that are bound or classed together)

Representation (Expression of the data element, through permissible values, datatype, and as applicable, a unit of quantity).

Page 14: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Domain Model: National Health Information Knowledgebase (Australia)

From: Australian Institute of Health and Welfare (AIHW):

http://www.aihw.gov.au/services/health/nhik.html

Page 15: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Person characteristic Accommodation characteristic Demographic characteristic Education characteristic Insurance / benefit characteristic Labour characteristic Legal characteristic Lifestyle characteristic Parenting characteristic Social characteristic Cultural characteristic Other person characteristic Physical characteristic

Information Model Subtypes

Accommodation characteristic The living arrangements of a PERSON. For example, the type of dwelling, age of dwelling, number of bedrooms, modification of dwelling to account for restricted movement etc. In the National Health Information Model, ACCOMMODATION / HOUSING CHARACTERISTIC may relate to where a PERSON usually resides or it may be of interest at an instance in time - for example while a PERSON is in receipt of care. [Show Linked Data Elements][Show Linked Work Programs]

Page 16: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Metadata: Definition and Rationale

“Data or information which help us perform one or more of the following functions with respect to data and information resources:

Finding

Interpreting/evaluating

Accessing

Analyzing

Managing

Preserving”

Boyko, Ernie. “Statistical Metadata: A User Perspective.” Open Forum on Metadata Registries. January 20, 2000.

Page 17: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Two Metadata Development Trends

Development of a metadata schema to precisely and unambiguously describe an information object, generally in a one-to-one metadata record to information object relationship.

Intrinsic: Incorporated within information object

Extrinsic: Located in a separate metadatabase with fielded link to information object

Objectives for Standardized Metadata Schema:

Provide description and management data at the information resource level.

Provide standardized metadata record formats to facilitate data storage, indexing, querying, retrieval, exchange and display.

Page 18: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Major domain-neutral Standards:

MARC (Machine Readable Data Records) primarily for library-based information resources

http://lcweb.loc.gov/marc

Dublin Core -15 standard elements with approved qualifiers. Primarily for web-based resources

http://purl.org/dc/ Content IntellectualProperty

Instantiation

Title Creator DateSubject Publisher TypeDescription Contributor FormatSource Rights IdentifierLanguageRelationCoverage

Drenth, B.D., et al. 1999. Guide to Best Practice: Dublin Core. Consortium for the Computer Interchange of Museum Information http://www.cimi.org/documents/meta_bestprac_final_ann.html

Page 19: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

RDF - Resource Description Framework. Framing system, providing transparent transport for the metadata schemas defined and utilized within its wrapper. XML schema that is both human and machine interpretable. http://www.w3.org/TR/PR-rdf-syntax/

Key Concepts:

Resource: Any object uniquely identifiable by a URI (uniform resource identifier)

Property-type: Property associated with a resource.

Value: Associated with a property type--may be atomic (a string) or another resource, creating

a new hierarchy)

Page 20: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

RDFProperty types express the relationships of values associated with resources:

“Famous Example”

The Author of “Metadata Overview” is Grace Agnew

Metadata Overview

http://www…….edu/meta

“Grace Agnew”

Resource

Property Type

Value

Author

Page 21: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Tools:

CORC: OCLC Cooperative Online Resource Catalog Project. http://purl.oclc.org/corc Information entered in a template is cross-cataloged in MARC, Dublin Core and RDF/Dublin Core. Membership to libraries of any description at no charge through July 1, 2000. Currently available for search and display to non-members. Use for MARC, Dublin Core and DC/RDF examples

DC.dot Generates records in Dublin Core and RDF/Dublin Core:

http://www.ukoln.ac.uk/metadata/dcdot/

Page 22: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Tools (cont’d):

Reggie http://metadata.net

Generates records in Dublin Core and RDF/Dublin Core. Provides a template for establishing a metadata registry.

MetaWeb: Provides software for establishing a gateway to search distributed Dublin Core. http://www.dstc.edu.au/RDU/MetaWeb/broker/search.html

Crosswalks between Formats:

http://www.ukoln.ac.uk/metadata/interoperability/

Page 23: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Data Element Registration

ISO/IEC 11179 - Specification and Standardization of Data Elements

Establishes concise, unambiguous definitions and context for atomic data elements, as well as the structure and format for the values that represent the data element, for sharing data, primarily in large datasets or technical reports.

Page 24: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

ISO 11179

Six Parts:

11179-1 Framework for the Specification and Standardization of Data Elements

11179-2 Classification for Data Elements

11179-3 Basic Attributes of Data Elements

11179-4 Rules and Guidelines for the Formulation of Data Definitions

11179-5 Naming and Identification Principles for Data Elements

11179-6 Registration of Data Elements

Page 25: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Metadata Registry - ISO/IEC 11179Data elements within the described dataset are registered in ISO 11179-compliant registry to:

• standardize representation of the data element to enable shareability and durability (reuse) of data

• establish context and meaning for intelligent retrieval and interpretation of data

Data element is the equivalent of an attribute in a data or object model. The representation of a single property of a class of objects in the natural world.

Draft Standards: http://pueblo.lbl.gov/~olken/X3L8/drafts/draft.docs.html

Page 26: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Class

class attributes

Employee

Name

Identification Number

Address

Data elements

Employee Name

Employee ID No.

Employee Address

From Framework for the Specification and Standardization of Data Elements (draft) p. B-3

Formal Definition: “A unit of data for which the definition, identification, representation and permissible values are specified by means of a set of attributes.”

Page 27: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Principles in the Application of 11179

• Each Data Element receives a unique, unintelligent number to create a reusable, international data element

• Data elements are derived from understanding the domain data content and breaking it into meaningful atomic elements.

•Metadata registries consist of the data element and its attributes, which provide definition, meaning and precision of application

•Metadata registries are populated in two ways:

• Bottom up: Begin with the data element and its attributes

• Top down: Develop a classification hierarchy and populate with data elements

Page 28: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Major Data Element Attributes - My Effort at Interpretation of the Standard

Name: SMPTE_Time_and_Control_Code

Definition: Time and control code for tracking playback of film, audio, and video established by the Society of Motion Picture and Television

Engineers

Permissible value: example or formulation principle

hh:mm:ss;s

Value Domain: Set of permissible values (enumerated or unenumerated

SMPTE 12M-1995

Type name: Determinant

Data Type: Numeric

Format: hh:mm:ss;s where hh = hour (00-24) mm= minute (00-59) ss=second (00-59) and s=scene(0-N)

Maximum character: unknown Minimum character: unknown

Page 29: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Data Element ID: unintelligent number identifying reusable data element. May include version number. Will be

combined with the registration authority number (and version, if not already included) for a composite ID number

54367

Version: Used to identify modification to the data element

1

Context: Designation or description of the application environmentin which the name is applied or from which it originates

General: Time and control code used for tracking and editing audio, video and film media

Registry: Dublin Core Coverage metadata elements for audio, video and film DTDs: DC:Coverage.t.min DC:Coverage.t.max

Page 30: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Example:

<DC:Coverage.t.min DC.Scheme=“SMPTE”>19:31:57;1</DC:Coverage.t.min>

<DC:Coverage.t.max DC Scheme=“SMPTE”>19:32:07;7</DC:Coverage.t.max>

Data Element Concept: A concept represented by a data element, independent of any particular representation. Shared perception between two or more parties

Audiovisual media time and control code

Conceptual Domain: A set of possible valid values of a data element concept expressed without representation

time and control code for film, audio and video expressed in hours, minutes, seconds and subseconds.

Classification Level: Taxonomic location within the context of the registry

Structural Metadata. Audio and Video File Component Identification

Page 31: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Level of Ambiguity: Precision of Data Element Attribution

Generalization

Registration Status: incomplete, recorded, certified and registered.

Recommendation: work through the registry in many iterations. Do not move to “certified” or “registered” until taxonomy of registry is largely populated and data element has proven its durability and functionality through comment, review and use.

Incomplete Because some elements are missing and because I don’t know the max and min numbers of characters for representation!

Administrative Status: Designation of position of the data element in the registration process.

Awaiting InformationCaveat: The Above Example is Intended to illustrate the decision-making process for a first iteration of a data element and not to serve as a model.

Page 32: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Principles in the Application of 11179

Rigorous Registration Process encourages multiple iterations of data elements.

Data element statuses: incomplete, recorded, certified and registered

BENEFITS OF DATA ELEMENT APPROACH:

• unambiguous, shareable data that can be evaluated, analyzed and utilized on its intrinsic merits. Any kind of data can be described, including time series data measurements

•Precise value attributes result in population of data sets with authoritative, highly usable data.

• Versioning allows data analysts to track changes in naming, definition, etc. for accurate time series analyses

DRAWBACKS:

Context and description at the data set or information object level is lacking. Searching at the data element level does not provide sufficient description and meaning for document retrieval. Ex: data element “species” as used in “Registration of Endangered Species” or “Catalog of Species of Northern Michigan”

Page 33: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Registry Name: Biological Class NameDefinition: The systematic name that represents the biological Class.Example: MammaliaIdentifier: 20733Version: 1Administrative Status: InterimRegistration Status: StandardRepresentation Class: Name Unit of Measure: Precision:Submitting Organization: OIRMOrigin Description: Summary Report of Data Standards for Biological Taxonomy (Document) Note Description: A Class is a major subdivision of a phylum or division, usually consisting of several orders.Unresolved Issues:DISA:Create Date: 11/05/98Change Date: 05/26/99Value Domain Information

Definition: All names that represent the portion of a systematic name that is the biological Class.

Type Name: DeterminantDatatype: AlphanumericFormat: A(50)Determinant Type:Minimum Character: 5Maximum Character: 50

From:

United States

Environmental Protection Agency (EPA)

Environmental Data Register

http://www.epa.gov/edr

ISO 11179 Implementation - Environmental Data Registry

Page 34: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

A

Country

Data element ConceptData element Concept

Country Code Domain-identifier: Afghanistan Belgium China .....

Conceptual Domain

Conceptual Domain

1..1 +represented_by

1..1  specifies

ISO 3166 -format: Number-item: 004 056 156......

VV

ISO 3166-Format: Alpha-3-item: AFG BEL CHN......

represents

1..1 represented_with

Conceptual Domain

Data Element Concept

Atomic Object

Country

Data Element

Country Represented with ISO 3166

Conceptual Domains

Country Code Domain Identifier:

Afghanistan

Belgium

China

Value Domain

ISO 3166Format: Alpha-3item: AFG, BELCHN

ISO 3166Format: Numberitem: 004 056 156

ISO 11179 Metadata Registry: Implementations

From: CBOP Consortium

Hajime Horiuchi [email protected]

http://www.cbop.gr.jp

Page 35: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

ISO 11179 Metadata Registry: ImplementationsTraffic Management Data DictionarySection 3 Data ElementsVersion: 1.4 February 5, 1999 Annex 3 - Traffic ModelingDescriptive Name: PREDICTED_HovLaneVehicleCount_quantityDescriptive Name Context: Manage TrafficDefinition: Predicted number of vehicles within a user-specified time period that legitimately are using High Occupancy Vehicle (HOV) lanes in the road and highway network.Class Name: Traffic ModelingClassification Scheme Name: IEEE P1489, Annex BClassification Scheme Version: 19980706, V0.1.0Keywords: HOV Lane Vehicle CountRelated Data Concept:Relationship Type:ASN1 Name: Predicted-HOV-lane-vehicle-countASN1 Data Type: IntegerRepresentation Class Term: QuantityValue Domain: SI 10-1997; vehiclesValid Value Range:Valid Value List:Valid Value Rule:Valid Value Range: VALUE (0 to 100000)Internal Representation Layout: 9999999999Internal Layout Maximum Size:Internal Layout Minimum Size:Remarks: V1.1 - New data element.Data Concept Identifier: 3550Data Concept Version: V1.5Submitter Organization Name: TMDDLast Change Date: 19990205

Joint Effort:

Institute of Transportation Engineers (ITE): Federal Highway Administration (FHWA) and the American Association of State Highway and Transportation Officials (AASHTO)

http://www.ite.org/tmdd/

Page 36: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

BEGIN_GROUP = MODULE_IDENTIFICATION ;

DEDSL_VERSION = 0.1; MODULE_TITLE = "Global Change Master Directory dictionary" ; MODULE_ADID = Not yet registered ;END_GROUP = MODULE_IDENTIFICATION ;

BEGIN_GROUP = ENTITY_DEFINITION ; NAME = Entry_ID ; MEANING = Unique identifier of the DIF ; SHORT_MEANING = Directory Entry Identifier ; VALUE_SYNTAX = STRING;END_GROUP = ENTITY_DEFINITION ;

BEGIN_GROUP = ENTITY_DEFINITION ; NAME = Entry_Title ; MEANING = Title of the DIF ; SHORT_MEANING = Directory Entry Title ; VALUE_SYNTAX = STRING;END_GROUP = ENTITY_DEFINITION ;

BEGIN_GROUP = ENTITY_DEFINITION ; NAME = Data_Set_Citation.Publication_Place ; MEANING = "The name of the city (and state or province and country if needed) where the data set was published or released." ; SHORT_MEANING = "Place where the data set was published or released." ; VALUE_SYNTAX = STRING ;END_GROUP = ENTITY_DEFINITION ;

BEGIN_GROUP = ENTITY_DEFINITION ; NAME = Data_Set_Citation.URL ; MEANING = "The Internet Uniform Resource Locator(s) (URL) of the data set." ; SHORT_MEANING = "URL of the data set." ; VALUE_SYNTAX = STRING ;END_GROUP = ENTITY_DEFINITION ;

ISO 11179 Variations: NASA Entity Dictionary

Specification Language (DESDL)

Source: Lou Reich. NASA/CCSDS

Page 37: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Elaboration in XML

<data element>

<name>Film_Video_Audio_Time_and_Control_Code</name>

<definition>Time and control code for tracking playback of film, audio, and video </definition>

<status>

<registration>Incomplete</registration>

<administrative>Awaiting review</administrative>

</status>

</data element>Based on: J. McCarthy, et al. “Using XML for Environmental Data Sharing”Open Forum on Metadata Registries. 1/20/2000

Page 38: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Metadata Schema Approach

Three Types of Metadata (Digital Library Federation Architecture Committee)

Descriptive: Discovery and Identification of an Object (Dublin Core, MARC, EAD, etc.)

Structural: Used to Display and Navigate an Object. Provide information on internal organization of an object

Administrative: Management information. Date created, modified, etc. Content file format (e.g. JPEG); rights information, etc.

Page 39: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

OAIS Reference Model:

Content Information: The data object and its representation that makes it understandable to the user ( DLF: Structural)

Preservation Description: Provenance, Context, Reference and Fixity (DLF: Structural and

Administrative)

Descriptive Information: (DLF: Descriptive Information)

Page 40: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Descriptive:

Recommendations: In most cases, use standards-based Dublin Core as the base record - for interoperability.

Add fields to serve your domain user group as needed. Document and register any added fields. Create an XML DTD

Distribute metadata creation responsibilities:

Administrative and Structural : Largely provided by content digitizers

Descriptive: Largely provided by domain specialists

Recommendation: Use thesauri for controlled subject terminology.

Tool: Koch, Traugott, comp.Controlled vocabularies, thesauri and classification systems available in the WWW. DC Subject. http://www.lub.lu.se/metadata/subject-help.html

Page 41: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Structural:

Identification.

URN (Uniform Resource Names) - intended to persist

IETF RFC 1737

URL: “de facto” web naming and addressing standard

PURL: Permanent URL involves intermediate resolution by a third party.

Handles: Developed by CNRI. URN proposal emphasizes persistent names. Names maintained by object publisher or author.. The handle server reconciles permanent name with address changes. http://www.handle.net/

See also: Library of Congress. National Digital Library Program. “The Relationship between URNs, Handles and PURLs. http://lcweb2.loc.gov/ammem/award/docs/PURL-handle.html

Page 42: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Administrative Metadata

Issues:

Digital Persistence:

technology emulation (“recreate the technology needed to open and display)

• migration path/backward compatibility (“standards backward compatible 1 or more version to allow migration of

data”)

• interpolation (“technology interpolates to retrieve or enhance obsolete data”)

Page 43: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Managing Data for Digital Persistence:

Maintain information needed to create, retrieve and display each digital object.

include platform, processor; version info for software and OS

digital creation hardware and software

digital editing hardware and software

digital viewing hardware and software

calibration hardware and software

Visual Media:

Color: color space (RGB, CMYK); color look up table; color profile for digital camera or scanner; color chart used for calibration.

Page 44: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Compression:

Images: pixels: pixel array (ex: 2,000 x 3,000 ppi)

bit depth (8-bit, 16-bit, 24-bit, etc.)

Video and Audio:

If at all possible, save master file in uncompressed format:

e.g. IEEE (Institute of Electrical & Electronics Engineers)

CCIR 601 (Broadcast Digital Video)

NTSC -- 720 x 480

PAL -- 720 x 576

10-bit or 8-bit

For MPEG1, 2 and 4 include level of service; frame rate (fps); frame size; bit depth. Consider IPB ratio.

Page 45: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Rights Management

Management Restrictions

Management Conditions

Access Restrictions

Access Conditions

Use Restrictions

Use Conditions

Define rights management in a format that maps to future use of a resolver (e.g. resolve to an address with copyright and use information as opposed to embedded use and access text)

DOI - Digital Object Identifier. Development of the commercial publishing domain. Uses handles technology to resolve access and use (and to support ecommerce applications). http://www.doi.org

Page 46: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Recommendation: Re-Use metadata elements developed by respected Early Adopters:

Making of America II White Paper: http://sunsite.berkeley.edu/moa2/wp-v2.html

MOAII Document Type Definition: http://sunsite.berkeley.edu/MOA2/papers/DTD.html

National Library of Australia: PANDORA (Preserving and Accessing Networked Documentary Resources of Australia) http://www.nla.gov.au/pandora

Library of Congress - Structural Metadata Dictionary for LC Digital Objects http://lcweb.loc.gov:8081/ndlint/repository/attdefs.html

UKOLN (UK Office for Library and Information Networking): http://www.ukoln.ac.uk/metadata/cld/

Page 47: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Rights Metadata:

University of Pittsburgh. School of Information Sciences. “Functional Requirements for Evidence in Recordkeeping” http://www.lis.pitt.edu/~nhprc

Video Metadata:

Hunter, Jane and Liz Armstrong. A Comparison of Schemas for Video Metadata Representation

http://www8.org/w8-papers/3c-hypermedia-video/comparison/comparison.html

Hunter, Jane and Jan Newmarch. “An Indexing, Browsing, Search and Retrieval System for Audiovisual Libraries.” http://link.springer.de/link/service/series/0558/bibs/1696/16960076.htm

Administrative Metadata:

A-Core IETF Draft Standard for Metadata about Descriptive metadata (documenting provenance, etc.) Iannell & Campbell. http://metadata.net/admin/draft-iannella-admin-01.txt

Page 48: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Recommendations:

* Create XML DTD for metadata records by format

* Use Dublin Core with approved qualifiers as the base record

* Document metadata elements in a metadata registry

* Use RDF as the export wrapper (“report format for a relational database”)

AlphaWorks Tools can assist:

DDbE: accepts well-formed XML documents and constructs a DTD

XMI Toolkit: generate DTDs and share Java objects

XML Parser for Java: validating parser

XML Generator: generates instances of valid XML from a DTD

http://www.alphaworks.ibm.com

Page 49: Georgia Institute of Technology Grace Agnew 1/15/2000 SCALABLE DURABLE METADATA: **A Tutorial**

Grace Agnew1/15/2000

Georgia Institute of Technology

Metadata Resources. http://dewey.yonsei.ac.kr/metadata/links.htm

IFLANET. Digital Libraries: Metadata Resources http://www.ifla.org/II/metadata.htm

UK Office of Library Networking. Metadata for Preservation: CEDARS Project Document AIWO1http://www.ukoln.ac.uk/metadata/cedars/AIW01.html

National Library of Australia. PADI: Preserving Access to Digital Information. http://www.nla.gov.au/padi/

National Archives of Australia. Designing and Implementing Recordkeeping Systems. http://www.naa.gov.au/Govserv/techpub/DIRKSman/dirks.html

General References