View
218
Download
2
Category
Tags:
Preview:
Citation preview
Using Dublin Core in Museums
Introduction to the CIMI Guide to Best Practice:
Dublin Core
Dr. Paul MillerUK Office for Library & Information Networking
p.miller@ukoln.ac.uk
Thomas HofmannAustralian Museums On-Line
thomash@amol.org.au
Overview What is metadata? Introducing the Dublin Core Introducing the CIMI testbed project CIMI DC Guidelines - Guide to Best Practice:
Dublin Core Break Practical session Implementation Discussion.
What is Metadata?
Meaningless jargon
or
a fashionable term for what we’ve always done
or
“a means of turning data into information”
and
“data about data”
and
the name of a film director (‘Luc Besson’)
and
the title of a book (‘The Lord of the Flies’).
What is Metadata?
Metadata exists for almost anything people
places
periods
objects
concepts
The trick lies in making descriptions suitably generic to be meaningful to the majority, whilst suitably controlled to aid location.
What is Metadata?
Metadata fulfils three main functions: description of resource content
“What is it?”
description of resource form
“How is it constructed?”
description of issues behind resource use
“Can I afford it?”.
What is Metadata?
Libraries
MARC AACR2
A resource description community is characterised by agreed semantic, structural and syntactic conventions for exchange of descriptive information
Based on a slide by Stu Weibel
What is Metadata?
ScientificDatabases Museums
GeoLibraries
‘InternetCommons’
HomePages Commerce
Whatever...
Based on a slide by Stu Weibel
What is Metadata?
Many structures have evolved at different levels, and to meet different requirements...
What is Metadata?
SemanticInteroperability
StructuralInteroperability
SyntacticInteroperability
“Let’s talk English”Standardisation ofcontent
Standardisation ofform
“Here’s how to make a sentence”
Standardisation ofexpression
“These are the rulesof grammar”
“cat sat on mat drankmilk”
“Cat sat on mat. Drankmilk.”
“The cat sat on the mat.It drank some milk.”
Approaches to Metadata (I)
Search Engines
Easy to build
Cheap
Cover large areas of the Internet.
Pretty stupid, really
Minimal contextualisation of datais ‘Miller’ the person who made this, the person whom it is about, the profession of a differently named individual, or something else entirely?
E.g. Alta Vista, Lycos, MetaCrawler, HotBot, Excite, InfoSeek, LookSmart, UK Max...
Approaches to Metadata (II)
Specialist Resource Description
Extremely detailed
Accurate finding aids
Comprehensive.
Expensive
Domain specific
Only likely to be worthwhile for ‘valuable’ resources.
E.g. MARC, FGDC CSDGM, EAD, SPECTRUM...
Approaches to Metadata (III)
Resource Discovery
Relatively easy to build
Relatively cheap?
Contextualises information
Enable semantic mapping across community
boundaries.
Insufficient to meet specialist requirements within a community?
E.g. Dublin Core...
Challenges
Many flavours of metadata which one do I use?
Managing change new varieties, and evolution of
existing forms
Tension between functionality and simplicity, extensibility and interoperability
Functions, features, and cool stuff Simplicity and interoperability
Opportunities
Introducing the Dublin Core
An attempt to improve resource discovery
on the Web now adopted more broadly
Building an interdisciplinary consensus about a
core element set for resource discovery simple and intuitive
cross–disciplinary
international
flexible.
Introducing the Dublin Core
15 elements of descriptive metadata
All elements optional
All elements repeatable
The whole is extensible offering a starting point for semantically richer descriptions
Interdisciplinary libraries, museums, government, education...
International available in 20 languages, with more on the way.
Introducing the Dublin Core
TitleTitle CreatorCreator SubjectSubject DescriptionDescription PublisherPublisher ContributorContributor DateDate TypeType
FormatFormat IdentifierIdentifier SourceSource LanguageLanguage RelationRelation CoverageCoverage RightsRights
http://purl.org/dc/
Extending DC (semantic refinement)
Creator
First Name
Surname Contact Info
Affiliation
Based on a slide by Stu Weibel
Improve descriptive precision by addingsub–structure (subelements and schemes)
Greater precision = lesser interoperability
Should ‘dumb down’ gracefully
Element qualifier Value qualifier
Extending DC (a modular approach)
Modular extensibility... additional elements to support local needs
complementary packages of metadata
…but only if we get the building blocks right
Description Archival Management
Terms & Conditions
Based on a slide by Stu Weibel
Extending DC?
DC offers a semantic framework
through use of further substructure,
meaning can often be clarified
<Creator> “Paul”Paul Inc. ?Paul xyz ?xyz Paul ?
<Creator> <fore name> “Paul” Paul Inc.Paul xyzxyz Paul.
Extending DC?
DC offers a semantic framework
Use of domain–specific schemes greatly
increases precision
<Coverage> “Washington”Washington State ?Washington DC ?Washington monument ?
<Coverage> <TGN> “Washington” Washington StateWashington DCWashington monument
“North and Central America, United States, Washington”
http://gii.getty.edu/tgn_browser/
Dublin Core in 1999
Formalise the process TAC, PAC, Directorate and Working Groups
Refinement of definitions DC 1.1
Qualification semantics consensus and streamlining
RDF
Common understandings INDECS/DOI
IMS ?
Formalisation
Dublin CoreWeb Sitepurl.org/dc/
Dublin CoreDirectorate
DC Policy Advisory Committee
DC Technical Advisory Committee
Working Groups
Stakeholder Communities
DC-General Dublin Core Mail Server
www.mailbase.ac.uk/lists/dc–general/
Based on a slide by Stu Weibel
Testbed Phase I Goals
Evaluate feasibility of DC for museum community Identifying and resolving operational, technical and intellectual issues Promote international consensus on DC practices in museum community
Milestones Involvement of over 18 participants (Software vendors, Museums, Consultants,
Cultural Heritage Gateways) Over 300,000 record repository (museums, collections, artefacts) using DC
Simple, both created from scratch and exported from legacy systems Guide to Best Practice: Dublin Core
Outcomes DC is easy to use DC simple is a machete, not a scalpel All Elements depend on Resource Type DC can be applied to both physical and electronic resources Further user evaluation necessary
Introducing the CIMI testbed project (I)
Testbed Phase II Goals
Finalisation and publication of “Guide to Best Practice: Dublin Core” Identification of proposed qualified elements (sub–structure) Examination of RDF Initial effort in mapping DC elements to CIMI Access Points User evaluation
Milestones There are four meetings scheduled for 1999. Please see http://www.cimi.org/ for
updates on the testbed phase II
Outcomes The schedule for 1999 for sees the following deadlines:
Guide publication (April)DC recommendation (December)DC to CIMI Access Points mapping (November)RDF examination (July)Choreographed demonstration/ user evaluation (October)Final report and recommendations (December)
For updates please see http://www.cimi.org/
Introducing the CIMI testbed project (II)
Dublin Core and the museum community
Challenges for museums Emphasis on attributes of the physical object (artefact) Need to associate the physical object with persons, places and
events Need to account for collections Need to account for surrogates such as photographs Historical lack of content standards
Assumptions regarding DC DC is useful to describe artefacts and associated information
resources in the museum community DC is simple to use and learn Adequate technical infrastructure exists to support use of resource
discovery
The ‘1:1 Principle’ - What does it mean?
Definition:Only one object, resource or instantiation described with one single record
Conclusion:Makes describing original and surrogates easier
object
surrogate
DC records
Interpretation:
DC record relationships
collection
DC record
story
DC record
artefact
DC record
slide
DC record
institution
DC record
artefact
DC record
Reality Check: Criteria for DC creation
Ask yourself:
Is the record itself (and each element within that record) useful for resource discovery?
Is the value of the element known with certainty? Is it readily available from existing databases or information sources?
If not, leave it out
If not, interoperability degraded and records harder to maintain
Have you selected values from enumerated lists recommended to assist in cross domain searching?
About the Guide to Best Practice: Dublin Core
Basis for the Guide: Based on Dublin Core 1.0 (RFC 2413) Recommendations based on testbed experience, not large scale
production efforts Syntax used in examples and testbed based on XML
Document structure: 15 DC simple elements starting with TYPE to assist in following
the 1:1 rule (original vs. surrogate) Each element:
- Introduced with standard DC Definition (RFC 2413)
- Explained with CIMI Interpretation- Manifested with CIMI Guideline- Illustrated through Examples
Appendices contain sample records for different types of museum describing a variety of resource types
Element: Type
Interpretation: The nature of the resource, including such aspects as originality, aggregation and manifestation.
Guideline: Helps to decide the values of other elements
To aid in searching across collections and across different disciplines among museums, specify TYPE from:
1. List of controlled values maintained by the DC community:text, image, sound, dataset, software, interactive, physical object, event
and the following list of museum-related values:
2. original or surrogate
3. item or collection
4. natural or cultural
list elements in the order as above for consistency reason (Note: element order is irrelevant in Dublin Core)
Element: Format
Interpretation: The properties of the resource that impose the use of tools for access, display, or operation; not the tools themselves. Do not use FORMAT if no tools are required.
Guideline Use to populate element
- with MIME type information for digital resources - with details of technique, material and media for
analogue resources
Don’t use to describe:- limitations to access or restrictions against usage
RIGHTS- dimensions DESCRIPTION
Element: Title
Interpretation: Name(s) given to the resource, regardless of whose they are — so long as they are useful for resource discovery.
GuidelineRepeat TITLE element as required
Untitled works of fine art::
use whatever value you would use on the wall label copy, exhibition catalog, or other promotional material—i.e., if the work is known as “Untitled,” specify this in TITLE
Cultural items and collections:
with no known title or name, use a term or phrase that is sufficiently descriptive to permit a user to judge relevancy. If your existing database does not contain title information, concatenate other descriptive field values as appropriate to “name” the resource
Natural specimens:
use Latin binomial name of the animal, plant, or mineral, should contain the name that is given to the object in hand by the person identified in CREATOR. These two elements plus the DC.DATE thus give a full citation for the specimen, and allow for the possibility that the same specimen can have different names allocated by the same, or different, taxonomists at different times.
Element: Description
Interpretation: A textual, narrative description of the resource, including abstracts for documents or content characterizations in the case of visual resources
Guideline Use this element whenever possible, as it is a rich source of
indexable vocabulary. Emphasize the contextual information and popular associations (people, places, and events) of the resource
If a single “description” field does not exist in your current database, values from other fields or wall label copy, exhibition catalogs, didactic copy, etc. may be concatenated to populate DESCRIPTION
DESCRIPTION is likely a display field with the resource in the search result set, we recommend brevity but not so as to sacrifice richness.
Element: Subject
Interpretation: Keywords about the theme and/or concept of the resource, as well as terms signifying significant associations of people, places, and events or other contextual information.
GuidelineDo not strictly interpret the element name “Subject,” which tends to lock our thinking into formal “subject terms” such as those used in bibliographic metadata. “Keywords” is a more appropriate interpretation of the kind of values that are useful for this element—index terms, or descriptors, rather than specific-to-broad categorizations of intellectual content.
Element: Creator
Interpretation: The person(s) or organization that conceived or initiated the resource. For example, author of written document; artist, photographer, or illustrator of visual resource; or founder of an institution. For natural specimens, CREATOR specifies the determiner; the person who created the name that is present in the TITLE element.
Element: Contributor
Interpretation: A person or organization not specified in a Creator element because their contributions to the resource are less direct or conceptual (for example, editor or translator). Also used for patrons, benefactors, and sponsors. For natural specimens, the collector and preparator are example values.
Element: Publisher
Interpretation: The person(s) or organizations responsible for making the resource available or for presenting it, such as a repository, an archive, or a museum. Also includes major financial supporters and legislative entities without whose support the resource would not be continuously available, such as a municipal historical council or a board of trustees. (Note: benefactors of the actual resources are listed under CONTRIBUTOR.)In addition, list distributors and other important agents of delivery in PUBLISHER.
Element: Date
Interpretation: The date associated with the creation or availability of the resource. This is not necessarily the same as the date in the Coverage element, which refers to the date or period of the resource’s intellectual content. For natural specimens, the value should be the date that the name in TITLE was given by CREATOR.
GuidelineRepeat DATE to express both the circa value and the range it represents according to your organization’s policyRepeat DATE to express both the time period during which the resource was brought into being and the specific date when it was [thought to be] first cataloged or collected
Element: Identifier
Interpretation: A text and/or number string used to effectively identify the resource.
GuidelineUse URLs, or URNs, or DOIs (when implemented) for internet resources. For realia, use widely recognized means of identifying items and collections such as accession numbers, International Standard Book Numbers (ISBN), raisonne catalog numbers, and Kochel numbers
Element: Source
Interpretation: Information about a resource from which the present resource is directly derived.
GuidelineSOURCE is distinguished from a RELATION value of IsBasedOn by degree or strength of the connection. The CIMI testbed group used SOURCE as a “kludge” element pending clarification of the “IsBasedOn” definition by the DC Directorate.
Element: Relation
Interpretation: Used to describe significant points in the hierarchy of surrogacy, including the immediate parent and the original item. Recommended values are CREATOR, TITLE, IDENTIFIER and any and all progenitors/children including (repeating) SOURCE value(s).
Element: Language
Interpretation: The language of the intellectual content of the resource, not the language of the DC record nor necessarily the language of the TITLE value. “Intellectual content” may be represented as text or as vocal sound. CIMI’s interpretation of this element reflects a potential application of “scheme” in DC Qualified.
GuidelineRe-use terms from list of language abbreviations defined in RFC 1766 at ftp://ds.internic.net/rfc/rfc1766.txtIf the language is not included in that reference, spell it out completelyUse repeated elements to express multiple valuesLANGUAGE is not applicable to natural objects or those lacking words
Element: Coverage
Interpretation: Requires no interpretation.
GuidelineRepeat DC.COVERAGE values as appropriate in DC.SUBJECT—e.g., “colonial America” or “ ‘Baroque’ dance” as an intellectual access point or keyword.
Temporal characteristics:Recommended best practice for dates is defined in a profile of ISO 8601 [Date and Time Formats (based on ISO8601), W3C Technical Note http://www.w3.org/TR/NOTE-datetime, which specifies the format YYYY-MM-DD. If the full date is unknown, month and year (YYYY-MM) or just year (YYYY) may be used.Repeat DC.COVERAGE to express both the time period during which the resource was brought into being and the specific date when it was [thought to be] first cataloged or collected.
Spatial characteristics:
Where possible, use Getty’s Thesaurus of Geographic Names at http://www.gii.getty.edu/vocabulary/tgn.html, specifying at a sufficient granularity to unambiguously identify the location. Concatenate place names as one string of values separated by semicolons. Start with broadest term and work down to narrowest.
Do not use latitude and longitude unless your audience is accustomed to associating resources to places in this manner (e.g., maritime items or events).
Element: Rights
Interpretation: A rights management or a usage statement, an identifier that links to a rights management or usage statement, or an identifier that links to a service providing information about rights management for and/or usage of the resource. A statement concerning accessibility, reproduction constraints, copyright holder, and/or inclusion of credit lines. Absence of RIGHTS in a record does not imply that the resource is not protected.
GuidelineUse a pointer to Terms and Conditions or copyright statements for Internet resources.Ensure proper agreement between the RIGHTS value and the resource in hand—do not, for example, link reproduction notices for digital surrogates to analog objects.
Exercises
And now…
…over to you...
http://www.cimi.org/
© CIMI 1999
CIMI LOGOS
Recommended