22
Introduction to Meta Data Mapping or Crosswalk

Metadata mapping

Embed Size (px)

Citation preview

Page 1: Metadata mapping

Introduction to Meta Data Mapping or

Crosswalk

Page 2: Metadata mapping

Crosswalk shows people where to put the data from one scheme into a different scheme. They are often used by libraries, archives, museums, and other cultural institutions to translate data to or from MARC, Dublin Core, TEI, and other metadata schemes.

Page 3: Metadata mapping

Crosswalks can apply to content standards, vocabularies, or both. An automated crosswalk process may take an instance of a metadata description that is presented in a particular format and change the format and element names and the values within those elements (i.e., the vocabulary) to meet the requirements of the second standard.

Page 4: Metadata mapping

Crosswalking is generally done when datasets using different metadata standards or vocabularies need to be integrated. For example, consider a website providing a searchable metadata directory. If the different datasets composing the directory were described using different standards and vocabularies, it would be difficult for a user to search across them effectively.

Page 5: Metadata mapping

If someone was interested in wave height data, she might need to know to search for “wave ht (m)” in one dataset and “wave amplitude” in another. A crosswalk that defined these two elements as synonymous would allow a website to be constructed that allowed the user to search on either term, and retrieve applicable results from both datasets.

Page 6: Metadata mapping

Due to the complexity of metadata content standards, there are few automated processes to crosswalk between content standards. Even in those cases where automated crosswalks exist, inevitably some information is lost when crosswalks are made. This is due to the complexity of the standards and potentially non-overlapping subject areas. When there are subject areas that do not overlap, even manual translation between standards does not result in complete information transfer.

Page 7: Metadata mapping

For example, say an archive has a MARC record in their catalog describing a manuscript. If the archive makes a digital copy of that manuscript and wants to display it on the web along with the information from the catalog, it will have to translate the data from the MARC catalog record into a different format such as MODS that is viewable in a webpage.

Page 8: Metadata mapping

Because MARC has different fields than MODS, decisions must be made about where to put the data into MODS. This type of "translating" from one format to another is often called "field mapping," and is related to "data mapping," and "semantic mapping."

Page 9: Metadata mapping

Crosswalks also have several technical capabilities. They help databases using different metadata schemes to share information. They help metadata harvesters create union catalogs. They enable search engines to search multiple databases simultaneously with a single query.

Page 10: Metadata mapping

Crosswalk tables are often employed within or in parallel to enterprise systems, especially when multiple systems are interfaced or when the system includes legacy system data. In the context of Interfaces, they function as a sort of internal ETL mechanism.

Page 11: Metadata mapping

MARC field Dublin Core element

260$c (Date of publication, distribution, etc.) → Date.Created

522 (Geographic Coverage Note) → Coverage.Spatial

300$a (Physical Description) → Format.Extent

For example, this is a metadata crosswalk from MARC to Dublin Core

Page 12: Metadata mapping

One of the biggest challenges for crosswalks is that no two metadata schemes are 100% equivalent. One scheme may have a field that doesn't exist in another scheme, or it may have a field that is split into two different fields in another scheme; this is why you often lose data when mapping from a complex scheme to a simpler one. For example, when mapping from MARC to Simple Dublin Core, you lose the distinction between types of titles:

MARC field Dublin Core element

210 Abbreviated Title → Title

222 Key Title → Title

240 Uniform Title → Title

242 Translated Title → Title

245 Title Statement → Title

246 Variant Title → Title

Page 13: Metadata mapping

Simple Dublin Core only has one single "Title" element so all of the different types of MARC titles get lumped together without any further distinctions. This is called "many-to-one" mapping. This is also why, once you've translated these titles into Simple Dublin Core you can't translate them back into MARC. Once they're Simple Dublin Core you've lost the MARC information about what types of titles they are so when you map from Simple Dublin Core back to MARC, all the data in the "Title" element maps to the basic MARC 245 Title Statement field.

Dublin Core element MARC fieldTitle → 245 Title StatementTitle → 245 Title StatementTitle → 245 Title StatementTitle → 245 Title StatementTitle → 245 Title StatementTitle → 245 Title Statement

Page 14: Metadata mapping

This is why crosswalks are said to be "lateral" (one-way) mappings from one scheme to another. Separate crosswalks would be required to map from scheme A to scheme B and from scheme B to scheme A

Page 15: Metadata mapping

The Crosswalk Process The process of mapping between content standards

or vocabularies is usually divided into the following steps

Page 16: Metadata mapping

1. Harmonization of Metadata Standards Metadata standards are often described in terms of 

element names and definitions. A standard defines the rules for how the metadata are structured and also the appropriate content for the various elements.

However, different standards can be stated in different ways. In other words, a particular standard (the source standard) doesn’t have to use the same element labels (names) for similar content, or allow the same terms to be filled in to each element as another standard (the target standard).

Page 17: Metadata mapping

In the harmonization process, the source and target metadata standards are resolved with the same syntax or model. In the simplest case, this is done by creating a table of fields from each standard in a common application (e.g., a spreadsheet). The table rows would likely contain elements from the source standard that are in some way related to elements of the target standard. In the simplest case, there would be one-to-one relationshipsbetween source elements and target elements.

In more complex harmonization cases, there are one-to-many or many-to-one relationships. Also, intra-relationships between the elements within a single standard must be thoroughly described as part of the harmonization process. Of course, this implies the elements must be thoroughly described in the source and target standard.

Page 18: Metadata mapping

2.Semantic Mappings The term semantic mapping as applied to metadata is a

visual or tabular strategy for establishing the relationships of vocabulary termsbetween data sets.

Basic Relationships When creating mappings among vocabulary terms, the

mapping organization requires a good set of basic relationships. The most common relationship, “is the same as,” is usually too narrow to adequately map all terms.

Page 19: Metadata mapping

3. Rules for Complex Metadata Mappings The introduction and definition of rules is an essential step

for most cases of creating semantic mapping between standards because of complex relationships that often exist.

To deal with complex mappings (when the mapping from source element to target element is more complex than one-to-one) between standards, we require the introduction of rules.

Page 20: Metadata mapping

As an example, consider the case of a source standard having a single element for the address. The target standard may represent the address using multiple elements, such as street address, city, state, zip code, and country. An automated rule could be established to identify certain province or state names, essentially parsing the single element address into its components. Alternatively, a manual rule may also be created, one that specifies that manual intervention is the only method to properly separate the address components.

Page 21: Metadata mapping

Transformation of Metadata Descriptions Transformation is the process of creating a target instance

 of the metadata description from the source instance. The transformation usessemantic mapping and rules to create the target instance.

It is important to note that the result of the transformation is a metadata description. The created description is sometimes referred to as a crosswalk, but this is an inappropriate usage of the word. See the Crosswalk guide for more information about the distinction.

Page 22: Metadata mapping

End ……… ..…/m/,…..