20
How Portable Are the Metadata Standards for Scientific Data? Jian Qin Kai Li School of Information Studies Syracuse University Syracuse, NY, USA

How Portable Are the Metadata Standards for Scientific Data?

Embed Size (px)

DESCRIPTION

The one-covers-all approach in current metadata standards for scientific data has serious limitations in keeping up with the ever-growing data. This paper reports the findings from a survey to metadata standards in the scientific data domain and argues for the need for a metadata infrastructure. The survey collected 4400+ unique elements from 16 standards and categorized these elements into 9 categories. Findings from the data included that the highest counts of element occurred in the descriptive category and many of them overlapped with DC elements. This pattern also repeated in the elements co-occurred in different standards. A small number of semantically general elements appeared across the largest numbers of standards while the rest of the element co-occurrences formed a long tail with a wide range of specific semantics. The paper discussed implications of the findings in the context of metadata portability and infrastructure and pointed out that large, complex standards and widely varied naming practices are the major hurdles for building a metadata infrastructure.

Citation preview

Page 1: How Portable Are the Metadata Standards for Scientific Data?

How Portable Are the Metadata Standards for

Scientific Data?

Jian Qin Kai Li School of Information Studies

Syracuse University Syracuse, NY, USA

Page 2: How Portable Are the Metadata Standards for Scientific Data?

Why study metadata portability? Complex, very large metadata standards

•  are “…unwieldy to apply…” •  are “difficult to understand and enact in its entirety…”

•  require customization to tailor to specific needs

•  costly in time and personnel

Page 3: How Portable Are the Metadata Standards for Scientific Data?

Each standard has its own schema and tools…

EPA Metadata Editor

Metavist2 Morpho

Protocol Registration System AVM Tagging Tool

…that lead to duplicated efforts and interoperability problems

Page 4: How Portable Are the Metadata Standards for Scientific Data?

Metadata for scientific data is at the juncture of -

Data-Driven Science

RDF RDFa

ORCID

XML

OWL DOI

Technical Standards Infrastructure

Page 5: How Portable Are the Metadata Standards for Scientific Data?

A few big questions •  What action should and can we take at this

juncture as a community of metadata practices?

•  How much do we know about metadata standards for scientific data?

•  How can we transform the current metadata standards into an infrastructure-driven service?

Page 6: How Portable Are the Metadata Standards for Scientific Data?

•  Portable •  Customizable •  Extendable •  Reusable •  Easy to use

An infrastructure perspective for metadata

Page 7: How Portable Are the Metadata Standards for Scientific Data?

An attempt to define “metadata infrastructure” Semantically:

• metadata elements, vocabularies, entities, and other metadata artifacts as the underlying foundation to build tools, software and applications

Technically: • “a data model for describing the resources,

aspects of metadata encoding and storage formats, metadata for web services, metadata tools, usage, modification, transformation, interoperability, and metadata crosswalk ” (CLARIN)

Page 8: How Portable Are the Metadata Standards for Scientific Data?

Portability is the key Building blocks of

metadata

Metadata for data citation

Metadata for data discovery

Metadata for data archiving

Metadata for data quality

Metadata for data provenance

Metadata for data management

Metadata generation output

Page 9: How Portable Are the Metadata Standards for Scientific Data?

How portable are metadata standards for scientific data? Two measures of metadata portability:

• Co-occurrence of semantic elements: the times of semantically identical elements used in multiple standards

• Degree of modularity: the degree of independence and self-descriptiveness of a sub-structure of concept/entity in metadata standards

Page 10: How Portable Are the Metadata Standards for Scientific Data?

Data

•  5,800 elements from 16 scientific metadata standards

Element Collection

•  4,434 unique elements in terms of semantic

Element De-duplication

•  9 categories based on functionalities of the elements Categorization

Page 11: How Portable Are the Metadata Standards for Scientific Data?

Element distribution by standard

NetCDF Climate and Forecast Metadata Convention (CF) 2427 elements

Ecological Metadata Language

Page 12: How Portable Are the Metadata Standards for Scientific Data?

Element distribution by standard and category

Page 13: How Portable Are the Metadata Standards for Scientific Data?

Frequency of occurrences by category

Page 14: How Portable Are the Metadata Standards for Scientific Data?

Top occurring elements

Element

Page 15: How Portable Are the Metadata Standards for Scientific Data?

Element co-occurrences across metadata standards

Of 4,434 unique elements, 539 (12.16%) elements occurred in more than one standard

Frequency of co-occurrences

Num

ber o

f ele

men

ts

Page 16: How Portable Are the Metadata Standards for Scientific Data?

Elements that most frequently co-occurred

Page 17: How Portable Are the Metadata Standards for Scientific Data?

Modularity Two levels of modularity:

• Level 1: having multiple XML schema files for the whole standard;

• Level 2: having separate schemas for entities such as person/organization, dataset, study, instrument, and subject

Of the 6 standards with schema files, all of them belong to Level 1 modularity.

Page 18: How Portable Are the Metadata Standards for Scientific Data?

Discussion Portable metadata standards

• Possible? • Feasible? • Advantages over the one-covers-all approach?

A metadata infrastructure for scientific data • Bridge the gap between existing semantic and entity resources and metadata generation

• Much to be researched…

Page 19: How Portable Are the Metadata Standards for Scientific Data?

Further research More questions than answers from this study:

• What should a metadata infrastructure constitute?

• How can the gaps be filled or narrowed between the infrastructure resources and metadata applications?

• Is it possible or is there a need to streamline the metadata scheme design practice toward a metadata infrastructure?

• …and the list can go on

Page 20: How Portable Are the Metadata Standards for Scientific Data?

Questions and comments?