25
Functional and Architectural Requirements for Metadata: Supporting Discovery and Management of Scientific Data Jian Qin School of Information Studies Syracuse University USA Alex Ball Digital Curation Centre UKOLN University of Bath UK Jane Greenberg School of Information and Library Science University of North Caroline Chapel Hill, USA International Conference of Dublin Core Metadata Kuching, Malaysia, September 5, 2012

Functional and Architectural Requirements for Metadata: Supporting Discovery and Management of Scientific Data

Embed Size (px)

DESCRIPTION

The tremendous growth in digital data has led to an increase in metadata initiatives for different types of scientific data, as evident in Ball’s survey (2009). Although individual communities have specific needs, there are shared goals that need to be recognized if systems are to effectively support data sharing within and across all domains. This paper considers this need, and explores systems requirements that are essential for metadata supporting the discovery and management of scientific data. The paper begins with an introduction and a review of selected research specific to metadata modeling in the sciences. Next, the paper’s goals are stated, followed by the presentation of valuable systems requirements. The results include a base-model with three chief principles: principle of least effort, infrastructure service, and portability. The principles are intended to support “data user” tasks. Results also include a set of defined user tasks and functions, and applications scenarios.

Citation preview

  • 1. Functional and Architectural Requirements for Metadata:Supporting Discovery and Management of Scientific DataJian QinAlex BallJane GreenbergSchool of Information Digital Curation CentreSchool of Information and Studies UKOLNLibrary Science Syracuse UniversityUniversity of BathUniversity of North Caroline USA UKChapel Hill, USAInternational Conference of Dublin Core MetadataKuching, Malaysia, September 5, 2012

2. Metadata standards for scientific data Ecological Metadata Language (EML)CSDGMAccess to BiologicalCollections Data ABCDDarwin Core 3. Many tools have been developed 4. Many tools have been developed 5. Many tools have been developed 6. Motivation for adoption Standardize the format and terminology ofmetadata Enable fast and effective discovery of datasetsacross different data repositories Enable data sharing, reuse, and preservation Provide information for obtaining datasetsfrom the data owners 7. Hindering factors large numbers of steep learning curveelements difficult to automate many layers in structure metadata generation Unwieldy to apply unnecessary duplicate been created for data entrymanual data entry, high costs in time,rather than forresource, andautomatic generation personnel expertise 8. Same entity data repeated in the same recordSeamless Daily Precipitation for the Conterminous United StatesMetadata:Identification_InformationData_Quality_InformationSpatial_Data_Organization_InformationSpatial_Reference_InformationEntity_and_Attribute_InformationDistribution_InformationMetadata_Reference_Information 9. and they are already in 10. Research questions What functions do metadata standards forscientific data serve? How should metadata standards for scientificdata be modeled to support these functionsby meeting the associated requirements? 11. Functions expected Resource discovery and use, Data interoperability, Automatic and semi-automatic metadatageneration, Linking of publications and underlyingdatasets, Data/metadata quality control, and Data security. 12. Metadata requirements for scientific data 13. Architectural view 14. Identity metadata Person: researcherID, Name repositoriesURI Linked data architecture Institution: ORCID, URI Customizable research Data object: DOI, group/communityHandle, URI member name lists Associated publication:DOI 15. Semantic metadata Large semantic resources available in linked data format, butusually not suitable for representing scientific data becausethey are designed for publications, especially books andjournals (containers) Format is contemporary but the content is far from it Smaller, specialized semanticresources are necessary for automaticsemantic metadata generation 16. Contextual metadata Provenance Provenance data model (W3C, http://www.w3.org/TR/prov-primer/) Provenance data represent the origins of digital objects and describe the entities and activities involved in producing and delivering or otherwise influencing a given object. 17. Geospatial metadata Biological sciencesClimateEcological Darwin MetadataBiologicalCore NetCDF Data Profile Language (DwC)(EML) Climate andForecast (CF)Georeferencing MetadataShoreline elementsConventionsMetadata ProfileFGDC Georeferencing elements AstronomyCSDGM Profiles CSDGM AstronomyVisualization Metadata ISO 19115: 2003StandardGeographic information -Metadata. 18 18. Temporal metadata Mean solar time Different measurement Civil time systems result in GPS time different units and format Terrestrial time Conversion between Atomic timesystems Geologic time 19. The least effort principle The infrastructure service principle The portable principle 20. TABLE 1. Mapping data user tasks with metadata functions and architectural building blocksDataMetadata function Architectural building blockusertasksDiscoverDescriptive metadata Identity and semantic metadataIdentifyDescriptive metadata Identity metadataSelectDescriptive, technical Identity, semantic, scientificmetadata context, geospatial, temporal, miscellany metadataObtainDescriptive metadata Identify metadataVerifyDescriptive metadata Scientific context metadataAnalyzeScientific context, geospatial, and temporal metadataManageDescriptive, Identify, semantic, scientificadministrative,context, geospatial, temporal,structural, andmiscellany metadatatechnical metadataArchive Descriptive, Identify, semantic, scientificadministrative,context, geospatial, temporal,structural, andmiscellany metadatatechnical metadataPublish Descriptive metadata Identity, semantic, scientific context, geospatial, and temporal metadataCiteDescriptive metadata Identify metadata 21. Development potentials An infrastructure of metadata services Entities as linked data Tools for slicing members by research group,community, or institution to customize the entityset Tools for grabbing entity data from existingresources through interoperability protocols 22. Application scenarios Cross-domain discovery and verification Automatically populating entity informationfrom customized slices of entities intometadata records And more 23. Conclusion Scientific data are inherently complex and diverse Functional metadata requirements should betranslated into an effective and efficient architecture Three principles for modeling metadata for scientific data Metadata for scientific data (or other domains atlarge) should adopt an infrastructure service approach Much to be explored, experimented, and evaluated 24. Thank you!Questions?