Upload
argus
View
54
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Metadata Management and Tools. August 1, 2013 Data Curation Course. Outline. General information about metadata Metadata and the data life cycle DDI – a specification for documenting social, behavioral and economic data Exercises. Defining Metadata. - PowerPoint PPT Presentation
Citation preview
Metadata Management and Tools
August 1, 2013Data Curation Course
Outline
• General information about metadata• Metadata and the data life cycle• DDI – a specification for documenting social,
behavioral and economic data• Exercises
Defining Metadata
• Metadata are commonly described as “data about data”
• Metadata serve as “bridge” between data producer and data user
• Metadata bring data to life, helping user to interpret and understand data
Simple Example
Bad Better…Better…
Best(Rich,
Structured)
Best(Rich,
Structured)
Importance of Metadata
• John MacInnes, Professor of Sociology, The University of Edinburgh, talks about the issues in using secondary data.*
• http://www.youtube.com/watch?v=xlQMVV7VJtA
* Video courtesy of MANTRA Research Data Management Training -- http://datalib.edina.ac.uk/mantra/
Concerns About Creating MetadataConcern Solution
workload required to capture accurate robust metadata
incorporate metadata creation into data development process – distribute the effort
time and resources to create, manage, and maintain metadata
include in grant budget and schedule
readability / usability of metadata use a standardized metadata format
discipline specific information and ontologies
‘profile’ standard to require specific information and use specific values
DataONE Education Module: Metadata. DataONE. Retrieved July 19, 2013
Metadata Types
• Types of metadata, by content: *– Descriptive: Intellectual content and contextual
information relevant to understanding and interpreting data
– Technical: Physical and digital features of a data resource
– Structural: Configuration of a resource, connections and relationships among parts, or among related resources
*Adapted from Jenn Riley, Seeing Standards: A Visualization of the Metadata Universe
Metadata and the Data Life Cycle
• Metadata–driven life cycle: Metadata are created, but also used and reused at every stage of the data life cycle
• Ideally, metadata continue to accumulate to provide a complete record of the evolution of a dataset
Metadata and the Data Life Cycle
Rich metadata = smooth life cycle, high quality data
Structured Metadata
• Enhances the value and usability of metadata• A consistent, predictable metadata structure
enables– More effective searches– Automated management and processing– Resource sharing– Interoperability
• Standardization leads to greater efficiency
Metadata Standards ExamplesDublin Core
Data Documentation Initiative (DDI)
Ecological Markup Language (EML)
Astronomy Visualization
Darwin Core
FGDC Content Standard for Digital Geospatial Metadata (CSDGM)
ISO 19115/19139 Geographic information
Standards
Cartoon courtesy of XKCD.com
What is DDI?
• A metadata standard of and for the community• Two major development lines
– DDI Codebook– DDI Lifecycle
• Metadata for both human and machine consumption• Additional specifications:
– Controlled vocabularies – RDF vocabularies for use with Linked Data
DDI Background and History
• Its development started in the mid-1990s, as a grant-funded effort initiated and organized by ICPSR, with international participation
• First version published in February 2000
Background and History Continued
• The DDI Alliance was formed in 2003 to support and develop the DDI standard
http://www.ddialliance.org/• Ever-growing number of DDI users; large
multinational projects– CESSDA data portal (20 European data archives)– International Household Survey Network – IHSN
(developing countries from Africa, Asia, former Soviet Union, and more recently, Latin America)
DDI Members and Projects Worldwide
DDI Specification
• The first versions of DDI (1.0 through 2.1) were document- and codebook-centric
• Version 3.0 was published in April 2008 to document the data life cycle
RDF Vocabularies for Semantic Web
• DDI-RDF Discovery Vocabularyo For publishing metadata about datasets into the Web of Linked
Datao Based on DDI Codebook and DDI Lifecycle
• XKOSo RDF vocabulary for describing statistical
classifications, which is an extension of the popular SKOS vocabulary
Publication expected in second half of 2013
DDI of the Future
• Robust and persistent data model (for the metadata), with extension possibilities, variety of technical expressions
• Complete data life cycle coverage• Broadened focus for new research domains• Simpler specification that is easier to understand
and use including better documentation
Benefits of DDI Approach
• Rich content (currently over 800 items)• Metadata reuse across the life cycle• Machine-actionability• Data management and curation• Support for longitudinal data and
comparison
Metadata Reuse
DDI Alignment with Other Metadata Standards
• MARC: DDI-C, DDI-L• Dublin Core: DDI-C, DDI-L• SDMX (Statistical Data and Metadata Exchange):DDI-L• ISO 11179 (Metadata Registries): DDI-L• FGDC (Digital Geospatial Metadata): DDI-L• ISO 19115 (Geographic Information Metadata): DDI-L• PREMIS (Preservation Metadata), METS (Metadata
Encoding and Transmission): under consideration
DDI-L or DDI-C?• DDI-L
– Complex data (hierarchical, longitudinal, comparative)
– Metadata-driven survey design (building questionnaires)
– Multiple languages– Detailed geographic information– Metadata reuse across the data life cycle– Reusable resources: question/concept/variable
banks, registries of organizations and individuals, etc.
DDI-L or DDI-C?
• DDI-C– Documentation of simple, survey-type data– Catalog records, involving mainly study-level
descriptions (most new features in DDI-L relate to documenting data at item/variable level)
• Both DDI-C and DDI-L may be used within the same organization
• ICPSR uses DDI-C but has translation to DDI-L for study-level records
DDI-C Structure and ContentsDDI-C main sections:1. Document Description
Self-referencing information about the DDI instance at hand. Usually for internal use, not publicly displayed
2. Study DescriptionGeneral information about the study. Input is usually the introductory part of a codebook, describing the study scope, methodology, topical/temporal coverage, etc. In DDI-C this section also includes data access and availability information
3. File DescriptionDescribes physical characteristics of data file(s) – name, format, structure, dimensions
4. Data DescriptionDetailed description of each variable, including variable groups if applicable. Special subsection for documenting census-type aggregate data
• Other (Study Related) MaterialsReferences, or contains materials used in the production of the study or useful in the analysis of the data
For complete content and Tag Library see http://www.ddialliance.org/Specification/DDI-Codebook/2.1/DTD/Documentation/DDI2-1-tree.html
Study-level DDI Elements at ICPSR• Study ID (Number, DOI)• Title, Alternate Title• Author/Primary Investigator• Bibliographic Citation• Funding Information• Abstract• Keywords/Topic Classification• Series Information• Geographic Coverage• Time Period Covered• Time Method
Date(s) of CollectionMode of CollectionUniverseSamplingUnit of AnalysisResponse RatesWeighting InformationData TypeExtent of ProcessingAccess Conditions/RestrictionsVersion History
Study-level DDI at ICPSR• Leveraged in several ways
o Data discovery -- Forms basis of Solr/Lucene faceted search
o Repurposing -- Record is reused across ICPSR’s topical archive sites
o Interoperating -- Records shared with Data-PASS, ODESI, and CESSDA archives
o Study Overview -- Becomes PDF overview bundled with each download
Example: www.icpsr.umich.edu/icpsrweb/ICPSR/studies/30103
DDI at ICPSR: Study-level Metadata Editor
DDI at ICPSR: Study-level Metadata Editor
Variable-level DDI elements at ICPSR
• Variable name and ID• Variable label• Question text• Descriptive variable text• Category labels and values (responses)• Category statistics (frequencies)• Summary statistics • Variable format• Notes
Variable-level DDI at ICPSR
• Variable-level DDI leveraged in several ways
o Search -- Permits search of variables within a dataset/serieso Search across ICPSR -- Serves as foundation for Social Science
Variables Databaseo Integration with online analysiso Codebook with frequencies -- Enables generation of PDF
documentation• Example:
http://www.icpsr.umich.edu/icpsrweb/ICPSR/ssvd/studies/30103/datasets/1/variables/Q25
Tools for generating DDI metadata• Nesstar Publisher
– DDI-C, study, file, and variable level• Colectica
– DDI-L configuration, study and variable level– Both DDI-C and DDI-L compatible (import and
export)– Exports DDI and PDF, HTML, RTF documentation
(no need to re-convert to presentation formats)• Colectica for Excel
Tools continued
• XCONVERT (SDA Berkeley)– DDI-C, variable level: converts SAS, SPSS, or
Stata syntax into DDI-XML, without frequencies
• StatTransfer (v. 11)– DDI-L, variable level: no frequencies
• MQDS tool– Exports Blaise to DDI-L to create study
documentation
Tools continued
• More DDI tools can be found here:http://www.ddialliance.org/resources/tools
Questions?