Developing a Metadata Template for CDC The Efforts of the Metadata and Data Quality Subgroup to Capture Data about Data

Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

  • Upload

  • View

  • Download

Embed Size (px)

Citation preview

Page 1: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

Developing a Metadata Template for CDC

The Efforts of the Metadata and Data Quality Subgroup to Capture Data about Data

Presentation Notes
The following presentation outlines the efforts of the Metadata and Data Quality Subgroup of the Environmental Public Health Tracking Standards and Network Development Workgroup to develop a template for the capture of metadata for datasets utilized on the Environmental Public Health Tracking Network.
Page 2: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

Presented byThe Metadata and Data Quality Subgroup of the Environmental Public Health Tracking Standards and Network Development Workgroup

Page 3: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,


Presentation Notes
Three objectives of presentation: Provide a basic understanding of metadata and why the members of the Metadata and Data Quality Subgroup believe its creation and maintenance are important to the success of the proposed Tracking Network. Provide an outline of the steps that Metadata and Data Quality Subgroup members took to select a metadata standard and the steps they continue to take to make the creation of metadata an easy process. Outline the next steps for the group.
Page 4: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

Metadata are “Data About Data”. They help a person to locate and understand data by describing the content, quality, condition, and other characteristics of the data.

What is Metadata?

Presentation Notes
What is Metadata? Metadata are ‘Data About Data’. They help a person to locate and understand data by describing the content, quality, condition, and other characteristics of the data.
Page 5: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

• What method(s) were used to collect data.

• When were data collected?

• How were data processed?

• When were data last updated?

• Are there any data gaps?

Metadata Reveals…

Presentation Notes
From metadata someone can learn: What method(s) were used to collect data When were data collected? How were data processed? When were data last updated? Are there any data gaps?
Page 6: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

• Protects investment in data

• Helps users to understand data

• Allows for users to discover the existence of data

• Limits liability

• Can reduce staff workload (once created)

Why is Metadata Important?

Presentation Notes
Why is metadata important? Protects investment in data 1. Reduces effect of staff turnover on institution memory loss 2. Sets the stage for data re-use and update 3. Provides documentation of data sources and quality Helps users to understand data 1. Provides consistency in terminology 2. Helps user determine data usefulness 3. Improves data transfer Allows for users to discover the existence of data 1. Information can be provided to data catalogs and clearinghouses 2. Metadata can be searched by search engines Limits Liability: Can prevent data from being used inappropriately or provides protection if data were used inappropriately. Once created can reduce staff workload associated with answering repeated questions about data.
Page 7: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

• Federal Geographic Data Committee (FGDC) Content Standards for Digital Spatial Metadata

• Dublin Core

• ISO 11179

• ISO 19115

• ISO 19139 (currently undergoing review)

Commonly Used Metadata Standards

Presentation Notes
What are Metadata standards? Metadata standards are a common set of terms and definitions that describe data. They are usually created by consortiums of interested parties that represent a common industry and / or discipline. There are several different metadata standards currently in use by health and environmental organizations. Those most commonly used by CDC and EPA are Federal Geographic Data Committee Content Standards for Digital Spatial Metadata, Dublin Core, ISO 11179, and ISO 19115. Another standard, ISO 19139 is currently undergoing review for future adoption. The following is a brief explanation of each.
Page 8: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

• Standard created for documenting geospatial datasets.

• Presidential Executive Order 12906 establishes thatgeospatial datasets created by Federal agencies must include FGDC-compliant metadata.

• Basic elements include:

Dataset Title Purpose Access Constraints Contact Info Citation Time Period of DatasetStatus Spatial Domain KeywordsAttributes Distribution Metadata Reference

Federal Geographic Data Committee (FGDC)

Presentation Notes
The Federal Geographic Data Committee (FGDC) was established on October 19, 1990 to involve Federal, State, local governments, Tribes, and the private sector in developing criteria and standards that would enable sharing and efficient transfer of spatial data between producers and users. Chaired by the Secretary of the Interior, the committee consists of a 19 member interagency committee with representatives from the Office of the President and Cabinet-level and independent agencies. It also works closely with 32 state Geographic Information Councils and 9 non-Federal organizations. Through its work, standards have been developed for establishing clearinghouses, partnerships, and collection and dissemination of geospatial data. The committee has also adopted a formal metadata standard. In 1994, Presidential Executive Order 12906 dictated that all federal agencies would utilize the FGDC Content Standard for the creation of geospatial metadata.
Page 9: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

Dublin Core is a higher level metadata standard. It consists of 16 elements and several element refiners. The elements are:

Coverage Title DateDescription Audience FormatType Contributor IdentifierRelation Creator LanguageSource PublisherSubject Rights

Dublin Core

Presentation Notes
Dublin Core is a higher level metadata standard. It consists of 16 elements and several refiners that are used to further define each element. The 16 elements include: Coverage, Description, Type, Relation, Source, Subject, Title, Audience, Contributor, Creator, Publisher, Rights, Date, Format, Identifier, and Language. It was developed for and used primarily by librarians.
Page 10: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

• Specifies a basic set of data element characteristics necessary to share data.

• Metadata about data elements is stored in a data element registry.

• Basic attributes of data elements include:

Name Classification Scheme Data TypeIdentifier Keywords Maximum Size

Version Related Data Reference Minimum SizeContext Type of Relationship Permissible Values

International Organization for Standardization ISO 11179

Presentation Notes
The International Organization for Standardization (ISO) is a world-wide body that works in conjunction with the International Electrotechnical Commission (IEC) to create international standards through technical committees. Other international organizations, governmental and non-governmental, serve as liaisons on the ISO and IEC committees and assist in the creation of these standards. One of the standards created by the ISO is ISO 11179. ISO 11179 specifies a basic set of data element characteristics necessary to share data. It places special emphasis on important data element characteristics such as identifiers, definitions, and classification categories. It also establishes guidelines for the creation and maintaining of a data element registry. Metadata about data elements are stored in this registry. Basic attributes of data elements include: Name, Identifier, Version, Context, Classification Scheme, Keywords, Related Data Reference, Type of Relationship, Data Type, Maximum Size, Minimum Size, and Permissible Values.
Page 11: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

• ISO 19115 incorporates the FGDC standard.

• Allows for the documenting of both geographic and non-geographic data.

• Will be superceded in the United States by ISO 19139

International Organization for StandardizationISO 19115

Presentation Notes
ISO 19115 is an international metadata standard created primarily for geographic datasets. However, it can also document non-geographic data. ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for documenting “who”, “what”, “when”, “where”, “why”, and “how”. While the FGDC mandatory fields and ISO Core metadata are similar, ISO 19115 encourages the recording of more detail. This standard is designated to be superceded in the United States by 19139 once it is fully adopted.
Page 12: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

• Based on ISO 19115

• Extensible Markup Language (XML) model

• Currently undergoing review / revision

• Technical specification designation Winter 2004-2005

International Organization for StandardizationISO 19139

Presentation Notes
ISO 19139 is the Technical Specification that defines the implementation of ISO 19115 metadata elements in Extensible Markup Language (XML). It includes both a Unified Modeling Language (UML) Schema and an XML Schema that will allow for metadata to be validated in XML format. Once adopted, ISO 19139 is proposed to supercede the FGDC standard in the United States.
Page 13: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

Metadata and Data Quality Subgroup formed as part of Standards and Network Development Workgroup to:

• Develop a metadata template using a controlled vocabulary of EPHT Network datasets that will identify a core set of information that is needed to adequately document a dataset and its limitations for potential users.

• Develop a means to describe data using commonways to document datasets to facilitate data searches.

Importance of Metadata to the EPHT Network

Presentation Notes
From the beginning, metadata has been recognized as an important component of the Environmental Public Health Tracking Network. In late 2002, the Standards and Network Development Workgroup was formed. One of the initial subgroups created by Workgroup members was the Metadata and Data Quality Subgroup. It was tasked with: Developing a metadata template using a controlled vocabulary of EPHT Network datasets (hazards, exposures, health, geospatial) that will identify a core set of information that is needed to adequately document a dataset and its limitations for potential users. Developing a means to describe data using common ways to document datasets to facilitate data searches.
Page 14: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

Actions Taken by Subgroup

• Talked with individuals involved with PublicHealth Information Network (PHIN) and the EPA Exchange Network

Presentation Notes
The Metadata and Data Quality Subgroup undertook three important actions to begin the process of meeting their tasks. Talked with individuals involved with PHIN and the EPA Exchange Network Subgroup members listened to presentations by Mamie Bell with the CDC’s Information Resource Management Office and from Michael Pendleton and Doug Mann with the EPA System of Registries. From these presentations the group learned that PHIN had not focused on metadata to document electronic datasets. However, CDC had adopted the use of Dublin Core in its Web redesign activities. Meanwhile, the EPA was utilizing Dublin Corp, ISO 11179, and FGDC standards with its Exchange Network.
Page 15: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

Actions Taken by Subgroup

• Talked with individuals involved with PublicHealth Information Network (PHIN) and the EPA Exchange Network

• Reviewed data inventories of the grantees to determine common elements

Presentation Notes
A second action taken by the subgroup was to create a compiled list of elements from the grantee environmental health dataset inventory tools. As part of the grant process, each grantee was required to create an inventory of all health and environmental related datasets in their states or cities. The compiled list of elements was created from these inventories. The subgroup members then reviewed the compiled list to create a minimum set of information for adequate documentation of a dataset for data users.
Page 16: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

• Talked with individuals involved with PublicHealth Information Network (PHIN) and the EPA Exchange Network

• Reviewed data inventories of the grantees to determine common elements

• Evaluated common elements against the currently accepted metadata standards

Actions Taken by Subgroup

Presentation Notes
After a minimum list of elements was agreed upon the subgroup mapped the elements to the appropriate existing metadata standards in Dublin Core, FGDC, and ISO 19115.
Page 17: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

Results of Element Mapping

• Dublin Core too general to meet the requirementsfor describing an electronic dataset

Presentation Notes
The results of the mapping exercise showed that Dublin Core was too general and would not meet the requirements for describing an electronic dataset.
Page 18: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

Results of Element Mapping

• Dublin Core too general to meet the requirementsfor describing an electronic dataset

• FGDC and ISO 19115 addressed most of the identified elements

Presentation Notes
They also showed that FGDC and ISO 19115 addressed most of the identified elements.
Page 19: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

• Dublin Core too general to meet the requirementsfor describing an electronic dataset

• FGDC and ISO 19115 addressed most of the identified elements

• FGDC recommended as the standard for the Network, until superceded by ISO standard.

Results of Element Mapping

Presentation Notes
Since federal agencies were mandated to use FGDC standards for describing geographic datasets, the subgroup recommended that the FGDC metadata standard be used until federal agencies are required to adopt the new ISO standard.
Page 20: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,


Presentation Notes
To test the FGDC Content Standard for documenting Public Health datasets, the Metadata and Data Quality Subgroup formed a five person “Swat Team” to oversee the testing and solicited for participants from grantees.
Page 21: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

Swat Team Mission

• Develop a test for the FGDC Content Standard

• Test the usefulness of the freeware tool Tkme

• Gather feedback from test participants

• Make recommendations to Metadata Subgroup members.

Presentation Notes
The team mission was to: Develop a test for the FGDC Content Standard against Public Health datasets. Test the usefulness of the freeware tool Tkme Gather feedback from test participants Make recommendations to Metadata Subgroup members.
Page 22: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

Test Participants

• Illinois• Maryland• Missouri• New Mexico• New York City• Oklahoma• Oregon

Presentation Notes
Those who participated in the test included: Illinois Maryland Missouri New Mexico New York City Oklahoma Oregon Each tester was asked to return a copy of the completed feedback form and an XML copy of the completed template.
Page 23: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,


• Determine a core set of FGDC elements that will constitute the bare minimum data elements that will be required to put data on the EPHTN. Additional elements could be added if applicable.

• Develop a tool that allows the easy creation of metadata. The tool should be user-friendly and should have built in constraints.

Presentation Notes
The testers agreed upon the following recommendations: Determine a core set of FGDC elements that will constitute the bare minimum data elements required for placing data on the Network. Additional elements could be added if applicable. The testers were concerned that the time necessary to complete the documentation requirements of the full FGDC template would inhibit the creation of metadata. Develop a tool that allows for easy creation of metadata. The tool should be user-friendly and should have constraints built into it (e.g. date, state abbreviations, etc.). The Tkme tool is adequate for testing purposes but is difficult to use for a large number of datasets or if it is used by individuals who are not intimately familiar with the datasets.
Page 24: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

Core Metadata Template

Presentation Notes
Based on these recommendations Metadata and Data Quality subgroup members reviewed the FGDC Standard and developed the following template. This template contains the minimum elements as defined by FGDC and several additional optional elements that the group believed essential for documenting Public Health datasets. This template constitutes the bare minimum that would be required for placing data on the Network. The full FGDC template is still available for those who have geospatial data or want to undertake more in-depth documentation.
Page 25: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,
Presentation Notes
This is the core template. The areas in gray are descriptive headings. The Data entry elements are in white.
Page 26: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,
Page 27: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,
Page 28: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

Metadata Tool Requirements Gathering

Presentation Notes
A second recommendation was the development of a tool for the easy creation of metadata. The Testers found the freeware Tkme less than adequate.The Tkme tool had initially been chosen after extensive research by MDQ members showed that there were very few tools available for the creation of metadata. Testers voiced a need for a tool that contained built in constraints for dates, abbreviations, etc. and would alert metadata creators when they had improperly completed elements.
Page 29: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

Team Mission

• Create a prototype tool to spark discussion• Gather tool requirements • Package those requirements for use by a


Presentation Notes
To meet this recommendation, the Swat Team was reactivated. The team had three defined objectives: Create a prototype tool to spark further discussion Gather tool requirements Packaging those requirements for use by a developer. The team has completed a first round of requirements gathering amongst the testers and Metadata and Data Quality Subgroup members. These have been compiled according to Rational Unified Process (RUP) standards. Another session is planned for the end of May to solicit input from a wider audience. Final requirements will be presented to CDC for use in creating or adapting a tool for the Network.
Page 30: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

• Complete compiling the metadata creation tool requirements

• Begin compiling requirements for the creation of a metadata registry

• Continue to promote the creation and use of metadata

Next Steps…

Presentation Notes
What are the next steps for the Metadata and Data Quality Subgroup? Complete compiling the metadata creation tool requirements Begin compiling requirements for the creation of a metadata registry. This effort will require working closely with the Network Architecture Subgroup. It will also involve working closely with our partners like EPA. Registries have been created previously. There is no reason to reinvented the wheel in the creation of one for the Environmental Public Health Tracking Network. We will continue to promote the creation and use of metadata.
Page 31: Developing a Metadata Template for CDC...ISO 19115 incorporates several metadata standards, including FGDC. It contains approximately 50 fields for docu對menting “who”, “what”,

For further information on anything seen in this presentation please contact:

Jeff Patridge, GIS Analyst

Missouri Department of Health and Senior Services

Office of Surveillance

930 Wildwood Drive

Jefferson City, MO 65109-0570

Email: [email protected]

Thanks for Listening

Presentation Notes
Thanks for listening.