View
217
Download
0
Category
Tags:
Preview:
Citation preview
Procedures to Develop and Register Data
Elements in Support ofData Standardization
September 2000
2
Based on:
ISO/IEC Draft Technical Report 20943,
Information Technology –Procedures for Achieving Metadata
Registry (MDR) Content Consistency – Data Elements
3
Metadata Registry
EPA’s metadata registry is the Environmental Data Registry (EDR):www.epa.gov/edr
The EDR is based onan international standardfor metadata registries.
www.epa.gov/edr
4
International Standard for Metadata Registries
ISO/IEC 11179:
Information Technology -
Data Management and Interchange -
Metadata Registries (MDR)
5
Parts of the Standard
Part 1: Framework for the Specification and Standardization of Data Elements
Part 2: Classification for Data Elements
Part 3: Registry Metamodel (MDR3)
Part 4: Rules and Guidelines for the Formulation of Data Definitions
Part 5: Naming and Identification Principles for Data Elements
Part 6: Registration of Data Elements
6
Data Element Registration
Characteristics of the data element are recorded as metadata attributes
Registration depends on the amount and quality of information available
Data elements might range from: Standard data elements–complete, with good
quality Application data elements–incomplete with
questionable quality
7
Steps to Follow When Registering a Data Element Understanding the data element Content research Definition and permissible values Names and identifiers Administrative and miscellaneous attributes Data element concepts Classification schemes Quality control
1
2
3
4
6
7
8
5
Example of Registration
Registration of a data element for the code used by the
United States Postal Service (USPS) to represent a state or
state equivalent.
8
9
Understanding the Data Element
What kind of data will be stored in this data element?
Is there a definition or description of data values?
What will the data values look like–names,descriptions, numerals to be calculated, character strings, or identifiers?
Are the data values determined by an arithmetic or statistical procedure?
Step 1
The USPS standard format for preparing a domestic mail piece requires that the last line of the address contain city name, state code, and ZIP code.
The data element to be registered must represent the list of data values for state code that are acceptable to the USPS for mail delivery.
Understanding the Data Element - Example
10
11
Content Research
Is this data element described in an existing standard?
Does the data element exist in this registry or a federation of registries, that has the potential for being used?
Step 2
FIPS PUB 5-2, 6-4, 55-3 Contain 2-letter state codes
Include a code for U.S. Minor Outlying Islands – not recognized by the USPS
U.S. does not intend to continue maintaining FIPS codes
National Supercomputer Centers Usage Database Contains only 4 of the 8 outlying territories
Omits all 4 freely associated states
National Standards:
Content Research - Example
12
U.S. Postal Service standards Include codes for all states, outlying territories, and
freely associated states of the United States Do not recognize a code for U.S. Minor Outlying
Islands, which must be identified on mail pieces by name
Include codes for military “States”
National Standards :
ISO 3166-Part 2, Country subdivision code Identifies U.S. outlying territories and freely associated
states as Countries in Part 1
International Standards:
(continued)
Content Research - Example
13
(continued)
State USPS Code
Mailing Address State Code
Geographic Address State Code
Existing Data Elements in the EDR:
The code for U.S. Minor Outlying Islands–not acceptable for mail delivery
Codes for the 12 Canadian provinces
All of the Above Include:
Content Research - Example
14
The preferred data source for a standard data element for state code for mail delivery within the U.S. for states and state equivalents is the USPS standard, available at:
Decision - Preferred Data Source
www.USPS.gov/ncsc/lookups/usps_abbreviations.htm
15
16
Definition and Permissible Values
A definition must capture the essential semantic content of a data element.
Definitions are recorded in context (where did the definition originate or how is it applied?).
Permissible values are the domains of acceptable values for the data element:Enumerated by a specific list of values?Defined by a description, procedure, or range?
Step 3
17
Permissible Values –Value Domain
How are values represented (e.g., name, code, text, date)?
When did each value become valid/invalid? What are the name and definition/description of the
value domain? How many characters are required in the database to
store the value? Is the data value recorded as a character string,
numerals, integer, or other? Are the data values formatted?
Step 3
The code that represents a United
States state or state equivalent in a
mailing address.
Context: USPS Standard
Definition - Example
18
Representation: Code Value Domain Name: The state codes for states and state
equivalents of the United States Definition: All codes recognized by the U.S. Postal Service on a
mail piece for identification of a state or state equivalent of the United States
Field length: 2 Datatype: alphabetic Format: character string List of values: 62 values representing the 50 states, the District of
Columbia, the 8 outlying territoriesand freely associated states, and the 3 codesfor military states
Permissible Values - Example
19
20
Names and Identifiers
A name is a term or phrase that describes the data element–something to call it.
Names are recorded in context (where did the name originate or how is it applied?).
Identifiers are unique. They identify the Registration Authority, the organization, the data
element, and the version of the data element if information about the data element changes.
Step 4
Name: State or State Equivalent Code Context: USPS Standard Identifier:
Registration Authority: EPA Organization: OEI Sub-organization: OIC Data Element ID: 29324 Version: 1
Names and Identifiers - Example
21
22
Administrative and Miscellaneous Attributes
Submitting organization–the organization that has submitted the data element for registration
Stewardship contact–the organization delegated the responsibility for maintaining the data element
Data element comment–provides remarks about usage, procedure, and other explanatory information that is not appropriate to include in the definition
Step 5
23
Administrative and Miscellaneous Attributes
Data element example–an example of a value that is permissible for the data element
Data element origin–source of information about the data element, including document, standard, system, group, form, or message set
Creation/last change date–the system date when a data element was created or updated in the registry
(continued)
Step 5
Submitting organization–Office of Environmental Information
Stewardship contact–Data Standards Branch Data element comment–this data element is used to
identify states and state equivalents for all United States mailing addresses, including military addresses
Data element example–NJ (New Jersey) Data element origin–EPA data standard workgroup Creation/last change date–system date
Administrative & Miscellaneous Attributes - Example
24
25
Data Element Concept
Provides conceptual information May relate data elements that convey the same
concept with different representations Singular–refers to only one concept Must have a name and definition, recorded in
context Specified through a conceptual domain,
i.e., the set of possible valid values for a data element concept, expressed without representation
Step 6
Name: U.S. State or State Equivalent Definition: An identifier for a primary political
subdivision of the United States, including an outlying territory or an associated state
Data elements that might share this data concept include: United States State Name–New Jersey State Common Name–Garden State Facility Location State Abbreviation–NJ
This data element concept uses a subset of the values in the conceptual domain: Primary Geopolitical Subdivisions of Countries
Data Element Concept - Example
26
Name: Primary Geopolitical Subdivisions of Countries
Definition: Identifiers for the primary geopolitical subdivisions of the countries of the world
Value meanings might include: The U.S. state of Alabama The Canadian province of Alberta The Malaysian state of Sabah The U.S. state equivalent of District of Columbia
Conceptual Domain - Example
27
28
Classification Schemes
Usage Data standard Application system Data collection form Keywords Object class
Data elements might be classified according to any of the following types of groups where the data element might be listed:
Step 7
Mailing address group
U.S. Postal Service Address Standard
Form R for Toxic Release Inventory
Keywords: State, Geopolitical
Classification Schemes - Example
29
30
Quality Control
Registration status–records the position in the registration life cycle of the data element, that indicates the stage of quality review for a data element Incomplete–all metadata are not entered Recorded–all metadata are entered Certified–metadata are valid Standard–the preferred data element for
Agency use
Step 8
Quality AssuranceRegistration
Status
All data have been entered: Recorded
Data are certified to be accurate: Certified
After becoming Agency standard: Standard
Quality Control - Example
31
Recommended