31
Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

Embed Size (px)

Citation preview

Page 1: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

Procedures to Develop and Register Data

Elements in Support ofData Standardization

September 2000

Page 2: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

2

Based on:

ISO/IEC Draft Technical Report 20943,

Information Technology –Procedures for Achieving Metadata

Registry (MDR) Content Consistency – Data Elements

Page 3: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

3

Metadata Registry

EPA’s metadata registry is the Environmental Data Registry (EDR):www.epa.gov/edr

The EDR is based onan international standardfor metadata registries.

www.epa.gov/edr

Page 4: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

4

International Standard for Metadata Registries

ISO/IEC 11179:

Information Technology -

Data Management and Interchange -

Metadata Registries (MDR)

Page 5: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

5

Parts of the Standard

Part 1: Framework for the Specification and Standardization of Data Elements

Part 2: Classification for Data Elements

Part 3: Registry Metamodel (MDR3)

Part 4: Rules and Guidelines for the Formulation of Data Definitions

Part 5: Naming and Identification Principles for Data Elements

Part 6: Registration of Data Elements

Page 6: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

6

Data Element Registration

Characteristics of the data element are recorded as metadata attributes

Registration depends on the amount and quality of information available

Data elements might range from: Standard data elements–complete, with good

quality Application data elements–incomplete with

questionable quality

Page 7: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

7

Steps to Follow When Registering a Data Element Understanding the data element Content research Definition and permissible values Names and identifiers Administrative and miscellaneous attributes Data element concepts Classification schemes Quality control

1

2

3

4

6

7

8

5

Page 8: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

Example of Registration

Registration of a data element for the code used by the

United States Postal Service (USPS) to represent a state or

state equivalent.

8

Page 9: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

9

Understanding the Data Element

What kind of data will be stored in this data element?

Is there a definition or description of data values?

What will the data values look like–names,descriptions, numerals to be calculated, character strings, or identifiers?

Are the data values determined by an arithmetic or statistical procedure?

Step 1

Page 10: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

The USPS standard format for preparing a domestic mail piece requires that the last line of the address contain city name, state code, and ZIP code.

The data element to be registered must represent the list of data values for state code that are acceptable to the USPS for mail delivery.

Understanding the Data Element - Example

10

Page 11: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

11

Content Research

Is this data element described in an existing standard?

Does the data element exist in this registry or a federation of registries, that has the potential for being used?

Step 2

Page 12: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

FIPS PUB 5-2, 6-4, 55-3 Contain 2-letter state codes

Include a code for U.S. Minor Outlying Islands – not recognized by the USPS

U.S. does not intend to continue maintaining FIPS codes

National Supercomputer Centers Usage Database Contains only 4 of the 8 outlying territories

Omits all 4 freely associated states

National Standards:

Content Research - Example

12

Page 13: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

U.S. Postal Service standards Include codes for all states, outlying territories, and

freely associated states of the United States Do not recognize a code for U.S. Minor Outlying

Islands, which must be identified on mail pieces by name

Include codes for military “States”

National Standards :

ISO 3166-Part 2, Country subdivision code Identifies U.S. outlying territories and freely associated

states as Countries in Part 1

International Standards:

(continued)

Content Research - Example

13

Page 14: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

(continued)

State USPS Code

Mailing Address State Code

Geographic Address State Code

Existing Data Elements in the EDR:

The code for U.S. Minor Outlying Islands–not acceptable for mail delivery

Codes for the 12 Canadian provinces

All of the Above Include:

Content Research - Example

14

Page 15: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

The preferred data source for a standard data element for state code for mail delivery within the U.S. for states and state equivalents is the USPS standard, available at:

Decision - Preferred Data Source

www.USPS.gov/ncsc/lookups/usps_abbreviations.htm

15

Page 16: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

16

Definition and Permissible Values

A definition must capture the essential semantic content of a data element.

Definitions are recorded in context (where did the definition originate or how is it applied?).

Permissible values are the domains of acceptable values for the data element:Enumerated by a specific list of values?Defined by a description, procedure, or range?

Step 3

Page 17: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

17

Permissible Values –Value Domain

How are values represented (e.g., name, code, text, date)?

When did each value become valid/invalid? What are the name and definition/description of the

value domain? How many characters are required in the database to

store the value? Is the data value recorded as a character string,

numerals, integer, or other? Are the data values formatted?

Step 3

Page 18: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

The code that represents a United

States state or state equivalent in a

mailing address.

Context: USPS Standard

Definition - Example

18

Page 19: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

Representation: Code Value Domain Name: The state codes for states and state

equivalents of the United States Definition: All codes recognized by the U.S. Postal Service on a

mail piece for identification of a state or state equivalent of the United States

Field length: 2 Datatype: alphabetic Format: character string List of values: 62 values representing the 50 states, the District of

Columbia, the 8 outlying territoriesand freely associated states, and the 3 codesfor military states

Permissible Values - Example

19

Page 20: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

20

Names and Identifiers

A name is a term or phrase that describes the data element–something to call it.

Names are recorded in context (where did the name originate or how is it applied?).

Identifiers are unique. They identify the Registration Authority, the organization, the data

element, and the version of the data element if information about the data element changes.

Step 4

Page 21: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

Name: State or State Equivalent Code Context: USPS Standard Identifier:

Registration Authority: EPA Organization: OEI Sub-organization: OIC Data Element ID: 29324 Version: 1

Names and Identifiers - Example

21

Page 22: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

22

Administrative and Miscellaneous Attributes

Submitting organization–the organization that has submitted the data element for registration

Stewardship contact–the organization delegated the responsibility for maintaining the data element

Data element comment–provides remarks about usage, procedure, and other explanatory information that is not appropriate to include in the definition

Step 5

Page 23: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

23

Administrative and Miscellaneous Attributes

Data element example–an example of a value that is permissible for the data element

Data element origin–source of information about the data element, including document, standard, system, group, form, or message set

Creation/last change date–the system date when a data element was created or updated in the registry

(continued)

Step 5

Page 24: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

Submitting organization–Office of Environmental Information

Stewardship contact–Data Standards Branch Data element comment–this data element is used to

identify states and state equivalents for all United States mailing addresses, including military addresses

Data element example–NJ (New Jersey) Data element origin–EPA data standard workgroup Creation/last change date–system date

Administrative & Miscellaneous Attributes - Example

24

Page 25: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

25

Data Element Concept

Provides conceptual information May relate data elements that convey the same

concept with different representations Singular–refers to only one concept Must have a name and definition, recorded in

context Specified through a conceptual domain,

i.e., the set of possible valid values for a data element concept, expressed without representation

Step 6

Page 26: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

Name: U.S. State or State Equivalent Definition: An identifier for a primary political

subdivision of the United States, including an outlying territory or an associated state

Data elements that might share this data concept include: United States State Name–New Jersey State Common Name–Garden State Facility Location State Abbreviation–NJ

This data element concept uses a subset of the values in the conceptual domain: Primary Geopolitical Subdivisions of Countries

Data Element Concept - Example

26

Page 27: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

Name: Primary Geopolitical Subdivisions of Countries

Definition: Identifiers for the primary geopolitical subdivisions of the countries of the world

Value meanings might include: The U.S. state of Alabama The Canadian province of Alberta The Malaysian state of Sabah The U.S. state equivalent of District of Columbia

Conceptual Domain - Example

27

Page 28: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

28

Classification Schemes

Usage Data standard Application system Data collection form Keywords Object class

Data elements might be classified according to any of the following types of groups where the data element might be listed:

Step 7

Page 29: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

Mailing address group

U.S. Postal Service Address Standard

Form R for Toxic Release Inventory

Keywords: State, Geopolitical

Classification Schemes - Example

29

Page 30: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

30

Quality Control

Registration status–records the position in the registration life cycle of the data element, that indicates the stage of quality review for a data element Incomplete–all metadata are not entered Recorded–all metadata are entered Certified–metadata are valid Standard–the preferred data element for

Agency use

Step 8

Page 31: Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000

Quality AssuranceRegistration

Status

All data have been entered: Recorded

Data are certified to be accurate: Certified

After becoming Agency standard: Standard

Quality Control - Example

31