1
NPOESS Enhanced Description Tool - “ned” Richard E. Ullman NASA/GSFC/NPP • NOAA/NESDIS/IPO Data / Information Architecture • Algorithm / System Engineering richard .e. ullman@nasa . gov SDR Sensor Data Record TDR Temperature Data Record EDR Environmental Data Record IP Intermediate Product ARP Application Related Product GEO Geolocation NPOESS National Polar-orbiting Operational Environmental Satellite System NPP NPOESS Preparatory Project N A S A N P P S C I E N C E D A T A S E G M E N T I P O • A L G O R I T H M D I V I S I O N Attributes extracted from XML Profile, Inserted into HDF as attributes of dataset field Attribute Name Type Comments Add_Offset Number Data type is the type of the resulting unscaled datum. To un-scale the elements, first multiply the scaled element by the Scale_Factor and then add the Add_Offset. If the dataset is not scaled, the Add_Offset will exist and have a value of zero. DataType String String format is: “%d-bit %s”, where %d is the number of bits and %s is one of: o signed integer o unsigned integer o floating point o <blank> (number of significant bits in bitfield) Description String A descriptive text MeasurementUnits String Consistent with SI naming and Unidata’s “udunits” package NumberOfFillValues Integer If zero, then no FillValue_Name and FillValue_Value attributes are attached. NumberOfLegendEntries Integer If zero, then no LegendEntry_Name and LegendEntry_Value attributes are attached. Legend entries are used for quality fields only. RangeMax Number Maximum expected value of field elements in the product, not just this dataset instance. Data type matches type of dataset. RangeMin Number Minimum expected value of field elements in the product, not just this instance. Data type matches type of dataset. Scale_Factor Number Data type is the type of the resulting unscaled datum. To un-scale the elements, first multiply the scaled element by the Scale_Factor and then add the Add_Offset. If the dataset is not scaled, Scale_Factor will exist and have a value of one. Scaled Boolean True indicates that Scale_Factor and Add_Offset should be applied to the data to recover the un-scaled element values. Note that fill values are in the dataset type and so must be tested before un-scaling. FillValue_Name Set of string FillValue_Value Set of number Data type matches type of dataset. Dimension_GranuleBoundary Set of Boolean True (1) indicates that this is dimension extends when granules are appended. Dimension_Name set of string Name match indicates that this dimension is congruent with the same dimension names in other datasets in this product group. Field_Name String Potentially a place for the CF standard_name LegendEntry_Name Set of string LegendEntry_Value Set of number Data type matches type of dataset. NumberOfDimensions Integer Integer greater than zero. This duplicates the HDF attribute. What is “ned”? A “c” language demonstration exercise An exploration of using the HDF5 API and other tools to navigate the NPOESS product information model. How hard is it to deal with the format challenges? The challenges marked with (see left) are addressed by “ned” Are NPOESS products enough consistent, one to another that a general purpose reader is possible? Can a computer program make detailed associations between the HDF5 file and the XML product profile? Can the product profile be used to drive automated parsing of quality flags? To demonstrate the feasibility of implementation of suggested enhancements in the self-description of NPOESS products. “ned” makes enhancements to the NPOESS format for challenge items marked by ned version 0.7 ANSI C code HDF hdf5-1.6.5 libxml 2.6.16 udunits 1.4 Development Apple Xcode GNU C 4.0 SLOCCount.. ansic=3707 XML=1500 Submitted for NASA technology transfer open source release. Written against NPOESS IDPS build 1.4 sample data and profiles. Numerous discrepancies between profiles and sample data resolved by editing XML profiles (not affecting build 1.4 profile DTD). After editing profiles, ned successfully parses and acts upon 50 of 53 non-RDR sample files. Not complete! Format Strengths Straight HDF5. No need for additional libraries. Consistent HDF5 group structure Organization for each product is the same as all others. Data “payload” is always in a product group within All_Data group. Allows for flexible temporal aggregation • Granules are appended by extending dataset dimension. Format Challenges Geolocation appears in a separate product group and may be in separate HDF5 file. Field metadata, used to interpret data (similar to netCDF CF) are in separate product profile file. Quality flags must be parsed before they can be interpreted. Information needed for un-scaling scaled integers is not obvious. HDF5 indirect reference link API, used to link metadata to the data in NPOESS’ use is complex and not supported by all analysis COTS implementations. Information Model UML Diagram HDF5 for NPOESS Hierarchical Data Format 5 (HDF5) is the format for delivery of processed products from the National Polar-orbiting Operational Environmental Satellite System (NPOESS) and for the NPOESS Preparatory Project (NPP). HDF5 is a general purpose library and file format for storing scientific data. Two primary objects: Dataset, a multidimensional array of data elements Group, a structure for organizing objects Efficient storage and I/O, including parallel I/O. Free, open source software, multiple platforms. Data stored in HDF5 is used in many fields from computational fluid dynamics to film making. Data can be stored in HDF5 in an endless variety of ways, so it is important to standardize how NPOESS product data is organized in HDF5. ‘Enhanced” UML Diagram HDF5 NPOESS XML Product Profile <field> •attributes Current Information Model as delivered by NPOESS IDPS. HDF5 file and associated XML Product Profile HDF5 Enhanced by “ned” HDF5 NPOESS 1 1 <<HDF5 Group>> / (root) •attributes <<HDF5 Group>> Data Products <<HDF5 Group>> <collection_shortname > •attributes <<HDF5 Group>> All_Data <<HDF5 Group>> <collection_shortname>_All <<HDF5 Dataset>> <collection_shortname>_ Gran_n •attributes 1 1 1 1 * 1 * 1 1 1 1 1 1 <<HDF5 Dataset>> <collection_shortname>_ Aggr •attributes dataset of Object References [one per field] dataset of Region References (Hyperslabs) [one per field] <<HDF5 Dataset>> <field> dataset of data 1 {n f } {n f } n g {n f } <<HDF5 Group>> / (root) •attributes <<HDF5 Group>> Data Products <<HDF5 Group>> <collection_shortname > •attributes <<HDF5 Group>> All_Data <<HDF5 Group>> <collection_shortname>_Al l attributes <<HDF5 Dataset>> <collection_shortname>_Gra n_n •attributes 1 1 1 1 * 1 * 1 1 1 1 1 <<HDF5 Dataset>> <collection_shortname>_Ag gr •attributes dataset of Object References [one per field] dataset of Region References (Hyperslabs) [one per field] {n f } {n f } n g <<HDF5 Dataset>> <field> •attributes dataset of data <<HDF5 Dataset>> <field> •attributes dataset of data <<HDF5 Dataset>> <field> •attributes dataset of data <<HDF5 Dataset>> <field> •attributes dataset of data <<HDF5 Dataset>> <QA field> attributes dataset of data Bit-wise flag fields are unpacked, split into separate fields, then compressed 1 Enhancements by “ned” Attribute Aggregation dataset of data <<HDF5 Dataset>> <field> attributes {n f } Extract from XML Profile source ned {n f } NED …. open attribute definition configuration file open the input HDF5 file create an output HDF5 file (make the "All_Data" and "Data_Product" paths) for each product in the "Data_Products" path of the input HDF5 file (H5Giterate) { Find and open the associated XML Product Profile create a product group in the "Data_Products" path of the output HDF5 file for the aggregate reference in the "Data_Products" path of the input HDF5 file { create an "aggregate reference stub" in the output HDF5 file for each attribute attached to the aggregate (H5Aiterate) validate type and range attach attribute to the aggregate reference stub of the output HDF5 file hold aggregate metadata for later } for each granule reference in the "Data_Products" path of the input HDF5 file { create a ”granule reference stub" in the output HDF5 file for each attribute attached to the granule (H5Aiterate) validate type and range attach attribute to the granule reference stub of the output HDF5 file "aggregate" the metadata according to attribute definition file (hold for later) } create a product group in the "All_Data" path of the output HDF5 file attach the aggregate metadata as HDF5 attributes of the product group in the "All_Data" path of output file. attach the aggregated granule metadata as HDF5 attributes of the product group in the "All_Data" path of output file. for each field in the associated product group in the "All_Data" path of the input HDF5 file { Find and open the associated "sub-tree" of the XML profile. read metadata from XML profile and validate type and range ** "aggregate" the field metadata to associate with the data aggregation for each NPOESS "datum" in the field { create a data aggregation field in the output HDF5 file copy the data from the input HDF5 file to the output HDF5 file attach the field metadata to the data aggregation field of the output HDF5 file } } ** for each field in the "All_Data" path of the output HDF5 file create the object reference to the field populate the "aggregate reference stub” in the "Data_Products" path of the output HDF5 file ** for each granule for each field in the "All_Data" path of the output HDF5 file create the region reference to the portion of the field appropriate to this granule populate the "granule reference stub"in the "Data_Products" path of the output HDF5 file } What does (0.7)not do? Copy some of the attributes. Aggregate g-Rings. Aggregate percent attributes. Compute reference regions. Produce a user block. Discover shared dimensions and create appropriate aggregate attributes. Additional thoughts Modifications will be necessary to accommodate build 1.5 product profiles. Should the field attribute names follow the CF conventions? What does ned do? Check the profile against the HDF5 product. Validate attributes in HDF5 product (type and value). For each field, read 16 attributes from product profile and attach them as HDF attributes of the field dataset. Attach scale and offset values as “Scale_Factor” and “Add_Offset” attributes to field dataset. Parse bit-wise fields and create separate compressed datasets for each quality flag. Aggregate granule attributes and attach them as attributes of All_data product group.

NPOESS Enhanced Description Tool - “ned”

Embed Size (px)

DESCRIPTION

NPOESS Enhanced Description Tool - “ned”. Format Challenges. HDF5 for NPOESS. Geolocation appears in a separate product group and may be in separate HDF5 file. Field metadata, used to interpret data (similar to netCDF CF) are in separate product profile file. - PowerPoint PPT Presentation

Citation preview

Page 1: NPOESS Enhanced Description Tool  - “ned”

NPOESS Enhanced Description Tool - “ned”

Richard E. Ullman NASA/GSFC/NPP • NOAA/NESDIS/IPO Data / Information Architecture • Algorithm / System Engineering [email protected]

SDR Sensor Data Record TDR Temperature Data Record EDR Environmental Data Record IP Intermediate Product ARP Application Related Product GEO Geolocation

NPOESS National Polar-orbiting Operational Environmental Satellite System NPP NPOESS Preparatory Project

N A S A • N P P S C I E N C E D A T A S E G M E N T • I P O • A L G O R I T H M D I V I S I O N

Attributes extracted from XML Profile,

Inserted into HDF as attributes of dataset fieldAttribute Name Type Comments

Add_Offset Number

Data type is the type of the resulting unscaled datum. To un-scale the elements, first multiply the scaled element by the Scale_Factor and then add the Add_Offset. If the dataset is not scaled, the Add_Offset will exist and have a value of zero.

DataType String

String format is:

“%d-bit %s”, where %d is the number of bits and %s is one of:

o signed integer

o unsigned integer

o floating point

o <blank> (number of significant bits in bitfield)Description String A descriptive textMeasurementUnits String Consistent with SI naming and Unidata’s “udunits” packageNumberOfFillValues Integer If zero, then no FillValue_Name and FillValue_Value attributes are attached.

NumberOfLegendEntries IntegerIf zero, then no LegendEntry_Name and LegendEntry_Value attributes are attached. Legend entries are used for quality fields only.

RangeMax Number Maximum expected value of field elements in the product, not just this dataset instance. Data type matches type of dataset.

RangeMin Number Minimum expected value of field elements in the product, not just this instance. Data type matches type of dataset.

Scale_Factor Number

Data type is the type of the resulting unscaled datum. To un-scale the elements, first multiply the scaled element by the Scale_Factor and then add the Add_Offset. If the dataset is not scaled, Scale_Factor will exist and have a value of one.

Scaled BooleanTrue indicates that Scale_Factor and Add_Offset should be applied to the data to recover the un-scaled element values. Note that fill values are in the dataset type and so must be tested before un-scaling.

FillValue_Name Set of string

FillValue_Value Set of number

Data type matches type of dataset.

Dimension_GranuleBoundary Set of Boolean

True (1) indicates that this is dimension extends when granules are appended.

Dimension_Name set of stringName match indicates that this dimension is congruent with the same dimension names in other datasets in this product group.

Field_Name String Potentially a place for the CF standard_nameLegendEntry_Name Set of string

LegendEntry_Value Set of number

Data type matches type of dataset.

NumberOfDimensions Integer Integer greater than zero. This duplicates the HDF attribute.

What is “ned”?

• A “c” language demonstration exercise• An exploration of using the HDF5 API and other tools to navigate

the NPOESS product information model.– How hard is it to deal with the format challenges? The

challenges marked with (see left) are addressed by “ned” – Are NPOESS products enough consistent, one to another that

a general purpose reader is possible?– Can a computer program make detailed associations between

the HDF5 file and the XML product profile?– Can the product profile be used to drive automated parsing of

quality flags? • To demonstrate the feasibility of implementation of suggested

enhancements in the self-description of NPOESS products.– “ned” makes enhancements to the NPOESS format for

challenge items marked by

ned version 0.7

• ANSI C code– HDF hdf5-1.6.5– libxml 2.6.16– udunits 1.4

• Development– Apple Xcode– GNU C 4.0

• SLOCCount.. – ansic=3707– XML=1500

• Submitted for NASA technology transfer open source release.

• Written against NPOESS IDPS build 1.4 sample data and profiles.

• Numerous discrepancies between profiles and sample data resolved by editing XML profiles (not affecting build 1.4 profile DTD).

• After editing profiles, ned successfully parses and acts upon 50 of 53 non-RDR sample files.

• Not complete!

Format Strengths

• Straight HDF5. No need for additional libraries.

• Consistent HDF5 group structure Organization for each product is the same as all

others. Data “payload” is always in a product group within

All_Data group.• Allows for flexible temporal aggregation

• Granules are appended by extending dataset dimension.

Format Challenges

• Geolocation appears in a separate product group and may be in separate HDF5 file.

• Field metadata, used to interpret data (similar to netCDF CF) are in separate product profile file.

• Quality flags must be parsed before they can be interpreted.

• Information needed for un-scaling scaled integers is not obvious.

• HDF5 indirect reference link API, used to link metadata to the data in NPOESS’ use is complex and not supported by all analysis COTS implementations.

Information Model UML Diagram

HDF5 for NPOESS

• Hierarchical Data Format 5 (HDF5) is the format for delivery of processed products from the National Polar-orbiting Operational Environmental Satellite System (NPOESS) and for the NPOESS Preparatory Project (NPP).

• HDF5 is a general purpose library and file format for storing scientific data. Two primary objects: • Dataset, a multidimensional array of data elements• Group, a structure for organizing objects

• Efficient storage and I/O, including parallel I/O.• Free, open source software, multiple platforms.• Data stored in HDF5 is used in many fields from

computational fluid dynamics to film making. • Data can be stored in HDF5 in an endless variety of

ways, so it is important to standardize how NPOESS product data is organized in HDF5.

‘Enhanced” UML Diagram

HDF5 NPOESSHDF5 NPOESS

XML Product Profile

<field>

•attributes

Current Information Model as delivered by NPOESS IDPS.HDF5 file and associated XML Product Profile

HDF5Enhanced

by “ned”

HDF5 NPOESS

1

1

<<HDF5 Group>>

/ (root)

•attributes

<<HDF5 Group>>

Data Products

<<HDF5 Group>>

<collection_shortname>

•attributes

<<HDF5 Group>>

All_Data

<<HDF5 Group>>

<collection_shortname>_All

<<HDF5 Dataset>>

<collection_shortname>_

Gran_n

•attributes

1

1

1

1

*

1

*

1

1

1

1

1

1

<<HDF5 Dataset>>

<collection_shortname>_

Aggr

•attributes

dataset of Object References [one per field]

dataset of Region References (Hyperslabs)

[one per field]

<<HDF5 Dataset>>

<field>

dataset of data

1

{nf}

{nf}

ng

{nf}

<<HDF5 Group>>

/ (root)

•attributes

<<HDF5 Group>>

Data Products

<<HDF5 Group>>

<collection_shortname>

•attributes

<<HDF5 Group>>

All_Data

<<HDF5 Group>>

<collection_shortname>_All

•attributes

<<HDF5 Dataset>>

<collection_shortname>_Gran_n

•attributes

1

1

1

1

*

1

*

1

1

1

1

1

<<HDF5 Dataset>>

<collection_shortname>_Aggr

•attributes

dataset of Object References [one per field]

dataset of Region References (Hyperslabs)

[one per field]

{nf}

{nf}

ng

<<HDF5 Dataset>>

<field>

•attributes

dataset of data

<<HDF5 Dataset>>

<field>

•attributes

dataset of data

<<HDF5 Dataset>>

<field>

•attributes

dataset of data

<<HDF5 Dataset>>

<field>

•attributes

dataset of data

<<HDF5 Dataset>>

<QA field>

•attributes

dataset of data

Bit-wise flag fields are unpacked, split into separate fields, then

compressed

1

Enhancements by “ned”

Attribute Aggregation …

Attribute Aggregation …

dataset of data

<<HDF5 Dataset>>

<field>

•attributes

{nf}

Extract from XML Profile source …Extract from XML Profile source …

ned

{nf}

NED …. open attribute definition configuration file open the input HDF5 file create an output HDF5 file (make the "All_Data" and "Data_Product" paths) for each product in the "Data_Products" path of the input HDF5 file (H5Giterate) { Find and open the associated XML Product Profile create a product group in the "Data_Products" path of the output HDF5 file for the aggregate reference in the "Data_Products" path of the input HDF5 file { create an "aggregate reference stub" in the output HDF5 file

for each attribute attached to the aggregate (H5Aiterate) validate type and rangeattach attribute to the aggregate reference stub of the output HDF5 filehold aggregate metadata for later

} for each granule reference in the "Data_Products" path of the input HDF5 file {

create a ”granule reference stub" in the output HDF5 file for each attribute attached to the granule (H5Aiterate)

validate type and rangeattach attribute to the granule reference stub of the output HDF5 file

"aggregate" the metadata according to attribute definition file (hold for later) }

create a product group in the "All_Data" path of the output HDF5 file attach the aggregate metadata as HDF5 attributes of the product group in the "All_Data" path of output file. attach the aggregated granule metadata as HDF5 attributes of the product group in the "All_Data" path of output

file. for each field in the associated product group in the "All_Data" path of the input HDF5 file { Find and open the associated "sub-tree" of the XML profile. read metadata from XML profile and validate type and range ** "aggregate" the field metadata to associate with the data aggregation for each NPOESS "datum" in the field { create a data aggregation field in the output HDF5 file copy the data from the input HDF5 file to the output HDF5 file attach the field metadata to the data aggregation field of the output HDF5 file } } ** for each field in the "All_Data" path of the output HDF5 file create the object reference to the field populate the "aggregate reference stub” in the "Data_Products" path of the output HDF5 file ** for each granule for each field in the "All_Data" path of the output HDF5 file create the region reference to the portion of the field appropriate to this granule populate the "granule reference stub"in the "Data_Products" path of the output HDF5 file }

What does (0.7)not do?• Copy some of the attributes.• Aggregate g-Rings.• Aggregate percent attributes.• Compute reference regions.• Produce a user block.• Discover shared dimensions

and create appropriate aggregate attributes.

Additional thoughts• Modifications will be necessary

to accommodate build 1.5 product profiles.

• Should the field attribute names follow the CF conventions?

What does ned do?• Check the profile against the

HDF5 product.• Validate attributes in HDF5

product (type and value).• For each field, read 16 attributes

from product profile and attach them as HDF attributes of the field dataset.

• Attach scale and offset values as “Scale_Factor” and “Add_Offset” attributes to field dataset.

• Parse bit-wise fields and create separate compressed datasets for each quality flag.

• Aggregate granule attributes and attach them as attributes of All_data product group.