33
Improving long-term preservation of EOS data by independently mapping HDF4 data objects The HDF Group

Improving long-term preservation of EOS data by independently mapping HDF4 data objects The HDF Group

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Improving long-term preservation of EOS data by independently mapping

HDF4 data objects

The HDF Group

Mapping project team members

The HDF Group• Ruth Aydt• Mike Folk• Joe Lee• Elena Pourmal• Binh-Minh Ribler• Muqun {Kent} Yang

NASA• Ruth Duerr & Luis

Lopez(NSIDC)• Chris Lynnes (GES DISC)

April 6 2011 Annual HDF Briefing to NASA 2

Raytheon• Evelyn Nakamura• many others

Annual HDF Briefing to NASA 3

Recap

• Problem• The complex byte layout of HDF files makes long-

term readability of HDF data dependent on long-term availability of HDF software.

• Solution• Create a map of the layout of data objects in an

HDF file, allowing a simple reader to be written to access the data.

• Implement tools to create layout maps for EOS data products.

• Deploy tools at select EOS data centers.

April 6 2011

April 6 2011 Annual HDF Briefing to NASA 4

HDF4 mapping workflow

HDF4 File HDF4 Map File (XML document)

h4mapwriterlinked with HDF4 library

Readerprogram

Object Data

Object Data Groups, Data Objects, Structural and Application

Metadata; Locations of Object Data

April 6 2011 Annual HDF Briefing to NASA 5

PHASE 1BUILD A PROTOTYPE

(COMPLETED IN 2009)

April 6 2011 Annual HDF Briefing to NASA 6

Annual HDF Briefing to NASA 7

PHASE 2

PRODUCTIZE HDF4 MAPPING SCHEMA AND

TOOLS FOR DEPLOYMENT

April 6 2011

Phase 2 tasks

April 6 2011 Annual HDF Briefing to NASA 8

A. Investigate integration of mapping schema with existing standards

B. Determine HDF-EOS 2 requirements

C. Redesign and expand the XML schema

D. Implement production quality map writer

E. Develop demo map reader

F. Deploy tools at select NASA data centers

April 6 2011 Annual HDF Briefing to NASA 9

Annual HDF Briefing to NASA 10

TASK AINVESTIGATE INTEGRATION OF MAPPING SCHEMA WITH

EXISTING STANDARDS

April 6 2011

Annual HDF Briefing to NASA 11

Investigate existing standards

• Investigated:• METS, PREMIS, ESML, NcML, and CSML

• Concluded: • Existing standards have different purposes than mapping

schema• None meet all needs of mapping project

• Develop new schema tailored to project goals• Harmonize with PREMIS• Leverage terminology and approaches from all

• Status: • Need to write report• Need to include some PREMIS-like data such as HDF4 file size

and possibly MD checksum

April 6 2011

Annual HDF Briefing to NASA 12

TASK BDETERMINE HDF-

EOS2 REQUIREMENTS

April 6 2011

Background

• An HDF-EOS2 file is an HDF4 file, so one can create an HDF4 mapping file to archive the HDF-EOS2 file.

• However, for some HDF-EOS2 files, it may be extremely difficult to retrieve correct geo-location information from the mapping files.

• For those files, special HDF-EOS2 mapping files may be needed.

April 6 2011 Annual HDF Briefing to NASA 13

Annual HDF Briefing to NASA 14

Categorize HDF-EOS2 data products

• Created a data pool from NASA data centers• GES DISC, NSIDC, LAADS, LP DAAC• LaRC, PO.DAAC, GHRC, OBPG

• Analyzed data and reported options for adding HDF-EOS2 contents to the mapping file

• Conclusion: No special mapping for HDF-EOS2 needs to be done

• However, the study uncovered some important shortcomings in certain HDF-EOS products

April 6 2011

Annual HDF Briefing to NASA 15

Status and Plans

• Status: Complete• Detailed descriptions of sample data:

• http://hdfeos.org/zoo/Data_Collection/index.php

• Documents and reports at wiki:• http://wiki.hdfgroup.org/MappingPhase2_TaskB

• Plans• We plan to recommend a future task in which these

issues are made known to the project

April 6 2011

Annual HDF Briefing to NASA 16

TASK CREDESIGN SCHEMA

April 6 2011

Annual HDF Briefing to NASA 17

Design priorities and assumptions

• Mapping files• Provide complete access to user-supplied content

in NASA’s EOS binary HDF4 files• Have enough information to stand on their own• Be as simple as possible

• Mapping schema• Describe the Mapping files• Used for validation and documentation• May not be available to target user

April 6 2011

Annual HDF Briefing to NASA 18

Status and Plans

• Status• All HDF4 objects found in EOS products are now

handled by the Mapping schema.

• Plans• Complete schema elements for HDF4 file

description information• File size, MD checksum (?), HDF4 library version stamp (?)

• Finalize schema documentation • Address any additional HDF4 objects found during

remainder of project, either by updating schema and map writer, or with follow-on proposal if substantial amount of effort required.

April 6 2011

TASK DIMPLEMENT MAP

WRITER

April 6 2011 Annual HDF Briefing to NASA 19

Annual HDF Briefing to NASA 20

Map Writer Requirements

• Retrieve information needed from HDF4 file• Write out corresponding XML file

• Quality requirements• Completeness

• Don’t miss any objects in file• Report on objects or features not handled by the writer

• Accuracy – don’t give wrong information• Readability – provide adequate instructions in the file

April 6 2011

Annual HDF Briefing to NASA 21

Activities

1. Implement functions to facilitate map creation• Develop writer requirements based on new XML

schema and additional deployment needs• Implement new functions as needed• Include functions in library as appropriate

2. Implement writer: h4mapwriter• Interpret map requirements according to schema• Implement writer• Package for deployment• Support deployment

April 6 2011

Annual HDF Briefing to NASA 22

Status and Plans

• Status1. Implement functions to facilitate map creation

• All functions implemented

2. Implement writer• Handles all objects• Available as alpha-2 release• Being tested by GES DISC, NSIDC, Raytheon

• Plans1. Functions to facilitate map creation

• Include in future HDF4 releases

2. Writer• Finish HDF4 file description elements• Complete testing and documentation• Support deployment, fix bugs and add features as needed

April 6 2011

TASK EIMPLEMENT DEMO

READER

April 6 2011 Annual HDF Briefing to NASA 23

Demo Reader Requirements

• Multiplatform command line tool• Easy to use clear arguments and output• Must validate that objects in the mapping file

are actually in the HDF4 file• Developed in a well-supported high level

language (python)• Well documented • Available as open source

April 6 2011 Annual HDF Briefing to NASA 24

Annual HDF Briefing to NASA 25

Demo reader activities

1. Develop requirements, based on new schema and identification of additional deployment needs.

2. Design reader, based on requirements, and from a review of the prototype design.

3. Implement and document reader.

4. Test reader on EOS file “zoo”

5. Deposit reader, documentation, and tests in open source repository, probably SourceForge.

April 6 2011

Demo Reader Status

• Status• Support provided so far for Vdata, SDS,

Group, and Attribute• Current source code available at

http://sourceforge.net/projects/hdf4mapreader/• Documentation at

http://hdf4mapreader.sourceforge.net/

• Plans• Add raster image (RIS) and palette support

April 6 2011 Annual HDF Briefing to NASA 26

TASK GDEPLOY

April 6 2011 Annual HDF Briefing to NASA 27

Annual HDF Briefing to NASA 28

Task G: Deploy

• Begin in April 2011, complete in June• The HDF Group

• Provide h4mapwriter map generation tool• Maintain tool during deployment and validation• Assist GES DISC, NSIDC, and Raytheon with

deployment and validation

• Raytheon• Validate HDF4 map software in anticipation of

future deployment

• GES DISC and NSIDC: see next slide

April 6 2011

Annual HDF Briefing to NASA 29

What about GES DISC and NSIDC?

• Activities (formerly):• GES DISC

• Incorporate into the existing archive ingest system• Manage the retrofit into existing metadata files

• NSIDC• Support implementation in NSIDC’s ECS system

• Other ESDCs • Encouraged to join in • But deployment to other centers expected subsequent to the project.

• Ruth Duerr’s observation:• The task for NSIDC is to assist in the ECS implementation at NSIDC,

which won't take place until 2012• Task G only includes the work up to the handoff to ECS• Thus, what NSIDC does needs to extend after the period of

performance of this award is over  • How do we resolve that issue?

April 6 2011

Annual HDF Briefing to NASA 30

BEYOND JULY 15

April 6 2011

Future work

• NSIDC• assist in the ECS deployment at NSIDC

• GES DISC: • ?

• The HDF Group• Monitor deployment activities by Raytheon and others to identify

• Unsupported objects and tags occurring in products • Software defects • Feature requests

• As needed, fix defects, add features, and add support for new objects and tags• Address performance issues• Add h4mapwriter tool and supporting API to regular HDF4 testing regime• Perform other services in support of the software as needed

• All• Perform post mortem and identify lessons learned• Write paper summarizing the project• Investigate HDF5 mapping

April 6 2011 Annual HDF Briefing to NASA 31

The End

Acknowledgements

This work was supported by cooperative agreement number NNX08AO77A from the National

Aeronautics and Space Administration (NASA).

Any opinions, findings, conclusions, or recommendations expressed in this material are

those of the author[s] and do not necessarily reflect the views of the National Aeronautics and Space

Administration.

April 6 2011 Annual HDF Briefing to NASA 33