View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Improving long-term preservation of EOS data by independently mapping
HDF4 data objects
The HDF Group
Mapping project team members
The HDF Group• Ruth Aydt• Mike Folk• Joe Lee• Elena Pourmal• Binh-Minh Ribler• Muqun {Kent} Yang
NASA• Ruth Duerr & Luis
Lopez(NSIDC)• Chris Lynnes (GES DISC)
April 6 2011 Annual HDF Briefing to NASA 2
Raytheon• Evelyn Nakamura• many others
Annual HDF Briefing to NASA 3
Recap
• Problem• The complex byte layout of HDF files makes long-
term readability of HDF data dependent on long-term availability of HDF software.
• Solution• Create a map of the layout of data objects in an
HDF file, allowing a simple reader to be written to access the data.
• Implement tools to create layout maps for EOS data products.
• Deploy tools at select EOS data centers.
April 6 2011
HDF4 mapping workflow
HDF4 File HDF4 Map File (XML document)
h4mapwriterlinked with HDF4 library
Readerprogram
Object Data
Object Data Groups, Data Objects, Structural and Application
Metadata; Locations of Object Data
April 6 2011 Annual HDF Briefing to NASA 5
Annual HDF Briefing to NASA 7
PHASE 2
PRODUCTIZE HDF4 MAPPING SCHEMA AND
TOOLS FOR DEPLOYMENT
April 6 2011
Phase 2 tasks
April 6 2011 Annual HDF Briefing to NASA 8
A. Investigate integration of mapping schema with existing standards
B. Determine HDF-EOS 2 requirements
C. Redesign and expand the XML schema
D. Implement production quality map writer
E. Develop demo map reader
F. Deploy tools at select NASA data centers
Annual HDF Briefing to NASA 10
TASK AINVESTIGATE INTEGRATION OF MAPPING SCHEMA WITH
EXISTING STANDARDS
April 6 2011
Annual HDF Briefing to NASA 11
Investigate existing standards
• Investigated:• METS, PREMIS, ESML, NcML, and CSML
• Concluded: • Existing standards have different purposes than mapping
schema• None meet all needs of mapping project
• Develop new schema tailored to project goals• Harmonize with PREMIS• Leverage terminology and approaches from all
• Status: • Need to write report• Need to include some PREMIS-like data such as HDF4 file size
and possibly MD checksum
April 6 2011
Background
• An HDF-EOS2 file is an HDF4 file, so one can create an HDF4 mapping file to archive the HDF-EOS2 file.
• However, for some HDF-EOS2 files, it may be extremely difficult to retrieve correct geo-location information from the mapping files.
• For those files, special HDF-EOS2 mapping files may be needed.
April 6 2011 Annual HDF Briefing to NASA 13
Annual HDF Briefing to NASA 14
Categorize HDF-EOS2 data products
• Created a data pool from NASA data centers• GES DISC, NSIDC, LAADS, LP DAAC• LaRC, PO.DAAC, GHRC, OBPG
• Analyzed data and reported options for adding HDF-EOS2 contents to the mapping file
• Conclusion: No special mapping for HDF-EOS2 needs to be done
• However, the study uncovered some important shortcomings in certain HDF-EOS products
April 6 2011
Annual HDF Briefing to NASA 15
Status and Plans
• Status: Complete• Detailed descriptions of sample data:
• http://hdfeos.org/zoo/Data_Collection/index.php
• Documents and reports at wiki:• http://wiki.hdfgroup.org/MappingPhase2_TaskB
• Plans• We plan to recommend a future task in which these
issues are made known to the project
April 6 2011
Annual HDF Briefing to NASA 17
Design priorities and assumptions
• Mapping files• Provide complete access to user-supplied content
in NASA’s EOS binary HDF4 files• Have enough information to stand on their own• Be as simple as possible
• Mapping schema• Describe the Mapping files• Used for validation and documentation• May not be available to target user
April 6 2011
Annual HDF Briefing to NASA 18
Status and Plans
• Status• All HDF4 objects found in EOS products are now
handled by the Mapping schema.
• Plans• Complete schema elements for HDF4 file
description information• File size, MD checksum (?), HDF4 library version stamp (?)
• Finalize schema documentation • Address any additional HDF4 objects found during
remainder of project, either by updating schema and map writer, or with follow-on proposal if substantial amount of effort required.
April 6 2011
Annual HDF Briefing to NASA 20
Map Writer Requirements
• Retrieve information needed from HDF4 file• Write out corresponding XML file
• Quality requirements• Completeness
• Don’t miss any objects in file• Report on objects or features not handled by the writer
• Accuracy – don’t give wrong information• Readability – provide adequate instructions in the file
April 6 2011
Annual HDF Briefing to NASA 21
Activities
1. Implement functions to facilitate map creation• Develop writer requirements based on new XML
schema and additional deployment needs• Implement new functions as needed• Include functions in library as appropriate
2. Implement writer: h4mapwriter• Interpret map requirements according to schema• Implement writer• Package for deployment• Support deployment
April 6 2011
Annual HDF Briefing to NASA 22
Status and Plans
• Status1. Implement functions to facilitate map creation
• All functions implemented
2. Implement writer• Handles all objects• Available as alpha-2 release• Being tested by GES DISC, NSIDC, Raytheon
• Plans1. Functions to facilitate map creation
• Include in future HDF4 releases
2. Writer• Finish HDF4 file description elements• Complete testing and documentation• Support deployment, fix bugs and add features as needed
April 6 2011
Demo Reader Requirements
• Multiplatform command line tool• Easy to use clear arguments and output• Must validate that objects in the mapping file
are actually in the HDF4 file• Developed in a well-supported high level
language (python)• Well documented • Available as open source
April 6 2011 Annual HDF Briefing to NASA 24
Annual HDF Briefing to NASA 25
Demo reader activities
1. Develop requirements, based on new schema and identification of additional deployment needs.
2. Design reader, based on requirements, and from a review of the prototype design.
3. Implement and document reader.
4. Test reader on EOS file “zoo”
5. Deposit reader, documentation, and tests in open source repository, probably SourceForge.
April 6 2011
Demo Reader Status
• Status• Support provided so far for Vdata, SDS,
Group, and Attribute• Current source code available at
http://sourceforge.net/projects/hdf4mapreader/• Documentation at
http://hdf4mapreader.sourceforge.net/
• Plans• Add raster image (RIS) and palette support
April 6 2011 Annual HDF Briefing to NASA 26
Annual HDF Briefing to NASA 28
Task G: Deploy
• Begin in April 2011, complete in June• The HDF Group
• Provide h4mapwriter map generation tool• Maintain tool during deployment and validation• Assist GES DISC, NSIDC, and Raytheon with
deployment and validation
• Raytheon• Validate HDF4 map software in anticipation of
future deployment
• GES DISC and NSIDC: see next slide
April 6 2011
Annual HDF Briefing to NASA 29
What about GES DISC and NSIDC?
• Activities (formerly):• GES DISC
• Incorporate into the existing archive ingest system• Manage the retrofit into existing metadata files
• NSIDC• Support implementation in NSIDC’s ECS system
• Other ESDCs • Encouraged to join in • But deployment to other centers expected subsequent to the project.
• Ruth Duerr’s observation:• The task for NSIDC is to assist in the ECS implementation at NSIDC,
which won't take place until 2012• Task G only includes the work up to the handoff to ECS• Thus, what NSIDC does needs to extend after the period of
performance of this award is over • How do we resolve that issue?
April 6 2011
Future work
• NSIDC• assist in the ECS deployment at NSIDC
• GES DISC: • ?
• The HDF Group• Monitor deployment activities by Raytheon and others to identify
• Unsupported objects and tags occurring in products • Software defects • Feature requests
• As needed, fix defects, add features, and add support for new objects and tags• Address performance issues• Add h4mapwriter tool and supporting API to regular HDF4 testing regime• Perform other services in support of the software as needed
• All• Perform post mortem and identify lessons learned• Write paper summarizing the project• Investigate HDF5 mapping
April 6 2011 Annual HDF Briefing to NASA 31
Acknowledgements
This work was supported by cooperative agreement number NNX08AO77A from the National
Aeronautics and Space Administration (NASA).
Any opinions, findings, conclusions, or recommendations expressed in this material are
those of the author[s] and do not necessarily reflect the views of the National Aeronautics and Space
Administration.
April 6 2011 Annual HDF Briefing to NASA 33