Developing a NetCDF-4 Interface to HDF5 Data

Preview:

DESCRIPTION

Developing a NetCDF-4 Interface to HDF5 Data. Russ Rew (PI), UCAR Unidata Mike Folk (Co-PI), NCSA/UIUC Ed Hartnett, UCAR Unidata Quincey Kozial, NCSA/UIUC John Caron, UCAR Unidata Robert E. McGrath, NCSA/UIUC. NASA award AIST-02-0071. Unidata: A Community Endeavor. - PowerPoint PPT Presentation

Citation preview

Developing a

NetCDF-4 Interface to

HDF5

Data

Russ Rew (PI), UCAR UnidataMike Folk (Co-PI), NCSA/UIUCEd Hartnett, UCAR UnidataQuincey Kozial, NCSA/UIUCJohn Caron, UCAR Unidata

Robert E. McGrath, NCSA/UIUC

QuickTime™ and aGraphics decompressor

are needed to see this picture.

NASA award AIST-02-0071

2

Unidata: A Community Endeavor

• Community of educators and researchers at 120 universities, 30 other institutions, international in scope

• Managed by the University Corporation for Atmospheric Research

• Mission: providing data, tools, support, and community leadership for enhanced earth-system education and research

• Atmospheric science community, expanding to oceanography, hydrology, other geosciences

• Unidata Program Center: 25 staff, 15 developers

Source

LDM

Source

Source

LDM LDM

LDMLDM

LDM LDMLDM

LDM

Internet

OpenDAPDatasetHDF5

File

NetCDF 4 library

API

OpenDAP

4.0

protocol

Local file or

HTTP protocol

Client

Application

NcMLDataset XML

NcMLDataset XML

NetCDFV.1 and 2

File

Virtual dataset

3

Overview

• What is netCDF? What is HDF5?

• Why develop a netCDF interface to HDF5?

• What is the current project status?

• What still needs to be done?

• Do we have the necessary resources?

• What are the prospects for success?

4

NetCDF-3 and HDF5

• Standard Data Models for scientific data and data abstractions

• Standard Interfaces between data providers and data users

• Standard Libraries for data access from various languages

• Standard Formats for portable binary data

• Users need not know about the format

Ad hoc standards are useful standards

5

Data Models

netCDF-3 HDF5

Variables DatasetsDimensions DataspacesAttributes Attributes

Coordinates

Element types

Datatypes

Groups

Links

References

Property Lists

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

6

Libraries

netCDF-3 HDF5

one interface levelhigh- and low-level

interfacesserial I/O serial. parallel (MPI) I/OC, C++ C, C++

Fortran-77, -90 Fortran-90Java (pure) Java (native)

Perl

Python Python

Ruby

IDL IDLMatlab Matlab

...

7

Formats

netCDF-3 HDF5

XDR XDR and nativedirect access direct access

efficiently extendible efficiently extendible32-bit file offsets 64-bit file offsets

chunked access

compound structures

nested structures

compression

efficient schema changes

virtual file I/O layer

8

Other Characterisitics

NetCDF-3 HDF5

Availability free free

Development and

maintenanceUCAR Unidata NCSA HDF Group

Primary funding

NSF NASA, DOE ASCI

Advantagespopular, simple, lots

of tools, multiple implementations

powerful, high-performance, storage

efficiency, extensibility

Primary usesclimate, forecast,

ocean models, data archives

satellite data, computational fluid dynamics, parallel

computing

9

Goals of NetCDF/HDF Combination

• Create netCDF-4, combining desirable characteristics of netCDF-3 and HDF5, while taking advantage of their separate strengths• Widespread use and simplicity of netCDF-3

• Generality and performance of HDF5

• Make netCDF more suitable for high-performance computing

• Provide simple high-level interface for HDF5

• Demonstrate benefits of combination in advanced Earth science modeling efforts

10

NetCDF-4 Features Enabled by HDF5

• Large file support

• Parallel I/O

• Multiple dynamic dimensions

• Packed data, compression

• New data types

• Dynamic schema modifications

• Other possibilities: groups, user-defined types, better coordinate support, …

11

Approach

• Implement netCDF-3 over HDF5, to demonstrate backward compatibility with• Programming interface• Format

• Design netCDF-4 interface• Implement netCDF-4 over HDF5 to add

enhancements made possible with HDF5• Foster continued collaboration between

Unidata and NCSA in design, development, testing, and support

12

NetCDF-4 Architecture

•Access to netCDF-3, netCDF-4, and HDF5 data created through netCDF-4 interface

HDF5 Library

netCDF-4 Library

netCDF-3Interface

13

User View of NetCDF-4

• NetCDF-4 library accesses either the netCDF-3 or HDF5 library to read or write data

netCDF-4 Prototype

HDF5 Library netCDF-3 Library

HDF5 file netCDF file

netCDF is fun!HDF5 is fun!

NetCDF-4user canwritenetCDF orHDF5 files

14

Current Technical Status

Implement netCDF-3 over HDF5, to demonstrate backward compatibility with API

and formatdone

Determine needed HDF5 enhancements donePrepare netCDF-3 for incorporation with

netCDF-4 nearly done

Design netCDF-4 interface to add enhancements made possible with HDF5

in progress

Implement needed HDF5 enhancementsin

progress

Implement netCDF-4 over enhanced HDF5not

started yet

15

NetCDF-3 Interface Using HDF5

• 13,000 lines of C code

• Passes all netCDF-3 tests

• Demonstrates HDF5 practical for netCDF-4

• Identifies HDF5 enhancements needed

• Shows read/write times and file sizes satisfactory

• Validates approach to backward compatibility• API compatibility: only recompilation and

relinking needed for existing netCDF-4 programs

• Format compatibility: accesses all current netCDF files as well as new HDF5 files transparently

16

NetCDF-3 Enhancements for NetCDF-4

• To provide • stable foundation for incorporating netCDF-4

• smooth transition for current users

• Automated multi-platform testing

• Documentation converted to maintainable form, new language-independent Users Guide

• Added large file support with backward compatibility

• Added default format interfaces

• Better Windows and .Net support

17

HDF5 Additions for Supporting NetCDF-4

• HDF5 enhancements

• numeric type conversions

• zero-dimensional datasets

• overflow handling improvements

• flexible parallel I/O

• HDF5 design specifications

• dimension scales for coordinate systems

• shared object proposal

18

Project Schedule

• July 2004: version 3.6.0 - revised documentation, 64-bit file offsets, default format functions

• October 2004: version 3.7.0 - use of autotools

• January 2005: version 3.7.1: netCDF-4 prototype included, support for multiple unlimited dimensions

• March 2005: version 4.0.0_beta - test relelase

• July 2005: version 4.0.0 - first netCDF-4 production release

Currently on schedule for a July 2005 release

19

NetCDF-4 Design Issues

• Issue: support for coordinate systems in netCDF and HDF5 data models? under consideration

• Issue: addition of HDF5 Groups abstraction to netCDF data model? yes, tentatively• subset of HDF5 Group features

• constrained by backward compatibility with netCDF-3

• no Group aliases but try to support Variable aliases and Dimension scoping?

• Issue: can we just adopt Northwestern/Argonne pnetCDF interface for adding parallel I/O?

20

What remains to be done?

• Next for netCDF-4: interface additions for multiple unlimited dimensions, group interfaces, dynamic schema modification, new data types, packed data, parallel I/O, compression

• HDF5 enhancements

• zero-length attributes

• shared dimensions

• creation order access for objects

• Testing in models (CCSM, WRF, ESMF, ...)

21

Papers, Posters, Presentations

1. R. Rew, M. Folk, E. Hartnett, and R. McGrath: Plans for an Enhanced NetCDF-4 Interface to HDF5 Data. HDF/HDF-EOS Workshop VII, Silver Springs, September 2003. Poster and presentation.

2. M. Folk, R. Rew, K. Yang, R. McGrath: NetCDF-4: Combining netCDF and HDF5 Data. AGU Fall Meeting, San Francisco, December 2003. Poster.

3. R. Rew and E. Hartnett: Merging NetCDF and HDF5. 20th International Conference on Interactive Information Processing Systems (IIPS) for Meteorology, Oceanography, and Hydrology, Seattle, January 2004. Paper and poster.

4. E. Hartnett: Merging the NetCDF and HDF5 Libraries to Achieve Gains in Performance and Interoperability. 2004 Earth Science Technology Conference, Palo Alto, June 2004. Paper and presentation.

22

Excellent Prospects for Success• More software engineering than research

• NetCDF-4 web site just announced:• www.unidata.ucar.edu/packages/netcdf/

netcdf-4/

• Unidata and NCSA developers collaborating via email, teleconferences

• On schedule for July 2005 release:• www.unidata.ucar.edu/packages/netcdf/

release_schedule.html

• Great interest in status of project! Ultimate goal to make earth science researchers more productive ...

23

Questions?

?

? ?

?

?

?

?

Recommended