View
82
Download
2
Category
Tags:
Preview:
DESCRIPTION
This presentation targets HDF5 application developers and anyone who is interested in the new HDF5 Library features. The following new features available in 1.8.0 will be discussed: HDF5 cache Meta data working set size is highly variable depending on file structure and access pattern. If the cache is too small, performance will deteriorate. In 1.8 we introduce code to configure metadata cache size automatically and API calls to allow manual configuration of the metadata cache. Text - data type conversion (10 minutes) The new high-level API function, H5LTtext_to_dtype, provides the ability to create a data type through the text description of the data type. The function H5LTdtype_to_text facilitates debugging by printing the text description of a data type. The current supported text description is in DDL format. External Links This feature allows links in a group to refer to objects in another file, and for the library to access those objects as if they are in the current file. We will present the API functions and how external links are supported. Group revisions We will introduce new features of the HDF5 Group object that include compact group storage, new large group storage, intermediate Group Creation and support of Unicode for the HDF5 object's names and datatypes. We will also cover new APIs for copying HDF5 objects between HDF5 files. Compact Groups – This feature allows groups containing only a few links to take up much less space in the file. New Large Group Storage – The method of storing groups with many links has been updated to be faster and more scalable. Intermediate Group Creation – This feature allows intermediate groups that don't exist yet to be created when creating an object in a file. Support for Unicode Character Set – The UTF-8 Unicode encoding is now supported for strings in datasets, the names of links and the names of attributes.
Citation preview
Update on HDF5 Update on HDF5 1.81.8The HDF Group
HDF and HDF-EOS Workshop XNovember 28, 2006
HDFHDF
Why HDF5 1.8?Why HDF5 1.8?
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
3
… as we know, there are known knowns; there are things we know we know.
We also know there are known unknowns; that is to say we know there are some
things we do not know.
But there are also unknown unknowns -- the ones we don't know we don't know.
Donald Rumsfeld
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
4
Some things we knew we Some things we knew we knewknew
• Need high level APIs – image, etc.• Need more datatypes - packed n-
bit, etc.• Need external and other links• Tools needed – h5pack, etc. • Caching embellishments• Eventually, multithreading
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
5
Things we knew we did not Things we knew we did not knowknow
• New requirements from EOS and ASCI
• New applications that would use HDF5
• How HDF5 would really perform in parallel
• What new tools, features and options needed
• New APIs, API features
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
6
Things we didn’t know we didn’t know
• Completely unanticipated applications• New data types and structures
• E.g. DNA sequences
• New operations• E.g. write many real-time streams
simultaneously
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
7
HDF5 1.8 topicsHDF5 1.8 topics
• Dataset and datatype improvements• Group improvements• Link Revisions• Shared object header nessages• Metadata cache improvements• Other improvements• Platform-specific changes• High level APIs• Parallel HDF5• Tool improvements
Dataset and Dataset and Datatype Datatype
ImprovementsImprovements
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
9
Text-based data type Text-based data type descriptionsdescriptions
• Why:• Simplify datatype creation• Make datatype creation code more
readable• Facilitate debugging by printing the text
description of a data type
• What: • New routine to create a data type through
the text description of the data type: H5LTdtype_to_text
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
10
Text data type description – Text data type description – ExampleExample
• Create a datatype of compound type.
/* Create the data type with text description */
dtype = H5Ttext_to_type(( “ “typedef struct foo {int a; float b;} typedef struct foo {int a; float b;} foo_t;”)foo_t;”)
/* Convert the data type back to text */H5Ttype_to_text(dtype, NULL, H5T_C, &tsize)
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
11
Serialized datatypes and Serialized datatypes and dataspaces dataspaces
• Why: • Allow datatype and dataspace info to
be transmitted between processes • Allow datatype/dataspace to be stored
in non-HDF5 files
• What: • A new set of routines to
serialize/deserialize HDF5 datatypes and dataspaces.
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
12
Int to float convert during I/OInt to float convert during I/O
• Why: Convert ints to floats during I/O
• What: Int to float conversion supported during I/O
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
13
Revised conversion exception Revised conversion exception handlinghandling
• Why: Give apps greater control over exceptions (range errors, etc.) during datatype conversion.
• What: Revised conversion exception handling
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
14
Revised conversion exception Revised conversion exception handlinghandling
• To handle exceptions during conversions, register handling function through H5Pset_type_conv_cb().
• Cases of exception:• H5T_CONV_EXCEPT_RANGE_HI• H5T_CONV_EXCEPT_RANGE_LOW• H5T_CONV_EXCEPT_TRUNCATE• H5T_CONV_EXCEPT_PRECISION• H5T_CONV_EXCEPT_PINF• H5T_CONV_EXCEPT_NINF• H5T_CONV_EXCEPT_NAN
• Return values: H5T_CONV_ABORT, H5T_CONV_UNHANDLED, H5T_CONV_HANDLED
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
15
Compression filter for n-bit Compression filter for n-bit datadata
• Why: Compact storage for user-defined
datatypes
• What:• When data stored on disk, padding
bits chopped off and only significant bits stored
• Supports most datatypes• Works with compound datatypes
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
16
N-bit compression exampleN-bit compression example
• In memory, one value of N-Bit datatype is stored like this:
| byte 3 | byte 2 | byte 1 | byte 0 ||????????|????SPPP|PPPPPPPP|PPPP????|
S-sign bit P-significant bit ?-padding bit
• After passing through the N-Bit filter, all padding bits are chopped off, and the bits are stored on disk like this:
| 1st value | 2nd value ||SPPPPPPP PPPPPPPP|SPPPPPPP PPPPPPPP|...
• Opposite (decompress) when going from disk to memory
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
17
Offset+size storage filter Offset+size storage filter
• Why:Use less storage when less precision needed
• What:• Performs scale/offset operation on each value• Truncates result to fewer bits before storing• Currently supports integers and floats
• ExampleH5Pset_scaleoffset
(dcr,H5Z_SO_INT,H5Z_SO_INT_MINBITS_DEFAULT);
H5Dcreate(……, dcr)
H5Dwrite (…);
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
18
Example with floating-point Example with floating-point typetype
• Data: {104.561, 99.459, 100.545, 105.644}• Choose scaling factor: decimal precision to
keepE.g. scale factor D = 2
1. Find minimum value (offset): 99.4592. Subtract minimum value from each
elementResult: {5.102, 0, 1.086, 6.185}
3. Scale data by multiplying 10D = 100Result: {510.2, 0, 108.6, 618.5}
4. Round the data to integerResult: {510 , 0, 109, 619}
5. Pack and store using min number of bits
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
19
““NULL” DataspaceNULL” Dataspace
• Why:• Allow datasets with no elements to be
described • NetCDF 4 needed a “place holder” for
attributes
• What:• A dataset with no dimensions, no data
Group Group improvementsimprovements
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
21
Access links by creation-time Access links by creation-time orderorder
• Why: • Allow iteration & lookup of group’s
links (children) by creation order as well as by name order
• Support netCDF access model for netCDF 4
• What: Option to access objects in group according to relative creation time
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
22
““Compact groups”Compact groups”
• Why: • Save space and access time for small groups• If groups small, don’t need B-tree overhead
• What:• Alternate storage for groups with few links
• Example• File with 11,600 groups• With original group structure, file size ~ 20
MB• With compact groups, file size ~ 12 MB• Total savings: 8 MB (40%)• Average savings/group: ~700 bytes
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
23
Better large group storageBetter large group storage
• Why: Faster, more scalable storage and access for large groups
• What: New format and method for storing groups with many links
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
24
Intermediate group creationIntermediate group creation
• Why: • Simplify creation of a series of
connected groups • Avoid having to create each
intermediate group separately, one by one
• What: • Intermediate groups can be created
when creating an object in a file, with one function call
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
25
Example: add intermediate Example: add intermediate groupsgroups
• Want to create “/A/B/C/dset1”• “A” exists, but “B/C/dset1” do not
/A
/A
BB
dset1dset1
CCH5Dcreate(file_id, “/A/B/C/dset1”,..)
One call creates groups “B” & “C”, then creates “dset1”
Link RevisionsLink Revisions
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
27
What are links?What are links?
Links connect groups to their members
“Hard” links point to a target by address
“Soft” links store the path to a target root group
Hard link
dataset
Soft link“/target dataset”<address>
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
28
file2.h5
file1.h5
New: New: externalexternal Links Links
• Why: Access objects by file & path within file
• What:• Store location of file and path within
that file• Can link across files
root group
“dataset EL”
“file2.h5”
“target dataset”
root group
dataset
“target dataset”
<address>
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
29
New: New: User-definedUser-defined Links Links
• Why:• Allow applications to create their own kinds of
links and link operations, such as• Create “hard” external link that finds an object by
address• Create link that accesses a URL• Keep track of how often a link accessed, or other
behavior
• What:• App can create new kinds of links by supplying
custom callback functions• Can do anything HDF5 hard, soft, or external
links do
Shared Object Shared Object Header MessagesHeader Messages
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
31
Shared object header Shared object header messagesmessages
• Why: metadata duplicated many times, wasting space
• Example:• You create a file with 10,000 datasets• All use the same datatype and dataspace• HDF5 needs to write this information 10,000 times!
Dataset 1
data 1
datatype
dataspace
Dataset 2
data 2
datatype
dataspace
Dataset 3
data 3
datatype
dataspace
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
32
Shared object header Shared object header messagesmessages
What:• Enable messages to be shared automatically• HDF5 shares duplicated messages on its
own!
Dataset 1
data 1
datatype
dataspace
Dataset 2
data 2
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
33
Shared MessagesShared Messages
• Happens automatically• Works with datatypes, dataspaces, attributes,
fill values, and filter pipelines• Saves space if these objects are relatively large• May be faster if HDF5 can cache shared
messages• Drawbacks
• Usually slower than non-shared messages• Adds overhead to the file
• Index for storing shared datatypes• 25 bytes per instance
• Older library versions can’t read files with shared messages
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
34
Two informal testsTwo informal tests
• File with 24 datasets, all with same big datatype• 26,000 bytes normally• 17,000 bytes with shared messages enabled• Saves 375 bytes per dataset
• But, make a bad decision: invoke shared messages but only create one dataset…• 9,000 bytes normally• 12,000 bytes with shared messages enabled• Probably slower when reading and writing, too.
• Moral: shared messages can be a big help, but only in the right situation!
Metadata cache Metadata cache improvementsimprovements
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
36
Metadata Cache Metadata Cache improvementsimprovements
• Why: • Improve I/O performance and memory
usage when accessing many objects• What:
• New metadata cache APIs• control cache size• monitor actual cache size and current hit rate
• Under the hood: adaptive cache resizing• Automatically detects the current working size• Sets max cache size to the working set size
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
37
Metadata cache Metadata cache improvementsimprovements
• Note: most applications do not need to worry about the cache
• See “Advanced topics” for details• And if you do see unusual memory
growth or poor performance, please contact us. We want to help you.
Other Other improvementsimprovements
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
39
New extendible error-New extendible error-handling APIhandling API
• Why: Enable app to integrate error reporting with HDF5 library error stack
• What: New error handling API• H5Epush - push major and minor error ID on
specified error stack• H5Eprint – print specified stack• H5Ewalk – walk through specified stack• H5Eclear – clear specified stack• H5Eset_auto – turn error printing on/off for
specified stack• H5Eget_auto – return settings for specified
stack traversal
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
41
Attribute improvementsAttribute improvements
• Why:• Use less storage when large numbers
of attributes attached to a single object
• Iterate over or look up attributes by creation order
• What:• Property to create index on the order
in which the attributes are created• Improved attribute storage
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
42
Support for Unicode Support for Unicode Character SetCharacter Set
• Why:• So apps can create names using Unicode• netCDF 4 needed this
• What• UTF-8 Unicode encoding now supported• For string datatypes, names of links and
attributes
• Example:H5Pset_char_encoding(lcpl_id, H5T_CSET_UTF8)
H5Llink(file_id, "UTF-8 name", …, lcpl_id, …);
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
43
Efficient copying of HDF5 Efficient copying of HDF5 objectsobjects
• Why:• Enable apps to copy objects efficiently
• What• New routines to copy an object in an HDF5
file within the current file or to another file• Done at a low-level in the HDF5 file,
allowing• Entire group hierarchies to be copied quickly• Compressed datasets to be copied without
going through a decompression/compression cycle
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
44
Performance of object copy Performance of object copy routinesroutines
88.1%
58.7%
35.8%
20.0%
0.3% 0.1%0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
80M a
rray,
compou
nd d
atatyp
e
16K x
16K in
t arra
y
10,00
0 gr
oups
16K x
16K flo
at ar
ray,
chun
ked
10,00
0 att
ribute
s
16Kx1
6K flo
at arra
y, ch
unked,
com
press
ed
relative time for new h5repack using object copy routines vs. old h5repack
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
45
Data transformation filterData transformation filter
• Why:• Apply arithmetic operations to data during I/O
• What:• Data transformation filter• Transform expressed by algebraic formula • Only +, -, *, and /supported
• Example:• Expression parameter set, such as x*(x-5)• When dataset read/written, x*(x-5) applied per
element• When reading, values in file are unchanged• When writing, transformed data written to file
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
46
Stackable Virtual File DriversStackable Virtual File Drivers
• What is Virtual File Driver (VFD)?
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
47
Virtual file I/O (C only)Virtual file I/O (C only) Perform byte-stream I/O operations (open/close, read/write, seek) User-implementable I/O (stdio, network, memory, etc.)
Virtual file I/O (C only)Virtual file I/O (C only) Perform byte-stream I/O operations (open/close, read/write, seek) User-implementable I/O (stdio, network, memory, etc.)
Library internalsLibrary internals• Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.)
Library internalsLibrary internals• Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.)
Structure of HDF5 LibraryStructure of HDF5 Library
Object API (C, Fortran 90, Java, C++)Object API (C, Fortran 90, Java, C++) Specify objects and transformation properties Invoke data movement operations and data transformations
Object API (C, Fortran 90, Java, C++)Object API (C, Fortran 90, Java, C++) Specify objects and transformation properties Invoke data movement operations and data transformations
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
48
Stackable VFDStackable VFD
• HDF5 VFD allows• Storing data using different physical
file layout. E.g., Family VFD (writes file as “family of files”)
• Doing different types of I/O. E.g., stdio (standard I/O); MPI-I/O (for parallel I/O)
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
49
Stackable VFDStackable VFD
• Why “stackable:”• Before now, only one VFD could be used at
a time• VFDs could not inter-operative
• What is “stackable:”• A Non-terminal VFD may stack on top of
compatible non-terminal and eventually Terminal VFD’s
• Two kinds of VFD• Non-terminal (e.g. Family)• Terminal (e.g. stdio; MPI-I/O)
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
50
Stackable VFDStackable VFD
HDF5 Files
Application
HDF5 API
stdio
Family Filesplit
mpiioSec2
Default I/O path
TerminalVFD
Non-terminalVFD
metadata rawdata
Platform-specific Platform-specific changeschanges
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
52
Platform-specific changesPlatform-specific changes
• Why: Better UNIX/Linux Portability • What:
• 1.8 uses latest GNU “auto” tools (autoconf, automake, libtool) • improves portability between many
machine and OS configurations
• Build can now be done in parallel • with gmake “–j” flag• speeds up build, test and install processes
• Build infrastructure includes many other improvements as well
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
53
Platforms to be droppedPlatforms to be dropped
• Operating systems• HPUX 11.00 • MAC OS 10.3• AIX 5.1 and 5.2• SGI IRIX64-6.5• Linux 2.4• Solaris 2.8 and 2.9
• Compilers• GNU C compilers
older than 3.4 (Linux)
• Intel 8.*• PGI V. 5.*, 6.0• MPICH 1.2.5
http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
54
Platforms to be addedPlatforms to be added
• Systems• Alpha Open VMS• MAC OSX 10.4
(Intel)• Solaris 2.* on Intel
(?)• Cray XT3• Windows 64-bit
(32-bit binaries)• Linux 2.6• BG/L
• Compilers• g95• PGI V. 6.1• Intel 9.*• MPICH 1.2.7• MPICH2
High level APIsHigh level APIs
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
56
High-Level Fortran APIsHigh-Level Fortran APIs
• Fortran APIs have been added for H5Lite, H5Image and H5Table.
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
57
Dimension scales Dimension scales
• Similar to • Dimension scales in HDF4• Coordinate variables in netCDF
• What is a dimension scale ?• An HDF5 dataset with additional metadata
that identifies the dataset as a “Dimension Scale”
• Associated with dimensions of HDF5 datasets• Meaning of the association is left to
applications • A Dimension scale can be shared by two
or more dataset dimensions
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
58
Dimension scales exampleDimension scales example
HDF Explorer image
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
59
Dimension scales exampleDimension scales example
HDF Explorer image
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
60
Sample dimension scale Sample dimension scale functionsfunctions
• H5DSset_scale:H5DSset_scale: convert dataset to a convert dataset to a dimension scaledimension scale
• H5DSattach_scale:H5DSattach_scale: attach scale to a attach scale to a dimensiondimension
• H5DSdetach_scale:H5DSdetach_scale: detach scale detach scale from a dimensionfrom a dimension
• H5DSis_attached:H5DSis_attached: verify if scale verify if scale attached to dataset attached to dataset
• H5DSget_scale_name:H5DSget_scale_name: read name of read name of scalescale
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
61
HDF5PacketHDF5Packet
• Why:• High performance table writing• For data acquisition, when there are
many sources of data• E.g. flight test
• What:• Each row is a “packet”: a collection of
fields, fixed or variable length• Append only• Indexed retrieval
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
62
Packets in HDF5Packets in HDF5
...
Data
Data
Data
Data
Data
Data
Variable-length recordsFixed-length data records
Tim
e
Tim
e
...
Parallel HDF5Parallel HDF5
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
64
Collective I/O improvementsCollective I/O improvements
• Why• Collective I/O not available for chunked
data• Collective I/O not available for complex
selections• Collective I/O is key to improving
performance for parallel HDF5• What
• Collective I/O works for chunked storage• Works for irregular selections for both
chunked and contiguous storage
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
65
Parallel h5diff (ph5diff)Parallel h5diff (ph5diff)
• Compares two files in an MPI parallel environment.
• Compares multiple datasets simultaneously
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
66
Windows MPICH supportWindows MPICH support
• Windows MPICH support: prototype
Tool improvementsTool improvements
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
68
New features for old toolsNew features for old tools
• h5dump• Dump data in binary format• Faster for files with large numbers of
objects• h5diff
• Can now compare dataset regions • Parallel ph5diff now available
• h5repack• Efficient data copy using H5Gcopy()• Able to handle big datasets
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
69
New HDF5 ToolsNew HDF5 Tools
• h5copy• Copies a group, dataset or named datatype from one
location to another• Copies within a file or across files
• h5repart• Partition file into a family of files
• h5import • Import binary/ascii data into an HDF5 file
• h5check • Verifies an HDF5 file against the defined HDF5 File
Format Specification
• h5stat• Reports statistics about a file and objects in a file
Thank YouThank You
Questions/Questions/comments?comments?
Nov. 28, 2006
HDF and HDF-EOS Workshop X, Landover MD
72
For more informationFor more information
• Go to http://www.hdfgroup.org/HDF5/
• Click on “Obtain HDF5 1.8.0 Alpha”
• Look at table “Information”
AcknowledgementAcknowledgementThis report is based upon work supported in part by a Cooperative Agreement with NASA under NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this
material are those of the author(s) and do not necessarily reflect the views of the
National Aeronautics and Space Administration.
Recommended