Upload
sibyl-pope
View
215
Download
2
Embed Size (px)
Citation preview
www.hdfgroup.org
The HDF Group
HDF5 Datasets and I/O
Dataset storage and its effect on performance
May 30-31, 2012 HDF5 Workshop at PSI 1
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 2
Outline
• Dataset metadata and array data storage layouts
• Types of dataset storage layouts• Factors affecting I/O performance
• I/O with compact datasets• I/O with contiguous datasets• I/O with chunked datasets• Variable length data and I/O
www.hdfgroup.org
HDF5 Layers
May 30-31, 2012
HDF5 Application
HDF5 Internals
VFD Layer
HDF5 file
Application buffer
HDF5 Object Layer (API) H5Dwrite is called
Data is prepared for I/O
SEC2 driver performs I/O
HDF5 Workshop at PSI 3
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 4
Goal of this talk
• Present what is happening to data inside the HDF5 library
• Show how application can control the HDF5 library behavior
• Specifically:- Describe some basic operations and data
structures and explain how they affect performance and storage sizes
- Give some “recipes” for how to improve performance
www.hdfgroup.org
HDF5 DATASET METADATA
May 30-31, 2012 HDF5 Workshop at PSI 5
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 6
HDF5 Dataset
• Data array• Also called raw data
• Metadata- Dataspace
- Rank, dimensions of dataset array
- Datatype- Information on how to interpret data
- Storage Properties- How array is organized on disk
- Attributes- User-defined metadata (optional)
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 7
HDF5 dataset components
Dataset data arrayDataset header
Dataspace
3
Rank
Dim_2 = 5Dim_1 = 4
Dimensions
Time = 32.4
Pressure = 987
Temp = 56
Attributes
Chunked
Compressed
Dim_3 = 7
Storage info
IEEE 32-bit float
Datatype
Metadata Raw data
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 8
HDF5 metadata
• HDF5 metadata• Information about HDF5 objects used by the
HDF5 library• Examples: object headers, B-tree nodes for
group, B-Tree nodes for chunks, heaps, super-block, etc.
• Usually small compared to raw data sizes (KB vs. MB-GB)
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 9
HDF5 metadata cache
Application memory
Metadata cache (MDC)
HDF5 File
Dataset array data
HDF5 metadataDataset array data
Dataset header
Dataset header resides in MDC. MDC is handled by HDF5 library
Metadata is mixed with raw data in HDF5 file
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 10
HDF5 metadata cache
• Metadata cache• Space allocated to handle pieces of the HDF5
metadata• Allocated by the HDF5 library in application’s
memory space• Allocated per file; released when file is closed• Metadata cache behavior affects overall
performance• Metadata cache implementation prior to HDF5
1.6.5 could cause performance degradation for some applications
www.hdfgroup.org
HDF5 DATASET STORAGE LAYOUTS
May 30-31, 2012 HDF5 Workshop at PSI 11
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 12
HDF5 datasets storage layouts
• Contiguous• External • Chunked• Compact
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 13
Contiguous storage layout
• Contiguous storage layout is a default storage layout for an HDF5 dataset
• Dataset raw data is stored in one contiguous block in HDF5 file
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 14
Contiguous storage layout
Application memory
Metadata cache (MDC)
Dataset array dataDataset header
HDF5 File
Dataset array data
Dataset header
Raw data is stored in one contiguous block in HDF5 file
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 15
External storage layout
• Dataset raw data is stored in an external file(s) that should be kept together with the HDF5 file
• Layout in the external file is specified by an application
• An easy way to make legacy data available to HDF5 library
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 16
External storage layout
Metadata cache (MDC)
Dataset array dataDataset header
HDF5 file Unix/Windows file
Metadata is stored in HDF5 file. Raw data is stored in a separate file as specified by application
Dataset header
Application memory
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 17
Chunked storage layout
• Chunking – storage layout where a dataset is partitioned in fixed-size multi-dimensional tiles or chunks
• Each chunk is stored as contiguous block• HDF5 library treats each chunk as atomic
object for I/O• Greatly affects performance and file sizes• Use for extendible datasets and datasets with
filters applied (checksum, compression)• Use for sub-setting of big datasets
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 18
Chunked storage layout
Application memory
Metadata cache (MDC) Dataset array data
Dataset header
HDF5 File
Dataset header
Chunkindex
A B C D
ChunkindexC ABD
Raw data is stored in separate chunks in HDF5 file
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 19
Compact storage layout
• Raw data is stored in a dataset object header• Raw data read/written with the header• Use for small (few K) datasets to minimize
small I/O operations
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 20
Compact storage layout
Application memory
Metadata cache (MDC)
Dataset array dataDataset header
HDF5 File Dataset header
Raw data is stored in a dataset object header
Dataset array data
www.hdfgroup.org
FACTORS AFFECTING I/O PERFORMANCE
May 30-31, 2012 HDF5 Workshop at PSI 21
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 22
HDF5 data structures
• Data structures used by HDF5 library• B-trees (groups, dataset chunks)• Hash tables• Local and global heaps (variable length
data: link names, strings, etc.)• Other concepts
• HDF5 metadata cache• HDF5 chunk cache • Free space management data structure• Etc.
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 23
Operations on data inside HDF5 library
• Copying to/from internal buffers• Datatype conversion, e.g.,
• Float to integer• Little-endian to big-endian• 64-bit integer to 16-bit integer• Variable-length data conversion from memory
to file• Scattering - gathering
• Data is scattered/gathered from/to application buffers into internal buffers for datatype conversion and partial I/O
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 24
Operations on data inside HDF5 library
• Data transformation (filters, compression)- Checksum on raw data and metadata- Algebraic transform- GZIP and SZIP compressions- HDF5 and user-defined data transformations
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 25
I/O performance
• I/O performance depends on many factors• Storage layouts• Dataset storage properties• Chunking strategy• Metadata cache performance• Datatype conversion performance• Other filters, such as compression• Access patterns
www.hdfgroup.org
I/O WITH DIFFERENT STORAGE LAYOUTS
May 30-31, 2012 HDF5 Workshop at PSI 26
www.hdfgroup.org
WRITING COMPACT DATASET
May 30-31, 2012 HDF5 Workshop at PSI 27
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 28
Writing compact dataset
Application memory
Metadata cache (MDC)
Dataset array dataDataset header
HDF5 File Dataset header
Raw data is written when object header is written
www.hdfgroup.org
WRITING CONTIGUOUS DATASET
May 30-31, 2012 HDF5 Workshop at PSI 29
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 30
Writing contiguous dataset
Application memory
Metadata cache (MDC)
Dataset array dataDataset header
HDF5 File
Dataset array data
Dataset header
Raw data is written first. The header is written when flushed to file (H5Dclose, H5Fflush, or MDC flush done by the HDF5 library)
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 31
Writing contiguous dataset with conversion
Application memory
Metadata cache (MDC) Dataset array data
Dataset header
HDF5 File Dataset header
Raw data goes through conversion buffer. The header is written when flushed to file (H5Dclose, H5Fflush, or MDC flush done by HDF5 library)
1MB conversion buffer
www.hdfgroup.org
PARTIAL I/O FOR CONTIGUOUS DATASET
May 30-31, 2012 HDF5 Workshop at PSI 32
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 33
Sub-setting of contiguous datasetSeries of adjacent rows
HDF5 File
Application data in memory
Subset is contiguous in file
One I/O operation
M rows
M rowsN
N elements
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 34
Sub-setting of contiguous datasetAdjacent, partial rows
HDF5 File
Application data in memory
Subset is in M contiguous blocks in file
Several I/O operation
M rows
M rows
N elements
N elements
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 35
Sub-setting of contiguous datasetExtreme case: writing a column
HDF5 File
Application data in memory
Subset data is scattered in a file in M different locations
Several small I/O operation
M rows
1 element
1element
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 36
Sub-setting of contiguous datasetData sieve buffer
HDF5 File
M
…
Application data in memory
1 element
Data is copied to a sieve buffer in memory (64K)memcopy
One write operation
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 37
Performance tuning for contiguous dataset
• Datatype conversion• Avoid for better performance• Use H5Pset_buffer function to customize
conversion buffer size• Partial I/O
• Write/read in big contiguous blocks • Use H5Pset_sieve_buf_size to improve
performance for complex sub-setting• Caution:
• Sieve buffer is allocated when the first write occurs and is released when the dataset is closed.
• Memory will grow if there are a lot opened datasets.
www.hdfgroup.org
I/O FOR CHUNKED DATASET
May 30-31, 2012 HDF5 Workshop at PSI 38
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 39
Recall: Chunked storage layout
Application memory
Metadata cache (MDC) Dataset array data
Dataset header
HDF5 File
Dataset header
Chunkindex
A B C D
ChunkindexC ABD
Raw data is stored in separate chunks in HDF5 file
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 40
HDF5 chunking
• HDF5 library treats each chunk as atomic object• Compression is applied to each chunk• Datatype conversion, other filters applied per
chunk• Chunk size greatly affects performance
• Chunk overhead adds to file size• Chunk processing involves many steps
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 41
HDF5 chunk cache
• Chunk cache (general points, details later)• Caches chunks for better performance; remains
allocated across multiple calls• Created for each chunked dataset• Size of chunk cache is set for file (default size
1MB)• Each chunked dataset has its own chunk cache• Chunk may be too big to fit into cache• Memory may grow if application keeps opening
datasets
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 42
HDF5 chunk cache
Application memory
Metadata cache
Default size is 1MB
Metadata cache (MDC)
Dataset header
Chunking B-tree nodes
Chunk caches (per dataset)
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 43
Writing chunked dataset
C BA
Datatype conversion is performed before chunked placed in cacheChunk is written when evicted from cacheCompression and other filters are applied on eviction
AB C
C
HDF5 File
Chunk cacheChunked dataset
Filter pipeline
Application memory space
Conversion buffer
www.hdfgroup.org
PARTIAL I/O FOR CHUNKED DATASET
May 30-31, 2012 HDF5 Workshop at PSI 44
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 45
Partial I/O for chunked dataset
• Example: write the green subset from the dataset , converting the data
• Dataset is stored as six chunks in the file.• The subset spans four chunks, numbered 1-4 in the figure.• Hence four chunks must be written to the file.• But first, the four chunks must be read from the file, to
preserve those parts of each chunk that are not to be overwritten.
1 2
3 4
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 46
Partial I/O for chunked dataset
• For each of the four chunks:• Read chunk from file into chunk
cache, unless it’s already there.• Determine which part of the chunk will
be replaced by the selection.• Move those elements to conversion
buffer and perform conversion• Move data elements to write from
application buffer to conversion buffer• Move those elements back from
conversion buffer to chunk cache.• Apply filters (compression) when
chunk is flushed from chunk cache• For each element 3 memcopy
performed
1 2
3 4
1 2
3 4
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 47
Partial I/O for chunked dataset
3
Conversion buffer
memcopy
memcopy
Application memory
Chunk cache
HDF5 File Chunk
Compress and write to file
memcopy
www.hdfgroup.org
I/O FOR VARIABLE-LENGTH DATASET
May 30-31, 2012 HDF5 Workshop at PSI 48
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 49
Examples of variable length data
• String A[0] “the first string we want to write”
…………………………………
A[N-1] “the N-th string we want to write”• Each element is a record of variable-length
A[0] (1,1,0,0,0,5,6,7,8,9) [length = 10]
A[1] (0,0,110,2005) [length = 4]
………………………..
A[N] (1,2,3,4,5,6,7,8,9,10,11,12,….,M) [length = M]
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 50
Variable length data in HDF5
• Variable length description in HDF5 application
typedef struct { size_t length; void *p;}hvl_t;
• Base type can be any HDF5 type
H5Tvlen_create(base_type)• ~ 20 bytes overhead for each element• Data cannot be compressed
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 51
How variable length data is stored in HDF5
Global heap
Actual variable length data
Dataset with variable length elements
Pointer intoglobal heap
HDF5 File
Dataset header
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 52
Variable length datasets and I/O
• Elements from application buffer “transferred” to/from heaps in the metadata cache during I/O
Global heap
Application bufferRaw data
Metadata cache
Pointers
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 53
There may be more than one global heap
Global heap
Application bufferRaw data
Global heap
Pointers
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 54
VL dataset and I/O
Global heap
Application buffer
Global heap
HDF5 File
Memory
Conversion buffers
www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 55
Hints for variable length data I/O
• Avoid closing/opening a file while writing VL datasets • Global heap information is lost• Global heaps may have unused space
• Avoid alternately writing different VL datasets• Data from different datasets will go into to the
same heap• If maximum length of the record is known,
consider using fixed-length records and compression
www.hdfgroup.org
The HDF Group
HDF5 Workshop at PSI 56
Thank You!
Questions?
May 30-31, 2012