Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression

CCGrid 2014

Improving I/O Throughput of Scientific Applications using Transparent Parallel

Compression

Tekin Bicer, Jian Yin and Gagan Agrawal

Ohio State UniversityPacific Northwest National Laboratory

CCGrid 2014

Introduction

• Increasing parallelism in HPC systems– Large-scale scientific simulations and instruments– Scalable computational throughput– Limited I/O performance

• Example:– PRACE-UPSCALE:

• 2 TB per day; expectation:10-100PB per day• Higher precision. i.e. more computation and data

• “Big Compute” Opportunities → “Big Data” Problems– Large volume of output data– Data read and analysis– Storage, management, and transfer of data

• Compression

CCGrid 2014

Introduction (cont.)• Community focus

– Storing, managing and moving scientific dataset• Compression can further help

– Decreased amount of data• Increased I/O throughput• Better data transfer

– Increased simulation and data analysis performance• But…

– Can it really benefit the application execution?• Tradeoff between CPU utilization and I/O idle time

– What about integration with scientific applications?• Effort required by scientists to adapt their application

CCGrid 2014

Scientific Data Management Libs.

• Widely used by the community– PnetCDF (NetCDF), HDF5…

• NetCDF Format– Portable, self-describing,

space-efficient• High Performance Parallel I/O

– MPI-IO• Optimizations: Collective

and Independent calls• Hints about file system

• No Support for Compression

CCGrid 2014

Parallel and Transparent Compression for PnetCDF

• Parallel write operations– Size of data types and variables– Data item locations

• Parallel write operations with compression– Variable-size chunks– No priori knowledge about the locations– Many processes write at once

CCGrid 2014

Parallel and Transparent Compression for PnetCDF

Desired features while enabling compression:• Parallel Compression and Write

– Sparse and Dense Storage• Transparency

– Minimum effort from application developer– Integration with PnetCDF

• Performance– Different variable may require different compression– Domain specific compression algorithm

CCGrid 2014

Outline

• Introduction• Scientific Data Management Libraries• PnetCDF• Compression Approaches• A Compression Methodology• System Design• Experimental Result• Conclusion

CCGrid 2014

Compression: Sparse Storage

• Chunks/splits are created• Compression layer applies

user provided algs.• Compressed splits are

written w/ orig. offset addr.• Still can benefit I/O

– Only compressed data• No benefit for storage

CCGrid 2014

Compression: Dense Storage

• Generated compressed splits are appended locally

• Net offset addresses are calculated– Requires metadata

exchange• All compressed data blocks

written using collective call• Generated file is smaller

– Advantages: I/O + storage space

CCGrid 2014

Compression: Hybrid Method

• Developer provides:– Compression ratio– Error ratio

• Does not require metadata exchange

• Error padding can be used for overflowed data

• Generated file is smaller• Relies on user inputs

Off’ = Off x (1/(comp_ratio-err_ratio)

CCGrid 2014

System API

• Complexity of scientific data management libs.• Trivial changes in scientific applications• Requirement of a system API:

– Defining compression function• comp_f (input, in_size, output, out_size, args)

– Defining decompression function• decomp_f (input, in_size, output, out_size, args)

– Registering user defined functions• ncmpi_comp_reg (*comp_f, *decomp_f, args, …)

CCGrid 2014

Compression Methodology• Common properties of scientific datasets

– Consist of floating point numbers– Relationship between neighboring values

• Generic compression cannot perform well• Domain specific solutions can help• Approach:

– Differential compression• Predict the values of neighboring cells• Store the difference

CCGrid 2014

Example: GCRM Temperature Variable Compression

• E.g.: Temperature record• The values of neighboring cells

are highly related• X’ table (after prediction):

• X’’ compressed values– 5bits for prediction +

difference• Lossless and lossy comp.• Fast and good compression

ratios

CCGrid 2014

PnetCDF Data Flow

1. Generated data is passed to PnetCDF lib.

2. Variable info. gathered from NetCDF header

3. Splits are compressed1. User defined comp. alg.

4. Metadata info. exchanged

5. Parallel write ops.

6. Synch. and global view1. Update NetCDF header

CCGrid 2014

Outline

CCGrid 2014

Experimental Setup

• Local cluster:– Each node has 8 cores (Intel Xeon E5630, 2.53Ghz)– Memory: 12GB

• Infiniband network– Lustre file system: 8 OSTs, 4 storage nodes– 1 Metadata Sert

• Microbenchmarks: 34 GB• Two data analysis applications: 136 GB dataset

– AT, MATT• Scientific simulation application: 49 GB dataset

– Mantevo Project: MiniMD

CCGrid 2014

Exp: (Write) Microbenchmarks

CCGrid 2014

Exp: (Read) Microbenchmarks

CCGrid 2014

Exp: Simulation (MiniMD)

Application Execution Times Application Write Times

CCGrid 2014

Exp: Scientific Analysis (AT)

CCGrid 2014

Conclusion

• Scientific data analysis and simulation app.– Deal with massive amount of data

• Management of “Big Data”– I/O throughput affects performance– Need for transparent compression– Minimum effort during integration

• Proposed two compression methods• Implemented a compression layer in PnetCDF

– Ported our proposed methods– Scientific data compression alg.

• Evaluated our system– MiniMD: 22% performance, 25.5% storage space– AT, MATT: 45.3% performance, 47.8% storage space

CCGrid 2014

Thanks

CCGrid 2014

PnetCDF: Example Header

netcdf temperature_v4 {dimensions: // names and lengths

time = UNLIMITED; interface = 27;cells = 2621442;corners = 5242880 ;...

variables: // type, name and attributes double time(time) ; time:long_name = "Time" ; time:units = "days since 1901-01-01" ; ... float temperature(time, cells, interfaces) ;

// variable attributes temperature:long_name = "Potential temperature" ;

temperature:units = "K" ; ...

data: // beginning of datatime = 777600, 788400, 799200, 810000, ….;temperature =

201.2936, 217.4867, 223.3362, …..}

CCGrid 2014

Exp: Microbenchmarks

• Dataset size: 34GB– Timestep: 270MB

• Comp.: 17.7GB– Timestep: 142MB

• Chunk size: 32MB• # Processes: 64• Strip count: 8

Comparing Write Times with Varying Stripe Sizes

CCGrid 2014

Outline

Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression

Documents

Practical Data Compression for Modern Memory Hierarchies · 2016. 8. 26. · Keywords: Data Compression, Memory Hierarchy, Cache Compression, Memory Compression, Bandwidth Compression,

Jonathan Stern, Solutions Architect - Continuous Integration · Calgary Corpus, single core throughput. Compressor Name Compression Throughput (MB ... 200 250 0 50 100 150 200 250

Experimental Study on Compression Deformation ... · in many engineering practices as an intuitive study of geotechnical physics characteristics. Therefore, ... transparent soil-rock

Hardware and Software Compatibility VPN Specifications New ... · Chapter 1 Introduction to the Cisco ASA 5500 Series New Features Compression for DTLS and TLS To improve throughput,

High-Throughput Lossless Compression on Tightly Coupled ...zhenman/files/C14-FCCM2018-Compression.pdf · High-Throughput Lossless Compression on Tightly Coupled CPU-FPGA Platforms

COLOURED VINYL CATALOGUE 2020 - duophonic.de · Red-Transparent similartoPMS185C Green-Transparent similartoPMS348C Blue-Transparent similartoPMS286C Yellow-Transparent similartoPMSYellowC

EDR-G903 Series - moxa.com...Stateful Inspection Router firewall Transparent (bridge) firewall Throughput Max. 40000 packets per second (max. 500 Mbps) ... SFP-1GSXLC SFP module with

Throughput Performance Guide for TCI66x KeyStone … · SPRABH2A1—July 2012 Throughput Performance Guide for TCI66x KeyStone Devices Page 1 of 60 ... 17.1.1 LTE Throughput ... Throughput

High-Throughput Design of Non-oxide p Type Transparent

Transparent Film on Transparent Substrate Measurement

Image Compression and Video Compression 2004 Notes - 6 Audio Compression

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State

Studies of matter at extreme conditions - ELETTRA · 2015. 9. 23. · Sodium At 200 GPa (corresponding to approximately 5.0-fold compression), Na transforms into an optically transparent

Transparent Pointer Compression for Linked Data Structuresllvm.org/pubs/2005-06-12-MSP-PointerCompSlides.pdf · Chris Lattner Growth of 64-bit computing n64-bit architectures are

HyET HCS100 H Compression 2 and Puriﬁ cation systems · 2018-09-22 · 2 Compression and Puriﬁ cation systems Throughput (kg/day) Input pressure (barg) Output pressure (barg)

Supplementary Information for Rational Design of ...Supplementary Information for Rational Design of Transparent P-type Conducting Non-oxide Materials From High-throughput Calculations

透過的データ圧縮 Transparent Data Compression

Space for Med Eyre consulting Transparent Forest Transparent Forest Transparent Forest an ESA IAP project

Image Compression Compression Fundamentals

Using Transparent Compression to Improve SSD-based I/O Caches€¦ · Using Transparent Compression to Improve SSD-based I/O Caches Institute of Computer Science (ICS) Foundation