View
32
Download
0
Category
Preview:
DESCRIPTION
Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab. Motivation. NERSC uses GPFS for $HOME and $SCRATCH Local disk filesystems on seaborg (/tmp) are tiny Growing data sizes and concurrencies often outpace I/O methodologies. Seaborg.nersc.gov. - PowerPoint PPT Presentation
Citation preview
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
1
Scaling Up Parallel I/O on the SP
David Skinner, NERSC Division, Berkeley Lab
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
2
Motivation
• NERSC uses GPFS for $HOME and $SCRATCH
• Local disk filesystems on seaborg (/tmp) are tiny
• Growing data sizes and concurrencies often outpace I/O methodologies
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
3
Seaborg.nersc.gov
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
4
Case Study: Data Intensive Computing at NERSC
• Binary black hole collisions • Finite differencing on a
1024x768x768x200 grid • Run on 64 NH2 nodes with
32GB RAM (2 TB total)• Need to save regular snapshots
of full grid
The first full 3D calculation of inward spiraling black holes done at NERSC by
Ed Seidel, Gabrielle Allen, Denis Pollney, and Peter Diener
Scientific American April 2002
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
5
Problems
• The binary black hole collision uses a modified version of the Cactus code to solve Einstein’s equations.
It’s choices for I/O are serial and MPI-I/O• CPU utilization suffers as time is lost to I/O• Variation in write times can be severe
Time to write 100GB
0100
200300
400500
1 2 3 4iteration
Tim
e (s
)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
6
Finding solutions
• Data pattern is a common one
• Survey strategies to determine the rate and variation in rate
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
7
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
8
Parallel I/O Strategies
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
9
Multiple File I/O
if(private_dir) rank_dir(1,rank); fp=fopen(fname_r,"w"); fwrite(data,nbyte,1,fp); fclose(fp); if(private_dir) rank_dir(0,rank); MPI_Barrier(MPI_COMM_WORLD);
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
10
Single File I/O
fd=open(fname,O_CREAT|O_RDWR, S_IRUSR);lseek(fd,(off_t)(rank*nbyte)-1,SEEK_SET); write(fd,data,1); close(fd);
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
11
MPI-I/O
MPI_Info_set(mpiio_file_hints, MPIIO_FILE_HINT0);MPI_File_open(MPI_COMM_WORLD, fname, MPI_MODE_CREATE | MPI_MODE_RDWR, mpiio_file_hints, &fh);MPI_File_set_view(fh, (off_t)rank*(off_t)nbyte, MPI_DOUBLE, MPI_DOUBLE, "native", mpiio_file_hints);MPI_File_write_all(fh, data, ndata, MPI_DOUBLE, &status);MPI_File_close(&fh);
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
12
Results
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
13
Scaling of single file I/O
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
14
Scaling of multiple file and MPI I/O
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
15
Large block I/O
• MPI I/O on the SP includes the file hint IBM_largeblock_io
• IBM_largeblock_io=true used throughout, default values show large variation
• IBM_largeblock_io=true also turns off data shipping
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
16
Large block I/O = false
• MPI on the SP includes the file hint IBM_largeblock_io
• Except above IBM_largeblock_io=true used throughout• IBM_largeblock_io=true also turns off data shipping
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
17
Bottlenecks to scaling
• Single file I/O has a tendency to serialize• Scaling up with multiple files create filesystem problems • Akin to data shipping consider the intermediate case
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
18
Parallel IO with SMP aggregation (32 tasks)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
19
Parallel IO with SMP aggregation (512 tasks)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
20
Summary
2048
1024
512
256
128
64
32
16
1MB
10MB
100MB
1GB
10G
100G
Serial
Multiple File
Multiple File mod nMPI IO
MPI IO collective
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
21
Future Work
• Testing NERSC port of NetCDF to MPI-I/O
• Comparison with Linux/Intel GPFS NERSC/LBL Alvarez Cluster 84 2way SMP Pentium Nodes Myrinet 2000 Fiber Optic Interconnect
• Testing GUPFS technologies as they become available
Recommended