Upload
others
View
18
Download
0
Embed Size (px)
Citation preview
12/20/2010 1
REMEM: REmote MEMory as Checkpointing Storage
Hui Jin Illinois Institute of TechnologyXian-He Sun Illinois Institute of TechnologyYong Chen Oak Ridge National LaboratoryTao Ke Illinois Institute of Technology
CloudCom 2010
12/20/2010 2
OutlineBackground & MotivationREMEM DesignImplementation of REMEM on Open MPIAdaptive Checkpointing Storage SelectionExperimental ResultsConclusions & Future Work
CloudCom 2010
12/20/2010 3
Motivation
Checkpointing is a mostly used mechanism to support fault tolerance in High-Performance Computing environment. However, it introduces considerable overhead due to the expensive
I/O access cost. For a 1-petaFLOPS system, checkpointing can potentially harm the
system performance by 50%.[R. Oldfield al, et 2007]The upcoming Exascale computing environment puts forward even more challenges. 10^18 FLOPS computing power. Millions of computing components. Checkpointing on the centralized parallel file system is not scalable. What if the MTBF < checkpointing cost?
CloudCom 2010
12/20/2010 4
A detailed look of Checkpointing Cost
J. Hursey, al et, "Interconnect Agnostic Checkpoint/Resart in Open MPI", HPDC 2009
CloudCom 2010
12/20/2010 5
Motivation
Memory-based checkpointing is a promising solution to break through the bottleneck from the stable storage. But …Rarely supported by the mainstream of current checkpoint systems. Complexity. Reliability Concern. Excess Memory Usage
CloudCom 2010
REMEM
REmote MEMory as Checkpiting Storage. Seamless integration with existing checkpointing
sysems. Flexible switch between disk and remote memory
as checkpointing storage. Consideration of reliability and space efficiency.
12/20/2010 CloudCom 2010 6
REMEM – Design Goals
Reliability: Memory is volatile.Scalability: Large-scale environment.Space Efficiency: Memory is precious.Transparency: Augment to existing systems.Flexibility: Switch between the disk and memory.
12/20/2010 CloudCom 2010 7
REMEM Design
12/20/2010 CloudCom 2010 8
REMEM – Node Matching
12/20/2010 CloudCom 2010 9
/ 2 2k kn
kn
CC
11 1
k kn k n k
kn
C CC
−− + − −−Reliability:
Z. Chen, etc, Fault Tolerant High Performacne Computing by a Coding Approach, PPoPP’05
REMEM – System Configuration
12/20/2010 CloudCom 2010 10
REMEM: Failure Handling
If failures occurs to the source node. If backup node is healthy, simply recovery from
remote memory. If backup node also fails, loads the image from
last disk-based checkpointing.
12/20/2010 CloudCom 2010 11
REMEM: Implementation on Open MPI
Open source MPI-2 implementation that provides a high performance, robust, parallel execution environment for a wide variety of computing environmentsSupports transparent, coordinated checkpoint/restart implementation supported primarily by the BLCR library.
12/20/2010 CloudCom 2010 12
REMEM: Implementation on Open MPI
12/20/2010 CloudCom 2010 13
Adaptive Checkpionting Storage SelectionDisk:Memory:
12/20/2010 CloudCom 2010 14
Experimental Setup
Hardware A 65-node SunFire Cluster. Compute Nodes.
Dual 2.3GHz Opteron quad-core processors and 8GB memory, 250GB 7.2K-RPM SATA hard drive.
OS: Ubuntu enterprise server with Linux kernel 2.6.10
Software: Open MPI v1.3.3 and GCC 4.3.3 REMEM was implemented on the Open MPI with the support of tmpfs
and NFS 3.0.
12/20/2010 CloudCom 2010 15
Experimental Setup
The 64 compute nodes are organized in two groups naturally by the rack id. The nodes from the two groups are mutually mapped for REMEM.4 dedicated X2200 computer nodes configured as PVFS2 servers. Results were obtained for the NAS Parallel Benchmarks (NPB) version 3.3.
12/20/2010 CloudCom 2010 16
REMEM Performance
12/20/2010 CloudCom 2010 17
Problem Size Scaling Performance
12/20/2010 CloudCom 2010 18
Task Scaling Performance
12/20/2010 CloudCom 2010 19
Adaptive Checkpointing Storage Selection
Simulate a cluster of 2048 nodes.For each node, we generate a series of failure arrivals withWeibull distribution. MTBF = 7668 Hours; shape parameter = 0.7
12/20/2010 CloudCom 2010 20
Adaptive Checkpointing Storage Selection -Metrics
12/20/2010 CloudCom 2010 21
Rework Cost
Restart Cost Useful Work
Checkpoint
Adaptive Checkpointing Storage Selection
12/20/2010 CloudCom 2010 22
Performance with Different Number of Processes
Adaptive Checkpointing Storage Selection
12/20/2010 CloudCom 2010 23
Performance with Different Number of I/O Nodes
Adaptive Checkpointing Storage Selection
12/20/2010 CloudCom 2010 24
Performance with Different Checkpointing Interval
Future Work
Release the software.More flexible node matching.How the HPC checkpointing looks like in the cloud?Adopt MapReduce as Checkponiting storage?
12/20/2010 CloudCom 2010 25
Conclusions
It is feasible to implement memory based checkpointing seamlessly.Remote memory is a promising alternative to existing disk as checkpointing storage.Memory should be used in combination with disk to guarantee reliability while achieving efficiency.
12/20/2010 CloudCom 2010 26