Upload
roland-thompson
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Improving the Scalability of Comparative Debugging with MRNet
MeSsAGE Lab (Monash Uni.)
• David Abramson
• Minh Ngoc Dinh
• Jin Chao
Cray Inc.
• Luiz DeRose
• Robert Moench
• Andrew Gontarek
Jin Chao
Outline
Assertion-based comparative debugging
The architecture of Guard
Improving the scalability of Guard with MRNet
Performance evaluation
Conclusion and future work
Challenge faced by parallel debugging (1)
Cognitive challenge Large dataset on popular supercomputer:
Cray Blue Water:•> 1.5PB aggregated memory•> 380,000 CPU cores
A large set of scientific data that is non-readable to human :•Multi-dimensional•Floating-point
What is the correct state?
Challenge faced by parallel debugging (2)
Control-flow based parallel debuggers
Limitations of visualization tools Errors in visualized presentations are hard to detect
Comparative debugging
Data-centric debugging: focusing the data set in parallel programs
Comparing state between programs Porting codes across different platforms Re-writing codes with different languages Software evolution: Modifying/improving existing codes
Comparing state with users’ expectations Invariants based on the properties of scientific modelling
or mathematical theories “Verifying” the correctness of computing status
Assertion for Comparative debugging
“An assertion is a statement about an intended state of a system’s component.”
Assertion in Guard: An ad hoc debug-time assertion Simple assertion:
assert $a::p@“source.c”:33 = 5000
31: for(j = 0; j < n; j++) {32: for (i = 0; i < m; i++) {33: p[j][j] = 5000; }}
31: for(j = 0; j < n; j++) {32: for (i = 0; i < m; i++) {33: p[j][i] = 5000; }}
Incorrect code Correct code
Assertion for Comparative debugging
“An assertion is a statement about an intended state of a system’s component.”
Assertion in Guard: An ad hoc debug-time assertion Simple assertion: assert $a::var@“source.c”:34 = 1024
Comparative assertions Observing the divergence in the key data structures as the
programs execute
assert $a::p_array@“prog1.c”:34 = $b::p_array@“prog2.c”:37
Statistical assertions Verifying the statistical properties of scientific modelling or
mathematical theories:• standard deviation• histogram
Example of statistical assertion: histogram
Histogram: user-defined abstract data model Creating the model with the two phase operation
• create $model=randset (Gaussian, 100000, 0.05)• set reduce histogram(1000, 0.0, 1.0)• assert $a::[email protected]:10~$model < 0.02
estimate operator: ~• 2 goodness of fit test
Implementation of assertionsAn assertion is compiled into a dataflow graph
Dataflow machine
set reduce “sum”assert $a::p_array@“prog.c”:34 > 1,000,000
PROCSET($a)
Go
Set BKPT
BKPT Hit
Get VAR
Compare
1,000,000
Exit
Reduce Sum(p_array)
Features of Guard (CCDB)
A general parallel debugger Supporting C, FORTRAN and MPI
A relative debugger Comparative assertions and statistical assertions
Client/Server structure: Machine independent command line client Visual Studio Client for Windows SUN One Studio and IBM Eclipse
Supporting servers from different architecture: Unix: SUN (Solaris), x86 (Linux), IBM RS6000 (AIX) Windows: Visual Studio .NET
Architecture Independent Format (AIF)
The architecture of Guard
p0 p1 p2 pn
s0 s1 s2 sn
Back-end:
Front-end:
Debug Client
GDB GDBGDB GDB
SocketNetwork:
Relative debugger
CLI
Features of MRNet
Tree-Based Overlay Networks (TBONs)
Scalable broadcast and gather
Custom data aggregation
General purpose API of MRNet
User-defined tree topology Topology file: k-ary, k-nomial and tailored layout
Communicator A set of back-ends
Stream A logical data channel over a communicator Multicast, gather and custom reduction
Packet Collection of data
Filter Modify data transferred across it WaitForAll, TimeOut, NoWait
Startup: launching back-end processes
The architecture of Guard with MRNet
p0 p1 p2 pn
Back-end:
Front-end:
MRNet Front-end
Debug Client
Command Line Interface
Comparative debugger
C Wrapper
Guard filter
sns2s1s0 MRNet BE
CP
CP
CP
GDB GDBGDB GDB
Communication with MRNetCommunication patterns in Guard
General parallel debugger
Topology Balanced, K-nomial Placement of MRNet internal nodes
• Requiring no additional resources
procset: Communicator
Channels for commands, events and I/O Command stream: synchronous channel Event and I/O Stream: asynchronous channel
Aggregating redundant messages: WaitForAll filter: synchronous channel TimerOut filter: asynchronous channel
Creating a communication tree of MRNet
Back-end:Front-end:
Guard Client
aprun apinit
p0
pn
launch helper agent
s0
sn
MRNet FE
Invoking a debug session on Cray:
GDB
GDB
Comparative assertion with MRNet (1)
assert $a::p_array@“prog1.c”:34 = $b::p_array@“prog2.c”:37
Back-end:
Front-end:
Guard client
CP
CP CP
BE sns1s0 BE sm
s1s0
CP
CP
CP
FE0 FE1
Comparative assertion with MRNet (2)Centralized comparison:
p0 p1p0 p1 p2
p3Back-end:
Front-end:
s0 s1 s2 s3 s0 s1 s2 s3d0 d1 d2 d3
s0 s1 s2 s3
d0 d1 d2 d3
s2 s3
Reconstruct and compare
MRNet MRNet
Data Reconstruction
Currently support regular decompositionsBlockmap:
Assertions require the debugger to understand data decomposition
P1
P2
P3
P4
p_array
s_array
blockmap test(P::V)define distribute(block, *)define data(1024, 1024)
end
Point-to-point comparison:
Comparative assertion with MRNet (3)
p0 p1p0 p1 p2
p3
Back-end:
Front-end:
d0 d1 d2 d3dd d1 d2 d3
r0 r1
Comparison results
d0 d1 dd d3
r2 r3
MRNet MRNet
s0 s1 sn
Statistical assertion with MRNet
Standard deviation: two phase operation Parallel: calculate a set of primary statistics Aggregation: form a full statistical model
Back-end:
Front-end: Guard client
CP
CP
CP
ps0 ps1 psn
Aggregation
Performance Evaluation
The configuration of MRNet (3.1.0) : A balanced topology Fan-out: 64
Test bed:‘Hera’ Cray XE6 Gemini 1.2 system 752 computing nodes, each with 32GB of memory. 21,760 CPU cores totally
General Parallel Debugger SP (Scalar Pentagidgonal) of NAS Parallel Benchmarks
(NPB) 3.3.1Relative Debugger
Comparative assertion: data examples from WRF Statistical assertion: molecular dynamics simulation
Scalability of Invoke Command
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x 104
0
2
4
6
8
10
12
14
16
18
The Number of Parallel Processes
Tim
e(se
cond
)
Total-MRNetMRNet InstantiationMRNet AttachmentMRNet Overall
Latency of Debugging Commands
0 2000 4000 6000 8000 10000 12000 14000 1600010
-3
10-2
10-1
100
The Number of Parallel Processes
Tim
e (s
econ
d)
All bkpt (MRNet)All step (MRNet)
Message Aggregation
A memory buffer of 256 bytes was added into the SP program. Its value was inspected under different degrees of aggregation (DoA).
0 2000 4000 6000 8000 10000 12000 14000 1600010
-3
10-2
10-1
100
The Number of Parallel Processes
Tim
e (s
econ
d)
DoA = 1(MRNet)DoA = 0.1(MRNet)DoA = 0.01(MRNet)
Comparative assertion
320 KBytes*10,000=3GB. The size of data is from WRF, a production climate model.
Statistical assertion Molecular dynamics simulation: 209*106 atoms
The atomistic mechanism of fracture accompanying structural phase transformation in AIN ceramic under hypervelocity impact.
0 2000 4000 6000 8000 10000 1200010
-1
100
101
102
The Number of Parallel Processes
Tim
e (s
econ
d)
Overall-HistogramReduction-HistogramOverall-StdevReduction-Stdev
Conclusion and Future Work
MRNet improves the scalability of Guard >20,000 debug servers
The overhead of creating an MRNet tree Attachment: Instantiation: one BE process per core
How to take advantage of computing capability of MRNet Comparison across multiple trees?
• Building a tree across multiple aprun? Programming filter for handling aggregations of
statistical assertions The best way of maintaining C wrapper for C++
interface ?
Related publications• Dinh M.N., Abramson D., Jin C., Gontarek A., Moench R., and DeRose L.,
Debugging Scientific Applications With Statistical Assertion, in International Conference on Computational Science (ICCS), Omaha, USA, 2012.
• Dinh M.N., Abramson D., Jin C., Gontarek A., Moench R., and DeRose L., Scalable parallel debugging with statistical assertions. ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPoPP) 2012, New Orleans, USA. (Poster session)
• Jin C., Abramson D., Dinh M.N., Gontarek A., Moench R., and DeRose L., A Scalable Parallel Debugging Library with Pluggable Communication Protocols, the 12th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2012),Ottwa, Canada, May 2012.
• Dinh M.N., Abramson D., M.N., Kurniawan, Jin C., Moench R., and DeRose L., Assertion based parallel debugging, the 11th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2011), Newport Beach,USA, May 2011.
• Abramson, D., Dinh, M.N., Kurniawan, D., Moench, B. and DeRose, L. Data Centric Highly Parallel Debugging, 2010 International Symposium on High Performance Distributed Computing (HPDC 2010), Chicago, USA, June 2010.
• Abramson D., Foster, I., Michalakes, J. and Sosic R., Relative Debugging and its Application to the Development of Large Numerical Models, Proceedings of IEEE Supercomputing 1995, San Diego, December 95. (Best paper award)