Upload
randall-walsh
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Overview of the NewBlue Gene/L Computer
Dr. Richard D. LoftDeputy Director of R&D
Scientific Computing DivisionNational Center for Atmospheric Research
Outline
• What is Blue Gene/L and why is it interesting?
• How did one end up at NCAR?
• What is the objective of the NCAR Blue Gene/L project?
• What is the status of it?
• How do I get an account on Blue Gene/L?
Why Blue Gene/L is Interesting
•Features•Massive parallelism - fastest in world. (137 Tflops)•Achieves high packaging density. (2048 pes/rack)•Lower power per processor. (25 KW/rack) •Dedicated reduction network. (solver scalability)•Puts network interfaces on chip. (embedded tech.)•Conventional programming model:
•xlf90, xlcc compiler •MPI
Fuel Efficiency: Gflops/Watt
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BlueGene/L DD2 beta-System (0.7 GHz PowerPC 440)
SGI Altix 1.5 GHz, Voltaire Infiniband
Earth-Simulator
eServer BladeCenter JS20+ (PowerPC970 2.2 GHz), Myrinet
Intel Itanium2 Tiger4 1.4GHz - QuadricsASCI Q - AlphaServer SC45, 1.25 GHz
1100 Dual 2.3 GHz Apple XServe/Mellanox Infiniband 4X/Cisco GigE
BlueGene/L DD1 Prototype (0.5GHz PowerPC 440 w/Custom)
eServer pSeries 655 (1.7 GHz Power4+)
PowerEdge 1750, P4 Xeon 3.06 GHz, Myrinet
eServer pSeries 690 (1.9 GHz Power4+)eServer pSeries 690 (1.9 GHz Power4+)LNX Cluster, Xeon 3.4 GHz, Myrinet
RIKEN Super Combined Cluster
BlueGene/L DD2 Prototype (0.7 GHz PowerPC 440)
Integrity rx2600 Itanium2 1.5 GHz, QuadricsDawning 4000A, Opteron 2.2 GHz, Myrinet
Opteron 2 GHz, Myrinet
MCR Linux Cluster Xeon 2.4 GHz - Quadrics
ASCI White, SP Power3 375 MHz
SP Power3 375 MHz 16 way
TeraGrid, Itanium2 1.3/1.5 GHZ, Myrinet
eServer Opteron 2.2 GHz. Myrinet
xSeries Cluster Xeon 2.4 GHz - Quadrics
eServer pSeries 655/690 (1.5/1.7 Ghz Power4+)
xSeries Xeon 3.06 GHz, Quadrics
eServer pSeries 690 (1.7 GHz Power4+)
AIST Super Cluster P-32, Opteron 2.0 GHz, Myrinet
Cray X1
eServer pSeries 690 (1.7 GHz Power4+)
Gflops/Watt
Top 20 systemsBased on processor power rating only
Blue Gene/LSystems
BG/L Questions/Limitations
•Questions•High reliability? (1/N effect)•Applications for 100k processors? (Amdahl’s Law)•System robustness: I/O, scheduling flexibility.
•Limitations•Node Memory Limitation (512 MB/node)•Partitioning is quantized (power of two)•Simple node kernel - (no: forks-> threads -> OMP)•No support for multiple executables.
BlueGene/L ASIC
PLB (4:1)
“Double FPU”
Ethernet Gbit
JTAGAccess
144 bit wide DDR256MB
JTAG
Gbit Ethernet
440 CPU
440 CPUI/O proc
L2
L2
MultiportedSharedSRAM Buffer
Torus
DDR Control with ECC
SharedL3 directoryfor EDRAM
Includes ECC
4MB EDRAM
L3 CacheorMemory
l
6 out and6 in, each at 1.4 Gbit/s link
256
256
1024+144 ECC256
128
128
32k/32k L1
32k/32k L1
2.7GB/s
22GB/s
11GB/s
“Double FPU”
5.5GB/s
5.5 GB/s
256
snoop
Tree
3 out and3 in, each at 2.8 Gbit/s link
GlobalInterrupt
4 global barriers orinterrupts
128
The Blue Gene/L Architecture
BlueGene/L Has Five Networks3-Dimensional Torus
– interconnects all compute nodes – 175 MB/sec/link bidirectional
Global Tree– point-to-point, one-to-all broadcast, reduction functionality– 1.5 microsecond latency ( @64K node )
Global Interrupts– AND/OR operations for global barriers – 1.5 microseconds latency (64K system)
Ethernet– incorporated into every node ASIC– active in the I/O nodes (1:64 in LLNL configuration)
• 1K 1Gbit links – all external comm. (file I/O, control, user interaction, etc.)
JTAG (Control)
BlueGene/L System Software Architecture
• User applications execute exclusively in the compute nodes
– avoid asynchronous events (e.g., daemons, interrupts)
• The outside world interacts only with the I/O nodes, an offload engine
– standard solution: Linux
• Machine monitoring and control also offloaded to service nodes: large SP system or Linux cluster.
Blue Gene/L system overview
Blue Gene/L @ NCAR
How did one get to NCAR?
• MRI proposal in partnership with CU’s
• Elements of MRI proposal to NSF: proving out an experimental architecture.– Application porting and scalability– System software testing
• Parallel file systems (Lustre, GPFS)• Schedulers (LSF, SLURM, COBALT)
– Education
BlueGene/L Collaboration
NCAR
CU Denver
CU Boulder
Blue Gene/L
BlueGene/L Collaborators
• NCAR– Richard Loft– Janice Coen– Stephen Thomas– Wojciech Grabowski
• CU Boulder– Henry Tufo– Xiao-Chuan Cai– Charbel Farhat– Thomas Manteuffel– Stephen McCormick
• CU Denver– Jan Mandel– Andrew Knyazev
Blue Gene/L
Details of NCAR/CU Blue Gene/L
• 2048 processors, 5.73 Tflops peak• 4.61 Tflops on Linpack Benchmark• Unofficially, 33rd fastest system in the world (in
one rack!)• 6 Tbytes of high performance disk• Delivered to Mesa Lab: March 15th• Acceptance tests
– began March 23rd.– Completed March 28th.– First PI meeting March 30th.
BG/L Front-End Architecture
Bring-up of Frost BG/L System
• Criteria for readiness– Scheduler– Fine Grain Partitions– I/O subsystem ready– MSS connection
Current “Frost” BG/L Status
• MSS connections in place.• I/O system issues appear to be behind us.• Partition definitions (512,256,128, 4x32) in place.• Codes ported: POP, WRF, HOMME, BOB, BGC5
(pointwise)• Biggest apps issue: memory footprint• Establishing relationships with other centers
– BG/L Consortium membership– Other BG/L sites: SDSC, Argonne, LLNL, Edinburgh
“Frost” BG/L I/O performance
mean aggregate I/O rates on compute nodes
0
100
200
300
400
500
600
700
800
0 200 400 600 800 1000 1200
number of concurrent processes
throughput (MB/sec) write rate
read rate
-each process wrote or read 1 GB of data
-I/O request size was 1 MB
Blue Gene/L “Frost” scheduler status
• IRC chat room scheduler - “hey, get off!” …done• LLML SLURM scheduler -testing
– has been installed, tested, available for 512 node “midplane” partitions only.
– LLNL testbed system will be used to port SLURM to smaller
partitions. • Argonne Cobalt scheduler - being installed
– DB2 Client on the FEN – Python– Elementtree (XML process library for Python)– Xerces (XML parser)– Supporting libraries (Openssl)
• Platform LSF - development account provided.
MRI Investigator Phase
• MRI Investigator access only – Users related to MRI proposal– Porting/testing evaluation
• Applications– HOMME atmospheric GCM dycore (Thomas)– Wildfire modeling (Coen)– Scalable solvers - algebraic multigrid (Manteuffel,
McCormick)– Numerical Flight Test Simulation (Farhat)– WRF - high resolution (Hacker)
User Access to Frost
• Cycles split – 50% UCAR – 40% CU Boulder– 10% CU Denver
• Interested users (access policy TBD)– UCAR: contact [email protected]– CU: contact [email protected]
Questions?