Upload
matia
View
23
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005. Outline. An engineering level overview of the HW and SW that make up jacquard. CPU’s Memory OS Interconnect Will use seaborg as a point of reference. G. P. F. S. main memory. GPFS. - PowerPoint PPT Presentation
Citation preview
Jacquard: Architecture and Application
Performance Overview
NERSC Users’ GroupOctober 2005
Outline
An engineering level overview of the HW and SW that make up jacquard.
1) CPU’s
2) Memory
3) OS
4) Interconnect
Will use seaborg as a point of reference.
Colony Switch
Colony Switch
PG F S
seaborg.nersc.gov (review?)
Resource Speed Bytes
Registers 3 ns 256 B
L1 Cache 5 ns 32 KB
L2 Cache 45 ns 8 MB
Main Memory 300 ns 16 GB
Remote Memory 19 us 7 TB
GPFS 10 ms 50 TB
HPSS 5 s 9 PB
380 x
HPSSHPSS
CSS0
CSS1
•6080 dedicated CPUs, 96 shared login CPUs•Hierarchy of caching, speeds•Bottleneck determined by first depleted resource
16 way SMP NHII Node
Seaborg:
crossbar
main memoryGPFSMPI
Infiniband Switch
Infiniband Switch
PG F S
jacquard.nersc.gov basics
Resource Speed Bytes
Registers 0.5 ns 2 KB
L1 Cache 1.5 ns 64 KB
L2 Cache 45 ns 1 MB
Main Memory 70-117 ns 6 GB
Remote Memory 5 us 2 TB
GPFS 10 ms 15 TB
HPSS 5 s 9 PB
320 x
HPSSHPSS
IB
•640 dedicated CPUs, 8 shared login CPUs•Smaller caches, HT, Really Fast•SMP? NUMA? SUMO.
2 way Opteron node
Jacquard:
Main MemoryGPFSMPI
HT
Opteron Block Diagram : Not strictly SMP
1 TLB per CPU1K entries 4K pages 4MB coverage
SDRAM SDRAM
Switch, I/O
Hyper Transport: Good Stuff
Little conflict between data movement and computation
SMP size and memory contention
Jacquard’s numbers1 task : 100 %2 tasks: 98%
Why is Jacquard
2 way SMP?
Flops @ 2.2 GHz
• Peak Theoretical Flops–Double (64 bit) floats : 1 add + 1 mult = 2.2 GFlop/s–Single (32 bit) floats : 2 add + 2 mult = 4.4 GFlop/s
• Peak Realized Flops–Double (64 bit) floats : 1.9 GFlop/s–Single (32 bit) floats : 3.4 GFlop/s
• Your Flops?– Walltime is more important than flops– For a known algorithm flops are a sanity check
Memory BW4 GB/sec per CPU
MPI Bandwidth: seaborg
MPI Bandwidth: Jacquard
Linux for AIX Users
Linux and AIX are more similar than different
• Linux is not as good as AIX in keeping processes scheduled of the same CPU processor affinity work.
• Linux has easy interfaces to architectural and process performance information /proc/cpuinfo, /proc/self, etc.
• AIX MPI is in /usr/{bin,lib}, Linux MPI is in modules
• Linux doesn’t need –bmaxdata !
• Little vs. Big Endian
Conclusions
• The underlying HW technologies HT, IB, etc. are quite promising. Opteron systems are delivering great price/performance.
• Still working some SDRAMM, OS, and SW issues.
• What’s useful to you? Let us know.