12
Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005

Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005

  • Upload
    matia

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005. Outline. An engineering level overview of the HW and SW that make up jacquard. CPU’s Memory OS Interconnect Will use seaborg as a point of reference. G. P. F. S. main memory. GPFS. - PowerPoint PPT Presentation

Citation preview

Page 1: Jacquard:  Architecture and Application  Performance Overview NERSC Users’ Group October 2005

Jacquard: Architecture and Application

Performance Overview

NERSC Users’ GroupOctober 2005

Page 2: Jacquard:  Architecture and Application  Performance Overview NERSC Users’ Group October 2005

Outline

An engineering level overview of the HW and SW that make up jacquard.

1) CPU’s

2) Memory

3) OS

4) Interconnect

Will use seaborg as a point of reference.

Page 3: Jacquard:  Architecture and Application  Performance Overview NERSC Users’ Group October 2005

Colony Switch

Colony Switch

PG F S

seaborg.nersc.gov (review?)

Resource Speed Bytes

Registers 3 ns 256 B

L1 Cache 5 ns 32 KB

L2 Cache 45 ns 8 MB

Main Memory 300 ns 16 GB

Remote Memory 19 us 7 TB

GPFS 10 ms 50 TB

HPSS 5 s 9 PB

380 x

HPSSHPSS

CSS0

CSS1

•6080 dedicated CPUs, 96 shared login CPUs•Hierarchy of caching, speeds•Bottleneck determined by first depleted resource

16 way SMP NHII Node

Seaborg:

crossbar

main memoryGPFSMPI

Page 4: Jacquard:  Architecture and Application  Performance Overview NERSC Users’ Group October 2005

Infiniband Switch

Infiniband Switch

PG F S

jacquard.nersc.gov basics

Resource Speed Bytes

Registers 0.5 ns 2 KB

L1 Cache 1.5 ns 64 KB

L2 Cache 45 ns 1 MB

Main Memory 70-117 ns 6 GB

Remote Memory 5 us 2 TB

GPFS 10 ms 15 TB

HPSS 5 s 9 PB

320 x

HPSSHPSS

IB

•640 dedicated CPUs, 8 shared login CPUs•Smaller caches, HT, Really Fast•SMP? NUMA? SUMO.

2 way Opteron node

Jacquard:

Main MemoryGPFSMPI

HT

Page 5: Jacquard:  Architecture and Application  Performance Overview NERSC Users’ Group October 2005

Opteron Block Diagram : Not strictly SMP

1 TLB per CPU1K entries 4K pages 4MB coverage

SDRAM SDRAM

Switch, I/O

Page 6: Jacquard:  Architecture and Application  Performance Overview NERSC Users’ Group October 2005

Hyper Transport: Good Stuff

Little conflict between data movement and computation

Page 7: Jacquard:  Architecture and Application  Performance Overview NERSC Users’ Group October 2005

SMP size and memory contention

Jacquard’s numbers1 task : 100 %2 tasks: 98%

Why is Jacquard

2 way SMP?

Page 8: Jacquard:  Architecture and Application  Performance Overview NERSC Users’ Group October 2005

Flops @ 2.2 GHz

• Peak Theoretical Flops–Double (64 bit) floats : 1 add + 1 mult = 2.2 GFlop/s–Single (32 bit) floats : 2 add + 2 mult = 4.4 GFlop/s

• Peak Realized Flops–Double (64 bit) floats : 1.9 GFlop/s–Single (32 bit) floats : 3.4 GFlop/s

• Your Flops?– Walltime is more important than flops– For a known algorithm flops are a sanity check

Memory BW4 GB/sec per CPU

Page 9: Jacquard:  Architecture and Application  Performance Overview NERSC Users’ Group October 2005

MPI Bandwidth: seaborg

Page 10: Jacquard:  Architecture and Application  Performance Overview NERSC Users’ Group October 2005

MPI Bandwidth: Jacquard

Page 11: Jacquard:  Architecture and Application  Performance Overview NERSC Users’ Group October 2005

Linux for AIX Users

Linux and AIX are more similar than different

• Linux is not as good as AIX in keeping processes scheduled of the same CPU processor affinity work.

• Linux has easy interfaces to architectural and process performance information /proc/cpuinfo, /proc/self, etc.

• AIX MPI is in /usr/{bin,lib}, Linux MPI is in modules

• Linux doesn’t need –bmaxdata !

• Little vs. Big Endian

Page 12: Jacquard:  Architecture and Application  Performance Overview NERSC Users’ Group October 2005

Conclusions

• The underlying HW technologies HT, IB, etc. are quite promising. Opteron systems are delivering great price/performance.

• Still working some SDRAMM, OS, and SW issues.

• What’s useful to you? Let us know.