58
HPC Hardware Overview Luke Wilson September 13, 2011 Texas Advanced Computing Center The University of Texas at Austin

HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

HPC Hardware Overview

Luke Wilson September 13, 2011

Texas Advanced Computing Center

The University of Texas at Austin

Page 2: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Outline

• Some general comments

• Lonestar System – Dell blade-based system – InfiniBand ( QDR) – Intel Processors

• Ranger System – Sun blade-based system – InfiniBand (SDR) – AMD Processors

• Longhorn System – Dell blade-based system – InfiniBand (QDR) – Intel Processors

Page 3: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

About this Talk

• We will focus on TACC systems, but much of the information applies to HPC systems in general

• As an applications programmer you may not care about hardware details, but… – We need to think about it because of performance

issues – I will try to give you pointers to the most relevant

architecture characteristics

• Do not hesitate to ask questions as we go

Page 4: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

High Performance Computing

• In our context, it refers to hardware and software tools dedicated to computationally intensive tasks

• Distinction between HPC center (throughput focused) and Data center (data focused) is becoming fuzzy

• High bandwidth, low latency – Memory

– Network

Page 5: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Lonestar: Intel hexa-core system

Page 6: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Lonestar: Introduction • Lonestar Cluster

– Configuration & Diagram – Server Blades

• Dell PowerEdge M610 Blade (Intel Hexa-Core) Server Nodes

• Microprocessor Architecture Features – Instruction Pipeline – Speeds and Feeds – Block Diagram

• Node Interconnect – Hierarchy – InfiniBand Switch and Adapters – Performance

Page 7: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Lonestar Cluster Overview

lonestar.tacc.utexas.edu

Page 8: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Lonestar Cluster Overview

Hardware Components Characteristics

Peak Performance 302 TFLOPS

Nodes 2 Hexa-Core Xeon 5680 1888 nodes / 22656 cores

Memory 1333 MHz DDR3 DIMMS 24 GB/node, 45 TB total

Shared Disk Lustre parallel file system 1 PB

Local Disk SATA 146GB/node, 276 TB total

Interconnect Infiniband 4 GB/sec P-2-P

Page 9: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Blade : Rack : System

• 1 node : 2 x 6 cores = 12 cores • 1 chassis : 16 nodes = 192 cores • 1 rack : 3 x 16 nodes = 576 cores • 39⅓ racks : 22656 cores

16 blades

3 chassis

Page 10: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Lonestar login nodes

• Dell PowerEdge M610

– Intel dual socket Xeon hexa-core 3.33GHz

– 24 GB DDR3 1333 MHz DIMMS

– Intel QPI 5520 Chipset

• Dell 1200 PowerVault

– 15 TB HOME disk

– 1 GB user quota(5x)

Page 11: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Lonestar compute nodes

• 16 Blades / 10U chassis

• Dell PowerEdge M619 – Dual socket Intel hexa-core Xeon

• 3.33 GHz(1.25x) • 13.3 GFLOPS/core(1.25x) • 64 KB L1 cache (independent) • 12 MB L2 cache (unified)

– 24 GB DDR3 1333 MHz DIMMS – Intel QPI 5520 Chipset – 2x QPI 6.4 GT/s – 146 GB 10k RPM SAS-SATA local disk (/tmp)

Page 12: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Motherboard

IOH

I/O Hub PCIe

ICH

I/O Controller

Hexa-core

Xeon

Hexa-core

Xeon

DDR3 1333

12 CPU Cores

QPI

DMI

DDR3 1333

Page 13: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Intel Xeon 5600(Westmere) • 32 KB L1 cache/core

• 256 KB L2 cache/core

• Shared 12 MB L3 cache

Core 1

Core 2

Page 14: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Cluster Interconnect

16 independent 4 GB/s connections/chassis

Page 15: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Lonestar Parallel File Systems: Lustre &NFS

$HOME

$WORK

$SCRATCH

15TB

1GB/

user

?TB

841 TB

1

2

3

Uses IP over IB

InfiniBand

Ethernet

Page 16: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Ranger: AMD Quad-core system

Page 17: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Ranger: Introduction

• Unique instrument for computational scientific research

• Housed at TACC’s new machine room

• Over 2 ½ years of initial planning and deployment efforts

• Funded by the National Science Foundation as part of a unique program to reinvigorate High Performance Computing in the United States (Office of Cyberinfrastructure)

ranger.tacc.utexas.edu

Page 18: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

How Much Did it Cost and Who’s Involved?

• TACC selected for very first NSF ‘Track2’ HPC system

– $30M system acquisition

– Sun Microsystems is the vendor

– Very Large InfiniBand Installation • ~4100 endpoint hosts

• >1350 MT47396 switches

• TACC, ICES, Cornell Theory Center, Arizona State HPCI are teamed to operate/support the system four 4 years ($29M)

Page 19: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Ranger: Performance

• Ranger debuted at #4 on the Top 500 list (ranked #11 as of June 2010)

Page 20: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Ranger Cluster Overview

Hardware Components Characteristics

Peak Performance 579 TFLOPS

Nodes 4 Quad-Core Opteron 3,936 nodes / 62,976 cores

Memory 667MHz DDR2 DIMMS

1GHz HyperTransport

32 GB/node, 123 TB total

Shared Disk Lustre parallel file system 1.7 PB

Local Disk NONE (flash-based) 8 Gb

Interconnect Infiniband (generation 2) 1 GB/sec P-2-P

2.3 μs latency

Page 21: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Ranger Hardware Summary

• Compute power - 579 Teraflops – 3,936 Sun four-socket blades – 15,744 AMD “Barcelona” processors

• Quad-core, four flops/cycle (dual pipelines)

• Memory - 123 Terabytes – 2 GB/core, 32 GB/node – ~20 GB/sec memory B/W per node (667 MHz DDR2)

• Disk subsystem - 1.7 Petabytes – 72 Sun x4500 “Thumper” I/O servers, 24TB each – 40 GB/sec total aggregate I/O bandwidth – 1 PB raw capacity in largest filesystem

• Interconnect - 10 Gbps /1.6 – 2.9 sec latency – Sun InfiniBand-based switches (2), up to 3456 4x ports each – Full non-blocking 7-stage Clos fabric – Mellanox ConnectX InfiniBand (second generation)

Page 22: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Ranger Hardware Summary (cont.)

• 25 Management servers - Sun 4-socket x4600s – 4 Login servers, quad-core processors

– 1 Rocks master, contains software stack for nodes

– 2 SGE servers, primary batch server and backup

– 2 Sun Connection Management servers, monitors hardware

– 2 InfiniBand Subnet Managers, primary and backup

– 6 Lustre Meta-Data Servers, enabled with failover

– 4 Archive data-movers, move data to tape library

– 4 GridFTP servers, external multi-stream transfer

• Ethernet Networking - 10Gbps Connectivity – Two external 10GigE networks: TeraGrid, NLR

– 10GigE fabric for login, data-mover and GridFTP nodes, integrated into existing TACC network infrastructure

– Force10 S2410P and E1200 switches

Page 23: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Infiniband Cabling in Ranger

• Sun switch design with reduced cable count, manageable, but still a challenge to cable – 1312 InfiniBand 12x to 12x cables – 78 InfiniBand 12x to three 4x splitter cables – Cable lengths range from 7-16m, average 11m

• 9.3 miles of InfiniBand cable total (15.4 km)

Page 24: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Space, Power and Cooling

• System Power: 3.0 MW total

• System: 2.4 MW – ~90 racks, in 6 row arrangement – ~100 in-row cooling units – ~4000 ft2 total footprint

• Cooling: ~0.6 MW – In-row units fed by three 400-ton chillers – Enclosed hot-aisles – Supplemental 280-tons of cooling from CRAC units

• Observations: – Space less an issue than power – Cooling > 25kW per rack difficult – Power distribution a challenge, more than 1200 circuits

Page 25: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

External Power and Cooling Infrastructure

Page 26: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Switches in Place

Page 27: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

InfiniBand Cabling in Progress

Page 28: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Ranger Features

• AMD Processors: – HPC Features 4 FLOPS/CP – 4 Sockets on a board – 4 Cores per socket – HyperTransport (Direct Connect between sockets) – 2.3 GHz core – Any idea what the peak floating-point performance of a node is?

• 2.3 GHz * 4 Flops/CP * 16 cores = 147.2 GFlops Peak Performance

– Any idea how much an application can sustain? • Can sustain over 80% of peak with DGEMM (matrix-matrix multiply)

• NUMA Node Architecture (16 cores per node, think hybrid)

• 2-tier InfiniBand (NEM – “Magnum”) Switch System

• Multiple Lustre (Parallel) File Systems

Page 29: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

GigE

InfiniBand

Ranger Architecture

Magnum

InfiniBand

Switches

I/O Nodes WORK File System

internet

Login

Nodes

1

72

1

82

X4600

X4600

3,456 IB ports,

each 12x Line

splits into 3 4x

lines.

Bisection BW =

110Tbps.

“C48”

Blades

Compute

Nodes

4 sockets

X 4 cores

24 TB

each

Thumper

X4500 Metadata Server

X4600

1 per

File Sys.

Page 30: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Ranger Infiniband Topology

…78…

“Magnum”

Switch

12x InfiniBand

3 cables combined

NEM

NEM

NEM

NEM

NEM

NEM

NEM

NEM NEM

NEM

NEM

NEM NEM

NEM

NEM

NEM

Page 31: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

MPI Tests: P2P Bandwidth

0

200

400

600

800

1000

1200

1 10 100 1000 10000 100000 1000000 1000000

0

1E+08

Message Size (Bytes)

Ba

nd

wid

th (

MB

/se

c)

Ranger - OFED 1.2 - MVAPICH 0.9.9

Lonestar - OFED 1.1 MVAPICH 0.9.8

Effictive Bandwith is improved

at smaller message size

Point-to-Point MPI Measured Performance

• Shelf Latencies: ~1.6 µs

• Rack Latencies: ~2.0 µs

• Peak BW: ~965 MB/s

Page 32: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

• Able to sustain ~73% bisection bandwidth efficiency with all 3936 nodes communicating simultaneously (82 racks)

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0 20 40 60 80 100

# of Ranger Compute Racks

Bis

ecti

on

BW

(G

B/s

ec)

.

Ideal

Measured

0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

120.0%

1 2 4 8 16 32 64 82

# of Ranger Compute Racks

Fu

ll B

isecti

on

BW

Eff

icie

ncy .

Ranger: Bisection BW Across 2 Magnums

Page 33: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Sun Motherboard for AMD Barcelona Chips Compute Blade

4 Sockets

4 Cores

8 Memory

Slots/Socket

HyperTransport

1 GHz

Page 34: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Sun Motherboard for AMD Barcelona Chips

8.3

GB

/s

8.3

GB

/s

8.3

GB

/s

8.3

GB

/s

Two PCIe x8 – 32Gbps

One PCIe x4 – 16Gbps

Passiv

e M

idp

lan

e

A maximum neighbor NUMA Configuration for 3-port HyperTransport.

HyperTransport Bidirectional is 6.4GB/s, Unidirectional is 3.2GB/s.

Dual Channel, 533MHz Registered, ECC Memory

NEM Switch

Page 35: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Intel/AMD Dual- to Quad-core Evolution

MCH MCH

AMD Barcelona/Phenom Quad-Core AMD Opteron Dual-Core

Intel Clovertown Intel Woodcrest

Page 36: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Caches in Quad Core CPUs

Memory Controller

Core

1

L1

Core

2

L1

L2

Core

4

L1

Core

3

L1

L2

Memory Controller

L2

Core

1

L1

L2

Core

2

L1

Core

3

L1

L2 L2

Core

4

L1

L3

Intel Quad-Core

L2 caches are not independent

AMD Quad-Core

L2 caches are independent

Page 37: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

• HT link bandwidth is 24 GB/s

(8 GB/s per link)

•Memory controller bandwidth

up to 10.7 GB/s (667 MHz)

Cache sizes in AMD Barcelona

L2

512 KB

Core 1

L1

64 KB

Core 2

L1

64 KB

Core 3

L1

64 KB

Core 4

L1

64 KB

L3

2 MB

L2

512 KB

L2

512 KB

L2

512 KB

System Request Interface

Crossbar Switch

HT 0 Mem Ctrl HT 1 HT 2

Page 38: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Other Important Features

• AMD Quad-core (K10, code name Barcelona)

• Instruction fetch bandwidth now 32 bytes/cycle

• 2MB L3 cache on-die; 4 x 512KB L2 caches; 64KB L1 Instruction & Data caches.

• SSE units are now 128-bit wide -> single-cycle throughput; improved ALU and FPU throughput

• Larger branch prediction tables, higher accuracies

• Dedicated stack engine to pull stack-related ESP updates out of the instruction stream

Page 39: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

AMD 10h Processor

Page 40: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Speeds and Feeds

On Die

Registers L1 Data L2

External

Memory

3 CP ~15 CP ~300 CP Latency

Load Speed

Store Speed

4 W/CP

2 W/CP

2 W/CP

1 W/CP

0.5 W/CP

0.5 W/CP

Cache line size (L1/L2) is 8W

4 FLOPS / CP

64 KB 512 KB

L3

2 MB

~25 CP

2x @ 533MHz

DDR2 DIMMS

W : FP Word (64 bit)

CP : Clock Period

Cache States: MOESI (Modified, Owner, Exclusive, Shared, Invalid)

MOESI is beneficial when latency/bandwidth between

cpus is significantly better than main memory

Page 41: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Ranger Disk Subsystem - Lustre

• Disk system (OSS) is based on Sun x4500 “Thumper” – Each server has 48 SATA II 500 GB drives (24TB total) - running internal

software RAID – Dual Socket/Dual-Core Opterons @ 2.6 GHz – 72 Servers Total: 1.7 PB raw storage (that’s 288 cores just to drive the

file systems)

• Metadata Servers (MDS) based on SunFire x4600s

• MDS is Fibre-channel connected to 9TB Flexline Storage

• Target Performance – Aggregate bandwidth: 40 GB/sec

Page 42: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Ranger Parallel File Systems: Lustre

$HOME

$WORK

$SCRATCH

96TB

6GB/

user

193TB

773TB

36 OSTs

72 OSTs

300 OSTs

6

Thumpers

12

Thumpers

50

Thumpers

…81

12 blades/Chassis

blade

0

1

2

3

Uses IP over IB

InfiniBand Switch

Page 43: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

I/O with Lustre over Native InfiniBand

• Max. total aggregate performance of 38 or 46 GB/sec depending on stripecount (Design Target = 32GB/sec)

• External users have reported performance of ~35 GB/sec with a 4K application run

$SCRATCH File System Throughput

0.00

10.00

20.00

30.00

40.00

50.00

60.00

1 10 100 1000 10000

# of Writing Clients

Wri

te S

pe

ed

(G

B/s

ec

)

.

Stripecount=1

Stripecount=4

$SCRATCH Single Application Performance

0.00

10.00

20.00

30.00

40.00

50.00

60.00

1 10 100 1000 10000# of Writing Clients

Wri

te S

peed

(G

B/s

ec)

. Stripecount=1

Stripecount=4

Page 44: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Longhorn: Intel Quad-core system

Page 45: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Longhorn: Introduction

• First NSF eXtreme Digital Visualization grant (XD Vis)

• Designed for scientific visualization and data analysis

– Very large memory per computational core

– Two NVIDIA Graphics cards per node

– Rendering performance 154 billion triangles/second

Page 46: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Longhorn Cluster Overview

Hardware Components Characteristics

Peak Performance (CPUs) 512 Intel Xeon E5400 20.7 TFLOPS

Peak Performance (GPUs, SP) 128 NVIDIA

Quadroplex 2200 S4 500 TFLOPS

System Memory DDR3-DIMMS

48 GB/node (240 nodes)

144 GB/node (16 nodes)

13.5 TB total

Graphics Memory 512 FX5800 x 4GB 2 TB

Disk Lustre parallel file

system 210 TB

Interconnect QDR Infiniband 4 GB/sec P-2-P

Page 47: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Longhorn “fat” nodes

• Dell R710

– Intel dual socket quad-core Xeon E5400 @ 2.53GHz

– 144 GB DDR3 ( 18 GB/core )

– Intel 5520 chipset

• NVIDIA Quadroplex 2200 S4

– 4 NVIDIA Quadro FX5800

• 240 CUDA cores

• 4 GB Memory

• 102 GB/s Memory bandwidth

Page 48: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Longhorn standard nodes

• Dell R610

– Dual socket Intel quad-core Xeon E5400 @ 2.53 GHz

– 48 GB DDR3 ( 6 GB/core )

– Intel 5520 chipset

• NVIDIA Quadroplex 2200 S4

– 4 NVIDIA Quadro FX5800

• 240 CUDA cores

• 4 GB Memory

• 102 GB/s Memory bandwidth

Page 49: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Motherboard (R610/R710)

Intel 5520

Quad Core

Intel Xeon

5500

DD

R3

DD

R3

DD

R3

DD

R3

DD

R3

DD

R3

DD

R3

DD

R3

DD

R3

Quad Core

Intel Xeon

5500

DD

R3

DD

R3

DD

R3

DD

R3

DD

R3

DD

R3

DD

R3

DD

R3

DD

R3

42 lanes PCI Express or 36 lines PCI Express 2.0

12 DIMM slots (R610)

18 DIMM slots (R710)

Page 50: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Cache Sizes in Intel Nehalem

L2

256 KB

Core 1

L1

32 KB

Core 2

L1

32 KB

Core 3

L1

32 KB

Core 4

L1

32 KB

L3

8 MB

L2

256 KB

L2

256 KB

L2

256 KB

Memory Controller

QPI 0 QPI 1

Shared L3 Cache

Core 1

Core 2

Core 3

Core 4

Memory Controller

QP

I 0

QP

I 1

• Total QPI bandwidth up to

25.6 GB/s (@ 3.2 GHz)

• 2 20-lane QPI links

• 4 quadrants (10 lanes each)

Page 51: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Nehalem μArchitecture

Page 52: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Storage

• Ranch – Long term tape storage

• Corral – 1 PB of spinning disk

Page 53: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Ranch Archival System

•Sun Storage Tek Silo

•10,000 tapes

•10 PB capacity

•Used for long-term

storage

Page 54: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Corral

• 1.2 PB DataDirect Networks online disk storage

• 8 Dell 1950 and 8 Dell 2950 servers

• High Performance Parallel File System – Multiple databases – iRODS data

management – Replication to tape

archive

• Multiple levels of access control

• Web and other data access available globally

Page 55: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

References

Page 56: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Lonestar Related References

User Guide services.tacc.utexas.edu/index.php/lonestar-user-guide

Developers

www.tomshardware.com

www.intel.com/p/en_US/products/server/processor/xeon5000/technical-documents

Page 57: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Ranger Related References

User Guide

services.tacc.utexas.edu/index.php/ranger-user-guide

Forums

forums.amd.com/devforum

www.pgroup.com/userforum/index.php

Developers

developer.amd.com/home.jsp

developer.amd.com/rec_reading.jsp

Page 58: HPC Hardware Overview - Texas Advanced Computing Center · 2011-09-11 · •Sun switch design with reduced cable count, manageable, but still a challenge to cable –1312 InfiniBand

Longhorn Related References

User Guide

services.tacc.utexas.edu/index.php/longhorn-user-guide

General Information

www.intel.com/technology/architecture-silicon/next-gen/

www.nvidia.com/object/cuda_what_is.html

Developers

developer.intel.com/design/pentium4/manuals/index2.htm

developer.nvidia.com/forums/index.php