The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories smkelly Abstract: Sandia National

The Red StormHigh Performance Computer

March 19, 2008

Sue KellySandia National Laboratories

http://www.sandia.gov/~smkelly

Abstract: Sandia National Laboratories has a long history of successfully applying massively parallel processing (MPP) technology to solve problems in the national interest for the US Department of Energy. We drew upon our experiences with numerous architectural and design features when planning the Red Storm computer system. This talk will present the key issues that were considered. Important principles are performance balance between the hardware components and scalability of the system software.

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Companyfor the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

• (n.) A branch of computer science that concentrates on developing supercomputers and software to run on supercomputers. A main area of this discipline is developing parallel processing algorithms and software: programs that can be divided into little pieces so that each piece can be executed simultaneously by separate processors. (http://www.webopedia.com/TERM/H/High_Performance_Computing.html)

• The idea/premise of parallel processing is not new (http://www.sandia.gov/ASC/news/stories.html#nineteen-twenty-two)

What is High Performance Computing?

http://www.webopedia.com/TERM/H/High_Performance_Computing.html

http://www.webopedia.com/TERM/H/High_Performance_Computing.html

http://www.sandia.gov/ASC/news/stories.html#nineteen-twenty-two

Red Storm – a First Look

• Sandia/Cray Inc. partnership:– Sandia architecture– Sandia & Cray System Software– Cray engineering and manufacturing– Sandia systems HW/SW expertise

Red Storm is a Massively Parallel Processor

Supercomputer

Users/home

Parallel I/OCompute PartitionService

Net I/O

• 12,960 2.4 GHz Dual Core Opterons for computation (called nodes)

• 2 GB Memory per core (in progress)

Usage Model

Linux Login (Service)

Node

ComputeResource

I/O

Key Performance Characteristics that Lead to a Balanced system

• 124.42 TeraFLOPS (trillion floating point operations per second)

• Aggregate system memory bandwidth of 83 TB/s• Sustained aggregate interconnect bandwidth of

120 TB/s• High-performance I/O subsystem (minimum

sustained file system bandwidth of 100 GB/s to 340 TB of parallel disk storage and sustained external network bandwidth of 50 GB/s)

Additional Architectural Features

• Scalability: Red Storm’s hardware and system software scale from a single cabinet system to a 32,000 node system.

• Functional Partitioning: Hardware and system software are carefully engineered to optimize the scalability and the performance of the system.

• Reliability: A full system Reliability, Availability, Serviceability (RAS) is designed into the architecture.

• Upgrade-ability: There is a designed-in path for system upgrades.

• Custom Packaging: Red Storm is designed to be a high density, relatively low power system.

• Price/Performance: It has excellent performance per dollar through the use of high volume commodity parts where feasible.

In Addition to Balanced Hardware,System Software must be Scalable

Scalable System SoftwareConcept #1

Do things in a hierarchical fashion

Jobs Launch is Hierarchical

ComputeNode

AllocatorJob Launch(Yod)

Login Node

Linux

UserApplication

Red Storm User

Login &

Start App

PBS Node

PBS momScheduler

PBS Server

......

...

…

ComputeNode

Allocator

Job Queues

Database Node

CPU Inventory Database

Fan out application

RAS monitoring is hierarchical

R S M S

E t h e r n e t

T r e eL 0L 0L 0

L 1L 1L 1

L 0L 0L 0

L 1L 1L 1

RR

RRC a b in e t

B o a r d

S M WS M WS M WS M WS M WS M W

S S

H /M

S S

H /M

HT

E t h e r n e t

H S NHT

S S

H /M

S S

H /M

HT

E t h e r n e t

H S NHT


Minimize Interruptions to the Application

Calculating Weather Minute by Minute

Calc 1

0 min

Calc 2

1 min

Calc 3

2 min

Calc 4

3 min 4 min

Calculation with Breaks

• Calculation with Asynchronous Breaks

Calc 1

0 min

Wait

1 min

Calc 2

2 min

Calc 3

3 min

Wait

4 min 5 min

Calc 4

6 min

Operating System Interruptions Impede Progress of the Application

Interruptions of User Applications

0

50000

100000

150000

200000

250000

300000

350000

0 1 2 3 4 5 6

Wall time in seconds

Inte

rru

pti

on

s i

n n

s

LinuxCatamount


Avoid linear scaling of buffer requirements

Connection-oriented protocolshave to reserve buffers for the worst case

• If each node reserves a 100KB buffer for its peers, that is 1GB of memory per node for 10,000 processors.

• Need to communicate using collective algorithms


Parallelize wherever possible

Use parallel techniques for I/OCompute Nodes

I/O Nodes

High Speed Network

Parallel File System Servers (190 + MDS)

10.0 GigE Servers (50)

Login Servers (10)

RAIDs10 Gbit Ethernet 1 Gbit Ethernet

• 140 MB/s per FC X 2 X 190 = 53 GB/s

• 500 MB/s X 50 = 25 GB/s

• 1.0 GigE X 10

CC CC CC CC CC CC CC CC CC CC CC CC CC

II II II II II

II

NN

LL

NN NN NN NN LL LL LL LL

Conclusion

• Hardware, system software, and application software are all important participants in achieving a high performing system.

• Although originally designed to address the needs of a specific project, it has become a very popular commercial product around the world.

Number of XT3/XT4 Sites Worldwide

0

5

10

15

20

25

1 2 3 4 5 6

Half Years Since 2005

Co

un

t o

f S

ites

(C

um

ula

tive

)

Documents

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories smkelly Abstract: Sandia National