Upload
shannon-lambert
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
The Red StormHigh Performance Computer
March 19, 2008
Sue KellySandia National Laboratories
http://www.sandia.gov/~smkelly
Abstract: Sandia National Laboratories has a long history of successfully applying massively parallel processing (MPP) technology to solve problems in the national interest for the US Department of Energy. We drew upon our experiences with numerous architectural and design features when planning the Red Storm computer system. This talk will present the key issues that were considered. Important principles are performance balance between the hardware components and scalability of the system software.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Companyfor the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
• (n.) A branch of computer science that concentrates on developing supercomputers and software to run on supercomputers. A main area of this discipline is developing parallel processing algorithms and software: programs that can be divided into little pieces so that each piece can be executed simultaneously by separate processors. (http://www.webopedia.com/TERM/H/High_Performance_Computing.html)
• The idea/premise of parallel processing is not new (http://www.sandia.gov/ASC/news/stories.html#nineteen-twenty-two)
What is High Performance Computing?
Red Storm – a First Look
• Sandia/Cray Inc. partnership:– Sandia architecture– Sandia & Cray System Software– Cray engineering and manufacturing– Sandia systems HW/SW expertise
Red Storm is a Massively Parallel Processor
Supercomputer
Users/home
Parallel I/OCompute PartitionService
Net I/O
• 12,960 2.4 GHz Dual Core Opterons for computation (called nodes)
• 2 GB Memory per core (in progress)
Key Performance Characteristics that Lead to a Balanced system
• 124.42 TeraFLOPS (trillion floating point operations per second)
• Aggregate system memory bandwidth of 83 TB/s• Sustained aggregate interconnect bandwidth of
120 TB/s• High-performance I/O subsystem (minimum
sustained file system bandwidth of 100 GB/s to 340 TB of parallel disk storage and sustained external network bandwidth of 50 GB/s)
Additional Architectural Features
• Scalability: Red Storm’s hardware and system software scale from a single cabinet system to a 32,000 node system.
• Functional Partitioning: Hardware and system software are carefully engineered to optimize the scalability and the performance of the system.
• Reliability: A full system Reliability, Availability, Serviceability (RAS) is designed into the architecture.
• Upgrade-ability: There is a designed-in path for system upgrades.
• Custom Packaging: Red Storm is designed to be a high density, relatively low power system.
• Price/Performance: It has excellent performance per dollar through the use of high volume commodity parts where feasible.
Jobs Launch is Hierarchical
ComputeNode
AllocatorJob Launch(Yod)
Login Node
Linux
UserApplication
Red Storm User
Login &
Start App
PBS Node
PBS momScheduler
PBS Server
......
...
…
ComputeNode
Allocator
Job Queues
Database Node
CPU Inventory Database
Fan out application
RAS monitoring is hierarchical
R S M S
E t h e r n e t
T r e eL 0L 0L 0
L 1L 1L 1
L 0L 0L 0
L 1L 1L 1
RR
RRC a b in e t
B o a r d
S M WS M WS M WS M WS M WS M W
S S
H /M
S S
H /M
HT
E t h e r n e t
H S NHT
S S
H /M
S S
H /M
HT
E t h e r n e t
H S NHT
Calculation with Breaks
• Calculation with Asynchronous Breaks
Calc 1
0 min
Wait
1 min
Calc 2
2 min
Calc 3
3 min
Wait
4 min 5 min
Calc 4
6 min
Operating System Interruptions Impede Progress of the Application
Interruptions of User Applications
0
50000
100000
150000
200000
250000
300000
350000
0 1 2 3 4 5 6
Wall time in seconds
Inte
rru
pti
on
s i
n n
s
LinuxCatamount
Connection-oriented protocolshave to reserve buffers for the worst case
• If each node reserves a 100KB buffer for its peers, that is 1GB of memory per node for 10,000 processors.
• Need to communicate using collective algorithms
Use parallel techniques for I/OCompute Nodes
I/O Nodes
High Speed Network
Parallel File System Servers (190 + MDS)
10.0 GigE Servers (50)
Login Servers (10)
RAIDs10 Gbit Ethernet 1 Gbit Ethernet
• 140 MB/s per FC X 2 X 190 = 53 GB/s
• 500 MB/s X 50 = 25 GB/s
• 1.0 GigE X 10
CC CC CC CC CC CC CC CC CC CC CC CC CC
II II II II II
II
NN
LL
NN NN NN NN LL LL LL LL
Conclusion
• Hardware, system software, and application software are all important participants in achieving a high performing system.
• Although originally designed to address the needs of a specific project, it has become a very popular commercial product around the world.
Number of XT3/XT4 Sites Worldwide
0
5
10
15
20
25
1 2 3 4 5 6
Half Years Since 2005
Co
un
t o
f S
ites
(C
um
ula
tive
)