18
ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI Program Leader Mark Seager ASCI Terascale Systems Principal Investigator Lawrence Livermore National Laboratory University of California formed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract W-740

ASCI Terascale Simulation Requirements and Deployments

Embed Size (px)

Citation preview

Page 1: ASCI Terascale Simulation Requirements and Deployments

1Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

Lawrence Livermore National Laboratory , P. O. Box 808, Livermore, CA 94551

ASCI-00-003.1

ASCI Terascale Simulation Requirements and Deployments

David A. NowakASCI Program Leader

Mark SeagerASCI Terascale Systems Principal Investigator

Lawrence Livermore National LaboratoryUniversity of California

*Work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract W-7405-Eng-48.

Page 2: ASCI Terascale Simulation Requirements and Deployments

2Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

Overview

ASCI program backgroundApplications requirementsBalanced terascale computing environmentRed Partnership and CPLANTBlue-Mountain partnership

Sustained Stewardship TeraFLOP/s (SST)Blue-Pacific partnership

Sustained Stewardship TeraFLOP/s (SST)White partnershipInterconnect issues for future machines

Page 3: ASCI Terascale Simulation Requirements and Deployments

3Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

A successful Stockpile Stewardship Program requires a successful ASCI

B61 W62 W76 W78 W80 B83 W84 W87 W88

PrimaryPrimaryCertificationCertification

MaterialsMaterialsDynamicsDynamics

AdvancedAdvancedRadiographyRadiography

SecondarySecondaryCertificationCertification

EnhancedEnhancedSuretySurety ICFICF

EnhancedEnhancedSurveillanceSurveillance

HostileHostileEnvironmentsEnvironments

WeaponsWeaponsSystemSystem

EngineeringEngineering

Directed Stockpile Work

Page 4: ASCI Terascale Simulation Requirements and Deployments

4Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

ASCI’s major technical elements meet Stockpile Stewardship requirements

Defense Applications & Modeling

University Partnerships

Integration

Simulation& Computer

Science

Integrated Computing Systems

Applications Development

Physical& MaterialsModeling

V&V

DisCom2

PSE

Physical Infrastructure

& Platforms

Operation of Facilities

AlliancesInstitutes

VIEWS

PathForward

Page 5: ASCI Terascale Simulation Requirements and Deployments

5Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

Example terascale computing environment in CY00 with ASCI White at LLNL

Programs

ApplicationPerformance

Computing Speed

Parallel I/O Archive BW

Cache BW

ArchivalStorage

TFLOPS

PetaBytes

GigaBytes/sec GigaBytes/sec

10 3 10 5

0.1

1

10

100

1.6

16

160

TeraBytes/sec1600

0.1

1

100

100.2

2

20

20010

1

0.01

‘96

‘97

‘00

2004Year

DiskTeraBytes

5.0

50

500

5000

Memory

Memory BW

100

10

1

0.130300 3

0.3

TeraBytes

TeraBytes/sec

1Interconnect

TeraBytes/sec

0.055

50

10 2

0.1

0.5

ASCI is achieving programmatic objectives, but the computing environment will not be in balance at LLNL for the ASCI White platform.

Computational Resource Scaling for ASCI Physics Applications

1 FLOPS Peak Compute16 Bytes/s / FLOPS Cache BW1 Byte / FLOPS Memory3 Bytes/s / FLOPS Memory BW0.5 Bytes/s / FLOPS Interconnect BW50 Bytes / FLOPS Local Disk0.002 Byte/s / FLOPS Parallel I/O BW0.0001 Byte/s / FLOPS Archive BW1000 Bytes / FLOPS Archive

Page 6: ASCI Terascale Simulation Requirements and Deployments

6Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

SNL/Intel ASCI Red

• Good interconnect bandwidth• Poor memory/NIC bandwidth CPU CPUNIC 256 MB

CPU CPUNIC 256 MB

Kestrel Compute Board ~2332 in system

CPUNIC 128 MB

CPU 128 MB

Eagle I/O Board ~100 in system

128 MB

128 MB

Terascale Operating System (TOS)cc on node, not cc across board

Cougar Operating System

38 switches

32 switches

node

node

6.25 TB

6.25 TB

Storage

Classified

Unclassified

2 GB each Sustained (total system)

800 MB bi-directional per link

Shortest hop ~13 µs, Longest hop ~16 µs

Aggregate link bandwidth = 1.865 TB/s

Bottleneck

Good

Page 7: ASCI Terascale Simulation Requirements and Deployments

7Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

SNL/Compaq ASCI C-Plant

SS S

S S

C-Plant is located at Sandia National LaboratoryCurrently a Hypercube, C-Plant will be reconfigured as a mesh in 2000C-Plant has 50 scalable unitsLINUX Operating SystemC-Plant is a “Beowulf” configuration

•16 “boxes”•2 16 port Myricom switches

•160 MB each direction•320 MB total

Scalable Unit:

“Box:”•500 MHz Compaq eV56 processor•192 MB SDRAM•55 MB NIC•Serial port•Ethernet port•______ Disk spaceHypercube

S S

S

This is a research project — a long way from being a production system.

Page 8: ASCI Terascale Simulation Requirements and Deployments

8Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

LANL/SGI/Cray ASCI Blue Mountain 3.072 TeraOPS Peak

48x128-CPU SMP= 3 TF 48x32 GB/SMP= 1.5TB 48x1.44 TB/SMP=70 TB

Inter-SMP Compute Fabric Bandwidth=48 x 12 HIPPI/SMP=115,200 MB/s bi-directional

20(14) HPSS Movers=40(27) HPSS Tape drives of 10-20 MB/s each=400-800 MB/s Bandwidth

LAN

HPSS Meta Data Server

Bandwidth=48 X 1 HIPPI/SMP=9,600 MB/s bi-directional

Link to LANLink toLegacy Mxs

Bandwidth=8 HIPPI= 1,600 MB/s bi-directional

HPSS Bandwidth=4 HIPPI= 800 MB/s bi-directional

144

120

12

21

RAID Bandwidth=48 X 10 FC/SMP=96,000 MB/s bi-directional

Aggregate link bandwidth = 0.115 TB/s

Page 9: ASCI Terascale Simulation Requirements and Deployments

9Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

Blue Mountain Planned GSN Compute Fabric

9 Separate 32x32 X-Bar Switch Networks1 2 3 4 5 6 7 8 9

1 2 3

Expected ImprovementsThroughput 115,200 MB/s => 460,800 MB/s, 4xLink Bandwidth 200 MB/s => 1,600 MB/s, 8xRound Trip Latency 110 µs => ~ 10 µs ,11x

3 Groups of 16 Computers eachAggregate link bandwidth = 0.461 TB/s

Page 10: ASCI Terascale Simulation Requirements and Deployments

10Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

LLNL/IBM Blue-Pacific3.889 TeraOP/s Peak

Sector S

Sector Y

Sector K

24

24

24

Each SP sector comprised of• 488 Silver nodes• 24 HPGN Links

System Parameters• 3.89 TFLOP/s Peak• 2.6 TB Memory• 62.5 TB Global disk

HPGN

HiPPI

2.5 GB/node Memory24.5 TB Global Disk8.3 TB Local Disk

1.5 GB/node Memory20.5 TB Global Disk4.4 TB Local Disk

1.5 GB/node Memory20.5 TB Global Disk4.4 TB Local Disk

FDDI

SST Achieved >1.2TFLOP/son sPPM and Problem

>70x LargerThan Ever Solved Before!

66

12

Aggregate link bandwidth = 0.439 TB/s

Page 11: ASCI Terascale Simulation Requirements and Deployments

11Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

I/O Hardware Architecture of SST

System Data and Control Networks

488 Node IBM SP Sector

56 GPFSServers

432 Silver Compute Nodes

Each SST Sector• Has local and global I/O file system• 2.2 GB/s delivered global I/O performance• 3.66 GB/s delivered local I/O performance• Separate SP first level switches• Independent command and control• Link bandwidth = 300 Mb/s Bi-directional

Full system mode• Application launch over full 1,464 Silver nodes• 1,048 MPI/us tasks, 2,048 MPI/IP tasks• High speed, low latency communication between all nodes• Single STDIO interface

GPFS GPFS GPFS GPFS GPFS GPFS GPFS GPFS

24 SP Links to Second Level

Switch

Page 12: ASCI Terascale Simulation Requirements and Deployments

12Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

Partisn (SN-Method) Scaling

1

10

100

1000

10000

1 10 100 1000 10000

Number of Processors

RED

Blue Pac (1p/n)

Blue Pac (4p/n)

Blue Mtn (UPS)

Blue Mtn (MPI)

Blue Mtn(MPI.2s2r)

Blue Mtn (UPS.offbox)

Constant Number of Cells per Processor

0

10

20

30

40

50

60

70

80

90

100

1 10 100 1000 10000

Number of Processors

REDBlue Pac (1p/n)

Blue Pac (4p/n)Blue Mtn (UPS)

Blue Mtn (MPI)Blue Mtn(MPI.2s2r)

Blue Mtn(UPS.offbox)

Constant Number of Cells per Processor

Page 13: ASCI Terascale Simulation Requirements and Deployments

13Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

The JEEP calculation adds to our understanding the performance of insensitive high explosives

• This calculation involved 600 atoms (largest number ever at such a high resolution) with 1,920 electrons, using about 3,840 processors

• This simulation provides crucial insight into the detonation properties of IHE at high pressures and temperatures.

• Relevant experimental data (e.g., shock wave data) on hydrogen fluoride (HF) are almost nonexistent because of its corrosive nature.

• Quantum-level simulations, like this one, of HF- H2O mixtures can substitute for such experiments.

Page 14: ASCI Terascale Simulation Requirements and Deployments

14Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

Silver Node delivered memory bandwidth is around 150-200 MB/s/process

0

50

100

150

200

250

300

350

1P 2P 4P 8P

E4000 E6000 Silver AS4100 AS8400 Octane Onyx2

Silver Peak Bytes:FLOP/S Ratio is 1.3/2.565 = 0.51Silver Delivered B:F Ratio is 200/114 = 1.75

Page 15: ASCI Terascale Simulation Requirements and Deployments

15Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

MPI_SEND/US delivers low latency and aggregate high bandwidth, but counter intuitive behavior per MPI task

0

20

40

60

80

100

120

0.00E+00 5.00E+05 1.00E+06 1.50E+06 2.00E+06 2.50E+06 3.00E+06 3.50E+06

Message Size (B)

Perf

orm

ance

(MB

/s)

One PairTwo PairFour PairTwo AggFour Agg

Page 16: ASCI Terascale Simulation Requirements and Deployments

16Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

LLNL/IBM White10.2 TeraOPS Peak

Aggregate link bandwidth = 2.048 TB/sFive times better than the SST; Peak is three times better

Ratio of Bytes:FLOPS is improving

MuSST (PERF) System• 8 PDEBUG nodes w/16 GB SDRAM• ~484 PBATCH nodes w/8 GB SDRAM• 12.8 GB/s delivered global I/O performance• 5.12 GB/s delivered local I/O performance• 16 GigaBit Ethernet External Network• Up to 8 HIPPI-800

Programming/Usage Model• Application launch over ~492 NH-2 nodes• 16-way MuSPPA, Shared Memory, 32b MPI• 4,096 MPI/US tasks• Likely usage is 4 MPI tasks/node with

4 threads/MPI task• Single STDIO interface

Page 17: ASCI Terascale Simulation Requirements and Deployments

17Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

Interconnect issues for future machines— Why Optical? —

Need to increase Bytes:FLOPS ratio Memory bandwidth (cache line) utilization will be dramatically lower for codes

that utilize arbitrarily connected meshes and adaptive refinement indirect addressing.

Interconnect bandwidth must be increased and latency must be reduced to allow a broader range of applications and packages to scale well

To get very large configurations (30 70 100 TeraOPS) larger SMPs will be deployed

For fixed B:F interconnect ratio this means that more bandwidth coming out of an SMP

Multiple pipes/planes will be used Optical reduces cable countMachjne footprint is growing 24,000 square feet may require opticalNetwork interface paradigm

Virtual memory direct memory access Low-latency remote get/put

Reliability Availability and Serviceability (RAS)

Page 18: ASCI Terascale Simulation Requirements and Deployments

18Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

Lawrence Livermore National Laboratory , P. O. Box 808, Livermore, CA 94551

ASCI-00-003.1

ASCI Terascale Simulation Requirements and Deployments

David A. NowakASCI Program Leader

Mark SeagerASCI Terascale Systems Principal Investigator

Lawrence Livermore National LaboratoryUniversity of California

*Work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract W-7405-Eng-48.