23
PITTSBU RGH P IT T S B U RG H PITTSBU RGH PITTSBU RGH PITTSBU RGH SU PERC O M P U TIN G SU PERC O M PU TIN G SU PERC O M P U TIN G SU PERC O M PU T IN G SU PERC O M P U TIN G C E N T E R C E N T E R C E N T E R C E N T E R C E N T E R SOS7: “Machines Already SOS7: “Machines Already Operational” Operational” NSF’s Terascale Computing NSF’s Terascale Computing System System SOS-7 March 4-6, 2003 SOS-7 March 4-6, 2003 Mike Levine, PSC Mike Levine, PSC

SOS7: “Machines Already Operational” NSF’s Terascale Computing System

  • Upload
    carlyn

  • View
    36

  • Download
    1

Embed Size (px)

DESCRIPTION

SOS7: “Machines Already Operational” NSF’s Terascale Computing System. SOS-7 March 4-6, 2003 Mike Levine, PSC. Outline. Overview of TCS, the US-NSF’s Terascale Computing System. Answering 3 questions: Is your machine living up to performance expectations? … What is the MTBI? … - PowerPoint PPT Presentation

Citation preview

Page 1: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R11

SOS7: “Machines Already Operational”SOS7: “Machines Already Operational”NSF’s Terascale Computing SystemNSF’s Terascale Computing System

SOS-7 March 4-6, 2003SOS-7 March 4-6, 2003Mike Levine, PSCMike Levine, PSC

Page 2: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R22

OutlineOutline

Overview of TCS, the US-NSF’s Terascale Overview of TCS, the US-NSF’s Terascale Computing System.Computing System.

Answering 3 questions:Answering 3 questions: Is your machine living up to performance Is your machine living up to performance

expectations? …expectations? … What is the MTBI? …What is the MTBI? … What is the primary complaint, if any, from users?What is the primary complaint, if any, from users?

[See also PSC web pages & Rolf’s info.][See also PSC web pages & Rolf’s info.]

Page 3: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R33

Q1: PerformanceQ1: Performance

Computational and communications Computational and communications performance is very good!performance is very good! Alpha processors & ES45 servers: very goodAlpha processors & ES45 servers: very good Quadrics bw & latency: very good.Quadrics bw & latency: very good. ~74% of peak on Linpack; >76% on LSMS~74% of peak on Linpack; >76% on LSMS

More work on disk IO.More work on disk IO. This has been a very ease “port” for most This has been a very ease “port” for most

users.users. Easier than some Cray Easier than some Cray Cray upgrades. Cray upgrades.

Page 4: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R44

6.3

8.78.0

11.3

10.3

11.3

10.19.4

10.39.9

11.1

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

MTB

I ( h

ours

)Q2: MTBI Q2: MTBI (Monthly Average)(Monthly Average)

• Compare with theoretical prediction of 12 hrs.• Expect further improvement (fixing systematic problems).

Page 5: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R55

Time Lost to Unscheduled EventsTime Lost to Unscheduled Events

0

500

1000

1500

2000

2500

3000

3500

4000

4500

Node

Hou

rs p

er W

eek

(tot=

126,

000)

• Purple: nodes requiring cleanup• Worst case is ~3%

Page 6: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R66

Q3: ComplaintsQ3: Complaints #1: “I need more time” #1: “I need more time” ((notnot a complaint about performance) a complaint about performance)

Actual usage >80% of wall clockActual usage >80% of wall clock Some structural improvements still in progress.Some structural improvements still in progress. Not a whole lot more is possible!Not a whole lot more is possible!

Work needed onWork needed on Rogue OS Rogue OS activity.activity. [recall Prof. Kale’s comment][recall Prof. Kale’s comment] MPI & global reduction libraries.MPI & global reduction libraries. [ditto][ditto] System debugging and fragility.System debugging and fragility. IO performance.IO performance.

We have delayed full disk deployment to avoid data corruption & instabilities.We have delayed full disk deployment to avoid data corruption & instabilities. Node cleanupNode cleanup

We detect & hold out problem nodes until staff clean.We detect & hold out problem nodes until staff clean. All in all, the users have been VERY pleased.All in all, the users have been VERY pleased. [ditto][ditto]

Page 7: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R77

Full Machine JobFull Machine Job This system is capable of doing big scienceThis system is capable of doing big science

Page 8: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R88

TCS TCS (Terascale Computing System)(Terascale Computing System) && ETF ETF Sponsored by the U.S. National Science FoundationSponsored by the U.S. National Science Foundation Serving the “very high end” for US academic computational science and Serving the “very high end” for US academic computational science and

engineeringengineering Designed to be used, Designed to be used, as a wholeas a whole, on single problems. (recall full machine job), on single problems. (recall full machine job) Full range of scientific and engineering applications.Full range of scientific and engineering applications. Compaq AlphaServer SC hardware and software technologyCompaq AlphaServer SC hardware and software technology In general production since April, 2002In general production since April, 2002

#6 in Top 500; #6 in Top 500; (largest (largest openopen facility in the world: facility in the world: Nov 2001)Nov 2001) TCS-1: in general production since April, 2002TCS-1: in general production since April, 2002 Integrated into the PACI program (Partnerships for Academic Computing Integrated into the PACI program (Partnerships for Academic Computing

Infrastructure)Infrastructure) DTF DTF project to build and integrate multiple systems project to build and integrate multiple systems

– NCSA, SDSC, Caltech, Argonne. Multi-lamba, transcontinental interconnectNCSA, SDSC, Caltech, Argonne. Multi-lamba, transcontinental interconnect ETF aka Teratrid ETF aka Teratrid (Extensible Terascale Facility) integrating TCS with DTF (Extensible Terascale Facility) integrating TCS with DTF

formingforming– A heterogeneous, extensible scientific/engineering cyberinfrastructure GridA heterogeneous, extensible scientific/engineering cyberinfrastructure Grid

Page 9: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R99

Infrastructure: PSC - TCS machine roomInfrastructure: PSC - TCS machine room ( @ Westinghouse)( @ Westinghouse)(Not require a new building; just a (Not require a new building; just a pipe & wirepipe & wire upgrade; not upgrade; not maxed outmaxed out))

~8k ft~8k ft22 Use Use

~2.5k~2.5k ExistingExisting

room.room. (16 yrs (16 yrs

old.)old.)

Page 10: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1010

Floor LayoutFloor Layout

Geometrical Geometrical constraints constraints invariant invariant twixt US & twixt US & JapanJapan

SWITCH

COMPUTE NODES

SERVERS

DISKSCONTROL

Full System: Full System: Physical StructurePhysical Structure

Page 11: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1111

Compute Nodes

TTerascale erascale CComputing omputing SSystemystem

Compute NodesCompute Nodes• 750 ES45 750 ES45 4-CPU4-CPU servers servers

• +13 inline spares+13 inline spares

• (+2 login nodes) (+2 login nodes)

• 4 - EV68’s /node4 - EV68’s /node

• 1 GHz = 2.Gf 1 GHz = 2.Gf [6 Tf][6 Tf]

• 4 GB memory 4 GB memory [3.0 TB][3.0 TB]

• 3*18.2 GB disk 3*18.2 GB disk [41 TB][41 TB]• SystemSystem• User temporaryUser temporary• Fast snapshotsFast snapshots

• [~90 GB/s][~90 GB/s]

• Tru64 UnixTru64 Unix

Page 12: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1212

ES45 nodesES45 nodes 5 nodes per cabinet5 nodes per cabinet 3 local disks /node3 local disks /node

Page 13: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1313

Quadrics

Compute Nodes

TTerascale erascale CComputing omputing SSystemystem

Quadrics NetworkQuadrics Network• 2 “rails”2 “rails”

• Higher bandwidthHigher bandwidth • (~250 MB/s/rail)(~250 MB/s/rail)

• Lower latencyLower latency• 2.5 2.5 s put latency s put latency

• 1 NIC/node/rail1 NIC/node/rail• FederatedFederated switch (/rail) switch (/rail)• “ “Fat-tree” (bbw ~0.2 Fat-tree” (bbw ~0.2 TB/s)TB/s)

• User virtual memory mappedUser virtual memory mapped• Hardware retryHardware retry• HeterogeneousHeterogeneous

• (Alpha Tru64 & Linux, Intel Linux)(Alpha Tru64 & Linux, Intel Linux)

Page 14: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1414

Central Switch AssemblyCentral Switch Assembly

20 cabinets20 cabinetsin centerin center

Minimize max Minimize max internode internode distancedistance

3 out of 4 rows 3 out of 4 rows shownshown

2121stst LL switch, LL switch, outside (not shown)outside (not shown)

Page 15: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1515

Quadrics wiring overhead Quadrics wiring overhead (view towards ceiling)(view towards ceiling)

Page 16: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1616

QuadricsControl

LAN

Compute Nodes

TTerascale erascale CComputing omputing SSystemystem

Management & ControlManagement & Control

• Quadrics switch control:Quadrics switch control:• Internal SBC & EthernetInternal SBC & Ethernet

• “ “Insight Manager” on PC’sInsight Manager” on PC’s• Dedicated systemsDedicated systems• Cluster/node Cluster/node monitoring & control monitoring & control

• RMS databaseRMS database• Ethernet &Ethernet &• Serial LinkSerial Link

Page 17: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1717

QuadricsControl

LAN

Compute Nodes

WAN/LAN

TTerascale erascale CComputing omputing SSystemystem

Interactive NodesInteractive Nodes

• Dedicated: 2*ES45 Dedicated: 2*ES45

• +8 on compute nodes+8 on compute nodes

• Shared function nodesShared function nodes

• User accessUser access

• Gigabit Ethernet to WANGigabit Ethernet to WAN

• Quadrics connectedQuadrics connected

• /usr & indexed store/usr & indexed store (ISMS) (ISMS)

Interactive/usr

Page 18: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1818

QuadricsControl

LAN

Compute Nodes

File Servers/tmp

WAN/LAN

Interactive/usr

TTerascale erascale CComputing omputing SSystemystem

File ServersFile Servers• 64, on compute nodes64, on compute nodes

• 0.47 TB/server 0.47 TB/server [[30 TB30 TB]]• ~500 MB/s ~500 MB/s [~32 GB/s[~32 GB/s]]

• Temporary user Temporary user storagestorage• Direct IODirect IO

• /tmp/tmp• [Each server has [Each server has

• 24 disks on24 disks on• 8 SCSI chains on8 SCSI chains on• 4 controllers4 controllers

• sustain full drive bw.]sustain full drive bw.]

Page 19: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1919

TTerascale erascale CComputing omputing SSystemystem

SummarySummary• 750750++ ES45 Compute Nodes ES45 Compute Nodes• 3000 EV68 CPU’s @ 1 GHz3000 EV68 CPU’s @ 1 GHz• 6 Tf6 Tf • 3. TB memory3. TB memory• 41 TB node disk41 TB node disk, ~90GB/s, ~90GB/s• Multi-rail fat-tree networkMulti-rail fat-tree network• Redundant monitor/ctrlRedundant monitor/ctrl• WAN/LAN accessibleWAN/LAN accessible• File servers: File servers: 30TB 30TB, ~32 GB/s, ~32 GB/s• Buffer disk store, ~150 TBBuffer disk store, ~150 TB• Parallel visualizationParallel visualization• Mass store, ~1 TB/hr, > 1 PBMass store, ~1 TB/hr, > 1 PB• ETF coupled (ETF coupled (heterohetero))

QuadricsControl

LAN

Compute Nodes

File Servers/tmp

WAN/LAN

Interactive/usr

Page 20: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R2020

Quadrics

TTerascale erascale CComputing omputing SSystemystem

TCS

ApplicationGateways Viz Buffer Disk

340 GB/s (1520Q)340 GB/s (1520Q)

4.5 GB/s (20Q)4.5 GB/s (20Q)3.6 GB/s (16Q)3.6 GB/s (16Q) 3.6 GB/s (16Q)3.6 GB/s (16Q)

VisualizationVisualization• Intel/LinuxIntel/Linux

• Newest softwareNewest software

• ~16 nodes ~16 nodes

• Parallel renderingParallel rendering

• HW/SW compositingHW/SW compositing

•Quadrics connectedQuadrics connected

• Image outputImage output

• Web pages +Web pages +

WAN coupledWAN coupled

Page 21: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R2121

Buffer Disk & HSMBuffer Disk & HSM Quadrics coupled Quadrics coupled (~225 (~225

MB/s/link)MB/s/link) Intermediate between TCS Intermediate between TCS

& HSM& HSM Independently managed.Independently managed. Private transportPrivate transport from from

TCS.TCS.Quadrics

TTerascale erascale CComputing omputing SSystemystem

TCS

ApplicationGateways Viz Buffer Disk

340 GB/s (1520Q)340 GB/s (1520Q)

4.5 GB/s (20Q)4.5 GB/s (20Q)3.6 GB/s (16Q)3.6 GB/s (16Q) 3.6 GB/s (16Q)3.6 GB/s (16Q)

HSM - LSCi

>360 MB/s to tape >360 MB/s to tape

Archive diskWAN/LAN & SDSC

Page 22: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R2222

Application GatewaysApplication Gateways

Quadrics coupled Quadrics coupled (~225 MB/s/link)(~225 MB/s/link)• Coupled to ETF Coupled to ETF

backbone by GigE backbone by GigE • 30 Gb/s30 Gb/sQuadrics

TTerascale erascale CComputing omputing SSystemystem

TCS

ApplicationGateways Viz Buffer Disk

340 GB/s (1520Q)340 GB/s (1520Q)

4.5 GB/s (20Q)4.5 GB/s (20Q)3.6 GB/s (16Q)3.6 GB/s (16Q) 3.6 GB/s (16Q)3.6 GB/s (16Q)

Multi GigE to ETF Backbone @Multi GigE to ETF Backbone @ 30 Gb/s

Page 23: SOS7: “Machines Already Operational” NSF’s Terascale Computing System

P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R2323

The The Front Row Front Row

Yes, those are Pittsburgh sports’ colors.Yes, those are Pittsburgh sports’ colors.