Cray XT3 for Science - · PDF fileCray XT3 for Science David Tanqueray Cray UK Limited...

HPCx Annual Seminar 2006

Cray XT3 for ScienceCray XT3 for Science

David TanquerayCray UK Limited

dt@cray.com

HPCx Presentation 4th October 2006

TopicsCray IntroductionThe Cray XT3Cray RoadmapSome XT3 Applications

Supercomputing is all we do

Sustained GigaflopAchieved in 1988 on a Cray Y-MPStatic finite element analysis

Phong Vu, Cray ResearchHorst Simon, NASA AmesCleve Ashcraft, Yale UniversityRoger Grimes, Boeing Computing ServicesJohn Lewis, Boeing Computer ServicesBarry Peyton, Oak Ridge Nat. Laboratory

1 Gigaflop/sec on a Cray Y-MP with 8 processors1988 Gordon Bell prize winner

Sustained Teraflop• Achieved in 1998 on a T3E• LSMS: Locally self-consistent

multiple scattering method• Metallic magnetism simulation for

understanding thin film disc drive read heads, magnets used in motors and power generation

• 1.02 Teraflops on Cray T3E-1200E with 1480 processors

• 1998 Gordon Bell prize winner

B. Ujfalussy, Xindong Wang, Xiaoguang Zhang, D. M. C. Nicholson,W. A. Shelton, and G. M. Stocks; Oak Ridge National Laboratory.

A. Canning; NERSC, Lawrence Berkeley National Laboratory.Yang Wang; Pittsburgh Supercomputing Center

B. L. Gyorffy; H. H. Wills; Physics Laboratory, University of Bristol.Sustained Petaflop

Goal by 2010Cascade programme1st Petaflop order for Cray from ORNL

41.5 TFLOP peak performance140 cabinets

11,648 AMD Opteron™ processors10 TB DDR memory

240 TB of disk storageApproximately 3,000 ft²

Cray Scientific CustomersSandia National LaboratoriesRed Storm System

Oak Ridge National LaboratoryNational Leadership Computing Facility

Cray-ORNL Selected by DOE for National Leadership Computing Facility (NLCF)Goal: Build the most powerful supercomputer in the world250-teraflop capability by 2007

50-100 TFLOP sustained performance on challenging scientific applicationsCray X1/X1E and Cray XT3

Pittsburgh Supercomputer CenterCray XT3 Named “Big Ben”Peak Performance of 10 TeraFlops2,000 AMD Opteron ProcessorsApplications:

Protein SimulationsStorm ForecastingGlobal Climate modelingEarthquake Simulations

Cray Scientific CustomersSwiss National Supercomputing Centre (CSCS)

First Cray XT3 in Europe“Horizon” is joint initiative with Paul Scherrer Institut (PSI)

Expanded to 18 cabinets, 8.6 TFin August 2006

Highly used90+ % node utilization3x oversubscriptionTypical jobs using 64-256 cpus

ApplicationsMaterial ScienceEnvironmental ScienceLife SciencesAstronomyChemistry

Cray XT3 SystemDual Core40 TeraFlops

ApplicationsWeapons PhysicsMaterial ScienceEngineering

The Cray XT3 SupercomputerMPP Architecture

Scales from 256 to 60,000 processing cores

Purpose-built Interconnect6.4 GB/s per processor socket delivers scalable performance

Scalable Application PerformanceLight Weight Kernel enables scalability and fine-grain synchronizationMPP job management and scheduling

Scalable I/OUp to 100 GB/sPrivate and shared global parallel file system

Reliability at Scale400 hours MTBF at 1,000 processors. One moving part, redundancy modelBuilt-in RAS for system management

Scalable by Design

Every aspect of the system supports applications that use

hundreds or thousands of processors simultaneously on

the same problem

Scalable by DesignScalable by Design

Every aspect of the system Every aspect of the system supports applications that use supports applications that use

hundreds or thousands of hundreds or thousands of processors simultaneously on processors simultaneously on

the same problemthe same problem

Cray XT3

Cray XT3 Processing Element: Measured Performance

Six Network LinksEach >3 GB/s x 2

(7.6 GB/sec Peak for each link)

5.7 GB/secSustained

1.1 Bytes/flop

6.5 GB/secSustained

2.17 GB/secSustained

.42 Bytes/flop

51.6 nslatency

measured

Scalable Software Architecture: UNICOS/lc

Microkernel on Compute PEs, full featured Linux on Service PEs.Contiguous memory layout used on compute processors to streamline communicationsService PEs specialize by functionSoftware Architecture eliminates OS “Jitter”Software Architecture enables reproducible run timesLarge machines boot in under 30 minutes, including filesystemJob Launch time is a couple seconds on 1000s of PEs Compute PE

Login PE

Network PE

System PE

I/O PE

Service Partition

Compute Partition

Specialized Linux nodes

• The Portland Group compilers PGI (unmodified from Linux version)

• High Performance MPI library

• Shmem Library

• AMD Math Libraries (ACML 2.6)

• CrayPat & Apprentice2 performance tools

• Etnus TotalView debugger available

• Static Binaries only

• UPC Support Coming

• We have support for GNU compilers as well and are adding support for PathScale

Programming Environment

Cray Apprentice2Call Graph Profile

Time Line View

Communication Activity View

Communication Overview

Pair-wise Communication View

Cray Roadmap

“Hood” Program

Next Generation MPP Compute BladeProduct evolution for both Cray XT3 and Cray XD1 customers

ProcessorNext-generation AMD processor socketDual-Core with Multi-Core upgrade later

MemoryDDR2 667

Interface – HyperTransport 1.0Rainier Infrastructure

Interconnect - Next-generation SeaStar ASICPackaging/Cooling

Air CooledXT3 (96 sockets per cabinet)

Rainier Infrastructure

Cray’s next-generation products will rely on a common “Rainier” infrastructure:

Opteron-based service & I/O (SIO) bladesSeaStar networkSingle local file systemSingle point of loginSingle point of administration

Delivered with one or more types of compute resources

Hood compute blades (scalar)BlackWidow compute cabinets (vector)Eldorado compute blades (multi-threading)

Provides multiple architectures to users in a single system

Single administrative and user environmentCommon infrastructure means more budget can go towards compute resources

A major step on the road to adaptive supercomputing

The Rainier Infrastructure will allow customers to “mix-

and-match” compute resources

The Rainier Infrastructure The Rainier Infrastructure will allow customers to “mixwill allow customers to “mix--

andand--match” compute match” compute resourcesresources

DARPA – Cascade ProjectAdvanced Research Program

Goal of a “trans-petaflops system”Robust, easier to program, more broadly applicable

Phase IStarted in June 2002 for one yearFive total vendorsUniversity partners

Phase IIAwarded in June 2003$49.9M to CrayTwo other vendorsThree-year contracts

Phase IIIProposal submissions in May 2006Award to one or two vendors

“The cycle time from when engineers have an idea to when they have a program ready to

run is one of the bottlenecks in high-end computers, and it will

only become worse as we develop bigger and bigger

machines," -Robert Graybill- HPCS Program

“The cycle time from when engineers have an idea to when they have a program ready to

run is one of the bottlenecks in high-end computers, and it will

only become worse as we develop bigger and bigger

machines," -Robert Graybill- HPCS Program

Cray XT3 for Science

Material Science

FusionEarth Science

Atomic Modelling•LSMS at Pittsburgh SCNanoparticle Research•Behaviour of FePt nanoparticles at ORNL

Plasma Simulation•Behaviour of plasma in a tokamak

Understanding Earthquakes•Earthquake Simulation at PSC

Global Climate Modelling•ECHAM at Max Plank

Significant Earthquakes

Atomic Modeling Code Performance on Cray XT3

10,000

0 500 1000 1500 2000

LSMS Performance on Cray-XT3 (bigben) at PSC

GFLOPS for the Run

Perfect Scaling (GFLOPS)

Estimated Performance (GFLOPS)

Number of Atoms (Nodes)

8.09 TFLOPS on 2068 nodes

8.03 TFLOPS on 2048 nodes

As processors are added to the problem, efficiency stays high As processors are added to the problem, efficiency stays high

Nanoparticle Research for Next Generation Storage using a Cray XT3 at Oak Ridge National Laboratory

The Science Challenge Potential to develop revolutionary new magnetic storage mediaNeed realistic thermodynamic description of the magnetic behavior of FePt nanoparticles

Computational Challenge The ability to rapidly perform the magnetic energy evaluations for nanoparticles containing thousands of atomsRequiring linear scaling codes, efficient numerical methods, andfast communication between processors

HPC SolutionCray XT325 TFlopsNanoparticles contain

thousands of atoms

Sets World Record – First time simulated FePt particle with 2662 atoms

Largest ever AORSA Simulation3072 processors of NCCS Cray XT3

The Science Challenge Simulate the behavior of plasma in a tokamak, the core of the multinational fusion reactor, ITER

Computational Challenge The code, AORSA, solves Maxwell’s equations –describing behavior of electric and magnetic fields and interaction with matter – for hot plasma in tokamak geometry

HPC SolutionIn August 2005, the largest run by Oak Ridge National Laboratory researcher Fred Jaeger.Utilized 3072 processors; roughly 60% of the entire Cray XT3

AORSA on the Cray XT3 “Jaguar” system compared with Seaborg, an IBM Power3. The columns represent execution phases of the code:

Aggregate is the total wall time, with Jaguar showing more than a factor of 3 improvement over Seaborg.

Understanding Earthquakes using “Big Ben” Cray XT3 at Pittsburgh Supercomputer Center

Significant Earthquakes

Modeling of Wave Propagation

The Science Challenge Simulating a magnitude 7.7 fault in southern California earthquake centered over a 230km portion of the San Andreas Predict worst hit locations, modify building codes Take into account multiple different soil layers and complex ground motion

Computational Challenge: A new simulationModels impact on larger areaHigher vibration frequencies – where most damage expected

HPC Solution “Big Ben” – Cray XT3 with 2090 AMD Opteron processors

Accurately forecast ground motion –ultimately saving lives

WRF: Hurricane Katrina

• 48 hour WRF forecast run of Hurricane Katrina

• Domain Size: 480 x 400 x 31

• 4KM Grid Resolution

• Integration Timestep: 20 seconds

• Output: Once an hour

• Runtime: 49 minutes on 240 Cray XT3 processors

• Image is water vapor content 2KM above the ground

The Science ChallengeIncrease our ability to understand, detect and eventually predict the human influence on climateMost scientists agree that global warming is influenced by human activity

Computational Challenge Advance global climate modeling code to 50 km grid spacing and 60 vertical levels

HPC Solution Cray XT3 with thousands of processors ran ECHAM5 at a record speed of 1.4 Tflops

Cray XT3 Enabled the Fastest-Ever ECHAM5 Performance

Climate Modeling using the Cray XT3:Performance Evaluation at Max Planck Institute for Meteorology

“MPI-M estimates that a Cray XT3 would make it possible to complete their next-

generation IPCC assessment runs in about the same real time as today, despite

requiring 120 times more computation. This advance promises to significantly

improve the scale and scope of the analysis researchers will be able to

submit for the next assessment report of the IPCC.”

- Dr. Annette Kirk- Max-Planck Institute for Meteorology

-http://idw-online.de/pages/de/news136278

“MPI-M estimates that a Cray XT3 would make it possible to complete their next-

generation IPCC assessment runs in about the same real time as today, despite

requiring 120 times more computation. This advance promises to significantly

improve the scale and scope of the analysis researchers will be able to

submit for the next assessment report of the IPCC.”

- Dr. Annette Kirk- Max-Planck Institute for Meteorology

-http://idw-online.de/pages/de/news136278

ECHAM5 T255L60 Performance on XT3

384 768 1536 2048 3072CPUs

The Shape of Things to Come

Final Slide

WhatDoYouNeedToKnow ?

Cray XT3 for Science - · PDF fileCray XT3 for Science David Tanqueray Cray UK Limited...

Documents

, CRAY C98 , CRAY C98D CRAY C94 , and CRAY C94D Site

Massively Parallel Magnetohydrodynamics on the Cray XT3

EVS XT3 Manual

Cray Assembly Language (CAL) for Cray X1™ Systems ...hack.org/mc/texts/cray-x1-assembly.pdf · Cray Assembly Language (CAL) for Cray X1™ Systems Reference Manual S–2314–50

XT3- D37 Gambito de Dama Rehusado

SACE Tmax XT UL A · B XT3-UL/III-IV Istruzioni di installazione XT3 UL Installation instructions XT3 UL Installationsanleitung XT3 UL Instructions pour l'installation XT3 UL Instrucciones

tmp.1411393764.pdf.P XT3 - Syracuse University

1 Cray Inc. 11/28/2015 Cray Inc Slide 2 Cray 2004-2010 Cray Adaptive Supercomputing Vision Cray moves to Linux-base OS Cray Introduces CX1 Cray moves

Cray Cray Bay

SACE Tmax XT UL, IEC, JIS, CCC A - library.e.abb.com · PDF fileB XT3-UL/III-IV Istruzioni di installazione XT3 Installation instructions XT3 Installationsanleitung XT3 Instructions

Designer Seymour Cray and the Cray-3 supercomputer, 1993archive.computerhistory.org/resources/access/text/2010/02/... · Introducing the CRAY-3 Supercomputer Systems The CRAY-3 is

Bowers Analog Digital Bohrungsmessungsserie · BOWERS DIGITALE INNENMESSGERÄTE Für weitere Informationen besuchen Sie NEU XT3 Digitale Innenmessgeräte Bowers ‘XT3 digitale Innenmessschrauben

NERSC XT3/XT4 Benchmarking

SPECIAL REVIEW JANAM XT3 · XT3 and an iPhone 7 Plus outdoors, with both devices set to maximum screen brightness. The Apple 7 Plus came in at 550 nits, the XT3 display at 475 nits,

Cray XT3 time from ERDC (CAP Phase II) Strongly Correlated ...site.physics.georgetown.edu/~jkf/presentations/denver_2006.pdfiversity, UGC, 2006 Presentation. Title: Microsoft PowerPoint

Enabling Moab’s Adaptive Computing for Cray XT3/XT4€¦ · Enabling Moab’s Adaptive Computing for Cray XT3/XT4 Michael Jackson, President Cluster Resources michael@clusterresources.com

SACE Tmax XT UL, IEC, JIS, CCC...Instructions pour l'installation PF XT3 Instrucciones de instalación PF XT3 設置方法 XT3 引出ベース Doc. N. 1SDH001462R0004 - B0212 SACE

Large Scale Visualization on the Cray XT3 Using ParaView · Large Scale Visualization on the Cray XT3 Using ParaView Kenneth Moreland, Sandia National Laboratories David Rogers, Sandia

Cray Valley Products For · PDF fileCray Valley Products For Inks Acrylate Oligomers ... have higher glass transition temperatures, ... Lithographic Ink Formulation

Technical Reference Manual - XT3 12