View
218
Download
4
Category
Preview:
Citation preview
HPCx Annual Seminar 2006
Cray XT3 for ScienceCray XT3 for Science
David TanquerayCray UK Limited
dt@cray.com
Page 2HPCx Presentation 4th October 2006
TopicsCray IntroductionThe Cray XT3Cray RoadmapSome XT3 Applications
Page 3HPCx Presentation 4th October 2006
Supercomputing is all we do
Sustained GigaflopAchieved in 1988 on a Cray Y-MPStatic finite element analysis
Phong Vu, Cray ResearchHorst Simon, NASA AmesCleve Ashcraft, Yale UniversityRoger Grimes, Boeing Computing ServicesJohn Lewis, Boeing Computer ServicesBarry Peyton, Oak Ridge Nat. Laboratory
1 Gigaflop/sec on a Cray Y-MP with 8 processors1988 Gordon Bell prize winner
Sustained Teraflop• Achieved in 1998 on a T3E• LSMS: Locally self-consistent
multiple scattering method• Metallic magnetism simulation for
understanding thin film disc drive read heads, magnets used in motors and power generation
• 1.02 Teraflops on Cray T3E-1200E with 1480 processors
• 1998 Gordon Bell prize winner
B. Ujfalussy, Xindong Wang, Xiaoguang Zhang, D. M. C. Nicholson,W. A. Shelton, and G. M. Stocks; Oak Ridge National Laboratory.
A. Canning; NERSC, Lawrence Berkeley National Laboratory.Yang Wang; Pittsburgh Supercomputing Center
B. L. Gyorffy; H. H. Wills; Physics Laboratory, University of Bristol.Sustained Petaflop
Goal by 2010Cascade programme1st Petaflop order for Cray from ORNL
Page 4HPCx Presentation 4th October 2006
41.5 TFLOP peak performance140 cabinets
11,648 AMD Opteron™ processors10 TB DDR memory
240 TB of disk storageApproximately 3,000 ft²
Cray Scientific CustomersSandia National LaboratoriesRed Storm System
Oak Ridge National LaboratoryNational Leadership Computing Facility
Cray-ORNL Selected by DOE for National Leadership Computing Facility (NLCF)Goal: Build the most powerful supercomputer in the world250-teraflop capability by 2007
50-100 TFLOP sustained performance on challenging scientific applicationsCray X1/X1E and Cray XT3
Pittsburgh Supercomputer CenterCray XT3 Named “Big Ben”Peak Performance of 10 TeraFlops2,000 AMD Opteron ProcessorsApplications:
Protein SimulationsStorm ForecastingGlobal Climate modelingEarthquake Simulations
Page 5HPCx Presentation 4th October 2006
Cray Scientific CustomersSwiss National Supercomputing Centre (CSCS)
First Cray XT3 in Europe“Horizon” is joint initiative with Paul Scherrer Institut (PSI)
Expanded to 18 cabinets, 8.6 TFin August 2006
Highly used90+ % node utilization3x oversubscriptionTypical jobs using 64-256 cpus
ApplicationsMaterial ScienceEnvironmental ScienceLife SciencesAstronomyChemistry
AWE
Cray XT3 SystemDual Core40 TeraFlops
ApplicationsWeapons PhysicsMaterial ScienceEngineering
Page 6HPCx Presentation 4th October 2006
TopicsCray IntroductionThe Cray XT3Cray RoadmapSome XT3 Applications
Page 7HPCx Presentation 4th October 2006
The Cray XT3 SupercomputerMPP Architecture
Scales from 256 to 60,000 processing cores
Purpose-built Interconnect6.4 GB/s per processor socket delivers scalable performance
Scalable Application PerformanceLight Weight Kernel enables scalability and fine-grain synchronizationMPP job management and scheduling
Scalable I/OUp to 100 GB/sPrivate and shared global parallel file system
Reliability at Scale400 hours MTBF at 1,000 processors. One moving part, redundancy modelBuilt-in RAS for system management
Scalable by Design
Every aspect of the system supports applications that use
hundreds or thousands of processors simultaneously on
the same problem
Scalable by DesignScalable by Design
Every aspect of the system Every aspect of the system supports applications that use supports applications that use
hundreds or thousands of hundreds or thousands of processors simultaneously on processors simultaneously on
the same problemthe same problem
Cray XT3
Page 8HPCx Presentation 4th October 2006
Cray XT3 Processing Element: Measured Performance
Six Network LinksEach >3 GB/s x 2
(7.6 GB/sec Peak for each link)
5.7 GB/secSustained
1.1 Bytes/flop
6.5 GB/secSustained
2.17 GB/secSustained
.42 Bytes/flop
51.6 nslatency
measured
Page 9HPCx Presentation 4th October 2006
Scalable Software Architecture: UNICOS/lc
Microkernel on Compute PEs, full featured Linux on Service PEs.Contiguous memory layout used on compute processors to streamline communicationsService PEs specialize by functionSoftware Architecture eliminates OS “Jitter”Software Architecture enables reproducible run timesLarge machines boot in under 30 minutes, including filesystemJob Launch time is a couple seconds on 1000s of PEs Compute PE
Login PE
Network PE
System PE
I/O PE
Service Partition
Compute Partition
Specialized Linux nodes
Page 10HPCx Presentation 4th October 2006
• The Portland Group compilers PGI (unmodified from Linux version)
• High Performance MPI library
• Shmem Library
• AMD Math Libraries (ACML 2.6)
• CrayPat & Apprentice2 performance tools
• Etnus TotalView debugger available
• Static Binaries only
• UPC Support Coming
• We have support for GNU compilers as well and are adding support for PathScale
Programming Environment
Page 11HPCx Presentation 4th October 2006
Cray Apprentice2Call Graph Profile
Time Line View
Communication Activity View
Communication Overview
Pair-wise Communication View
Page 12HPCx Presentation 4th October 2006
TopicsCray IntroductionThe Cray XT3Cray RoadmapSome XT3 Applications
Page 14HPCx Presentation 4th October 2006
“Hood” Program
Next Generation MPP Compute BladeProduct evolution for both Cray XT3 and Cray XD1 customers
ProcessorNext-generation AMD processor socketDual-Core with Multi-Core upgrade later
MemoryDDR2 667
Interface – HyperTransport 1.0Rainier Infrastructure
Interconnect - Next-generation SeaStar ASICPackaging/Cooling
Air CooledXT3 (96 sockets per cabinet)
Page 15HPCx Presentation 4th October 2006
Rainier Infrastructure
Cray’s next-generation products will rely on a common “Rainier” infrastructure:
Opteron-based service & I/O (SIO) bladesSeaStar networkSingle local file systemSingle point of loginSingle point of administration
Delivered with one or more types of compute resources
Hood compute blades (scalar)BlackWidow compute cabinets (vector)Eldorado compute blades (multi-threading)
Provides multiple architectures to users in a single system
Single administrative and user environmentCommon infrastructure means more budget can go towards compute resources
A major step on the road to adaptive supercomputing
The Rainier Infrastructure will allow customers to “mix-
and-match” compute resources
The Rainier Infrastructure The Rainier Infrastructure will allow customers to “mixwill allow customers to “mix--
andand--match” compute match” compute resourcesresources
Page 18HPCx Presentation 4th October 2006
DARPA – Cascade ProjectAdvanced Research Program
Goal of a “trans-petaflops system”Robust, easier to program, more broadly applicable
Phase IStarted in June 2002 for one yearFive total vendorsUniversity partners
Phase IIAwarded in June 2003$49.9M to CrayTwo other vendorsThree-year contracts
Phase IIIProposal submissions in May 2006Award to one or two vendors
“The cycle time from when engineers have an idea to when they have a program ready to
run is one of the bottlenecks in high-end computers, and it will
only become worse as we develop bigger and bigger
machines," -Robert Graybill- HPCS Program
“The cycle time from when engineers have an idea to when they have a program ready to
run is one of the bottlenecks in high-end computers, and it will
only become worse as we develop bigger and bigger
machines," -Robert Graybill- HPCS Program
Page 19HPCx Presentation 4th October 2006
TopicsCray IntroductionThe Cray XT3Cray RoadmapSome XT3 Applications
Page 20HPCx Presentation 4th October 2006
Cray XT3 for Science
Material Science
FusionEarth Science
Atomic Modelling•LSMS at Pittsburgh SCNanoparticle Research•Behaviour of FePt nanoparticles at ORNL
Plasma Simulation•Behaviour of plasma in a tokamak
Understanding Earthquakes•Earthquake Simulation at PSC
Global Climate Modelling•ECHAM at Max Plank
Significant Earthquakes
Page 21HPCx Presentation 4th October 2006
Atomic Modeling Code Performance on Cray XT3
0
2,000
4,000
6,000
8,000
10,000
0 500 1000 1500 2000
LSMS Performance on Cray-XT3 (bigben) at PSC
GFLOPS for the Run
Perfect Scaling (GFLOPS)
Estimated Performance (GFLOPS)
Tota
l Per
form
ance
(GFL
OPS
)
Number of Atoms (Nodes)
8.09 TFLOPS on 2068 nodes
8.03 TFLOPS on 2048 nodes
As processors are added to the problem, efficiency stays high As processors are added to the problem, efficiency stays high
Page 22HPCx Presentation 4th October 2006
Nanoparticle Research for Next Generation Storage using a Cray XT3 at Oak Ridge National Laboratory
The Science Challenge Potential to develop revolutionary new magnetic storage mediaNeed realistic thermodynamic description of the magnetic behavior of FePt nanoparticles
Computational Challenge The ability to rapidly perform the magnetic energy evaluations for nanoparticles containing thousands of atomsRequiring linear scaling codes, efficient numerical methods, andfast communication between processors
HPC SolutionCray XT325 TFlopsNanoparticles contain
thousands of atoms
Sets World Record – First time simulated FePt particle with 2662 atoms
Sets World Record – First time simulated FePt particle with 2662 atoms
Page 23HPCx Presentation 4th October 2006
Largest ever AORSA Simulation3072 processors of NCCS Cray XT3
The Science Challenge Simulate the behavior of plasma in a tokamak, the core of the multinational fusion reactor, ITER
Computational Challenge The code, AORSA, solves Maxwell’s equations –describing behavior of electric and magnetic fields and interaction with matter – for hot plasma in tokamak geometry
HPC SolutionIn August 2005, the largest run by Oak Ridge National Laboratory researcher Fred Jaeger.Utilized 3072 processors; roughly 60% of the entire Cray XT3
AORSA on the Cray XT3 “Jaguar” system compared with Seaborg, an IBM Power3. The columns represent execution phases of the code:
Aggregate is the total wall time, with Jaguar showing more than a factor of 3 improvement over Seaborg.
Page 26HPCx Presentation 4th October 2006
Understanding Earthquakes using “Big Ben” Cray XT3 at Pittsburgh Supercomputer Center
Significant Earthquakes
Modeling of Wave Propagation
The Science Challenge Simulating a magnitude 7.7 fault in southern California earthquake centered over a 230km portion of the San Andreas Predict worst hit locations, modify building codes Take into account multiple different soil layers and complex ground motion
Computational Challenge: A new simulationModels impact on larger areaHigher vibration frequencies – where most damage expected
HPC Solution “Big Ben” – Cray XT3 with 2090 AMD Opteron processors
Accurately forecast ground motion –ultimately saving lives
Accurately forecast ground motion –ultimately saving lives
Page 27HPCx Presentation 4th October 2006
WRF: Hurricane Katrina
• 48 hour WRF forecast run of Hurricane Katrina
• Domain Size: 480 x 400 x 31
• 4KM Grid Resolution
• Integration Timestep: 20 seconds
• Output: Once an hour
• Runtime: 49 minutes on 240 Cray XT3 processors
• Image is water vapor content 2KM above the ground
Page 29HPCx Presentation 4th October 2006
The Science ChallengeIncrease our ability to understand, detect and eventually predict the human influence on climateMost scientists agree that global warming is influenced by human activity
Computational Challenge Advance global climate modeling code to 50 km grid spacing and 60 vertical levels
HPC Solution Cray XT3 with thousands of processors ran ECHAM5 at a record speed of 1.4 Tflops
Cray XT3 Enabled the Fastest-Ever ECHAM5 Performance
Cray XT3 Enabled the Fastest-Ever ECHAM5 Performance
Climate Modeling using the Cray XT3:Performance Evaluation at Max Planck Institute for Meteorology
“MPI-M estimates that a Cray XT3 would make it possible to complete their next-
generation IPCC assessment runs in about the same real time as today, despite
requiring 120 times more computation. This advance promises to significantly
improve the scale and scope of the analysis researchers will be able to
submit for the next assessment report of the IPCC.”
- Dr. Annette Kirk- Max-Planck Institute for Meteorology
-http://idw-online.de/pages/de/news136278
“MPI-M estimates that a Cray XT3 would make it possible to complete their next-
generation IPCC assessment runs in about the same real time as today, despite
requiring 120 times more computation. This advance promises to significantly
improve the scale and scope of the analysis researchers will be able to
submit for the next assessment report of the IPCC.”
- Dr. Annette Kirk- Max-Planck Institute for Meteorology
-http://idw-online.de/pages/de/news136278
Page 30HPCx Presentation 4th October 2006
ECHAM5 T255L60 Performance on XT3
0,0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
384 768 1536 2048 3072CPUs
Tim
e pe
r tim
este
p (s
econ
ds)
0
200
400
600
800
1000
1200
1400
Sust
aine
d G
flops
Recommended