BuddyBland Titan SC12

7/29/2019 BuddyBland Titan SC12

1/12

Office ofScience

Titan - Early Exwith the Titan S

Oak Ridge National La

Oak Ridge Leadership Co

No


2/12

4Bud

4,352 ft2

404 m2

SYSTEM SPECIFICATIONS:

Peak performance of 27.1 PF (

18,688 Compute Nodes each w 16-Core AMD Opteron CP

NVIDIA Tesla K20x GPU

512 Service and I/O nodes

200 Cabinets

710 TB total system memory

Cray Gemini 3D Torus Intercon 8.9 MW peak power 8.3 avg.

ORNLs Titan Hybrid System: Crawith AMD Opteron and NVIDIA Tesprocessors


3/12

5Bud

X86 processor provides fast, singlthread performance for control &communications AMD Opteron 62

16 cores 141 GFLOPs p


4/12

6Bud

GPUs are designed for extremeparallelism, performance & powerefficiency

NVIDIA Tesla K2

14 Streaming Mult

2,688 CUDA cores

1.31 TFLOPs peak

6 GB GDDR5 mem

HPL: >2.0 GFLOP(Titan full system mea


5/12


6/128Bud

Titan:Cray XK7 System

Board:

4 Compute Nodes

5.8 TF

152 GB

Cab

2496

13

3.Compute Node:

1.45 TF

38 GB


7/1213 Bud

Why GPUs?High Performance and Power Efficiena Path to Exascale Hierarchical parallelism Improves scalability of applications

Exposing more parallelism through code refactoring and sourcdirectives

Heterogeneous multi-core processor architecture Use the rigprocessor for each task.

Data locality Keep the data near the processing. GPU has hbandwidth to local memory for rapid access. GPU has large in

Explicit data management Explicitly manage data movemen

CPU and GPU memories.


8/1214 Bud

Hybrid Programming Model On Jaguar, with 299,008 cores, we were seeing the lim

single level of MPI scaling for most applications

To take advantage of the vastly larger parallelism in Titneed to use hierarchical parallelism in their codes

Distributed memory: MPI, SHMEM, PGAS

Node Local: OpenMP, Pthreads, local MPI communicators

Within threads: Vector constructs on GPU, libraries, OpenA

These are the same types of constructs needed onmulti-PFLOPS computers to scale to the full size o

systems!


9/1215

Bud

How do you program these nodes? Compilers

OpenACC is a set of compiler directives that allows the user to express hierin the source code so that the compiler can generate parallel code for the taGPU, MIC, or vector SIMD on CPU

Cray compiler supports XK7 nodes and is OpenACC compatible

CAPS HMPP compiler supports C, C++ and Fortran compilation for heterogOpenACC support

PGI compiler supports OpenACC and CUDA Fortran

Tools

Allinea DDT debugger scales to full system size and with ORNL support wilheterogeneous (x86/GPU) apps

ORNL has worked with the Vampir team at TUD to add support for profiling heterogeneous nodes

CrayPAT and Cray Apprentice support XK6 programming


10/1221

Bud

Early Science Applications on Ti

Material Science(WL-LSMS)Role of material disorder,statistics, and fluctuationsin nanoscale materialsand systems.

Combustion (S3D)Combustion simulations toenable the next generationof diesel/bio- fuels to burnmore efficiently.

Climate Change (CAM-SE)Answer questions about specific climatechange adaptation and mitigationscenarios; realistically representfeatures like precipitationpatterns/statistics and tropical storms.

Nuclear EUnprecedenradiation trathat can be nuclear eneapplications

BiofueA multipmolecucode.

Astrophysics (NRDF)Radiation transport critical toastrophysics, laser fusion,combustion, atmosphericdynamics, and medical imaging.

H Eff ti GPU S l bl A li


11/1222

Bud

How Effective are GPUs on Scalable AppliOLCF-3 Early Science CodesVery early performance measurements on Titan

XK7 (w/ K20x) vs. XE6

Cray XK7: K20x GPU plus AMD

Cray XE6: Dual AMD 6274 and

Cray XK6 w/o GPU: Single AMD 62

Application PerformanceRatio

Comments

S3D 1.8 Turbulent combustion 6% of Jaguar workload

Denovo sweep 3.8 Sweep kernel of 3D neutron transport f 2% of Jaguar workload

LAMMPS 7.4*(mixed precision) High-performance molecular dynamics 1% of Jaguar workload

WL-LSMS 3.8 Statistical mechanics of magnetic mate 2% of Jaguar workload 2009 Gordon Bell Winner

CAM-SE 1.8*(estimate)

Community atmosphere model 1% of Jaguar workload


12/12

27Bud

Questions?

[email protected]

27

The research and activities des

presentation were performed using

the National Center for Computati

Oak Ridge National Laboratory, whi

the Office of Science of the U.S. Depunder Contract No. DE-AC050

Want to join ou

ORNL is hiring. C

http://jobs.o

Documents

BuddyBland Titan SC12