14
CCS machine development plan for post-peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba

CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba

Embed Size (px)

Citation preview

Page 1: CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba

CCS machine development plan for post-peta scale computing

and Japanese the next generation

supercomputer project

Mitsuhisa SatoCCS, University of Tsukuba

Page 2: CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba

2

2010.2.22

core core

core core

mem

ory

core core

core core

core core

core core

core core

core core

mem

ory

mem

orym

emory

core core

core core

core core

core core

mem

orym

emory

core core

core core

core core

core core

core core

core core

core core

core core

core core

core core

core core

core core

mem

orym

emory

mem

orym

emory

mem

orym

emory

IO interface IO interface

Network (DDR Infiniband x 4)

#node 2560 node (Intel Xeon 2.8GHz, single core /node) peak performance 14.34 TF memory   5 TB network 250MB/s/link x 3 (3D-HXB by GbE)

#node 2560 node (Intel Xeon 2.8GHz, single core /node) peak performance 14.34 TF memory   5 TB network 250MB/s/link x 3 (3D-HXB by GbE)

L1 SWs

Nodes

L2 SWs

L3 SWs

Full bi-sectional FAT-tree Network

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 361 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

1 2 3 4 5 6 7 8 9 10 11 1211 22 33 44 55 66 77 88 99 1010 1111 1212

1 2 3 4 5 6 7 8 9 10 11 1211 22 33 44 55 66 77 88 99 1010 1111 1212

: #Node with 4 Linksn

: #24ports IB Switchn

: #Node with 4 Linksn : #Node with 4 Linksnn

: #24ports IB Switchn : #24ports IB Switchnn

696Node

Total switch 616

#Item

144Level 3 switch

240Level 2 switch

232Level 1 switch

696Node

Total switch 616

#Item

144Level 3 switch

240Level 2 switch

232Level 1 switch

Detail View for one network unit

x 20 network units

※ ノード総数696台にはオンラインの スペアノード4台を含みます。

Designed by T2K Open SupercomputerAlliance (U. Tokyo and Kyoto U)

Spec ;• 648 nodes (quad Opteron, 4sockets/node)• 10000 cores• Peak performance 95.4TF• total memory 20TB• total disk capacity 800TB( 20th in top 500, June, 2008)

A Special-purpose system to Astrophysics simulation by hybrid computation of radiation and N-body.

Each node is equipped by GRAPE-6, which is an accelerator specialized for N-body Gravity calculation.

256 nodes  performance: cluster 3.5TFLOPS + Grape-6 35TFLOPS

PACS-CS FIRST

T2K-tsukuba

GRAPE-6

(2006 ~) (2007 ~)

(2008 ~)

Computing resources in CCS

Page 3: CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba

3

2010.2.22

System installation and future plans

2004 20122011201020092005 2006 2007 2008

FCS-IV

FCS-V

HA-PACS(planned)

PACS-CS

CP-PACS

T2K

FIRST

H16 H17 H18 H19 H20 H21 H22 H23 H24

(計画)

2011-2013

VPP suspended

2013

NGS(10PF)

FCS: Front-end system

the next systemto T2K

Page 4: CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba

4

2010.2.22

Issues for Post-peta scale systems (not exa?)

System to enable strong-scaling the current petascale system enabled by weak-scaling We need more powerful node & network

GPGPU is one of solution

More specialized architecture we need a sharp science target All applications cannot use

More difficult to program Need supports from CS-side Collaboration with computer science

and computational science

1 10 102 103 104 105 106

1GFlops109

1TFlops1012

1PFlops1015

1EFlops1018

#node

Peakflops

limitationof #node

Exaflops system

PACS-CS(14TF)

target ofHA-PACS

NGS> 10PF

CCS's mission

Page 5: CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba

5

2010.2.22HA-PACS: Highly Accelerated Parallel Advanced system for Computational Sciences

(planned)Objective: to investigate acceleration technologies for post-petascale computing and its software, algorithms and computational science applications, and demonstrate by building a prototype system

Objective: to investigate acceleration technologies for post-petascale computing and its software, algorithms and computational science applications, and demonstrate by building a prototype system

Design and deploy a GPGPU-based Cluster system Research on programming model and languages, environment for parallel

system with accelerators. Design of Algorithms and applications for parallel system with accelerators. Research on architectures for parallel system with accelerators.

IB switch IB switch

..............

18 node

IB switch IB switch

..............

18 node

IB switch IB switch

..............

12 node

IB switch IB switch

..............

12 node

IB switch IB switch

..............

18 node

IB switch IB switch

..............

18 node8coreCPU

8coreCPU

GPGPU GPGPU

Infiniband QDRx 2 port

........

18 groups

2-stage Fat-Tree (Infiniband QDR)..... .........

•ノード構成:8-core CPU x 2 + GPU x 2•ネットワーク構成:Infiniband QDR x 2 / node

Full-bisection B/W Fat-Tree•ピーク性能:2TFLOPS/node x 324

= 648TFLOPS

Total #node = 18x18 = 324

examples

Page 6: CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba

6

2010.2.22

IB switch IB switch

..............

12 node

IB switch IB switch

..............

12 node

IB switch IB switch

..............

12 node

HA-PACS/NG powered by PEARL Link

CPUCPU

PEACHPEACH GPUGPU

to neighbor node

To external PCI-e switch

To neighbor node

CPUCPU

PEACHPEACH GPUGPU

to neighbor node

To external PCI-e switch

To neighbor node

CPU

GPGPU

CPU

GPGPU

Infiniband QDR

..... .........

GPGPU GPGPU GPGPU..............

PEARL Link

PEARL Link

PCIe PCIe

Infiniband QDR

CPUCPU

PEACHPEACH GPUGPU

CPUCPU

PEACHPEACH GPUGPU

CPUCPU

PEACHPEACH GPUGPU

CPUCPU

PEACHPEACH GPUGPU

DirectConnectionbetweenGPUs

PEARL: PCI-Express Adaptive and Reliable LinkUse PCI-Express as a high-speed linkConnect CPU and devices including GPGPU through a router chip, PEACH (PCI-Express Adaptive Communication Hub)

Page 7: CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba

7

2010.2.22

Strategic target computational sciences of HA-PACS

① Bio-physics : high performance QM/MM hybrid simulation for mechanisms of high-efficiency enzymatic reactions, electronic and 3D structures of biomacromolecules

Speedup of QM is a key for this simulations

② astrophysics: full Hydrodynamics and radiative-transfer simulation for the Universe and Formation of Astronomical objects

Full 6 dimensional simulation is required

③ Particle physics: full-lattice QCD simulation

Page 8: CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba

Japanese the next generation supercomputer project

Page 9: CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba

9

2010.2.22

background: Japanese government plan The 3rd Science and Technology Basic Plan (FY2006-FY2010) “Next-generation super computing technology” is selected as one of

key technologies of national importance Development and installation of the advanced high performance

supercomputer system (10petaflops) → the Next-Generation Supercomputer

Development application software Establishment of “Advanced Computational Science and Technology

Center” (tentative name) The 4th Science and Technology Basic Plan (FY2011-FY2015)

(Now under discussion) Exaflops class HPC Technology New chip device, software, hardware…

After the election of the House of Representatives in the last summer,….

In the November of the last year, the new government party have decided to freeze the plan of the development at the screening of government projects!!!

In January of this year, the cabinet have made a decision to resume the super computer project.

Page 10: CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba

10

2010.2.22

The System Overview of NGS

Ultra high-speed/ high-reliable CPU Advanced 45nm process technology 8cores/CPU, 128GFLOPS Error recovery ( ECC, Instruction retry, etc.)

High performance/highly reliable network Direct interconnection network by multi-dimensional mesh/torus network Expandability and reliablity

System Software Linux OS Fortran, C, and MPI libraries Distributed parallel file system

【 Massively Parallel/Distributed Memory Supercomputer 】

Logical 3-dimensional torus network

Courtesy of FUJITSU

Page 11: CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba

11

2010.2.22

Configuration of Compute Nodes Number of nodes > 80k

Number of CPUs > 80kNumber of cores > 640k

Peak Performance > 10PFLOPSTotal Memory Capacity > 1PB ( 16GB/node )

Multi-dimensional mesh/torus networkPeak bandwidth: 5GB/s x 2 for each direction of logical 3-dimensional torus networkPeak bi-sectional bandwidth: > 30TB/s

ノード

CPU: 128GFLOPS(8 Core)

CoreSIMD(4FMA)

16GFlops

CoreSIMD(4FMA)

16GFlops

CoreSIMD(4FMA)

16GFlops

CoreSIMD(4FMA)

16GFlops

CoreSIMD(4FMA)

16GFlops

CoreSIMD(4FMA)

16GFlops

CoreSIMD(4FMA)

16GFlops

L2$: 5MB

64GB/s

CoreSIMD(4FMA)16GFLOPS

MEM: 16GB

Logical 3-dimensional torus network for programmingx

y

z

5GB/s x 2 5GB/s x 2

5G

B/s x

25

GB

/s x 2

5GB/s x 2

5GB/s x 2

Page 12: CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba

12

2010.2.22

The Next-Generation Supercomputer Project

FY2008 FY2009 FY2010 FY2011

Computerbuilding

Researchbuilding

FY2007FY2006 FY2012

Next-GenerationIntegrated NanoscienceSimulation

Next-GenerationIntegratedLife Simulation VerificationVerificationDevelopment, production, and evaluationDevelopment, production, and evaluation

Tuning and improvement

Tuning and improvement

VerificationVerification

Production, installation, and adjustment

Production, installation, and adjustment

ConstructionConstructionDesignDesign

ConstructionConstructionDesignDesign

Prototype andevaluation Detailed design Detailed design

Conceptualdesign

Development, production, and evaluationDevelopment, production, and evaluation

System

Bu

ildin

gs

Ap

plic

atio

ns

○Schedule

open to users

Page 13: CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba

13

2010.2.22

The categories of users of NGS

1. Strategic Use: MEXT selected 5 strategic fields from national viewpoint. Field 1: Life science/Drug manufacture Field 2: New material/energy creation Field 3: Global change prediction for disaster

prevention/mitigation Field 4: Mono-zukuri (Manufacturing technology) Field 5: The origin of matters and the universe

2. General Use: The use for the needs of the researchers in many

science and technology fields including industrial use and educational use

Page 14: CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba

14

2010.2.22

Organization for NGS “Advanced Computational Science and Technology Center”

(ACSTC) (tentative name) will be organized at NGS.

MEXT selects 5 core organizations that lead research activities in 5 strategic fields

ACSTC → Core research center• Conducts advanced and basic R&D in computational science• Leads cooperation among strategic fields• Provides key knowledge to 5 organizations in strategic fields and another

research organizations 5 core organizations → Research center in each field

• Conducts advanced R&D in each field

• CCS was selected as a core organization for "Field 5: The origin of matters and the universe"

• particle physics, Astrophysics, nuclear physics• Collaboration with KEK and National Observatory