49
-1- Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational Physics University of Tsukuba 5 March 2002 ISCSE’02

Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

Embed Size (px)

Citation preview

Page 1: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-1-

Development of Next-Generation Massively Parallel Computers

for Continuous Physical Systems

Akira Ukawa

Center for Computational PhysicsUniversity of Tsukuba

5 March 2002 ISCSE’02

Page 2: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-2-

Computational Needs of Physical Sciences

Need more computing power to analyze: larger system sizes/more degrees of freedomlonger time intervals/smaller time stepsmore complex systems

Need improved quality to tackle:complex phenomena having multiples of interactions typesmultiples of scales

1 TFLOPS (1996) → 10TFLOPS (2000) → 100TFLOPS (2004?)

Page 3: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-3-

Difficulties with conventional supercomputer development

0.1

1

10

100

1000

104

105

106

1975 1980 1985 1990 1995 2000 2005 2010

Earth Simulator

CP-PACS/2048

ASCI Q

ASCI final

ASCI RedASCI-BlueMountainASCI BluePacific

ASCI White

QCDPAX/480

NWT/160

GRAPE-1

GRAPE-6

0.0001

0.001

0.01

0.1

1

10

100

1000

GFLOPS

year

TFLOPS

CRAY-1

vector-parallel machines VPP,SR,SX,ESHigh efficiency, but large development and running costscalar-parallel machines ASCI/Red,Blue,White,QLess expensive, but lower efficiency and large system size dueto lower packaging density

Further enhancement along these architectures facing serious difficulties

Page 4: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-4-

Our Strategy: Classification and Synthesistwo broad categories of physical systems

continuous systems: fluids,wave functions,…local but complex force lawsonly O(N) computations but need general-purpose CPU

particle-based system: DNA/proteins,galaxies,…long-ranged but universal and simple force lawsO(N2) computations but special-purpose CPU effective

Pursue Hybrid approach:general/special processors for continuous/particle systems

hybridization of the two systems for efficient processing of complex physical systems

Page 5: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-5-

RFTF ProjectDevelopment of Next-Generation

Massively Parallel Computers

Development for Continuous Physical Systemscore member:Akira Ukawa, Taisuke Boku

Center for Computational Physics, University of Tsukuba

Development for particle-based Systemscore member:Junichiro Makino

Graduate School of Science, University of Tokyo

Leader: Yoichi Iwasaki, University of Tsukuba

Development of GRAPE-6

Page 6: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-6-

Target of our Project R&D of key technologies fornext-generation MPP for continuous physical systems

new processor architecture SCIMAinterconnect and system designparallel I/O and visualization environment PAVEMENT

development of hybrid multi-computer system (HMCS) for coupled continuous/particle-based simulations

in collaboration with the Makino subprojectprototype system development

CP-PACS(continuous) and GRAPE-6(particles)novel astrophysics simulation with the prototype

galaxy formation under UV radiation

Page 7: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-7-

Project Organization

Computer Science Computational Physics

Core member : Taisuke Boku (Univ. of Tsukuba)

Shigeru Chiba (Tokyo Inst. of Technology)

Tsutomu Hoshino (Univ. of Tsukuba)

Hiroshi Nakamura (Univ. of Tokyo)

Ikuo Nakata (Housei Univ.)

Kisaburo Nakazawa (Meisei Univ.)

Shuichi Sakai (Univ. of Tokyo)

Mitsuhisa Sato (Univ. of Tsukuba)

Tomonori Shirakawa (Univ. of Tsukuba)

Daisuke Takahashi (Univ. of Tsukuba)

Moritoshi Yasunaga (Univ. of Tsukuba)

Yoshiyuki Yamashita (Saga Univ.)

Koichi Wada (Univ. of Tsukuba)

Yoshiyuki Watase (KEK)

Core member : Akira Ukawa (Univ.of Tsukuba)

Sinya Aoki (Univ. of Tsukuba)

Kazuyuki Kanaya (Univ. of Tsukuba)

Taishi Nakamoto (Univ of Tsukuba)

Masanori Okawa (KEK)

Hajime Susa (Univ. of Tsukuba)

Masayuki Umemura (Univ. of Tsukuba)

Tomoteru Yoshie (Univ. of Tsukuba)

Leader : Yoichi Iwasaki (Univ. of Tsukuba)

Page 8: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-8-

Project Organization IIProcessor architecture SCIMA

H. Nakamura,S. Chiba, M. Sato, D. Takahashi, S. SakaiM. Kondo, M. Fujita, T. Ohneda, C. Takahashi, M. Nakamura

Interconnect and system designT. Boku, E. Oiwa

Parallel I/O and visualization environment PAVEMENTT. Boku, K. Itakura, M. Umemura, T. NakamotoM. Matsubara, H. Numa, S. MiyagiKubota Graphics Technologies

HMCS and astrophysics simulationT. Boku, M. Umemura, H. SusaM. MatsubaraJ. Makino, T. Fukushige

Page 9: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-9-

Novel Processor Architecture SCIMA for HPC applications

Memory-CPU gapDesign conceptPerformance evaluation RTL designCompiler

Page 10: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-10-

Growing gap of CPU and DRAMµProc60%/yr.

DRAM7%/yr.

1

10

100

100019

8019

81

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

CPU

1982

Processor-MemoryPerformance Gap:(grows 50% / year)

Perfo

rman

ce

“Moore’s Law”

year

cited from: Fig 5.1. “Computer Architecture A Quantitative Approach (2nd Edition)”by J.Henessy and D.Patterson, Morgan Kaufmann (ISBN: 1-55860-329-8)

Effective performance of CPU limited by slow memory access

Page 11: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-11-

SCIMA: Software ControlledIntegrated Memory Architecture

Concept of SCIMA

Memory(DRAM)

・・・

On-Chip Memory(SRAM)

L1 Cacheregister

ALU

MMU

NIA

Network

FPU addressable On-ChipMemory in addition to ordinary cache for controlled allocation ofdata frequently used

page-load/page-storeinstruction between on-chip memory and off-chip memory for large granularity data transfer

Page-load/store

Page 12: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-12-

Benefit of On-Chip Memoryallocation / replacement is under software control

frequently used data is guaranteed to reside interferences between/within arrays are avoided

data transfer from/to off-chip memory is under software control

large granularity transfer & block stride transfer

only regularly accessed data can enjoy this benefitdata cache is still provided for irregular data

can fully exploit data reusability

effective use of off-chip memory bandwidth

Page 13: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-13-

Performance Evaluation

BenchmarksNAS Parallel Benchmark: CG, FTQCD (quantum chromodynamics)

real application: quantum mechanical systemOptimization :

by hand following the optimization strategy

Data set : NAS PB: class-W for saving simulation timeQCD: practical data size

6 x 6 x 12 x 12 (4 dim. space-time)[48 x 48 x 48 x 96 on 2048 PU MPP]

Page 14: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-14-

0.0E+002.0E+074.0E+076.0E+078.0E+071.0E+081.2E+081.4E+081.6E+08

32B

128B 32

B12

8B 32B

128B 32

B12

8B

Cache SCIMA Cache SCIMA

64K 512K

Exe

cuti

on C

ycle

s

throughput stall

latency stall

CPU busy time

Results of QCD

line size(B)

• SCIMA: 1.5-3.0 times faster

• Cachelarger line ☺ latency stall

throughput stall

• conflict on copied data • unnecessary data transfer

1.5 times1.7 times

latency=160cycle, throughput=1B/cycle longer latency, narrower throughput

Page 15: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-15-

0.0E+00

1.0E+08

2.0E+08

3.0E+08

4.0E+08

5.0E+08

6.0E+08

32B

128B 32

B12

8B 32B

128B 32

B12

8B

Cache SCIMA Cache SCIMA

64K 512K

Exe

cuti

on C

ycle

s throughput stall

latency stallCPU busy time

Results of FT

• SCIMA: 2.0-4.2 times faster

• Cachelarger line ☺ latency stall

throughput stall

• conflict on copied data • unnecessary data transfer

line size(B)

2.3 times 2.0 times

latency=160cycle, throughput=1B/cycle longer latency, narrower throughput

Page 16: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-16-

RTL design to check surface area and delay

Compare MIPS R10000 processor (cache model) and SCIMA processor built on R10000 architectureRTL level design in Verilog-HDLSynthesis to gate level using

VDEC 0.35μm process library Synopsis design compiler

evaluate chip surface area and delay

Instruction Cache Data Cache/On Chip Memory

System Interface

Addr InstructionAddr data

TLB

Addr.calc

ALU 1

ALU 2

FP Adder

FP Multiplier

switch

IntegerQueue

Flt.Pt.Queue

PLSQueue PLS

addr.stk

OCMtestAddressQueue

ctrl

Designed part in red

Page 17: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-17-

Result of evaluation

Delaylongest delay in cache model comes from instruction issue mechanism; possible critical path4.9% increase for SCIMA processorSCIMA model still much faster than the cache model

cache modelSCIMA model

6.116.48

Delay in instruction issue[ns]

Surface areaaddress queue(load/store issuing mechanism) occupies 3% of R10000 chipthis area expands by 1.7 for SCIMA chip;

negligible effect on chip surface area

Page 18: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-18-

Compiler for SCIMA processorSCIMA directives

On-chip memory mapping Control on-chip and off-chip memory transfer.

backend: re-targetable code generatorfor various architectural parameters.

Number of registersInstructions

Front-end- translates SCIMA directives

Fortran source+ SCIMA directives

Back-end-Translates to SCIMA inst.

- re-targetable for architecture

Object Codefor SCIMA processor

C code+ SCIMA runtime lib

Sample program

double precision sum

double precision a(N*2,N*2)

!$scm begin (a, N, N, 0, 0)

!$scm load (a, N + 1, N + 1, N, N)

sum = 0.0

do i = N + 1, N * 2

sum = sum + a(i, i)

enddo

!$scm end (a)

Page 19: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-19-

Interconnect and system design

SMP configuration of SCIMAsystem image

Page 20: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-20-

SMP configuration of SCIMA

Address space is partially mapped onto On-Chip Memory (exclusive among processors)Memory access management system is slightly modified for SMP

Processor 1 Processor 2 Processor N

Bus

Off-Chip Memory(main memory)

Registers

cache On-ChipMemory

Registers

cache

Registers

cacheOn-ChipMemory

On-ChipMemory

Page 21: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

Scalability on number of processors on SMP(Matrix-Matrix Multiplication Benchmark)

0

12

34

5

67

8

0 1 2 3 4 5 6 7 8

Number of processors

Speed-u

SCIMACACHEideal

0

0.2

0.4

0.6

0.8

1

1.2

0 1 2 3 4 5 6 7 8

Number of Processors

Data traffic [

SCIMA

CACHE

N = 300, Bus band width: 4 [byte/cycle],Off-Chip memory access latency: 40[cycle]

Speed-up Ratio Bus traffic per cycle

オーバーヘッド 小SCIMA obtains best granularity for data movement between on-chip/off-chip memories

Page 22: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-22-

・・・

・・・

・・・

・・・

・・・

・・・

Y-Crossbar

X-Crossbar

Cluster R

outer

・・・

・・・・・・

・・・

・・・

・・・

Y-Crossbar

X-Crossbar

Cluster R

outer

Network and Total System Image

…..

High bandwidth Optical Link & Switch(Cluster Link & Switch)

Full Crossbar (16)

・・・・・・

・・・

・・・

・・・

・・・

Y-Crossbar

X-Crossbar

Cluster R

outer

Low Latency Electrical Link & Switch(Intra-Cluster Link & Switch)2-D Hyper Crossbar (16x16)

Multi-bankShared Memory

4-way SCIMA SMP

Exchanger(NIA)

to X-Crossbar to Y-Crossbar

2 GHz SCIMA (4 operations / clock)× 4 = 32 GFLOPS / node

32 GFLOPS× (16 x 16) x 16 = 131 TFLOPS

Page 23: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-23-

Parallel I/O environementPAVEMENT

Data I/O and visualization issuesPIO (parallel I/O system)PFS (parallel file system)VIZ (parallel visualizer)

Page 24: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-24-

Problem of conventional I/O and visualization

MPPfront-end computer

Parallel Processing Sequential Processing

Data Serialization lack of

throughputlack of

computation power

(single connection)

single connection and/or sequential processing limits the overall performance

Page 25: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-25-

Goal of parallel I/O and visualization

parallel processingparallel processing

MPPFront-end computer

multiprocessor system

parallel connections

NO serialization during the whole computation

Page 26: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-26-

Design Strategy of Parallel I/O System

Parallel Processing both in Network and Front-end System

making full use of parallel I/O processors in MPP free from bottleneck caused by serialization

Commodity-based Network Interface100base-TX, Gigabit Ethernet, etc.widely used protocol (TCP/IP) plenty of portabilityhigh-performance and inexpensive I/O system

dynamic load balancing

Page 27: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-27-

Experimental EnvironmentMassively Parallel Processor

CP-PACS(2048 PUs, 128 IOUs)

Parallel Visualization ServerSGI Onyx2 (4 Proc.s)

Parallel File ServerSGI Origin-2000 (8 Proc.s)

SwitchingHUB

100Base-TXEthernet

……

8 links

Alpha Cluster(16 Nodes)

……

Page 28: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-28-

System Image

Application

A

PIO-API

PIO Server PIO Server

Application

C

PIO-APIApplication

B

PIO-API

Parallel Network

Machine-A Machine-B

Page 29: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-29-

PAVEMENT/VIZ

Parallelized 3-D volume rendering moduleExtension modules for AVS/Express (defactostandard visualization environment ); compatible user interface with original moduleUse PAVEMENT/PIO for high throughput parallel data streaming

Development partly with Kubota Graphics TechnologiesTo be included in their free software package

Page 30: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-30-

Example of PAVEMENT/PIO and VIZ at work

CP-PACS 2048PU 128IOU(614GFLOPS for calculation)

+ 16 channels

(For I/O)+

Origin2000 8CPU(For volume rendering)

Reionization of the Universe Real-Time Visualization

Page 31: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-31-

Heterogeneous Multi-Computer System (HMCS)

MotivationsConceptPrototype with CP-PACS/GRAPE-6Application to galaxy formation

with J. Makino and T. Fukushige

Page 32: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-32-

Multi-Scale Physics Simulation

Simulation of complex phenomena involvingMultiple types of interactions

Classical (gravity, electromagnetizm, …)Quantum Dynamics, …

Short- and long-ranged InteractionsDifference in Computation Order

O(N2) ex. gravitational forceO(N log N) ex. FFTO(N) ex. straight-CFD

Page 33: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-33-

Concept of HMCS– Heterogeneous Multi-Computer System –

Combining Particle Simulation (ex: Gravity interaction) and Continuum Simulation (ex: SPH) in a PlatformCombining General Purpose Processor (flexibility) and Special Purpose Processor (high-speed)Connecting General Purpose MPP and Special Purpose MPP through High-throughput parallel Network

Page 34: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

Prototype HMCS SYSTEM

Alpha cluster(16×alpha21264)

PCI×8

GRAPE-6

8boards

8TFLOPS

CP-PACS2048PU

614GFLOPS

RDMA: 300MB/s100base TX

Origin-2000Onyx-2

4IOU 8IOU

HUB

(with KGT)

(Parallel IO)

Page 35: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-35-

g6cpplib

g6cpp_start(myid, nio, mode, error)

g6cpp_unit(n, t_unit, x_unit, eps2,

error)

g6cpp_calc(mass, r, f_old, phi_old,

error)

g6cpp_wait(acc, pot, error)

g6cpp_end(error)

API to access GRAPE-6 from CP-PACS

Page 36: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-36-

Breakdown after sent to GRAPE-6 cluster

4.9522.4782.1391.6470.695communication

0.4400.1580.0630.0290.012calculation

0.6100.3570.2290.1650.139set j-particle

2.3572.2491.7861.4361.012all-to-all

128K64K32K16K8K# of particles(sec)

GRAPE-6 takes just 1 sec for 128K particles

Page 37: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-37-

55

101033

00

Redshift Redshift ZZBig Bang

0.10.1MyrMyr

0.50.5GyrGyr

1515GyrGyr

1010

((The PresentThe Present))

1010--4444secsec

11GyrGyr

Cosmic TimeCosmic Time

dE dIE S0 SB S

Matter (Baryon, Dark matter)Matter (Baryon, Dark matter)

RadiationRadiation

QSO QSO

NeutralNeutral

IonizedIonized

DARK AGEDARK AGE

Galaxy formation on HMCS

Galaxy formation

Page 38: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-38-

Hydrodynamic motion of matter + Gravity acting on matter

+ Radiative transfer1. Hydrodynamics:

Smoothed Particle Hydrodynamics (SPH) method is employed.

2. Self-gravity:Barnes-Hut Tree is effective for serial calculation, but difficult to parallelize.→ Direct summation is made by GRAPE-6.

3. Chemical reaction & radiative cooling are included.

4. Radiative transfer (RT): RT is solved with a method by Kessel-Deynet &Burkert.

Galaxy formation under UV radiation

Page 39: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-39-

SPH(Smoothed Particle Hydrodynamics)

ρ

0

kernel function

( ) ( ¦ ¦)

:

i i jjj

W

W

ρ ρ= −∑r r r

Matter density represented Matter density represented as a collection of particlesas a collection of particles

Page 40: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-40-

)(1νννν

ν χ ISItI

c−=∇⋅+

∂∂ n

Radiative Transfer Equation

Boltzmann equation for photon distribution function6D calculation, hence computationally heavy

(3D for space + 2D for ray direction + 1D for frequency)

Page 41: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-41-

Radiative Transfer for SPH

P1

P2

ST

SourceTargetE1

E2 E3

E4E5

P3

P4

P5

Accurate calculation of optical depth along light paths required. Use the method by Kessel-Deynet & Burkert (2000) .

θ

( )( )1 12 i i i iTS E E E Ei

n n s sστ+ +

= + −∑

Page 42: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-42-

Algorithm

Density (SPH)

Radiative Transfer

Chemical Reaction

Temperature

Integration of Equation of Motion

Self-Gravity

Iteration

Pressure Gradient

GRAPE-6

CP-PACS

Page 43: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-43-

Calculation parameters

1024PU of CP-PACS 307GFLOPS4 boards of GRAPE-6 4TFLOPS

64K baryonic matter particles + 64K dark matter particles

4 sec for calculation on CP-PACS3 sec for communication to/from GRAPE-60.1 sec for calculation on GRAPE-6

total 7.1 sec/time step

Page 44: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-44-

QSO

Page 45: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-45-

0.3 0.3 GyrGyr0.5 0.5 GyrGyr

0.7 0.7 GyrGyr1 1 GyrGyr

At the initial stage, a density fluctuation is generated to match a cold dark matter spectrum. This fluctuation expands with the cosmic expansion, and simultaneously smaller-scale density fluctuations develop inside to form filamentary structures. Tiny filaments evaporate due to the heating by background UV radiation, whereas larger filaments shrink to coalesce into a condensed rotating cloud. This rotating cloud would evolve into a galaxy. This simulation has revealed that the background UV radiation plays an important role for the final structure of the galaxy.

Page 46: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-46-

Summary and Prospect Carried out R&D of key element technologies for HPC of continuum physical systems :

processor architecture SCIMAinterconnect and total system designparallel I/O and visualization environment

Floating point power alone will be insufficient to process complex multi-scale simulations:

both continuum and particle degrees of freedomboth short- and long-ranged interactionsmultiple of scales

Page 47: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-47-

Summary and Prospect II

Proposed HMCS as a paradigm to treat such systems :

combines high flexibility of general-purpose systems and high performance of special-purpose systems, distributing the computation load in a best way possible to each sub-system

built a prototype HMCS and demonstrated the effectiveness of the concept with a novel galaxyformation simulation in astrophysics

Page 48: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-48-

Summary and Prospect III

further development of the HMCS concept: HMCS-R: remote general/special systemsconnected through high-speed network

(e.g., superSINET)HMCS-E: embed special-purpose processors in the node of general purpose systems

Will provide an ideal platform for next generation of large-scale scientific simulations of complex phenomena

Page 49: Development of Next-Generation Massively Parallel ... file-1-Development of Next-Generation Massively Parallel Computers for Continuous Physical Systems Akira Ukawa Center for Computational

-49-

HMCS-E (Embedded)

G.P.CPU Special

Processor

Memory

NIC

High Speed Network Switch

general and special purpose processor unified in each node

Ideal combination of flexibility and high-performance