K-computer and Supercomputing Projects in Japan Makoto Taiji Computational Biology Research Core...
29
K-computer and Supercomputing Projects in Japan Makoto Taiji Computational Biology Research Core RIKEN Planning Office for the Center for Computational and Quantitative Life Science & Processor Research Team RIKEN Advanced Institute for Computational Science [email protected]
K-computer and Supercomputing Projects in Japan Makoto Taiji Computational Biology Research Core RIKEN Planning Office for the Center for Computational
K-computer and Supercomputing Projects in Japan Makoto Taiji
Computational Biology Research Core RIKEN Planning Office for the
Center for Computational and Quantitative Life Science &
Processor Research Team RIKEN Advanced Institute for Computational
Science [email protected]
Slide 2
Agenda K-computer Advanced Institute for Computational Science
High Performance Computing Infrastructure My own perspective in
future HPC, and MDGRAPE-4 (in short)
Slide 3
My Backgrounds Physics Special-purpose computers for scientific
simulations (1986~) Monte Carlo simulations of spin systems (1986,
m-TIS I) FPGA-based reconfigurable machine (1990, m-TIS II)
Gravitational N-body problems (1992~96, GRAPE-4,5) Molecular
Dynamics simulations (1994~, MD-GRAPE, MDM, MDGRAPE-3,4) Dense
Matrix Calculation, quasi-general-purpose machine (MACE, 2000)
Ultrafast laser spectroscopy (1987~92) Conjugated Polymers
Rhodopsin and Bacteriorhodopsin Learning process as dynamical
systems, multi-agent dynamics (1996~2002) Physical Random Number
Generator (1997~2004)
Slide 4
World situation of HPC (Top 500) Country Share of Japan: Down
to 6 th position
Slide 5
Next-Generation Supercomputer Project National project to
develop a leading general- purpose supercomputer in Japan Not for
single purpose cf. Earth Simulator Location: Kobe Port Island
Developer: Fujitsu Linpack 10 PetaFLOPS Partial operation: Spring
2011 Full service: Autumn 2012 K computer system (CG)
Slide 6
Mt. Rokko Sannomiya Port Island Kobe Sky Bridge Portliner To
Akashi / Awaji-Island To Osaka About 5km from Sannomiya 12 min. by
Portliner Ashiya Kobe Airport Kobe Medical Industry Development
Project Core Facilities Shinkansen-Line Shin-Kobe Station Photo:
June, 2006 K-computer & Advanced Institute for Computational
Sciences Location of K computer
Slide 7
RIKEN Advanced Institute for Computational Science National
Center to cover wide fields of computational science and
engineering
Slide 8
Formation of Central Hub in Kobe 8 Strategic Region Academia
Registered Organization Selection of applications User Support
Public Use Industry Advanced Institute for Computational Science
Operation Sophistication Operation Organization Use
Interdisciplinary Research, Computer Science Operation and
sophistication of the supercomputer, Computational Sciences
Interdisciplinary research Director: Dr. Kimihiko Hirao Strategic
Region Strategic Use
Slide 9
RIKEN Advanced Institute for Computational Science 9 Director
Operation Technology Division Research Promotion Division Research
Division Field Theory Research Team (TL: Yoshinobu Kuramashi)
Computational Biophysics Research Team (TL: Yuji Sugita)
Computational Materials Science Research Team (TL: Seiji Yunoki)
Computational Molecular Science Research Team (TL: Takahito
Nakajima) System Software Research Team (TL: Yutaka Ishikawa)
Processor Research Team (TL: Makoto Taiji) Deputy Director
Computational Science Research Computer Science Research
Slide 10
Grand Challenge Applications Next-Generation Integrated
Nano-Science Simulation Software (20062011) Next-Generation
Integrated Life-Science Simulation Software (20062012) To create
next-generation nano-materials (new semiconductor materials, etc.)
by integrating theories (such as quantum chemistry, statistical
dynamics and solid electron theory) and simulation techniques in
the fields of new-generation information functions/materials,
nano-biomaterials, and energy Base site: Institute for Molecular
Science Next-Generation Energy Solar energy fixation Fuel alcohol
Fuel cells Electric energy storage Electrons and molecules
Electrons Domain Electron theory of solids Quantum chemistry Doping
of fullerene and carbon nanotubes Molecular dynamics Condensed
matters Integrated system 5nm Self- organized magnetic nanodots
Semi- macroscopic Molecular assembly Next-Generation Nano
Biomolecules Next-Generation information Function Materials
One-dimensional crystal of silicon Polio virus Orbiton (orbital
waves) Ferromagnetic half-metals offon light Optical switch
Liposome Nafion Water 15nm Mesoscale structure of naflon membrane
Self- assembly Capsulation Nafion membrane Medicines, New drug, and
DDS Protein folding Nonlinear optical Device Nano quantum devices
Spin electronics Ultra high-density storage devices Integrated
electronic devices Water molecules inside lisozyme cavity Whole
body Cardiova scular system Cells Organs Tissues Micro Macro Meso
Microscopic approach MD/first principle/quantum chemistry
simulations Continuous entity simulations Size Base site: RIKEN
Wako Institute Electronic conduction in integrated systems Vascular
system modeling Skeleton model Fluids, heat, structures Achievement
of chemical reactions Molecular network analysis Protein structural
analysis Drug response analysis Proteins/ DNA 10 0 10 -1 10 -3~-2
10 -5~-4 10 -8~-6 High Intensity Focused Ultrasound Drug
development Tailor-made medicine Drug Delivery System Regenerative
medicine Surgical procedures Catheters Micromachines Hyperthermia
Macroscopic approach Organ and body scale Toward therapeutic
technology Molecular scale Cellular scale Viruses Anticancer drugs
Protein control Nano processes for DDC light 27 nm 46 nm To provide
new tools for breakthroughs against various problems in life
science by means of petaflops-class simulation technology, leading
to comprehensive understanding of biological phenomena and the
development of new drugs/medical devices and diagnostic/therapeutic
methods Brain Function
Slide 11
Appointment of Strategic Regions Computational resources and
budget will be allocated for the following regions Strategic
organization will organize the research Region 1. Foundations for
predictive life sciences, medical care, and drug design Region 2.
Innovation of new materials and new energies Region 3. Prediction
of global change for disaster prevention and reduction Region 4.
Next-generation manufacturing Region 5. Origin and structure of
matter and the universe 2009-2010: Feasibility Studies 2011-2015:
Strategic Researches 11
Slide 12
FY2008FY2009FY2010FY2011 Computer building Research building
FY2007FY2006FY2012 Shared file system Processing unit Front-end
unit (total system software) Next-Generation Integrated Nanoscience
Simulation Next-Generation Integrated Life Simulation Verification
Development, production, and evaluation Tuning and improvement
Verification Production, installation, and adjustment Production,
installation, and adjustment Production, installation, and
adjustment Construction Design Construction Design Prototype and
evaluation Detailed design Conceptual design Detailed design Basic
design Basic design Development, production, and evaluation
Production and evaluation System Buildings Detailed design Basic
design Basic design Schedule of Project Applications Strategic
Researches Research Promotion Preparatory Researches Preparatory
Researches Partial operation within FY2010, Full operation starts
from FY2012 Feasibility Studies 12
Slide 13
Features of K computer = K means 10 16 High Performance :
Linpack 10 PFLOPS Massive Parallelization > 80,000 Processors,
> 640,000 Cores SPARC64 VIIIfx: Processor designed for HPC
VISIMPACT / HPC-ACE extensions 16GB / node, 2GB / core ~20MW
Slide 14
K-Computer System Number of nodes : > 80,000 Number of
Processors: > 80,000 Number of Cores: > 640,000 Peak
Performance: > 10 PFLOPS Memory Capacity: > 1PB (16GB/node)
Network: Tofu interconnect (6-dim. Torus) User view: 3D-Torus
Bandwidth: 5GB/s bidirectional for each six direction 4
Simultaneous Communication Bisection Bandwidth: >30TB/s
(bidirectional, nominal peak) CPU: 128GFLOPS (8 Core) Core
SIMD(4FMA) 16GFlops Core SIMD(4FMA) 16GFlops Core SIMD(4FMA)
16GFlops Core SIMD(4FMA) 16GFlops Core SIMD(4FMA) 16GFlops Core
SIMD(4FMA) 16GFlops Core SIMD(4FMA) 16GFlops L2$: 5MB 64GB/s Core
SIMD(4FMA) 16GFLOPS MEM: 16GB 3D-Torus Network x y z 5GB/s x
Bidirectional 5GB/s x Bidirectional 5GB/s x Bidirectional 5GB/s x
Bidirectional 5GB/s x Bidirectional 5GB/s x Bidirectional
Slide 15
Cabinet of K computer 24 boards/cabinet 192 CPUs 24 TFLOPS
15
Slide 16
What is special in K computer? Network High Bandwidth, Low
Latency Processor for HPC VISIMPACT Shared Cache & Hardware
Barrier Multi-core parallelization of inner loop HPC-ACE Register
Extension SIMD 2FMA, 2 issue/cycle (4FMA/Core) Instructions for
special functions (trigonometric, inverse, square-root, inverse
square-root etc.) 16
Slide 17
17 T. Maruyama, Proc. Hot Chips 2009.
Slide 18
Software OS: Linux Compiler Fujitsu compiler will support
Fortran(2003), C(1999), C++(2003) GNU C/C++ extensions Automatic
vectorization for SPARC64 VIIIfx OpenMP 3.0 MPI-2.1 gcc may also be
available. However, it cannot generate CPU specific instructions
(e.g SIMD) and poor performance is expected.
Slide 19
How to use it? Five Strategic Regions has been selected. For
these fields, MEXT will fund some research budget, and machine time
will be delivered. General Use For general use, registered
organization will control distribution of machine time. Commercial
Use RIKEN does not responsible for the usage of the machine,
basically.
Slide 20
HPCI: High Performance Computing Infrastructure System to
utilize academic supercomputers in Japan 2012~ User Communities 5
strategic regions, Industrial Consortiums, National Universities
and Institutes Computing Resource Provider RIKEN AICS, University
Centers, National Institutes 20
Slide 21
Basic Idea of HPCI 21 Logical Structure Physical Structure 25
Organization13 Organization
Slide 22
Problem in Future of HPC Hardware If the problem can be
parallelized Computing performance is cheap. However, in every
aspects Data movements dominates costs. Core Cache Cache Main
Memory Node Node Node Disk System System/Apparatus/Internet 22
Slide 23
Future Processors for HPC Gap between top-end HPC processors
and commodity will increase What are needed for HPC Many-core
processors, Accelerators for dense problems Chip stacking for
bandwidth Network integration Network will be the most important
factor in HPC
Slide 24
Future Directions (1) Network integration is essential both for
general- purpose machines and special-purpose ones Platform for
Accelerators General-purpose processor cores Cache or local memory
Fast, low-latency on-chip and off-chip networks Network >30GB/s
Memory 100GB/s Memory PU Accelerator On-chip Network
>100GB/s/router
Slide 25
Future Directions (2) High Memory Bandwidth System Single-chip
BlueGene/L by System-on-Chip or Chip stacking by TSV B/F 1 B/F 0.1
for remote node Network >50GB/s Memory PU >500GB/s
>500GFLOPS
Slide 26
Problem in Network Molecular Dynamics: Strong Scaling is
important 50,000 FLOP/particle/step N=10 5 5 GFLOP/step 5TFLOPS
effective performance 1msec/step = 170nsec/day Rather Easy 5PFLOPS
effective performance 1sec/step = 200sec/day??? Difficult, but
important
Slide 27
Anton D. E. Shaw Research Special-purpose pipeline +
General-purpose core + Dedicated Network By decreasing
communication latency, it can achieve high sustained performance
even for small systems R. O. Dror et al., Proc. Supercomputing
2009, in USB memory.
Slide 28
MDGRAPE-4 Special-purpose computer for molecular dynamics
simulations Test bed for future HPC hardware FY2010-FY2012
System-on-Chip Accelerator Memory General-purpose processor Network
~4Tflops / chip