22
Copyright 2013 FUJITSU LIMITED Key Software Technologies or Future High Performance Computing

Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

Copyright 2013 FUJITSU LIMITED

Key Software Technologies for Future High Performance Computing

Page 2: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

Copyright 2013 FUJITSU LIMITED

Key software issues for future HPC

Scalability

Manageability

Power Efficiency

Productivity

1

Page 3: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

Copyright 2013 FUJITSU LIMITED

Key software issues for future HPC

Scalability

Manageability

Power Efficiency

Productivity

2

Page 4: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

Copyright 2013 FUJITSU LIMITED

Technologies for Scalability

Improve application parallelization

Reduce OS Jitter affection

Keep I/O performance scalability

Issues in this area:

Background: The performance are always top-priority on HPC application. Scalability is the most important for performance on massively parallel computing.

OSS

PRIMEHPC Series

PRIMEHPC OS

FEFS

System Manager

Job Manager

Main Storage Mgmt. Nodes

MPI Tools Compiler

3

Page 5: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

OSS

PRIMEHPC Series

FEFS

System Manager

Job Manager

Main Storage Mgmt. Nodes

MPI Tools Compiler

Copyright 2013 FUJITSU LIMITED

Technologies for Scalability

Improve application parallelization

Technology: VISIMPACT + Tofu Optimized MPI Easy and efficient inter-core Parallelization

automated by VISIMPACT

Threads-Processes hybrid parallelization with Tofu optimized MPI library.

Memory

CPU

Core

L2$

Process

Core

L2$

Memory

CPU

Core

Core

L2$

Process Inter-core

thread parallel

processing

w/o VISIMPACT w/ VISIMPACT

Process

PRIMEHPC OS

4

Page 6: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

OSS

PRIMEHPC Series

FEFS

System Manager

Job Manager

Main Storage Mgmt. Nodes

MPI Tools Compiler

Copyright 2013 FUJITSU LIMITED

Technologies for Scalability

Reduce OS Jitter affection

Technology: Tuned Linux OS for PRIMEHPC minimized negative effect of OS jitter ultimately by: Core-binding technology Deliberately selected and tuned system service

x86 cluster w/o TCS

x86 cluster w/ TCS

FX10 OS w/ TCS

TCS: Technical Computing Suite (Fujitsu’s System Software Product)

PRIMEHPC OS

5

Page 7: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

OSS

PRIMEHPC Series

PRIMEHPC OS

FEFS

System Manager

Job Manager

Main Storage Mgmt. Nodes

MPI Tools Compiler

Copyright 2013 FUJITSU LIMITED

Technologies for Scalability

Keep I/O performance scalability

Technology: FEFS(Fujitsu Exabyte File System) Lustre based scalable file system Supports up to 8 Exa bytes capacity Achieves superb performance on K computer

Write:965 GB/s Read:1,486 GB/s

0

200

400

600

800

1000

1200

1400

1600

0 500 1000 1500 2000 2500 3000

Thro

ughp

ut [

GB

/s]

Number of OSS

IOR Read (direct, file per proc)

0

200

400

600

800

1000

1200

0 500 1000 1500 2000 2500 3000

Thro

ughp

ut [

GB

/s]

Number of OSS

IOR Write (direct, file per proc)

Collaborative work with RIKEN 6

Page 8: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

Copyright 2013 FUJITSU LIMITED

Key software issues for future HPC

Scalability

Manageability

Power Efficiency

Productivity

7

Page 9: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

OSS

PRIMEHPC Series

PRIMEHPC OS

FEFS

System Manager

Job Manager

Main Storage Mgmt. Nodes

MPI Tools Compiler

Copyright 2013 FUJITSU LIMITED

Technologies for Manageability

Availability

Operability

Background: For system administrator, managing system is a key issue. We have already achieved nearly 100,000 nodes system on K computer.

Issues in this area:

8

Page 10: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

OSS

PRIMEHPC Series

PRIMEHPC OS

FEFS

System Manager

Job Manager

Main Storage Mgmt. Nodes

MPI Tools Compiler

Copyright 2013 FUJITSU LIMITED

Technologies for Manageability

Availability

Technology: Automatic Management Node failover Immediate failover by Hot Stand-by

Technology: Automatic MDS/OSS failover Monitored by TCS system manager

RAID1+0 RAID

FC multipath

Node failover

IB multirail

Active

Active

RAID6 RAID6

Active

Standby

IB SW

MDS: Active/Standby OSS Active/Active

IB SW

9

Page 11: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

OSS

PRIMEHPC Series

PRIMEHPC OS

FEFS

System Manager

Job Manager

Main Storage Mgmt. Nodes

MPI Tools Compiler

Copyright 2013 FUJITSU LIMITED

Technologies for Manageability

Operability

Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware / Software) Power Control / Monitoring Support PRIMEHPC / x86 Hybrid Cluster system Technology: Flexible cluster management provides various physical/logical partitioning.

Technology: QoS/Directory Quota on FEFS facilitates sharing global storage across multi cluster system.

10

Page 12: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

Copyright 2013 FUJITSU LIMITED

Key software issues for future HPC

Scalability

Manageability

Power Efficiency

Productivity

11

Page 13: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

OSS

PRIMEHPC Series

PRIMEHPC OS

FEFS

System Manager

Job Manager

Main Storage Mgmt. Nodes

MPI Tools Compiler

Copyright 2013 FUJITSU LIMITED

Technologies for Power Efficiency

System-wide Power Management

Maximize Resource utilization

Maximize application efficiency

Background: Customer Requirement: Increasing “actual” system throughput Keeping total power at target value

Issues in this area:

“Actual” Throughput

Total system power Power Efficiency =

12

Page 14: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

OSS

PRIMEHPC Series

FEFS

System Manager

Job Manager

Main Storage Mgmt. Nodes

MPI Tools Compiler

Copyright 2013 FUJITSU LIMITED

Technologies for Power Efficiency

Maximize application efficiency

Technology: Optimized OS and languages achieves good efficiency for many applications.

Table is provided by Dr. Minami of RIKEN

PRIMEHPC OS

13

Page 15: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

OSS

PRIMEHPC Series

PRIMEHPC OS

FEFS

System Manager

Job Manager

Main Storage Mgmt. Nodes

MPI Tools Compiler

Copyright 2013 FUJITSU LIMITED

Technologies for Power Efficiency

Maximize Resource Utilization

Technology: Various job allocation method to increase node/core utilization even on Torus system Torus mode/Mesh mode allocation Node simplex/share allocation Heterogeneous hybrid parallel job allocation

Technology: Centric Power Control helps to integrate center-wide power capping with: Interface to control power of nodes or storages Power consumption monitoring Control power saving mode w/ job manager

System-wide Power Management

14

Page 16: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

Copyright 2013 FUJITSU LIMITED

Key software issues for future HPC

Scalability

Manageability

Power Efficiency

Productivity

15

Page 17: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

OSS

PRIMEHPC Series

PRIMEHPC OS

FEFS

System Manager

Job Manager

Main Storage Mgmt. Nodes

MPI Tools Compiler

Copyright 2013 FUJITSU LIMITED

Technologies for Productivity

Portability

Tuning and Debugging

Background: Because of increasing complexity of node and network architecture, developing applications become more difficult.

Issues in this area:

16

Page 18: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

OSS

PRIMEHPC Series

PRIMEHPC OS

FEFS

System Manager

Job Manager

Main Storage Mgmt. Nodes

MPI Tools Compiler

Copyright 2013 FUJITSU LIMITED

Technologies for Productivity

Tuning and Debugging

Technology: Supports world’s standard debugger (DDT) Profiler - GUI Detailed PA information - Optimize communication on Tofu Rank Mapping Optimization (RMATT)

17

Page 19: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

OSS

PRIMEHPC Series

PRIMEHPC OS

FEFS

System Manager

Job Manager

Main Storage Mgmt. Nodes

Tools MPI Compiler

Copyright 2013 FUJITSU LIMITED

Technologies for Productivity

Portability

Technology: Continues to supports: The latest international standards

Fortran 2008, C 11, C++ 11 De facto standards

GNU C/C++ extensions, OpenMP 4.0, MPI 3.0

Technology: Supports Generic Linux OS POSIX compliant Linux and generic libraries

Technology: Compatibility with K computer Binary compatibility Same Architecture applied

1cpu/node, VISIMPACT, Tofu interconnect

18

Page 20: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

Copyright 2013 FUJITSU LIMITED

Now, we are ready for 100 PFlops!

What’s next?

19

Page 21: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

Copyright 2013 FUJITSU LIMITED

Activities for Exascale Computing

App. review

FS projects

HPCI strategic applications program

Development National projects

Exa-system development project

2011 2012 2013 2014 2015 2016 2017 2018 2019

Roadmap of Japan’s National Project

Participate co-design for Exascale System software

Light-weight Micro kernel next to Linux

File I/O performance improvement

Operation of K computer Development

20

Page 22: Key Software Technologies for Future High Performance ... · Technology: Centric Management provides single system image for: System Installation / Update Node status Monitoring (Hardware

Copyright 2012 FUJITSU LIMITED 21