18
State of ARM-based HPC LTD20-106 24 March 2020

State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

State of ARM-based HPCLTD20-106

24 March 2020

Page 2: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

Welcome!1. This is not our first rodeo…

a. Mont Blanc - https://www.montblanc-project.eu/wp-content/uploads/2017/12/UCHPC_Presentation_PDF_lw.pdf

b. Linaro Connect - http://connect.linaro.org.s3.amazonaws.com/sfo17/Presentations/SFO17-200K1.pdf

c. Linaro Connect - https://connect.linaro.org/resources/san19/san19-400k1/d. Arm - https://developer.arm.com/solutions/hpc

2. The question of whether Aarch64/Arm64 can do HPC is a resounding Yes!

Page 3: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

Typical components of a HPC1. Common components.

a. As near identical configuration per node as possible.b. A method of interconnecting nodes.

2. A job scheduler.a. Slurm workload managerb. Univa grid enginec. ...and others or ways to parallelise across nodes.

3. CPU / RAM / Interconnect / StorageIs that enough?

Page 4: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

Components1. Core volume/density.

a. We used to count the number of simultaneous processes by the number of physical CPUs.

i. In each node we look at number of CPUsii. The number of coresiii. The number of threads

1. Is threading intentionally disabled?iv. Is NUMA supported?v. Whether those CPUs are cache-coherent.

2. Levels of CacheL0 - Macro-op cacheL1 - for each coreL2 - for each cluster of coresL3 - for each cluster of CPUs

L1,L2,L3 Cache have separate Instruction and Data elements.

Page 5: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

Chips● Arm v8.0-A (Advanced Neon, SIMD 32 x 128bit)

○ Ampere eMag 8180○ Cavium ThunderX○ Qualcomm Kryo

● Arm v8.1-A○ Marvell ThunderX2 (28core variant) - Astra Supercomputer (dual-socket)○ Marvell ThunderX2 (32core variant) - Isambard Supercomputer (dual-socket)

● Arm v8.2-A○ Arm NeoverseN1○ Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket)○ Huawei Kunpeng 920○ NVidia Carmel○ Ampere Altra (v8.2+)

● Arm v8.3-A (SIMD Complex Number rotation support and Nested Virtualisation support)○ Marvell ThunderX3 (v8.3+) 2020○ Huawei Kunpeng 930 (almost v8.4 + SVE) 2021

https://en.wikipedia.org/wiki/ARM_architecture

Page 6: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

Chips● Arm v8.6-A (Neoverse N2 ‘Zeus’ to be used in the European Processor Initiative)

○ General Matrix Multiply (GEMM)○ Bfloat16 format support○ SIMD matrix manipulation instructions, BFDOT, BFMMLA, BFMLAL and BFCVT○ Enhancements for virtualization, system management and security

● Arm SVE2○ Fine-grained data-level parallelism

Support for v8.6-A and SVE2 to be in GCC 10 and LLVM CLANG 9Announced April 2019

https://en.wikipedia.org/wiki/ARM_architecture

Page 7: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

RISC, CISC, ACCELERATOR● The ARM ISA is a RISC implementation

○ Do simple operations highly efficiently.○ Each operation takes one clock cycle, enables pipelining.

● A CISC implementation○ Do simple instructions like RISC but have additional complex instructions that take more

than one clock cycle. Pipelining is more cumbersome.

● Accelerators○ Do bespoke actions as quick as possible, even asynchronously.

● The Challenge,○ Can an ARM ISA extended with accelerator-style operations be as effective as a CISC +

plug-in Accelerator?

Page 8: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

Interconnects● Between upto 128 cores there is ARM CMN600 - Coherent Mesh Network for single chassis

● Between chassis there are:○ PCIe○ CCIX○ CXL?○ Ares○ Tofu

● Network options○ InfiniBand - Low latency○ Ethernet

Page 9: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

Adaptive Compute Acceleration

https://www.xilinx.com/products/silicon-devices/acap/versal-premium.html

Page 10: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

Resilience

● ECC Memory● Dual power-supplies● Core fault sensing

● ...Containers?

Page 11: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

Blending Containers● Containers are packaged environments to enable the easy execution of applications by

supplying its dependencies within.

● Multiple containers can work together as building blocks of a larger solution.

● Subject to operational requirements, containers can be built to run on a variety of platforms.○ From SBC to HPC!

● With the right sort of scheduler system and orchestration tool jobs become:○ Auto-built/tested○ Parallelised○ Flexible○ Scalable○ On-demand

Page 12: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

Storage is still required...● DRAM is volatile● Virtual disks ephemeral● Diskless nodes

● Persistent storage is still needed:○ File systems

■ Ext4,lvm,xfs,zfs ○ Parallel file systems

■ Lustre○ Distributed storage

■ CEPH○ Media

■ Conventional disks■ SSD,nvme

Page 13: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

ApplicationsWhat does HPC enable...

● 292 Libraries/Applications tested for Aarch64 - https://gitlab.com/arm-hpc/packages/-/wikis/home

● Weather prediction○ Although Scalable Probabilistic approximation might be more efficient…

https://advances.sciencemag.org/content/6/5/eaaw0961

● Molecular Dynamics○ GROMACS supports SIMD NEON operations○ https://redmine.gromacs.org/issues/2806 SIMD algorithms for ARM SVE scheduled for

2021.

● AI

Page 14: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

All things Cloud...● IDC - Worldwide Server Market Revenue Declined 11.6% Year Over Year in the Second Quarter

of 2019 https://www.idc.com/getdoc.jsp?containerId=prUS45482519

● COVID-19 pandemic causes Stock Market falls of 20% (Mar.2020). https://www.wired.com/story/covid-19-spreads-listen-stock-market/

● Working remotely is now the norm.

● Scalable on-demand services brings Serverless Computing.

Page 15: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

The Linaro Datacenter & Cloud Group (LDCG)● Common development center for the Arm

Server & Infrastructure ecosystem● Eliminates fragmentation, reduces cost

and accelerates time to market● Members can focus on innovation and

differentiated value-add● Working on core open-source software for

ARM servers○ Server architecture – UEFI/ACPI/ServerReady○ ARMv8 enablement & optimization○ Big Data, BigTop, Hadoop and Spark○ Cloud Infrastructure such as Kubernetes,

OpenStack and Ceph

Linaro Developer CloudEnterprise-class Arm Powered servers hosted in UK are available for development, test, CI and cloud deployments for VM and containers.

www.linaro.cloud

Page 16: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

Lower deployment & management barriersLeverage the Linaro Developer Cloud and other services to develop cost-effective Cloud-integrated HPC development frameworks and generate reference implementations to accelerate

Member-driven with Advisory BoardMembers determine work completed by engineering resources while advisory board provides subject matter expertise on HPC requirements and guidance and feedback on ongoing HPC SIG strategic direction and roadmap

Driving datacenter-class, open-source HPC development on ArmIdentify and adopt standards to make HPC deployment on Arm a commercial imperative. Develop real-world use cases that reap the benefits of Arm while ensuring interoperability, modularization, orchestration

LDCG High Performance Computing (HPC) SIGCollaborative project building on the work of the Linaro Datacenter & Cloud Group

HPC

Page 17: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

Functions-as-a-Service● Linaro HPC hardware being reconfigured towards a scalable environment.

○ A combination of OpenStack, K8S and OpenHPC.○ A testbed to verify combinations of heterogeneous ingredients for the optimal recipes.

● Service Consumers○ Send the service request and receive the service answer.○ The service consumer will be CPU,GPU,ISA,Accelerator agnostic!

If the equipment is billed as pay-per-use then it’s our challenge to ensure that Aarch64 solutions match a significant number of requests.

Page 18: State of ARM-based HPC...Arm v8.2-A Arm NeoverseN1 Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket ) Huawei Kunpeng 920 NVidia Carmel Ampere Altra (v8.2+) Arm v8.3-A …

Thank youContinuing to accelerate deployment of your Arm-based solutions through collaboration

[email protected]