Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
State of ARM-based HPCLTD20-106
24 March 2020
Welcome!1. This is not our first rodeo…
a. Mont Blanc - https://www.montblanc-project.eu/wp-content/uploads/2017/12/UCHPC_Presentation_PDF_lw.pdf
b. Linaro Connect - http://connect.linaro.org.s3.amazonaws.com/sfo17/Presentations/SFO17-200K1.pdf
c. Linaro Connect - https://connect.linaro.org/resources/san19/san19-400k1/d. Arm - https://developer.arm.com/solutions/hpc
2. The question of whether Aarch64/Arm64 can do HPC is a resounding Yes!
Typical components of a HPC1. Common components.
a. As near identical configuration per node as possible.b. A method of interconnecting nodes.
2. A job scheduler.a. Slurm workload managerb. Univa grid enginec. ...and others or ways to parallelise across nodes.
3. CPU / RAM / Interconnect / StorageIs that enough?
Components1. Core volume/density.
a. We used to count the number of simultaneous processes by the number of physical CPUs.
i. In each node we look at number of CPUsii. The number of coresiii. The number of threads
1. Is threading intentionally disabled?iv. Is NUMA supported?v. Whether those CPUs are cache-coherent.
2. Levels of CacheL0 - Macro-op cacheL1 - for each coreL2 - for each cluster of coresL3 - for each cluster of CPUs
L1,L2,L3 Cache have separate Instruction and Data elements.
Chips● Arm v8.0-A (Advanced Neon, SIMD 32 x 128bit)
○ Ampere eMag 8180○ Cavium ThunderX○ Qualcomm Kryo
● Arm v8.1-A○ Marvell ThunderX2 (28core variant) - Astra Supercomputer (dual-socket)○ Marvell ThunderX2 (32core variant) - Isambard Supercomputer (dual-socket)
● Arm v8.2-A○ Arm NeoverseN1○ Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket)○ Huawei Kunpeng 920○ NVidia Carmel○ Ampere Altra (v8.2+)
● Arm v8.3-A (SIMD Complex Number rotation support and Nested Virtualisation support)○ Marvell ThunderX3 (v8.3+) 2020○ Huawei Kunpeng 930 (almost v8.4 + SVE) 2021
https://en.wikipedia.org/wiki/ARM_architecture
Chips● Arm v8.6-A (Neoverse N2 ‘Zeus’ to be used in the European Processor Initiative)
○ General Matrix Multiply (GEMM)○ Bfloat16 format support○ SIMD matrix manipulation instructions, BFDOT, BFMMLA, BFMLAL and BFCVT○ Enhancements for virtualization, system management and security
● Arm SVE2○ Fine-grained data-level parallelism
Support for v8.6-A and SVE2 to be in GCC 10 and LLVM CLANG 9Announced April 2019
https://en.wikipedia.org/wiki/ARM_architecture
RISC, CISC, ACCELERATOR● The ARM ISA is a RISC implementation
○ Do simple operations highly efficiently.○ Each operation takes one clock cycle, enables pipelining.
● A CISC implementation○ Do simple instructions like RISC but have additional complex instructions that take more
than one clock cycle. Pipelining is more cumbersome.
● Accelerators○ Do bespoke actions as quick as possible, even asynchronously.
● The Challenge,○ Can an ARM ISA extended with accelerator-style operations be as effective as a CISC +
plug-in Accelerator?
Interconnects● Between upto 128 cores there is ARM CMN600 - Coherent Mesh Network for single chassis
● Between chassis there are:○ PCIe○ CCIX○ CXL?○ Ares○ Tofu
● Network options○ InfiniBand - Low latency○ Ethernet
Adaptive Compute Acceleration
https://www.xilinx.com/products/silicon-devices/acap/versal-premium.html
Resilience
● ECC Memory● Dual power-supplies● Core fault sensing
● ...Containers?
Blending Containers● Containers are packaged environments to enable the easy execution of applications by
supplying its dependencies within.
● Multiple containers can work together as building blocks of a larger solution.
● Subject to operational requirements, containers can be built to run on a variety of platforms.○ From SBC to HPC!
● With the right sort of scheduler system and orchestration tool jobs become:○ Auto-built/tested○ Parallelised○ Flexible○ Scalable○ On-demand
Storage is still required...● DRAM is volatile● Virtual disks ephemeral● Diskless nodes
● Persistent storage is still needed:○ File systems
■ Ext4,lvm,xfs,zfs ○ Parallel file systems
■ Lustre○ Distributed storage
■ CEPH○ Media
■ Conventional disks■ SSD,nvme
ApplicationsWhat does HPC enable...
● 292 Libraries/Applications tested for Aarch64 - https://gitlab.com/arm-hpc/packages/-/wikis/home
● Weather prediction○ Although Scalable Probabilistic approximation might be more efficient…
https://advances.sciencemag.org/content/6/5/eaaw0961
● Molecular Dynamics○ GROMACS supports SIMD NEON operations○ https://redmine.gromacs.org/issues/2806 SIMD algorithms for ARM SVE scheduled for
2021.
● AI
All things Cloud...● IDC - Worldwide Server Market Revenue Declined 11.6% Year Over Year in the Second Quarter
of 2019 https://www.idc.com/getdoc.jsp?containerId=prUS45482519
● COVID-19 pandemic causes Stock Market falls of 20% (Mar.2020). https://www.wired.com/story/covid-19-spreads-listen-stock-market/
● Working remotely is now the norm.
● Scalable on-demand services brings Serverless Computing.
The Linaro Datacenter & Cloud Group (LDCG)● Common development center for the Arm
Server & Infrastructure ecosystem● Eliminates fragmentation, reduces cost
and accelerates time to market● Members can focus on innovation and
differentiated value-add● Working on core open-source software for
ARM servers○ Server architecture – UEFI/ACPI/ServerReady○ ARMv8 enablement & optimization○ Big Data, BigTop, Hadoop and Spark○ Cloud Infrastructure such as Kubernetes,
OpenStack and Ceph
Linaro Developer CloudEnterprise-class Arm Powered servers hosted in UK are available for development, test, CI and cloud deployments for VM and containers.
www.linaro.cloud
Lower deployment & management barriersLeverage the Linaro Developer Cloud and other services to develop cost-effective Cloud-integrated HPC development frameworks and generate reference implementations to accelerate
Member-driven with Advisory BoardMembers determine work completed by engineering resources while advisory board provides subject matter expertise on HPC requirements and guidance and feedback on ongoing HPC SIG strategic direction and roadmap
Driving datacenter-class, open-source HPC development on ArmIdentify and adopt standards to make HPC deployment on Arm a commercial imperative. Develop real-world use cases that reap the benefits of Arm while ensuring interoperability, modularization, orchestration
LDCG High Performance Computing (HPC) SIGCollaborative project building on the work of the Linaro Datacenter & Cloud Group
HPC
Functions-as-a-Service● Linaro HPC hardware being reconfigured towards a scalable environment.
○ A combination of OpenStack, K8S and OpenHPC.○ A testbed to verify combinations of heterogeneous ingredients for the optimal recipes.
● Service Consumers○ Send the service request and receive the service answer.○ The service consumer will be CPU,GPU,ISA,Accelerator agnostic!
If the equipment is billed as pay-per-use then it’s our challenge to ensure that Aarch64 solutions match a significant number of requests.
Thank youContinuing to accelerate deployment of your Arm-based solutions through collaboration