29
1 www.esi-group.com Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany Copyright © ESI Group, 2016. All rights reserved. www.esi-group.com Scalability Performance Analysis and Tuning on OpenFOAM Pak Lui – HPC Advisory Council 4 th OpenFOAM User Conference 2016 – October 11-12 - Cologne, Germany

Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

  • Upload
    others

  • View
    31

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

1www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, GermanyCopyright © ESI Group, 2016. All rights reserved.

www.esi-group.com

Scalability Performance Analysis and Tuning on OpenFOAM

Pak Lui – HPC Advisory Council

4th OpenFOAM User Conference 2016 – October 11-12 - Cologne, Germany

Page 2: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

2www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

Agenda

• Introduction to HPC Advisory Council

• Benchmark Configurations

• Performance Benchmark Testing

• Results and MPI Profiles

• Summary

• Q&A / For More Information

Page 3: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

3www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

Mission Statement

The HPC Advisory Council

• World-wide HPC non-profit organization (420+ members)

• Bridges the gap between HPC usage and its potential

• Provides best practices and a support/development center

• Explores future technologies and future developments

• Leading edge solutions and technology demonstrations

Page 4: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

4www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

HPC Advisory Council Members

Page 5: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

5www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

HPCAC Centers

HPC ADVISORY COUNCIL CENTERS

HPCAC HQ

AUSTIN

SWISS

(CSCS)CHINA

Page 6: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

6www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

HPC Advisory Council HPC CenterDell™ PowerEdge™ R730

32-node cluster

HP Cluster Platform 3000SL

16-node clusterHP Proliant XL230a

Gen9 10-node cluster

Dell™ PowerEdge™ R815

11-node cluster

Dell™ PowerEdge™ C6145

6-node cluster

Dell™ PowerEdge™ M610

38-node cluster

Dell™ PowerEdge™ C6100

4-node cluster

White-box InfiniBand-based Storage (Lustre)

Dell™ PowerEdge™

R720xd/R720 32-node cluster

Dell PowerVault MD3420

Dell PowerVault MD3460

InfiniBand

Storage

(Lustre)

Page 7: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

7www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

• Lattice QCD

• LAMMPS

• LS-DYNA

• miniFE

• MILC

• MSC Nastran

• MR Bayes

• MM5

• MPQC

• NAMD

• Nekbone

• NEMO

• NWChem

• Octopus

• OpenAtom

• OpenFOAM

• OpenMX

• OptiStruct

• PARATEC

• PFA

• PFLOTRAN

• Quantum ESPRESSO

• RADIOSS

• SNAP

• SPECFEM3D

• STAR-CCM+

• STAR-CD

• VASP

• VSP

• WRF

158 Applications Best Practices Published

• Abaqus

• ABySS

• AcuSolve

• Amber

• AMG

• AMR

• ANSYS CFX

• ANSYS FLUENT

• ANSYS Mechanical

• BQCD

• CAM-SE

• CCSM

• CESM

• COSMO

• CP2K

• CPMD

• Dacapo

• Desmond

• DL-POLY

• Eclipse

• FLOW-3D

• GADGET-2

• Graph500

• GROMACS

• Himeno

• HIT3D

• HOOMD-blue

• HPCC

• HPCG

• HYCOM

• ICON

App

App

App

App

App

App App

App

App

App

For more information, visit: http://www.hpcadvisorycouncil.com/best_practices.php

Page 8: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

8www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

University Award Program

• University award program‣ Universities / individuals are encouraged to submit proposals for advanced research

• Selected proposal will be provided with:‣ Exclusive computation time on the HPC Advisory Council’s Compute Center‣ Invitation to present in one of the HPC Advisory Council’s worldwide workshops‣ Publication of the research results on the HPC Advisory Council website

• 2010 award winner is Dr. Xiangqian Hu, Duke University‣ Topic: “Massively Parallel Quantum Mechanical Simulations for Liquid Water”

• 2011 award winner is Dr. Marco Aldinucci, University of Torino‣ Topic: “Effective Streaming on Multi-core by Way of the FastFlow Framework’

• 2012 award winner is Jacob Nelson, University of Washington‣ “Runtime Support for Sparse Graph Applications”

• 2013 award winner is Antonis Karalis‣ Topic: “Music Production using HPC”

• 2014 award winner is Antonis Karalis‣ Topic: “Music Production using HPC”

• 2015 award winner is Christian Kniep‣ Topic: Dockers

• To submit a proposal – please check the HPC Advisory Council web site

Page 9: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

9www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

Student Cluster Competition

HPCAC - ISC’16

• University-based teams to compete and demonstrate the incredible capabilities of state-of- the-art HPC systems and applications on the International Super Computing HPC (ISC HPC) show-floor

• The Student Cluster Challenge is designed to introduce the next generation of students to the high performance computing world and community

• Submission for Participation in ISC’ 17 SCC opens now

Page 10: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

10www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

ISC'16 – Student Cluster Competition Teams

Page 11: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

11www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

Getting Ready to 2017 Student Cluster Competition

Page 12: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

12www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

Introduction

2016 HPC Advisory Council Conferences

• HPC Advisory Council (HPCAC)

‣ 429+ members, http://www.hpcadvisorycouncil.com/

‣ Application best practices, case studies

‣ Benchmarking center with remote access for users

‣ World-wide workshops

‣ Value add for your customers to stay up to date andin tune to HPC market

• 2016 Conferences

‣ USA (Stanford University) – February 24-25

‣ Switzerland (CSCS) – March 21-23

‣ Spain (BSC) – September 21

‣ China (HPC China) – October 26

• For more information

‣ www.hpcadvisorycouncil.com

[email protected]

Page 13: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

13www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

Objectives

• The following was done to provide best practices‣ OpenFOAM performance benchmarking

‣ Interconnect performance comparisons

‣ MPI performance comparison

‣ Understanding OpenFOAM communication patterns

• The presented results will demonstrate ‣ The scalability of the compute environment to provide nearly linear

application scalability

‣ The capability of OpenFOAM to achieve scalable productivity

Page 14: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

14www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

Dell PowerEdge R730 32-node (896-core) “Thor” cluster

Test Cluster Configuration 1

• Dell PowerEdge R730 32-node (896-core) “Thor” cluster‣ Dual-Socket 14-Core Intel E5-2697v3 @ 2.60 GHz CPUs ‣ BIOS: Maximum Performance, Home Snoop‣ Memory: 64GB memory, DDR4 2133 MHz‣ OS: RHEL 6.5, MLNX_OFED_LINUX-3.2 InfiniBand SW stack‣ Hard Drives: 2x 1TB 7.2 RPM SATA 2.5”

• Mellanox ConnectX-4 EDR 100Gb/s InfiniBand Adapters• Mellanox Switch-IB SB7700 36-port EDR 100Gb/s InfiniBand Switch• Dell InfiniBand-Based Lustre Storage based on Dell PowerVault MD3460 and

MD3420‣ Based on Intel Enterprise Edition for Lustre (IEEL) software version 2.1

• NVIDIA Tesla K80 GPUs• MPI: Mellanox HPC-X v1.6.392 (based on Open MPI 1.10)• Application: OpenFOAM 2.2.2• Benchmark:

‣ Automotive benchmark – 670 Million elements, DP, pimpleFoam solver

Page 15: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

15www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

pimpleFoam Solver – Interconnect Comparisons

OpenFOAM Performance

• EDR InfiniBand delivers better application performance‣ EDR IB provides 13% higher performance than FDR IB

‣ Perf benefit due to concentration of large messages of transfers

‣ EDR IB uses new driver for enhanced scalability compared to FDR IB

5%

13%

Higher is betterNumber of Nodes

Page 16: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

16www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

pimpleFoam Solver - Number of MPI Calls

OpenFOAM Profiling

• The pimpleFOAM solver is the transient solver for incompressible flow

• Mostly in non-blocking and collective comm, plus other MPI operations

• MPI_Waitall, MPI_Allreduce are used mostly

• MPI_Waitall (33%), MPI_Allreduce (37% each) at 32 nodes/ 1K cores

Page 17: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

17www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

pimpleFoam Solver - % MPI Time

OpenFOAM Profiling

• MPI profiling reports large consumption in large messages‣ MPI_Probe and MPI_Isend occurred in large messages

• Interconnect with higher large-message bandwidth would be advantageous for the pimpleFoam performance

Page 18: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

18www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

pimpleFoam Solver - MPI Tuning

OpenFOAM Performance

• Transport method provide significant advantage on larger core counts

• Mellanox HPC-X shows 10% higher performance than Intel MPI at 32 nodes‣ HPC-X can use MXM UD or RC for transport and hcoll for collective offload

‣ Intel MPI can use OFA or DAPL as the transport

10%

Higher is better EDR InfiniBand

Page 19: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

19www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

Dell PowerEdge R730 32-node (1024-core) “Thor” cluster

Test Cluster Configuration 2

• Dell PowerEdge R730 32-node (1024-core) “Thor” cluster‣ Dual-Socket 16-Core Intel E5-2697Av4 @ 2.60 GHz CPUs ‣ BIOS: Maximum Performance, Home Snoop‣ Memory: 256GB memory, DDR4 2400 MHz‣ OS: RHEL 7.2, MLNX_OFED_LINUX-3.3-1.0.4.0 InfiniBand SW stack‣ Hard Drives: 2x 1TB 7.2 RPM SATA 2.5”

• Mellanox ConnectX-4 EDR 100Gb/s InfiniBand Adapters• Mellanox Switch-IB SB7700 36-port EDR 100Gb/s InfiniBand Switch• Intel Omni-Path 100Gb/s Host Fabric Adapter and Switch• Dell InfiniBand-Based Lustre Storage based on Dell PowerVault MD3460 and MD3420

‣ Based on Intel Enterprise Edition for Lustre (IEEL) software version 3.0• NVIDIA Tesla K80 GPUs• MPI: Mellanox HPC-X v1.6.392 (based on Open MPI 1.10)• Application: OpenFOAM 2.2.2• Benchmark:

‣ Automotive benchmark – 670 Million elements, DP, pimpleFoam solver‣ Automotive benchmark 2 – 85 Million elements, DP, simpleFoam solver

Page 20: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

20www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

simpleFoam Solver - MPI Tuning

OpenFOAM Performance

• SimpleFOAM (Semi-Implicit Method for Pressure-Linked Equations) ‣ allows to couple the Navier-Stokes equations

• Mellanox HPC-X shows 13% higher performance than Open MPI at 32 nodes‣ HPC-X is based on Open MPI, with additional libraries for support offloads

‣ HPC-X uses MXM UD transport for messaging and hcoll for collective offload

‣ Communication % of overall runtime drops from 34% (Open MPI) to 24% (HPC-X)

13%

Higher is better EDR InfiniBandNumber of Nodes

Page 21: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

21www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

simpleFoam Solver – Time spent in MPI Calls

OpenFOAM Profiling

• Mostly in non-blocking point-to-point communication and collective ops‣ MPI_Waitall (33.8% MPI, 8% wall)

‣ MPI_Probe (25% MPI, 6% wall)

‣ MPI_Allreduce (19% MPI, 5% wall)

Page 22: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

22www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

pimpleFoam Solver – Interconnect Comparisons

OpenFOAM Performance

• EDR IB provides an advantage at scale compared to OPA

• EDR IB outperforms Omni-Path by nearly 10% on the pimpleFoam simulation

• MPI runtime options used:‣ EDR IB: -bind-to core -mca coll_hcoll_enable 1 -x MXM_TLS=ud,shm,self -x

HCOLL_ENABLE_MCAST_ALL=1 -x HCOLL_CONTEXT_CACHE_ENABLE=1 -x MALLOC_MMAP_MAX_=0 -x MALLOC_TRIM_THRESHOLD_=-1

‣ OPA: -genv I_MPI_PIN on -PSM2

10%

Higher is better

Page 23: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

23www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

pimpleFoam Solver – Interconnect Comparisons

OpenFOAM Profiling

• Mostly in non-blocking pt-to-pt communication, plus other MPI operations

• EDR appears to be lower in MPI communication time‣ EDR IB: 12.58% wall MPI_Allreduceat 1K cores

‣ OPA: 14.49% wall in MPI_Allreduceat 1K cores

Mellanox EDR InfiniBand Intel Omni-Path

Page 24: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

24www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

128-node (3584-core) “Orion” cluster

Test Cluster Configuration 3

• HP ProLiant DL360 Gen9 128-node “Orion” cluster‣ Dual-Socket 14-core Intel E5-2697 v3 @ 2.60 GHz CPUs

‣ Memory: 256GB memory, DDR3 2133 MHz

‣ OS: RHEL 7.2, MLNX_OFED_LINUX-3.2-1.0.1.1 InfiniBand SW stack

• Mellanox ConnectX-4 EDR InfiniBand adapters

• Mellanox SwitchX-IB 2 EDR InfiniBand switches

• MPI: HPC-X v1.6.392 (based on Open MPI 1.10)

• Compilers: Intel Parallel Studio XE 2015

• Application: OpenFOAM 3.1

• Benchmark dataset:‣ Lid Driven Cavity Flow - 1 million elements, DP, 2D icoFoam solver for

laminar, isothermal, incompressible flow

Page 25: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

25www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

icoFoam Solver – Open MPI Tuning

OpenFOAM Performance

• HPC-X MPI library provides switch-based collective offload with network

• Switch based MPI collective operations in the network (MPI_Allreduce & MPI_Barrier)‣ Seen up to by 75% of performance benefit on a 64-node simulation

• Benefit of FCA depends on the usage of MPI collective operations in solvers

Higher is better

75%

Page 26: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

26www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

icoFoam Solver – Time spent in MPI Calls

OpenFOAM Profiling

• IcoFoam‣ solves incompressible laminar Navier-Stokes equations using the PISO algorithm

• Mostly involves in collective communication

• MPI_Allreduce at 16 bytes is almost used exclusively

• MPI_Allreduce (85%), MPI_Waitall (5%)

Page 27: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

27www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

icoFoam Solver - MPI Comparisons

OpenFOAM Performance

• Tuned HPC-X MPI library has significant performance advantage at scale

• HPC-X (SHArP) delivers 120% of greater scalability than Intel MPI at 64 nodes

• SHArP - Scalable Hierarchical Aggregation Protocol

• Benefit is mostly due to MPI collective offloads in HPC-X‣ Offloading the MPI_Allreduce to the networking hardware

Higher is better

120%

Page 28: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

28www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

OpenFOAM Summary

• OpenFOAM scales well to thousands of cores with suitable setup

• OpenFOAM solvers scales well for the solvers/cases being tested

• Summary:‣ EDR IB vs FDR IB:

• EDR IB provides 13% higher performance than FDR IB on 32 nodes

‣ EDR vs OPA: • IB provide nearly 10% higher performance with pimpleFoam simulation on 32

nodes

‣ Network Offload: • 75% of performance gain by using offload in HPC-X on 64 nodes

‣ MPI: • HPC-X shows 13% higher performance than Intel MPI on simpleFoam solver

• HPC-X Provided 120% of scalability than the Intel MPI on icoFoamsolver

Page 29: Scalability Performance Analysis and Tuning on OpenFOAM ... · • AMR • ANSYS CFX • ANSYS FLUENT • ANSYS Mechanical • BQCD • CAM-SE • CCSM • CESM • COSMO • CP2K

29www.esi-group.com

Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany

• Web: www.hpcadvisorycouncil.com

• Email: [email protected]

• Facebook: http://www.facebook.com/HPCAdvisoryCouncil

• Twitter: www.twitter.com/hpccouncil

• YouTube: www.youtube.com/user/hpcadvisorycouncil

All trademarks are property of their respective owners. All information is provided “As -Is” without any kind of warranty. The HPC Advisory Council makes no representation to the accuracy and completeness of the information contained herein. HPC Advisory Council undertakes no duty and assumes no obligation to update or correct any information presented herein

Contact Us