Upload
others
View
31
Download
0
Embed Size (px)
Citation preview
1www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, GermanyCopyright © ESI Group, 2016. All rights reserved.
www.esi-group.com
Scalability Performance Analysis and Tuning on OpenFOAM
Pak Lui – HPC Advisory Council
4th OpenFOAM User Conference 2016 – October 11-12 - Cologne, Germany
2www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
Agenda
• Introduction to HPC Advisory Council
• Benchmark Configurations
• Performance Benchmark Testing
• Results and MPI Profiles
• Summary
• Q&A / For More Information
3www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
Mission Statement
The HPC Advisory Council
• World-wide HPC non-profit organization (420+ members)
• Bridges the gap between HPC usage and its potential
• Provides best practices and a support/development center
• Explores future technologies and future developments
• Leading edge solutions and technology demonstrations
4www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
HPC Advisory Council Members
5www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
HPCAC Centers
HPC ADVISORY COUNCIL CENTERS
HPCAC HQ
AUSTIN
SWISS
(CSCS)CHINA
6www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
HPC Advisory Council HPC CenterDell™ PowerEdge™ R730
32-node cluster
HP Cluster Platform 3000SL
16-node clusterHP Proliant XL230a
Gen9 10-node cluster
Dell™ PowerEdge™ R815
11-node cluster
Dell™ PowerEdge™ C6145
6-node cluster
Dell™ PowerEdge™ M610
38-node cluster
Dell™ PowerEdge™ C6100
4-node cluster
White-box InfiniBand-based Storage (Lustre)
Dell™ PowerEdge™
R720xd/R720 32-node cluster
Dell PowerVault MD3420
Dell PowerVault MD3460
InfiniBand
Storage
(Lustre)
7www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
• Lattice QCD
• LAMMPS
• LS-DYNA
• miniFE
• MILC
• MSC Nastran
• MR Bayes
• MM5
• MPQC
• NAMD
• Nekbone
• NEMO
• NWChem
• Octopus
• OpenAtom
• OpenFOAM
• OpenMX
• OptiStruct
• PARATEC
• PFA
• PFLOTRAN
• Quantum ESPRESSO
• RADIOSS
• SNAP
• SPECFEM3D
• STAR-CCM+
• STAR-CD
• VASP
• VSP
• WRF
158 Applications Best Practices Published
• Abaqus
• ABySS
• AcuSolve
• Amber
• AMG
• AMR
• ANSYS CFX
• ANSYS FLUENT
• ANSYS Mechanical
• BQCD
• CAM-SE
• CCSM
• CESM
• COSMO
• CP2K
• CPMD
• Dacapo
• Desmond
• DL-POLY
• Eclipse
• FLOW-3D
• GADGET-2
• Graph500
• GROMACS
• Himeno
• HIT3D
• HOOMD-blue
• HPCC
• HPCG
• HYCOM
• ICON
App
App
App
App
App
App App
App
App
App
For more information, visit: http://www.hpcadvisorycouncil.com/best_practices.php
8www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
University Award Program
• University award program‣ Universities / individuals are encouraged to submit proposals for advanced research
• Selected proposal will be provided with:‣ Exclusive computation time on the HPC Advisory Council’s Compute Center‣ Invitation to present in one of the HPC Advisory Council’s worldwide workshops‣ Publication of the research results on the HPC Advisory Council website
• 2010 award winner is Dr. Xiangqian Hu, Duke University‣ Topic: “Massively Parallel Quantum Mechanical Simulations for Liquid Water”
• 2011 award winner is Dr. Marco Aldinucci, University of Torino‣ Topic: “Effective Streaming on Multi-core by Way of the FastFlow Framework’
• 2012 award winner is Jacob Nelson, University of Washington‣ “Runtime Support for Sparse Graph Applications”
• 2013 award winner is Antonis Karalis‣ Topic: “Music Production using HPC”
• 2014 award winner is Antonis Karalis‣ Topic: “Music Production using HPC”
• 2015 award winner is Christian Kniep‣ Topic: Dockers
• To submit a proposal – please check the HPC Advisory Council web site
9www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
Student Cluster Competition
HPCAC - ISC’16
• University-based teams to compete and demonstrate the incredible capabilities of state-of- the-art HPC systems and applications on the International Super Computing HPC (ISC HPC) show-floor
• The Student Cluster Challenge is designed to introduce the next generation of students to the high performance computing world and community
• Submission for Participation in ISC’ 17 SCC opens now
10www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
ISC'16 – Student Cluster Competition Teams
11www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
Getting Ready to 2017 Student Cluster Competition
12www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
Introduction
2016 HPC Advisory Council Conferences
• HPC Advisory Council (HPCAC)
‣ 429+ members, http://www.hpcadvisorycouncil.com/
‣ Application best practices, case studies
‣ Benchmarking center with remote access for users
‣ World-wide workshops
‣ Value add for your customers to stay up to date andin tune to HPC market
• 2016 Conferences
‣ USA (Stanford University) – February 24-25
‣ Switzerland (CSCS) – March 21-23
‣ Spain (BSC) – September 21
‣ China (HPC China) – October 26
• For more information
‣ www.hpcadvisorycouncil.com
13www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
Objectives
• The following was done to provide best practices‣ OpenFOAM performance benchmarking
‣ Interconnect performance comparisons
‣ MPI performance comparison
‣ Understanding OpenFOAM communication patterns
• The presented results will demonstrate ‣ The scalability of the compute environment to provide nearly linear
application scalability
‣ The capability of OpenFOAM to achieve scalable productivity
14www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
Dell PowerEdge R730 32-node (896-core) “Thor” cluster
Test Cluster Configuration 1
• Dell PowerEdge R730 32-node (896-core) “Thor” cluster‣ Dual-Socket 14-Core Intel E5-2697v3 @ 2.60 GHz CPUs ‣ BIOS: Maximum Performance, Home Snoop‣ Memory: 64GB memory, DDR4 2133 MHz‣ OS: RHEL 6.5, MLNX_OFED_LINUX-3.2 InfiniBand SW stack‣ Hard Drives: 2x 1TB 7.2 RPM SATA 2.5”
• Mellanox ConnectX-4 EDR 100Gb/s InfiniBand Adapters• Mellanox Switch-IB SB7700 36-port EDR 100Gb/s InfiniBand Switch• Dell InfiniBand-Based Lustre Storage based on Dell PowerVault MD3460 and
MD3420‣ Based on Intel Enterprise Edition for Lustre (IEEL) software version 2.1
• NVIDIA Tesla K80 GPUs• MPI: Mellanox HPC-X v1.6.392 (based on Open MPI 1.10)• Application: OpenFOAM 2.2.2• Benchmark:
‣ Automotive benchmark – 670 Million elements, DP, pimpleFoam solver
15www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
pimpleFoam Solver – Interconnect Comparisons
OpenFOAM Performance
• EDR InfiniBand delivers better application performance‣ EDR IB provides 13% higher performance than FDR IB
‣ Perf benefit due to concentration of large messages of transfers
‣ EDR IB uses new driver for enhanced scalability compared to FDR IB
5%
13%
Higher is betterNumber of Nodes
16www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
pimpleFoam Solver - Number of MPI Calls
OpenFOAM Profiling
• The pimpleFOAM solver is the transient solver for incompressible flow
• Mostly in non-blocking and collective comm, plus other MPI operations
• MPI_Waitall, MPI_Allreduce are used mostly
• MPI_Waitall (33%), MPI_Allreduce (37% each) at 32 nodes/ 1K cores
17www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
pimpleFoam Solver - % MPI Time
OpenFOAM Profiling
• MPI profiling reports large consumption in large messages‣ MPI_Probe and MPI_Isend occurred in large messages
• Interconnect with higher large-message bandwidth would be advantageous for the pimpleFoam performance
18www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
pimpleFoam Solver - MPI Tuning
OpenFOAM Performance
• Transport method provide significant advantage on larger core counts
• Mellanox HPC-X shows 10% higher performance than Intel MPI at 32 nodes‣ HPC-X can use MXM UD or RC for transport and hcoll for collective offload
‣ Intel MPI can use OFA or DAPL as the transport
10%
Higher is better EDR InfiniBand
19www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
Dell PowerEdge R730 32-node (1024-core) “Thor” cluster
Test Cluster Configuration 2
• Dell PowerEdge R730 32-node (1024-core) “Thor” cluster‣ Dual-Socket 16-Core Intel E5-2697Av4 @ 2.60 GHz CPUs ‣ BIOS: Maximum Performance, Home Snoop‣ Memory: 256GB memory, DDR4 2400 MHz‣ OS: RHEL 7.2, MLNX_OFED_LINUX-3.3-1.0.4.0 InfiniBand SW stack‣ Hard Drives: 2x 1TB 7.2 RPM SATA 2.5”
• Mellanox ConnectX-4 EDR 100Gb/s InfiniBand Adapters• Mellanox Switch-IB SB7700 36-port EDR 100Gb/s InfiniBand Switch• Intel Omni-Path 100Gb/s Host Fabric Adapter and Switch• Dell InfiniBand-Based Lustre Storage based on Dell PowerVault MD3460 and MD3420
‣ Based on Intel Enterprise Edition for Lustre (IEEL) software version 3.0• NVIDIA Tesla K80 GPUs• MPI: Mellanox HPC-X v1.6.392 (based on Open MPI 1.10)• Application: OpenFOAM 2.2.2• Benchmark:
‣ Automotive benchmark – 670 Million elements, DP, pimpleFoam solver‣ Automotive benchmark 2 – 85 Million elements, DP, simpleFoam solver
20www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
simpleFoam Solver - MPI Tuning
OpenFOAM Performance
• SimpleFOAM (Semi-Implicit Method for Pressure-Linked Equations) ‣ allows to couple the Navier-Stokes equations
• Mellanox HPC-X shows 13% higher performance than Open MPI at 32 nodes‣ HPC-X is based on Open MPI, with additional libraries for support offloads
‣ HPC-X uses MXM UD transport for messaging and hcoll for collective offload
‣ Communication % of overall runtime drops from 34% (Open MPI) to 24% (HPC-X)
13%
Higher is better EDR InfiniBandNumber of Nodes
21www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
simpleFoam Solver – Time spent in MPI Calls
OpenFOAM Profiling
• Mostly in non-blocking point-to-point communication and collective ops‣ MPI_Waitall (33.8% MPI, 8% wall)
‣ MPI_Probe (25% MPI, 6% wall)
‣ MPI_Allreduce (19% MPI, 5% wall)
22www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
pimpleFoam Solver – Interconnect Comparisons
OpenFOAM Performance
• EDR IB provides an advantage at scale compared to OPA
• EDR IB outperforms Omni-Path by nearly 10% on the pimpleFoam simulation
• MPI runtime options used:‣ EDR IB: -bind-to core -mca coll_hcoll_enable 1 -x MXM_TLS=ud,shm,self -x
HCOLL_ENABLE_MCAST_ALL=1 -x HCOLL_CONTEXT_CACHE_ENABLE=1 -x MALLOC_MMAP_MAX_=0 -x MALLOC_TRIM_THRESHOLD_=-1
‣ OPA: -genv I_MPI_PIN on -PSM2
10%
Higher is better
23www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
pimpleFoam Solver – Interconnect Comparisons
OpenFOAM Profiling
• Mostly in non-blocking pt-to-pt communication, plus other MPI operations
• EDR appears to be lower in MPI communication time‣ EDR IB: 12.58% wall MPI_Allreduceat 1K cores
‣ OPA: 14.49% wall in MPI_Allreduceat 1K cores
Mellanox EDR InfiniBand Intel Omni-Path
24www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
128-node (3584-core) “Orion” cluster
Test Cluster Configuration 3
• HP ProLiant DL360 Gen9 128-node “Orion” cluster‣ Dual-Socket 14-core Intel E5-2697 v3 @ 2.60 GHz CPUs
‣ Memory: 256GB memory, DDR3 2133 MHz
‣ OS: RHEL 7.2, MLNX_OFED_LINUX-3.2-1.0.1.1 InfiniBand SW stack
• Mellanox ConnectX-4 EDR InfiniBand adapters
• Mellanox SwitchX-IB 2 EDR InfiniBand switches
• MPI: HPC-X v1.6.392 (based on Open MPI 1.10)
• Compilers: Intel Parallel Studio XE 2015
• Application: OpenFOAM 3.1
• Benchmark dataset:‣ Lid Driven Cavity Flow - 1 million elements, DP, 2D icoFoam solver for
laminar, isothermal, incompressible flow
25www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
icoFoam Solver – Open MPI Tuning
OpenFOAM Performance
• HPC-X MPI library provides switch-based collective offload with network
• Switch based MPI collective operations in the network (MPI_Allreduce & MPI_Barrier)‣ Seen up to by 75% of performance benefit on a 64-node simulation
• Benefit of FCA depends on the usage of MPI collective operations in solvers
Higher is better
75%
26www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
icoFoam Solver – Time spent in MPI Calls
OpenFOAM Profiling
• IcoFoam‣ solves incompressible laminar Navier-Stokes equations using the PISO algorithm
• Mostly involves in collective communication
• MPI_Allreduce at 16 bytes is almost used exclusively
• MPI_Allreduce (85%), MPI_Waitall (5%)
27www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
icoFoam Solver - MPI Comparisons
OpenFOAM Performance
• Tuned HPC-X MPI library has significant performance advantage at scale
• HPC-X (SHArP) delivers 120% of greater scalability than Intel MPI at 64 nodes
• SHArP - Scalable Hierarchical Aggregation Protocol
• Benefit is mostly due to MPI collective offloads in HPC-X‣ Offloading the MPI_Allreduce to the networking hardware
Higher is better
120%
28www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
OpenFOAM Summary
• OpenFOAM scales well to thousands of cores with suitable setup
• OpenFOAM solvers scales well for the solvers/cases being tested
• Summary:‣ EDR IB vs FDR IB:
• EDR IB provides 13% higher performance than FDR IB on 32 nodes
‣ EDR vs OPA: • IB provide nearly 10% higher performance with pimpleFoam simulation on 32
nodes
‣ Network Offload: • 75% of performance gain by using offload in HPC-X on 64 nodes
‣ MPI: • HPC-X shows 13% higher performance than Intel MPI on simpleFoam solver
• HPC-X Provided 120% of scalability than the Intel MPI on icoFoamsolver
29www.esi-group.com
Copyright © ESI Group, 2016. All rights reserved. 4th ESI OpenFOAM User Conference 2016 – October 11 & 12 – Cologne, Germany
• Web: www.hpcadvisorycouncil.com
• Email: [email protected]
• Facebook: http://www.facebook.com/HPCAdvisoryCouncil
• Twitter: www.twitter.com/hpccouncil
• YouTube: www.youtube.com/user/hpcadvisorycouncil
All trademarks are property of their respective owners. All information is provided “As -Is” without any kind of warranty. The HPC Advisory Council makes no representation to the accuracy and completeness of the information contained herein. HPC Advisory Council undertakes no duty and assumes no obligation to update or correct any information presented herein
Contact Us