36
EU H2020 Centre of Excellence (CoE) 1 December 2018 – 30 November 2021 Grant Agreement No 824080 Case Study: 3x Speed Improvement for Zenotech's zCFD Solver Nick Dingle, Numerical Algorithms Group Ltd.

A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

EU H2020 Centre of Excellence (CoE) 1 December 2018 – 30 November 2021

Grant Agreement No 824080

Case Study: 3x Speed Improvement for Zenotech's zCFD SolverNick Dingle, Numerical Algorithms Group Ltd.

Page 2: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

• Performance Optimisation and Productivity

• A Centre of Excellence• Collaborative European project funded by Horizon 2020 programme

• Runs 1 December 2018 – 30 November 2021

• Providing Free Services within Europe• Precise understanding of parallel application and system behaviour

• Across application areas, platforms and scales

• Suggestions/support on how to rewrite code in the most productive way

• For academic and industrial codes and users

12 March 2019 2

The POP service

Page 3: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

• Participating institutions:• Barcelona Supercomputing Center, Spain (coordinator)• HLRS, Germany• IT4Innovations, Czech Republic• Jülich Supercomputing Center, Germany• NAG, UK• RWTH Aachen, IT Center, Germany• TERATEC, France• Université de Versailles Saint-Quentin-en-Yvelines, France

• A team with:• Expertise in performance analysis and optimisation• Expertise in parallel programming models and practices• A research and development background and a

proven commitment to real academic and industrial use cases

12 March 2019 3

The POP team

Page 4: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

• A density-based finite volume and Discontinuous Galerkin(DG) computational fluid dynamics (CFD) solver for steady-state or time-dependent flow simulation

• Decomposes domains using unstructured meshes

• Written in Python and C++ and parallelised with OpenMP and MPI

12 March 2019 4

zCFD by Zenotech

Page 5: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

• Provides a quantitative measurement of the relative impact of the different factors inherent in parallelisation

• Uses a hierarchy of metrics, with each metric measuring a common cause of inefficiency in parallel programs

• Metrics are efficiencies between 0 and 1; higher numbers are better

• Efficiencies less than 0.8 are candidates for further investigation

5

POP methodology

12 March 2019

Page 6: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

• The headline figure is Global Efficiency, which is the product of the Parallel and Computational Efficiencies

• Parallel Efficiency measures the effect that parallelising the code has on the runtime• E.g. how balanced between threads is the computational work, how much time is

lost to OpenMP overheads, etc

• Calculated as the ratio between the average amount of time that threads spend in useful computation (i.e. not in the OpenMP library or I/O) and the total runtime

• Computational Efficiency describes how well the computational load of the application scales with the number of threads• The ratio between the total time across all threads that the code spends in useful

computation, and the time that the serial code spends in useful computation

12 March 2019 6

The metrics

Page 7: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 7

Efficiencies

# Threads

1 2 6 12

Global Efficiency 0.97 0.71 0.52 0.33

→ Parallel Efficiency 0.97 0.80 0.64 0.50

→ Computational Efficiency 1.00 0.89 0.82 0.66

Page 8: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 8

Efficiencies

# Threads

1 2 6 12

Global Efficiency 0.97 0.71 0.52 0.33

→ Parallel Efficiency 0.97 0.80 0.64 0.50

→ Computational Efficiency 1.00 0.89 0.82 0.66

• Investigate more deeply:• Time spent in parallel code vs. serial code

• Load balance of OpenMP loops

• OpenMP overhead

Page 9: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 9

Parallel efficiency

# Threads

1 2 6 12

Parallel Efficiency 0.97 0.80 0.64 0.50

% Parallel Code - 88.6% 75.0% 66.6%

Load Balance Efficiency 1.00 0.88 0.85 0.85

OpenMP Overhead Efficiency 1.00 1.00 0.99 0.98

Page 10: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 10

VTune

Page 11: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 11

VTune

Page 12: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 12

VTune

Page 13: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 13

VTune

Page 14: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 14

VTune

Page 15: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 15

VTune

Page 16: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 16

VTune

Page 17: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 17

VTune

Page 18: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 18

VTune

Page 19: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 19

Computational efficiency

# Threads

1 2 6 12

Computational Efficiency 1.00 0.89 0.82 0.66

IPC Efficiency 1.00 0.94 0.92 0.91

Instructions Efficiency 1.00 1.00 1.00 1.00

CPU Frequency Efficiency 1.00 0.94 0.89 0.72

Page 20: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 20

Computational performance

Page 21: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 21

Computational performance

Page 22: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 22

Computational performance

Page 23: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 23

Computational performance

Page 24: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 24

Computational performance

Page 25: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 25

CPU frequency

Page 26: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 26

CPU frequency

Page 27: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 27

CPU frequency

Page 28: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 28

CPU frequency

Page 29: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

• A surprisingly large amount of time spent executing in serial

• One key OpenMP loop suffered from load imbalance

• CPU frequency was lower when the code was run on the maximum number of threads

12 March 2019 29

Performance Audit findings

Page 30: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

• Although the code contained the correct OpenMP pragmas, the compiler found a particular region too complex to analyse and did not apply optimisations or OpenMP pragmas

• Fixed by removing an inline keyword

12 March 2019 30

Parallelising serial portions of code

Page 31: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

• The main load imbalance was due to a call to pow() hitting a slow code-path when both base and exponent were close to 1

• This was resolved by scaling the base, raising it to the power, and then undoing the scaling

• Switching OpenMP loop scheduling to dynamic also helped

12 March 2019 31

Improving load balance

Page 32: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

• The CPU frequency governor was set to ondemand by default, which meant that the frequency reduced when all 12 threads were active

• Fixed by adding --cpu-freq=performance to the Slurm job submission command

12 March 2019 32

Changing execution environment

Page 33: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 33

New efficiencies

# Threads

1 2 6 12

Global Efficiency 0.97 0.71 0.52 0.33

→ Parallel Efficiency 0.97 0.80 0.64 0.50

→ Computational Efficiency 1.00 0.89 0.82 0.66

# Threads

1 2 6 12

Global Efficiency 1.00 0.89 0.73 0.56

→ Parallel Efficiency 1.00 0.98 0.89 0.76

→ Computational Efficiency 1.00 0.91 0.82 0.74

Page 34: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

• For the test case used in the Audit, these improvements meant the code ran 1.65x faster on 12 threads

• On a test case that was 100x larger they gave a 3x performance improvement on 12 threads

12 March 2019 34

Results

Page 35: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

• Using metrics helps to break down aspects of performance and identify underlying opportunities for improvement

• Performance analysis tools provide insights into application behaviour

• Not always the case that you need to re-engineer significant portions of your code to achieve meaningful performance increases!

12 March 2019 35

Summary

Page 36: A Centre of Excellence in HPC - Case Study: 3x Speed … · 2019. 3. 13. · EU H2020 Centre of Excellence (CoE) 1 December 2018 –30 November 2021 Grant Agreement No 824080 Case

12 March 2019 36

Contact:https://www.pop-coe.eumailto:[email protected]

@POP_HPC

This project has received funding from the European Union‘s Horizon 2020 research and innovation programme under grant agreement No 676553 and 824080.

Performance Optimisation and Productivity A Centre of Excellence in HPC