Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
ARM and Mellanox Hackathon - GRChombo
Kacper Kornet
September 25, 2019
DAMTP, University of Cambridge
Table of contents
1
GRChombo
GRChombo is an AMR GR code developed by a team of researchers
from:
• Department of Applied Mathematics and Theoretical Physics
(DAMTP), University of Cambridge
• Argonne Leadership Computing Facility, Argonne National
Laboratory
• Department of Physics, King’s College London
• School of Mathematical Sciences, Queen Mary University of London
• Department of Physics, University of Oxford
• Institute of Mathematics and Physics, University of Louvain
Core developers: Josu C. Aurrekoetxea (KCL), Katy Clough (Oxford),
Amelia Drew (Cambridge), Pau Figueras (QMUL), Hal Finkel (ANL),
Tiago Frana (QMUL), Chenxia Gu (QMUL), Thomas Helfer (KCL),
Cristian Joana (UCLouvain), Kacper Kornet (Cambridge), Markus
Kunesch (Cambridge), Eugene Lim (KCL), Miren Radia (Cambridge),
James Widdicombe (KCL) 2
GRCombo
3
GRChombo: Parallelization levels
• Set of boxes distributed among with MPI
• Inside boxes outer loops parallelized with OpenMP
• Innermost loops vectorized with intrinsics
4
GRChombo: C++ template library
void BinaryBHLevel::specificEvalRHS(GRLevelData &a_soln, GRLevelData &a_rhs,
const double a_time)
{
// Enforce positive chi and alpha and trace free A
BoxLoops::loop(make_compute_pack(TraceARemoval(),
PositiveChiAndAlpha()),
a_soln, a_soln, INCLUDE_GHOST_CELLS);
// Calculate CCZ4 right hand side and set constraints
// to zero to avoid undefined values
BoxLoops::loop(
make_compute_pack(CCZ4(m_p.ccz4_params, m_dx, m_p.sigma),
SetValue(0, Interval(c_Ham, NUM_VARS - 1))),
a_soln, a_rhs, EXCLUDE_GHOST_CELLS);
}
5
GRChombo: C++ template library
// Compute the value of phi at the current point
template <class data_t>
data_t ScalarBubble::compute_phi(Coordinates<data_t> coords) const
{
data_t rr = coords.get_radius();
data_t rr2 = rr * rr;
data_t out_phi = m_params.amplitudeSF * rr2 *
exp(-sqr(rr - m_params.r_zero
/ m_params.widthSF));
return out_phi;
}
6
GRChombo: instrinsics classes
template <> struct simd\_traits<float>
{
typedef __m512 data_t;
typedef __mmask16 mask_t;
static const int simd_len = 16;
};
template <> struct simd<double> : public simd_base<double>
{
typedef typename simd_traits<double>::data_t data_t;
typedef typename simd_traits<double>::mask_t mask_t;
ALWAYS_INLINE
simd() : simd_base<double>(_mm512_setzero_pd()) {}
ALWAYS_INLINE
simd(double x) : simd_base<double>(_mm512_set1_pd(x)) {} 7
Porting GRChombo to ARM
• Finding best compiler options (-fno-fast-errno)
• Replacing x86 specific bits with general one
• NEON port
• rudimentary SVE port (not vector length agnostic yet)
8
GRCombo benchmarks on ARM cluster
9
GRCombo benchmarks on ARM cluster
10
GRCombo on Bluefield
• Runs without source modifications (although one needs to be careful
about architecture options)
• Using same number of cores ∼ 3 slower then ThunderX2
11