11
Research Article ScalabilityofOpenFOAMDensity-BasedSolverwithRunge–Kutta Temporal Discretization Scheme Sibo Li , 1 Roberto Paoli , 1,2 and Michael D’Mello 3 1 Department of Mechanical and Industrial Engineering, University of Illinois at Chicago, Chicago, IL, USA 2 Computational Science Division and Leadership Computing Facility, Argonne National Laboratory, Lemont, IL, USA 3 Intel Corporation, Schaumburg, IL, USA Correspondence should be addressed to Roberto Paoli; [email protected] Received 5 December 2019; Revised 2 February 2020; Accepted 11 February 2020; Published 11 March 2020 Academic Editor: Manuel E. Acacio Sanchez Copyright©2020SiboLietal.isisanopenaccessarticledistributedundertheCreativeCommonsAttributionLicense,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Compressibledensity-basedsolversarewidelyusedinOpenFOAM,andtheparallelscalabilityofthesesolversiscrucialforlarge- scalesimulations.Inthispaper,wereportourexperienceswiththescalabilityofOpenFOAM’snativerhoCentralFoamsolver,and bymakingasmallnumberofmodificationstoit,weshowthedegreetowhichthescalabilityofthesolvercanbeimproved.e mainmodificationmadeistoreplacethefirst-orderaccurateEulerschemeinrhoCentralFoamwithathird-orderaccurate,four- stage Runge-Kutta or RK4 scheme for the time integration. e scaling test we used is the transonic flow over the ONERA M6 wing.isisacommonvalidationtestforcompressibleflowssolversinaerospaceandotherengineeringapplications.Numerical experiments show that our modified solver, referred to as rhoCentralRK4Foam, for the same spatial discretization, achieves as much as a 123.2% improvement in scalability over the rhoCentralFoam solver. As expected, the better time resolution of the Runge–KuttaschememakesitmoresuitableforunsteadyproblemssuchastheTaylor–Greenvortexdecaywherethenewsolver showed a 50% decrease in the overall time-to-solution compared to rhoCentralFoam to get to the final solution with the same numerical accuracy. Finally, the improved scalability can be traced to the improvement of the computation to communication ratioobtainedbysubstitutingtheRK4schemeinplaceoftheEulerscheme.AllnumericaltestswereconductedonaCrayXC40 parallel system, eta, at Argonne National Laboratory. 1.Introduction Withthecontinuousdevelopmentofhardwareandsoftware supporting high-performance computing, eventually lead- ingtoexascalecomputingcapabilitiesinanearfuture,high- fidelity simulations of complex flows that were out of reach until a decade ago are now becoming feasible on super- computer infrastructures. Direct numerical simulations (DNS) and large-eddy simulations (LES) are natural can- didates for high-fidelity simulations as they can capture all relevant spatial and temporal characteristic scales of the flow. Indeed, there is consensus in the computational fluid dynamics (CFD) community that DNS/LES codes incor- porating high-order time integration and spatial dis- cretization methods are preferable for ensuring minimal influence of numerical diffusion and dispersion on the flow physics. While these numerical constraints have been tra- ditionally integrated in the simulation of academic flows on simple geometries, they are also being considered for in- dustrial and more complex applications where accurate prediction of local or instantaneous flow properties are required (e.g., in combustion, multiphase and reacting flows). Inthiscontext,theOpenFOAMpackage[1]representsa popular open-source software originally designed for use in CFD.Itsoperationiscomposedofsolvers,dictionaries,and domain manipulation tools. It provides several different kinds of solvers for different partial differential equations (PDEs)andframeworktoimplementthird-partysolversfor customized PDEs. e solvers integrated in the standard distributions of OpenFOAM are robust, but they generally lackprecisionwith2nd-orderaccuracyatmostinbothspace Hindawi Scientific Programming Volume 2020, Article ID 9083620, 11 pages https://doi.org/10.1155/2020/9083620

ScalabilityofOpenFOAMDensity-BasedSolverwithRunge–Kutta ...downloads.hindawi.com/journals/sp/2020/9083620.pdfmethods in OpenFOAM for transient flow calculations. eseinclude,forexample,numericalalgorithmsforDNS

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ScalabilityofOpenFOAMDensity-BasedSolverwithRunge–Kutta ...downloads.hindawi.com/journals/sp/2020/9083620.pdfmethods in OpenFOAM for transient flow calculations. eseinclude,forexample,numericalalgorithmsforDNS

Research ArticleScalability ofOpenFOAMDensity-BasedSolverwithRungendashKuttaTemporal Discretization Scheme

Sibo Li 1 Roberto Paoli 12 and Michael DrsquoMello 3

1Department of Mechanical and Industrial Engineering University of Illinois at Chicago Chicago IL USA2Computational Science Division and Leadership Computing Facility Argonne National Laboratory Lemont IL USA3Intel Corporation Schaumburg IL USA

Correspondence should be addressed to Roberto Paoli robpaoliuicedu

Received 5 December 2019 Revised 2 February 2020 Accepted 11 February 2020 Published 11 March 2020

Academic Editor Manuel E Acacio Sanchez

Copyright copy 2020 Sibo Li et al )is is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Compressible density-based solvers are widely used in OpenFOAM and the parallel scalability of these solvers is crucial for large-scale simulations In this paper we report our experiences with the scalability of OpenFOAMrsquos native rhoCentralFoam solver andby making a small number of modifications to it we show the degree to which the scalability of the solver can be improved )emain modification made is to replace the first-order accurate Euler scheme in rhoCentralFoam with a third-order accurate four-stage Runge-Kutta or RK4 scheme for the time integration )e scaling test we used is the transonic flow over the ONERA M6wing )is is a common validation test for compressible flows solvers in aerospace and other engineering applications Numericalexperiments show that our modified solver referred to as rhoCentralRK4Foam for the same spatial discretization achieves asmuch as a 1232 improvement in scalability over the rhoCentralFoam solver As expected the better time resolution of theRungendashKutta scheme makes it more suitable for unsteady problems such as the TaylorndashGreen vortex decay where the new solvershowed a 50 decrease in the overall time-to-solution compared to rhoCentralFoam to get to the final solution with the samenumerical accuracy Finally the improved scalability can be traced to the improvement of the computation to communicationratio obtained by substituting the RK4 scheme in place of the Euler scheme All numerical tests were conducted on a Cray XC40parallel system )eta at Argonne National Laboratory

1 Introduction

With the continuous development of hardware and softwaresupporting high-performance computing eventually lead-ing to exascale computing capabilities in a near future high-fidelity simulations of complex flows that were out of reachuntil a decade ago are now becoming feasible on super-computer infrastructures Direct numerical simulations(DNS) and large-eddy simulations (LES) are natural can-didates for high-fidelity simulations as they can capture allrelevant spatial and temporal characteristic scales of theflow Indeed there is consensus in the computational fluiddynamics (CFD) community that DNSLES codes incor-porating high-order time integration and spatial dis-cretization methods are preferable for ensuring minimalinfluence of numerical diffusion and dispersion on the flow

physics While these numerical constraints have been tra-ditionally integrated in the simulation of academic flows onsimple geometries they are also being considered for in-dustrial and more complex applications where accurateprediction of local or instantaneous flow properties arerequired (eg in combustion multiphase and reactingflows)

In this context the OpenFOAM package [1] represents apopular open-source software originally designed for use inCFD Its operation is composed of solvers dictionaries anddomain manipulation tools It provides several differentkinds of solvers for different partial differential equations(PDEs) and framework to implement third-party solvers forcustomized PDEs )e solvers integrated in the standarddistributions of OpenFOAM are robust but they generallylack precision with 2nd-order accuracy at most in both space

HindawiScientific ProgrammingVolume 2020 Article ID 9083620 11 pageshttpsdoiorg10115520209083620

and time )erefore there is a growing interest in the CFDcommunity to develop and implement higher-ordermethods in OpenFOAM for transient flow calculations)ese include for example numerical algorithms for DNSLES of incompressible flows [2] compressible flows [3ndash5]and reacting flows [6]

In addition to high-order numerical schemes parallelefficiency and scalability are of course crucial for high-performance computing of complex flows Parallelism inOpenFOAM is implemented by means of MPI (MessagePassing Interface) although previous scalability analysis ofincompressible OpenFOAM solvers showed limited speedup[7 8] )e improvement of scaling performance in Open-FOAM has been a concern in many recent studies In orderto optimize the performance it is crucial to first conductperformance analysis to find the bottleneck of the simulationprocess Culpo [7] found that the communication is thebottleneck in scalability of solvers in OpenFOAM for large-scale simulations Duran et al [9] studied the speedup oficoFoam solver for different problem sizes and showed thatthere is a large room for scalability improvement Lin et al[10] proposed a communication optimization method formultiphase flow solver in OpenFOAM Ojha et al [11]applied optimizations to the geometric algebraic multigridlinear solver and showed an improved scaling performance

In this work we are primarily interested in compressibleflow solvers for aeronautical applications In the standarddistribution of OpenFOAM rhoCentralFoam is the onlydensity-based solver for transient compressible flows [12] Itis built upon a second-order nonstaggered central schemebased on KT (Kurganov and Tadmor [13]) and KNP(Kurganov Noelle and Petrova [14]) methods )ere are afew studies that tried to improve the numerical algorithm ofdensity-based solver Heyns et al [15] extended the rho-CentralFoam solver through the implementation of alter-native discretization schemes to improve its stability andefficiency More recently Modesti and Pirozzoli [4] devel-oped a solver named rhoEnergyFoam relying on the AUSMscheme by Liou and Steffen Jr [16] By advancing using alow-storage third-order four-stage RungendashKutta algorithmfor time advancement their solver showed reduced nu-merical diffusion and better conservation properties com-pared to rhoCentralFoam in both steady and unsteadyturbulent flows

In this work we present a new OpenFOAM solvernamed rhoCentralRK4Foam which is derived from rho-CentralFoam by replacing its first-order time advancementscheme with a third-order four-stage RungendashKutta scheme)e aim of developing this solver is twofold (i) to improvethe scaling performance of rhoCentralFoam especially inlarge-scale simulations (ii) to improve time accuracy andoverall time-to-solution using high-order RungendashKuttascheme [17] Instead of trying to optimize the parallelismembedded in OpenFoam which has been shown to be veryhard to achieve (see eg Culpo [7]) our proposed approachis to select a different numerical integration scheme that showsimproved CPU and scalability performances with minimalmodification of the standard distribution )e cases areconfigured to solve the PDEs through the rhoCentralFoam

and rhoCentralRK4Foam solvers We investigate the parallelperformance of rhoCentralFoam and rhoCentralRK4Foam onCray XC40 system )eta [18] )ese two solvers are bench-marked on two cases an inviscid transonic flow over theONERA M6 wing and a supersonic flow over the Forward-facing step to validate the new solverrsquos shock capturing ca-pability )e TAU (Tuning and Analysis Utilities) Perfor-mance System analyzer [19] is used to collect the hotspotprofiles of the two solvers)e strong and weak scaling tests ofthe benchmark problems are conducted on )eta up to 4096cores For the time-to-solution analysis we considered theTaylorndashGreen vortex problem and determined the time-to-solution needed by the two solvers to get a solution with thesame accuracy (same numerical error) with respect to theanalytical solution )e present study is also aimed at pro-viding users a handle on parameter choices for performanceand scalability

)e rest of the paper is organized as follows in Section 2a description of numerical methods for the two solvers aswell as the hardware and profiling tools are presented )ebenchmark test cases and the results for the scalabilityanalysis are presented in Section 3 In Section 4 we presentthe results for the time-to-solution analysis Section 5 showsthe conclusion

2 Methods

21 Governing Equations and Spatial Integration )e solverrhoCentralFoam is the most widely used compressible flowsolver in the baseline distribution of OpenFOAM It relies onthe full discretization of the convective fluxes through thecentral TVD scheme (total variation diminishing) of KTandKNP schemes rhoCentralFoam solves the governing fluidequations in an Eulerian frame of reference for three con-servative variables specifically density (ρ) momentumdensity (ρu) and total energy density (ρE)

zρzt

+ nabla middot (ρu) 0 (1)

zρuzt

+ nabla middot (ρuu) minusnablap + nabla middot τ (2)

zρE

zt+ nabla middot (ρuE) + nabla middot (pu) nabla middot (T middot u) + nabla middot J (3)

In the above equations E equiv e(T) + 12u middot u where e(T)

is the internal energy and T is the temperature p is thethermodynamic pressure related to the temperature anddensity through the equation of state for ideal gasesp ρRT where R is the gas constant and τ 2μ[(12)(nablau +

nablaTu) minus 13(nabla middot u)I] is the viscous stress tensor under theassumption of Newtonian fluid with dynamic viscosity μwhile I is the unit tensor Finally J minusλnablaT is the heat fluxvector where λ is the thermal conductivity

)e governing equations are discretized using a finitevolume method where equations (1)ndash(3) are integrated overa control volume represented by a grid cell of volume VUsing Gauss theorem all fluxes can be transformed intosurface integrals across the cell boundaries which are

2 Scientific Programming

estimated by summing up the flux contributions from allfaces f of the cell )e volume average of the state vector ofconserved variables Ψ equiv [ρ ρu ρE] is given by

Ψ 1V

1113946VΨdV (4)

So the integral form of equations (1)ndash(3) can beexpressed as

dΨdt

L(Ψ) (5)

where L(Ψ) denotes the operator returning the right-handside of equations (1)ndash(3) containing all inviscid and viscousfluxes )ese fluxes have to be estimated numerically usingthe volume-averaged state variables Ψ of adjacent cells Inparticular the convective fluxes are obtained as

1113946Vnabla middot (uΨ)dV 1113946

SuΨ middot dS asymp 1113944

fisinSuf middot SfΨf1113872 1113873 1113944

fisinSϕfΨf

(6)

where subscript f identifies the faces of the cell volumewhereas the termsΨf uf Sf and ϕf equiv uf middot Sf represent thestate vector velocity surface area and volumetric flux at theinterface between two adjacent cells respectively )eproduct ϕfρf is obtained by applying the KT scheme

ϕfΨf 12

ϕ+fΨ

++ ϕminus

fΨminusf1113872 1113873 minus

12αmax Ψminus

f minus Ψ+f1113872 1113873 (7)

where + andminus superscripts denote surface values interpo-lated from the owner cell and neighbor cell respectivelywhile αmax is an artificial diffusive factor (see [12] for details)In the implementation of rhoCentralRK4Foam the Eu-lerian fluxes are firstly calculated explicitly as shown inAlgorithm 1 where ldquophirdquo represents ϕfρf ldquophiUprdquo repre-sents ϕfρfUf + Sfpf with pf 12(p+

f + pminusf) and ldquophiEprdquo

represents ϕfρfEf + ϕfpf

22 Time Integration After calculating fluxes and gradientsaccording to equations (6) and (7) equation (5) can benumerically integrated between time tn and tn+1 tn + Δtusing multistage RungendashKutta scheme Denoting by Ns thenumber of stages this yields

Ψl

n Ψ0n + αlΔtL Ψlminus1n1113874 1113875 l 1 Ns (8)

where Ψ0n equiv Ψ(tn) and ΨNs

n equiv Ψ(tn+1) In the proposedsolver rhoCentralRK4Foam we use a four-stage Run-gendashKutta scheme which is obtained by setting Ns 4 andα1 14 α2 13 α3 12 and α4 1 )e implementa-tion in OpenFOAM is reported in Algorithm 2 and the C++code is publicly available at httpsgithubcomSiboLi666rhoCentralRK4Foam Note that the original Euler schemeimplemented in rhoCentralFoam can be simply obtainedfrom equation (8) by setting Ns 1 and α1 1

23 Hardware and Profiling Tools )e simulations were runon supercomputer platform ldquo)etardquo at Argonne National

Laboratory )eta is a Cray XC40 system with a second-generation Intel Xeon Phi (Knights Landing) processor andthe Cray Aries proprietary interconnect )e system com-prises 4392 compute nodes Each compute node is a singleXeon Phi chip with 64 cores 16GB Multi-Channel DRAM(MCDRAM) and 192GB DDR4 memory To avoid anypossible memory issues with OpenFoam the simulationswere run using 32 cores of each computing node (half of itsmaximum capacity)

We assess the scalability and parallel efficiency of the twosolvers rhoCentralFoam and rhoCentralRK4Foam by an-alyzing their speedup on the same grid and by monitoringthe portions of CPU time spent in communication and incomputation respectively )ese measurements are per-formed with performance tools adapted to the machine Inthe strong scaling tests the tools are set to start countingafter the initial setup stage is completed in our case after 20time steps lasting for an extra 75 time steps )e strongscaling tests are performed on)eta In order to measure thetime spent in communication we rely on the TAU Per-formance system [19])e TAU performance tool suite is anopen-source software developed at the University of Oregonand offers a diverse range of performance On )eta theOpenFOAM makefile is modified to utilize the TAUwrapper functions for compilation TAU can automaticallyparse the source code and insert instrumentation calls tocollect profile andor trace data which allow us to measurethe total time spent in communication during the targeted75 time steps and identify the hotspots for the two solvers

3 Results for the Scalability Analysis

31 Test Cases Description For the scalability analysis wetested the new solver rhoCentralRK4Foam in two differentbenchmark cases (i) the three-dimensional ONERA M6transonic wing (ii) the two-dimensional supersonic forwardfacing step)e two cases are steady flows they are first usedfor validating the new solverrsquos shock-capturing capabilityand then used for detailed parallel performance analysis ofthe solver in the next section For the two cases we used thedecomposePar tool in OpenFOAM to decompose thegenerated mesh )e scotch decomposition method [20] isused to partition the mesh into subdomains A brief de-scription of the two cases is presented next

311 Transonic M6Wing In the ONERAM6 wing case themean aerodynamic chord is c 053m the semispan isb 1m and the computational domain extends to 20c )eangle of attack is 306deg and the free stream mach number isMa 08395 )e Reynolds number in the original exper-iment was Re 1172 times 106 and so the flow is certainlyturbulent however in order to capture the pressure dis-tribution and the shock location along the wing inviscidcalculations can be safely employed and are indeed cus-tomary (see for example Modesti and Pirozzoli [4]) )egeometry is meshed using hexahedral elements and threegrids with different sizes are generated Grid1 has 1 millioncells (shown in Figure 1(a)) Grid2 has 5 million cells and

Scientific Programming 3

Grid3 has 43 million cells )e flow-filed analysis wasconducted on Grid1 (which is sufficient for grid conver-gence) while Grid2 and Grid 3 were used for scalinganalysis Figure 1(b) shows the pressure distribution on thewing surface computed by using rhoCentralFoam andrhoCentralRK4Foam In Figure 1(c) the pressure coefficientat the inner wing section (20 span) computed by the twosolvers is compared with the experimental data (blue circlesymbols [21]) We can observe that the newly developedsolver captures the main flow features It generates apressure distribution on the wing surface similar to thatobtained with rhoCentralFoam and captures the shock lo-cation precisely

312 Supersonic Forward-Facing Step Computations ofinviscid flow over a forward-facing step are used tofurther verify the shock-capturing capability of

rhoCentralRK4Foam solver )e flow configuration used byWoodward and Colella [22] is considered in this work )esupersonic flow mach number is Mainfin 3 )e grid hasNx times Ny 240 times 80 cells in the coordinate directions )eshock pattern represented by density distribution shown inFigure 2 confirms that the rhoCentralRK4Foam is capable ofcapturing strong shocks for supersonic flows

32 Strong ScalingTests In the strong scaling tests we testedthe rhoCentralFoam and rhoCentralRK4Foam solvers onthree different mesh sizes of ONERA M6 wing Here wepresent the scalability results by showing the speedup as wellas the speedup increment percentage for rhoCen-tralRK4Foam for Grid1 up to 1024 ranks in Table 1 Grid2 upto 2048 ranks in Table 2 and Grid3 up to 4096 ranks inTable 3 Note that the scaling is presented as CPU timenormalized by the CPU time obtained with 128 ranks which

Result construct Eulerian fluxes explicitlywhile runTimeloop() doSaving quantities at preavious time steps if rk4 thenlowast )e following firstly constructs the Eulerian fluxes by applying the Kurganov and Tadmor (KT) scheme lowastsurfaceScalarField phi(ldquophirdquo(aphiv poslowast rho pos + aphiv neglowast rho neg))surfaceVectorField phiUp(ldquophiUprdquo(aphiv poslowast rhoU pos + xaphiv neglowast rhoU neg)

+ (a poslowastp pos + a neglowastp neglowastmeshSf))surfaceScalarField phiEp(ldquophiEprdquo(aphiv poslowast (rho poslowast (e pos + 05lowastmagSqr(U pos)) + p pos)

+ aphiv neglowast (rho neglowast (e neg + 05lowastmagSqr(U neg)) + p neg)

+ aSflowastp pos minus aSflowastp neg))lowast )en the divergence theorem (Gaussrsquos law) is applied to relate the spatial integrals to surface integralslowastvolScalarField phiSum (ldquophiSumrdquo fvc di v(phi))volVectorField phiUpSum2 (ldquophiUpSum2rdquo fvc di v(tauMC))volVectorField phiUpSum3 (ldquophiUpSum3rdquo fvc laplacian(muEff U))

elseend

end

ALGORITHM 1)e Eulerian fluxes are calculated explicitly so that the ordinary differential equations system can be constructed and furtherintegrated in time by using explicit RungendashKutta algorithm

Result Calculate the three conservative variablesfor (int cycle 0 cyclelt rkloopsize cycle + +)lowast )e following calculates the three conservative variables specifically density (ρ) momentum density (ρU) and total energy

density (ρE) lowastrho rhoOl d + rk[cycle]lowast runTimede ltaTlowast(minusphiSum)rhoU rhoUOl d + rk[cycle]lowast runTimede ltaTlowast(minusphiUpSum + phiUpSum2 + phiUpSum3)rhoE rhoEOl d + rk[cycle]lowast runTimede ltaTlowast(minusenFl + enFl2)

ALGORITHM 2 )e time integration of the resulting ordinary differential equations system is carried out by a third-order four-stageRungendashKutta algorithm

4 Scientific Programming

is the minimum number of ranks that fit the memory re-quirements for the largest grid For example for 4096 ranksthe ideal speedup would then be 4096128 32 Itcan be observed that rhoCentralRK4Foam outperformsrhoCentralFoam in speedup in each case For the same gridsize the speedup increment percentage increases as the

number of ranks increases To better illustrate and analyzethe scaling performance improvement the results reportedin the three tables are summarized in Figure 3 First we canobserve that for the Grid3 there is a dramatic increase inspeedup when the number of ranks rises from 1024 to 2048and from 2048 to 4096)e speedup of rhoCentralRK4Foam

(a)

075 090 10 12060p

(b)

rhoCentralFoamrhoCentralRK4FoamExperiment

ndash10

ndash05

00

05

10

15

ndashCp

02 04 06 08 1000(x ndash x1)(xt ndash x1)

(c)

Figure 1 ONERAM6 transonic wing results (a) computational mesh (Grid 1) (b) pressure distribution computed with the proposed solverrhoCentralRK4Foam (c) comparison of pressure coefficient at section located (20 span) computed with rhoCentralRK4Foam and theoriginal rhoCentralFoam

190

340

490

0400

640rho

Figure 2 Density distribution in supersonic forward facing step simulations obtained using rhoCentralRK4Foam

Scientific Programming 5

has a 16 times increase from 1024 ranks to 2048 ranks and a17 times increase from 2048 ranks to 4096 ranks for Grid3)e reason is that rhoCentralFoam scales very slowly after1024 ranks eventually reaching a plateau which is an in-dication the solver attained its maximum theoreticalspeedup It is indeed instructive to estimate the serial portionf of CPU time from the speedup of the two solvers usingAmdahlrsquos law [23]

S Nr( 1113857 Nr

1 + Nr minus 1( 1113857(1 minus f) withf

tp

tp + ts

(9)

where tp and ts are the CPU time spent in parallel and serial(ie not parallelizable) sections of the solver respectivelyFrom the previous equations we have

f 1 minus 1S Nr( 1113857( 1113857

1 minus 1Nr( 1113857⟶ f 1 minus

1Sinfin

(10)

where Sinfin equiv S(Nr⟶infin) is the maximum theoreticalspeedup obtained for Nr⟶infin By measuring Sinfin we candirectly evaluate f from equation (10) for the cases featuringasymptotic speedup for rhoCentralFoam (Grid3) this yieldsf≃9988 We can also estimate f but best fitting S(Nc) toAmdahlrsquos formula in equation (9) for rhoCentralRK4Foamwhich yields f≃9998 Of course these results can beaffected by other important factors such as latency andbandwidth of the machine that are not taken into account inAmdahlrsquos model We also observe that the scalability be-comes better as the problem size increases as the workload of

communication over computation decreases For examplewith 2048 ranks the speedup for rhoCentralRK4Foam is6005 for Grid2 (5 million cells) while it becomes 9115 forGrid3 (43million cells) Moreover we can also see that as thegrid size increases the maximum speedup increment per-centage also increases which corresponding to 16 32and 123 for Grid1 Grid2 and Grid3 respectively As afinal exercise the TAU performance tool was applied toprofile the code for Grid3 with 2048 ranks Figure 4 alsoshows the communication and computation time percent-age for rhoCentralFoam and rhoCentralRK4Foam on Grid3from 128 to 4096 ranks We can observe that the rho-CentralRK4Foam solver takes less time to communicatewhich lead to the improved parallel performance found inprevious tests In addition among the MPI subroutines thatare relevant for communication (ie called every time stepin the simulation) MPI_waitall() was the one that employedmost of the time (see Figure 5) which is in line with previousprofiling (see for example Axtmann and Rist [24])

33 Weak Scaling Tests In order to further analyze theparallel scalability of rhoCentralRK4Foam solver we con-ducted a weak scaling analysis based on the ONERA M6wing case and the forward-facing step case )e grid sizeranges from 700 000 to 44 800 000 cells in ONERA M6wing case and from 1 032 000 to 66 060 000 cells in theforward-facing step case )e configurations of the weakscaling test cases are the same as the one in the previous testcases )e number of ranks ranges from 16 to 1024 and 64 to4096 respectively )e number of grid points per rank staysconstant at around 43 times 103 and 16 times 103 for two caseswhich ensures that the computation workload is not im-pacted much by communication Denoting by τ the CPUtime for a given run the relative CPU time is obtained by

Table 1 Normalized speedup for Grid1 (ONERA M6 wing) up to1024 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1341 1367 1939512 1594 1814 138021024 2364 2747 16201

Table 2 Normalized speedup for Grid2 (ONERA M6 wing) up to2048 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1574 1601 1715512 2308 2423 49831024 2916 3575 225992048 4541 6005 32240

Table 3 Normalized speedup for Grid3 (ONERA M6 wing) up to4096 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1871 1896 1336512 3275 3527 76951024 5098 5809 139472048 6341 9115 437474096 6895 1539 123205

rhoCentralFoam-Grid3 rhoCentralRk4Foam-Grid3rhoCentralRk4Foam-Grid2

rhoCentralFoam-Grid1 rhoCentralRk4Foam-Grid1Linear

0

6

12

18

24

30

36

Spee

dup

1024 2048 3072 40960ranks

rhoCentralFoam-Grid2

Figure 3 Strong scaling normalized speedups of rhoCentralFoamand rhoCentralRK4Foam for three grids of the M6 wing test case

6 Scientific Programming

normalizing with respect to the values obtained for 16 and 64ranks ττRanks 16 for 16 ranks and ττRanks 64 for 64ranks )e relative CPU time are recorded in Tables 4 and 5)e comparison of the two solvers is also plotted in Figures 6and 7 )e weak scaling tests give us indication about howwell does the two solvers scale when the problem size isincreased proportional to process count From Tables 4 and5 and Figures 6 and 7 it can be observed that for lower MPItasks (16 to 64) both of the two solvers scale reasonably wellHowever for higher MPI tasks (128 to 1024) the rho-CentralRK4Foam solver scales better Remarkably we canobserve a distinguishable relative time difference betweenthe two solvers as the number of ranks increases from 512 to1024 )e profiling results from TAU show that in the testcase with 1024 ranks rhoCentralRK4Foam spends around

40 time on computation while rhoCentralFoam only hasaround 20 time spent on computation As for the forward-facing step case we were able to conduct the tests at largercores count (up to 4096 cores) and it can be confirmed thatrhoCentralRK4Foam solver has better scaling performancethan rhoCentralFoam solver Indeed it scales better withlarger grid size and the relative time of rhoCentralRK4Foamsolver maintains at 1063 with 4096 cores (the relative time is1148 with 1024 cores in ONERA M6 wing case) Generallythe rhoCentralRK4Foam solver outperforms the rhoCen-tralFoam solver at large-scale ranks due to communicationhiding

rhoCentralFoam communicationrhoCentralRk4Foam communication

0

20

40

60

80

100

Tim

e per

cent

age (

)

600 1200 1800 2400 3000 3600 42000ranks

(a)

rhoCentralFoam communicationrhoCentralRk4Foam communication

0

20

40

60

80

100

Tim

e per

cent

age (

)

600 1200 1800 2400 3000 3600 42000ranks

(b)

Figure 4 Communication (a) and computation (b) time percentage for the strong scaling analysis (M6 test case Grid 3)

128 256 512 1024 20480

2

4

6

8

10

12

14

489

468

356

519

416

114

2

101

6

883 9

52

766

MPI

wai

tall

()

rhoCentralFoamrhoCentralRK4Foam

Figure 5 MPI waitall() time percentage for rhoCentralFoam andrhoCentralRK4Foam for M6 test case (Grid3) up to 2048 ranks

Table 4 Relative time results ττRanks 16 of weak scaling test(ONERA M6 wing)

Ranks rhoCentralFoam rhoCentralRK4Foam16 1 132 1085 106264 1106 1105128 1148 1057256 1191 1100512 1213 11581024 1255 1148

Table 5 Relative time results ττRanks 64 of the weak scaling test(forward-facing step)

Ranks rhoCentralFoam rhoCentralRK4Foam64 1 1128 1034 1002256 1085 1018512 1153 10311024 1186 10412048 1212 10674096 1203 1063

Scientific Programming 7

4 Results for the Time-to-Solution Analysis

)e scaling analysis is an important exercise to evaluate theparallel performance of the code on multiple cores howeverit does not provide insights into the time accuracy of thenumerical algorithm nor into the time-to-solution needed toevolve the discretized system of equations (equations(1)ndash(3)) up to a final state within an acceptable numericalerror For example in the scaling study conducted on theM6transonic wing the number of iterations was fixed for bothsolvers rhoCentralFoam and rhoCentralRK4Foam andcalculations were performed in the inviscid limit so that theimplicit solver used in rhoCentralFoam to integrate the

viscous fluxes was not active In such circumstancescomparing the two solvers essentially amounts at comparingfirst-order (Euler) and fourth-order forward in time Run-gendashKutta algorithms Hence it is not surprising that theCPU time for rhoCentralRK4Foam is about four times largerthan rhoCentralFoam (for the same number of iterations)since it requires four more evaluations of the right-hand-sideof equations (5) per time step For a sound evaluation of thetime-to-solution however we need to compare the twosolvers over the same physical time at which the errors withrespect to an exact solution are comparable Because the timeaccuracy of the solvers is different time step is also differentand so are the number of iterations required to achieve thefinal state

41 Test Case Description For the time-to-solution analysiswe consider the numerical simulation of TaylorndashGreen (TG)vortex a benchmark problem in CFD [25] for validatingunsteady flow solvers [26 27] requiring time accurate in-tegration scheme as the RungendashKutta scheme implementedin rhoCentralRK4Foam )ere is no turbulence model ap-plied in the current simulation )e TG vortex admits an-alytical time-dependent solutions which allows for precisedefinition of the numerical error of the algorithm withrespect to a given quantity of interest )e flow is initializedin a square domain 0lex yleL as follows

ρ(x y t) ρ0 (11)

u(x y 0) minusU0 cos kx sin ky (12)

v(x y 0) U0 sin kx cos ky (13)

p(x y 0) p0 minusρ0U2

04

(cos 2 kx + cos 2 ky) (14)

where u and v are the velocity components along x and ydirections k equiv 2πL is the wave number and U0 ρ0 and p0are the arbitrary constant velocity density and pressurerespectively )e analytical solution for the initial-valueproblem defined by equations (11)ndash(14) is

ρ(x y t) ρ0 (15)

u(x y t) minusU0 cos kx sin kyeminus 2]tk2

(16)

v(x y t) U0 sin kx cos kyeminus 2]tk2

(17)

p(x y t) p0 minusρ0U2

04

(cos 2 kx + cos 2 ky)eminus 4]tk2

(18)

which shows the velocity decay due to viscous dissipationAs a quantity of interest we chose to monitor both thevelocity profiles u at selected location and the overall kineticenergy in the computational box which can be readilyobtained from equations (16)ndash(18) as

ε 1L2 1113946

L

01113946

L

0

u2 + v2

2dxdy

U204

eminus 4]tk2

(19)

13

12

11

10

0 400ranks

Rela

tive t

ime

rhoCentralFoamrhoCentralRK4Foam

800 1200

Figure 6 Relative time ττRanks 16 of weak scaling tests forONERA M6 wing case

0 1000 2000ranks

3000 4000 5000

Rela

tive t

ime

rhoCentralFoamrhoCentralRK4Foam

12

11

10

Figure 7 Relative time ττRanks 64 of weak scaling tests forforward-facing step

8 Scientific Programming

42 Comparison of Time-to-Solution For the analysis ofrhoCentralFoam and rhoCentralRK4Foam solvers we con-sider fixed box size L 2πm and U0 1ms and two valuesof Reynolds number Re LU0v to test the effect of vis-cosity )e computation is done on a 160 times 160 pointsuniform grid )e simulations run up to t 058 s for Re

628 and t 116 s for Re 1256 which corresponds to thesame nondimensional time ttref 00147 with tref equiv L2]At this time the kinetic energy ε decreased to 10 from theinitial value as an illustrative example Figure 8 shows thevelocity magnitude contour for the Re 628 Figure 9presents the computed velocity profile u(y) in the middle

of the box x L2 rhoC stands for rhoCentralFoam solverand rhoCRK4 stands for rhoCentralRK4Foam solver It canbe observed that rhoCentralRK4Foam shows an excellentagreement with the analytical profile for Δt 10minus 4 s (re-ducing the time step further does provide any furtherconvergence for the grid used here) In order to achieve thesame level of accuracy with rhoCentralFoam time step hasto be reduced by a factor of 5 to Δt 2 times 10minus 5 s compared torhoCentralRK4Foam )e same behavior is observed for theevolution of ε as shown in Figure 10

In order to calculate the time-to-solution τ we firstevaluated the CPU time τ0 per time step per grid point per

0081275

016255

024383

4939e ndash 10

3251e ndash 01U magnitude

Y

ZX

Figure 8 Contour of velocity magnitude U equiv (u2 + v2)12 for the TaylorndashGreen vortex at ttref 00147 for the case Re 628

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

ndash030

ndash015

000

015

030

u

157 314 471 628000y

(a)

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

157 314 471 628000y

ndash030

ndash015

000

015

030

u

(b)

Figure 9 Velocity distribution in the TaylorndashGreen vortex problem at ttref 00147 (a)Re 628 (b)Re 1256

Scientific Programming 9

core for both solvers We used an Intel Xeon E5-2695v4processor installed on )eta and Bebop platforms atArgonne National Laboratory and reported the results of thetests in Table 6 Even though τ0 depends on the specificprocessor used but the ratio τ0rhoCentralRK4τ

0rhoCentral does not

and so is a good indication of the relative performance of thetwo solvers In the present case this ratio is consistentlyfound to be 3 for all Reynolds number we made additionaltests with much higher Reynolds up to Re sim 105 to confirmthis (see Table 6) Denoting by tf the physical final time ofthe simulation by Niter the number of iterations and byNpoints the number of grid points the time-to-solution forthe two solvers can be obtained as

τ τ0 times Niter times Npoints withNiter tf

Δt (20)

and is reported in Table 7 It is concluded that to achieve thesame level of accuracy τrhoCentralRK4 is about 15 to 16 timessmaller than τrhoCentral Additionally as we discussed in thescaling analysis rhoCentralRK4Foam can achieve up to123 improvement in scalability over rhoCentralFoamwhen using 4096 ranks )us the reduction in time-to-solution for rhoCentralRK4Foam is expected to be evenlarger for large-scale time-accurate simulations utilizingthousands of parallel cores

5 Conclusion

In this study we presented a new solver called rhoCen-tralRK4Foam for the integration of NavierndashStokes equationsin the CFD package OpenFOAM)e novelty consists in thereplacement of first-order time integration scheme of thenative solver rhoCentralFoam with a third-order Run-gendashKutta scheme which is more suitable for the simulationof time-dependent flows We first analyzed the scalability ofthe two solvers for a benchmark case featuring transonicflow over a wing on a structured mesh and showed thatrhoCentralRK4Foam can achieve a substantial improvement(up to 120) in strong scaling compared to rhoCen-tralFoam We also observed that the scalability becomesbetter as the problem size grows Hence even thoughOpenFOAM scalability is generally poor or decent at bestthe new solver can at least alleviate this deficiency We thenanalyzed the performance of the two solvers in the case ofTaylorndashGreen vortex decay and compared the time-to-so-lution which gives an indication of the workload needed toachieve the same level of accuracy of the solution (ie thesame numerical error) For the problem considered here weobtained a reduction of time-to-solution by a factor of about15 when using rhoCentralRK4Foam compared to rho-CentralFoam due to the use of larger time steps to get to thefinal solution with the same numerical accuracy )is factorwill be eventually even larger for larger-scale simulationsemploying thousands or more cores thanks to the betterspeedup of rhoCentralRK4Foam )e proposed solver ispotentially a useful alternative for the simulation of com-pressible flows requiring time-accurate integration like indirect or large-eddy simulations Furthermore the imple-mentation in OpenFOAM can be easily generalized todifferent schemes of RungendashKutta family with minimal codemodification

Data Availability

)e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

)e authors declare that they have no conflicts of interest

Acknowledgments

)is work was supported by Argonne National Laboratorythrough grant ANL 4J-30361-0030A titled ldquoMultiscale

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

000

005

010

015

020

025

ε

01 02 03 04 05 0600Time (s)

Figure 10 Evolution of total kinetic energy

Table 6 CPU time per time step per grid point per core τ0

τ0rhoCentral τ0rhoCentralRK4 τ0rhoCentralRK4τ0rhoCentral

Re 628 1230eminus 06 s 3590eminus 06 s 2919Re 1256 1120eminus 06 s 3630eminus 06 s 3241Re 1260 1189eminus 06 s 3626eminus 06 s 3046Re 12600 1187eminus 06 s 3603eminus 06 s 3035Re 126000 1191eminus 06 s 3622eminus 06 s 3044

Table 7 Time-to-solution τ for rhoCentralFoam and rhoCentral-RK4Foam

τrhoCentral τrhoCentralRK4 τrhoCentralτrhoCentralRK4Δt 2 times 10minus 5 s Δt 10minus 4 s

Re 628 848 s 532 s 159Re 1256 826 s 539 s 153

10 Scientific Programming

modeling of complex flowsrdquo Numerical simulations wereperformed using the Director Discretionary allocationldquoOF_SCALINGrdquo in the Leadership Computing Facility atArgonne National Laboratory

References

[1] H G Weller G Tabor H Jasak and C Fureby ldquoA tensorialapproach to computational continuum mechanics usingobject-oriented techniquesrdquo Computers in Physics vol 12no 6 pp 620ndash631 1998

[2] V Vuorinen J-P Keskinen C Duwig and B J BoersmaldquoOn the implementation of low-dissipative Runge-Kuttaprojection methods for time dependent flows using Open-FOAMrdquo Computers amp Fluids vol 93 pp 153ndash163 2014

[3] C Shen F Sun and X Xia ldquoImplementation of density-basedsolver for all speeds in the framework of openFOAMrdquoComputer Physics Communications vol 185 no 10pp 2730ndash2741 2014

[4] D Modesti and S Pirozzoli ldquoA low-dissipative solver forturbulent compressible flows on unstructured meshes withOpenFOAM implementationrdquo Computers amp Fluids vol 152pp 14ndash23 2017

[5] S Li and R Paoli ldquoModeling of ice accretion over aircraftwings using a compressible OpenFOAM solverrdquo Interna-tional Journal of Aerospace Engineering vol 2019 Article ID4864927 11 pages 2019

[6] Q Yang P Zhao and H Ge ldquoReactingFoam-SCI an opensource CFD platform for reacting flow simulationrdquo Com-puters amp Fluids vol 190 pp 114ndash127 2019

[7] M Culpo Current Bottlenecks in the Scalability of OpenFOAMon Massively Parallel Clusters Partnership for AdvancedComputing in Europe Brussels Belgium 2012

[8] O Rivera and K Furlinger ldquoParallel aspects of OpenFOAMwith large eddy simulationsrdquo in Proceedings of the 2011 IEEEInternational Conference on High Performance Computingand Communications pp 389ndash396 Banff Canada September2011

[9] A Duran M S Celebi S Piskin and M Tuncel ldquoScalabilityof OpenFOAM for bio-medical flow simulationsrdquoCe Journalof Supercomputing vol 71 no 3 pp 938ndash951 2015

[10] Z Lin W Yang H Zhou et al ldquoCommunication optimi-zation for multiphase flow solver in the library of Open-FOAMrdquo Water vol 10 no 10 pp 1461ndash1529 2018

[11] R Ojha P Pawar S Gupta M Klemm and M NambiarldquoPerformance optimization of OpenFOAM on clusters ofIntel Xeon Phitrade processorsrdquo in Proceedings of the 2017 IEEE24th International Conference on High Performance Com-puting Workshops (HiPCW) Jaipur India December 2017

[12] C J Greenshields H G Weller L Gasparini and J M ReeseldquoImplementation of semi-discrete non-staggered centralschemes in a colocated polyhedral finite volume frameworkfor high-speed viscous flowsrdquo International Journal for Nu-merical Methods in Fluids vol 63 pp 1ndash21 2010

[13] A Kurganov and E Tadmor ldquoNew high-resolution centralschemes for nonlinear conservation laws and convection-diffusion equationsrdquo Journal of Computational Physicsvol 160 no 1 pp 241ndash282 2000

[14] A Kurganov S Noelle and G Petrova ldquoSemidiscrete central-upwind schemes for hyperbolic conservation laws andHamiltonndashJacobi equationsrdquo SIAM Journal on ScientificComputing vol 23 no 3 pp 707ndash740 2006

[15] J A Heyns O F Oxtoby and A Steenkamp ldquoModellinghigh-speed viscous flow in OpenFOAMrdquo in Proceedings of the9th OpenFOAM Workshop Zagreb Croatia June 2014

[16] M S Liou and C J Steffen Jr ldquoA new flux splitting schemerdquoJournal of Computational Physics vol 107 no 23 p 39 1993

[17] D Drikakis M Hahn A Mosedale and B )ornber ldquoLargeeddy simulation using high-resolution and high-ordermethodsrdquo Philosophical Transactions of the Royal Society AMathematical Physical and Engineering Sciences vol 367no 1899 pp 2985ndash2997 2019

[18] K Harms T Leggett B Allen et al ldquo)eta rapid installationand acceptance of an XC40 KNL systemrdquo Concurrency andComputation Practice and Experience vol 30 no 1 p 112019

[19] S S Shende and A D Malony ldquo)e TAU parallel perfor-mance systemrdquo Ce International Journal of High Perfor-mance Computing Applications vol 20 no 2 pp 287ndash3112006

[20] C Chevalier and F Pellegrini ldquoPT-Scotch a tool for efficientparallel graph orderingrdquo Parallel Computing vol 34 no 6ndash8pp 318ndash331 2008

[21] V Schmitt and F Charpin ldquoPressure distributions on theONERA-M6-wing at transonic mach numbersrdquo in Experi-mental Data Base for Computer Program Assessment andReport of the Fluid Dynamics Panel Working Group 04AGARD Neuilly sur Seine France 1979

[22] P Woodward and P Colella ldquo)e numerical simulation oftwo-dimensional fluid flow with strong shocksrdquo Journal ofComputational Physics vol 54 no 1 pp 115ndash173 1984

[23] G M Amdahl ldquoValidity of the single processor approach toachieving large-scale computing capabilitiesrdquo in Proceedingsof the 1967 spring joint computer conference onmdashAFIPSrsquo67(Spring) vol 30 pp 483ndash485 Atlantic NJ USA April 1967

[24] G Axtmann and U Rist ldquoScalability of OpenFOAM withlarge-leddy simulations and DNS on high-performancesytemsrdquo in High-Performance Computing in Science andEngineering W E Nagel Ed Springer International Pub-lishing Berlin Germany pp 413ndash425 2016

[25] A Shah L Yuan and S Islam ldquoNumerical solution of un-steady Navier-Stokes equations on curvilinear meshesrdquoComputers amp Mathematics with Applications vol 63 no 11pp 1548ndash1556 2012

[26] J Kim and P Moin ldquoApplication of a fractional-step methodto incompressible Navier-Stokes equationsrdquo Journal ofComputational Physics vol 59 no 2 pp 308ndash323 1985

[27] A Quarteroni F Saleri and A Veneziani ldquoFactorizationmethods for the numerical approximation of Navier-Stokesequationsrdquo Computer Methods in Applied Mechanics andEngineering vol 188 no 1ndash3 pp 505ndash526 2000

Scientific Programming 11

Page 2: ScalabilityofOpenFOAMDensity-BasedSolverwithRunge–Kutta ...downloads.hindawi.com/journals/sp/2020/9083620.pdfmethods in OpenFOAM for transient flow calculations. eseinclude,forexample,numericalalgorithmsforDNS

and time )erefore there is a growing interest in the CFDcommunity to develop and implement higher-ordermethods in OpenFOAM for transient flow calculations)ese include for example numerical algorithms for DNSLES of incompressible flows [2] compressible flows [3ndash5]and reacting flows [6]

In addition to high-order numerical schemes parallelefficiency and scalability are of course crucial for high-performance computing of complex flows Parallelism inOpenFOAM is implemented by means of MPI (MessagePassing Interface) although previous scalability analysis ofincompressible OpenFOAM solvers showed limited speedup[7 8] )e improvement of scaling performance in Open-FOAM has been a concern in many recent studies In orderto optimize the performance it is crucial to first conductperformance analysis to find the bottleneck of the simulationprocess Culpo [7] found that the communication is thebottleneck in scalability of solvers in OpenFOAM for large-scale simulations Duran et al [9] studied the speedup oficoFoam solver for different problem sizes and showed thatthere is a large room for scalability improvement Lin et al[10] proposed a communication optimization method formultiphase flow solver in OpenFOAM Ojha et al [11]applied optimizations to the geometric algebraic multigridlinear solver and showed an improved scaling performance

In this work we are primarily interested in compressibleflow solvers for aeronautical applications In the standarddistribution of OpenFOAM rhoCentralFoam is the onlydensity-based solver for transient compressible flows [12] Itis built upon a second-order nonstaggered central schemebased on KT (Kurganov and Tadmor [13]) and KNP(Kurganov Noelle and Petrova [14]) methods )ere are afew studies that tried to improve the numerical algorithm ofdensity-based solver Heyns et al [15] extended the rho-CentralFoam solver through the implementation of alter-native discretization schemes to improve its stability andefficiency More recently Modesti and Pirozzoli [4] devel-oped a solver named rhoEnergyFoam relying on the AUSMscheme by Liou and Steffen Jr [16] By advancing using alow-storage third-order four-stage RungendashKutta algorithmfor time advancement their solver showed reduced nu-merical diffusion and better conservation properties com-pared to rhoCentralFoam in both steady and unsteadyturbulent flows

In this work we present a new OpenFOAM solvernamed rhoCentralRK4Foam which is derived from rho-CentralFoam by replacing its first-order time advancementscheme with a third-order four-stage RungendashKutta scheme)e aim of developing this solver is twofold (i) to improvethe scaling performance of rhoCentralFoam especially inlarge-scale simulations (ii) to improve time accuracy andoverall time-to-solution using high-order RungendashKuttascheme [17] Instead of trying to optimize the parallelismembedded in OpenFoam which has been shown to be veryhard to achieve (see eg Culpo [7]) our proposed approachis to select a different numerical integration scheme that showsimproved CPU and scalability performances with minimalmodification of the standard distribution )e cases areconfigured to solve the PDEs through the rhoCentralFoam

and rhoCentralRK4Foam solvers We investigate the parallelperformance of rhoCentralFoam and rhoCentralRK4Foam onCray XC40 system )eta [18] )ese two solvers are bench-marked on two cases an inviscid transonic flow over theONERA M6 wing and a supersonic flow over the Forward-facing step to validate the new solverrsquos shock capturing ca-pability )e TAU (Tuning and Analysis Utilities) Perfor-mance System analyzer [19] is used to collect the hotspotprofiles of the two solvers)e strong and weak scaling tests ofthe benchmark problems are conducted on )eta up to 4096cores For the time-to-solution analysis we considered theTaylorndashGreen vortex problem and determined the time-to-solution needed by the two solvers to get a solution with thesame accuracy (same numerical error) with respect to theanalytical solution )e present study is also aimed at pro-viding users a handle on parameter choices for performanceand scalability

)e rest of the paper is organized as follows in Section 2a description of numerical methods for the two solvers aswell as the hardware and profiling tools are presented )ebenchmark test cases and the results for the scalabilityanalysis are presented in Section 3 In Section 4 we presentthe results for the time-to-solution analysis Section 5 showsthe conclusion

2 Methods

21 Governing Equations and Spatial Integration )e solverrhoCentralFoam is the most widely used compressible flowsolver in the baseline distribution of OpenFOAM It relies onthe full discretization of the convective fluxes through thecentral TVD scheme (total variation diminishing) of KTandKNP schemes rhoCentralFoam solves the governing fluidequations in an Eulerian frame of reference for three con-servative variables specifically density (ρ) momentumdensity (ρu) and total energy density (ρE)

zρzt

+ nabla middot (ρu) 0 (1)

zρuzt

+ nabla middot (ρuu) minusnablap + nabla middot τ (2)

zρE

zt+ nabla middot (ρuE) + nabla middot (pu) nabla middot (T middot u) + nabla middot J (3)

In the above equations E equiv e(T) + 12u middot u where e(T)

is the internal energy and T is the temperature p is thethermodynamic pressure related to the temperature anddensity through the equation of state for ideal gasesp ρRT where R is the gas constant and τ 2μ[(12)(nablau +

nablaTu) minus 13(nabla middot u)I] is the viscous stress tensor under theassumption of Newtonian fluid with dynamic viscosity μwhile I is the unit tensor Finally J minusλnablaT is the heat fluxvector where λ is the thermal conductivity

)e governing equations are discretized using a finitevolume method where equations (1)ndash(3) are integrated overa control volume represented by a grid cell of volume VUsing Gauss theorem all fluxes can be transformed intosurface integrals across the cell boundaries which are

2 Scientific Programming

estimated by summing up the flux contributions from allfaces f of the cell )e volume average of the state vector ofconserved variables Ψ equiv [ρ ρu ρE] is given by

Ψ 1V

1113946VΨdV (4)

So the integral form of equations (1)ndash(3) can beexpressed as

dΨdt

L(Ψ) (5)

where L(Ψ) denotes the operator returning the right-handside of equations (1)ndash(3) containing all inviscid and viscousfluxes )ese fluxes have to be estimated numerically usingthe volume-averaged state variables Ψ of adjacent cells Inparticular the convective fluxes are obtained as

1113946Vnabla middot (uΨ)dV 1113946

SuΨ middot dS asymp 1113944

fisinSuf middot SfΨf1113872 1113873 1113944

fisinSϕfΨf

(6)

where subscript f identifies the faces of the cell volumewhereas the termsΨf uf Sf and ϕf equiv uf middot Sf represent thestate vector velocity surface area and volumetric flux at theinterface between two adjacent cells respectively )eproduct ϕfρf is obtained by applying the KT scheme

ϕfΨf 12

ϕ+fΨ

++ ϕminus

fΨminusf1113872 1113873 minus

12αmax Ψminus

f minus Ψ+f1113872 1113873 (7)

where + andminus superscripts denote surface values interpo-lated from the owner cell and neighbor cell respectivelywhile αmax is an artificial diffusive factor (see [12] for details)In the implementation of rhoCentralRK4Foam the Eu-lerian fluxes are firstly calculated explicitly as shown inAlgorithm 1 where ldquophirdquo represents ϕfρf ldquophiUprdquo repre-sents ϕfρfUf + Sfpf with pf 12(p+

f + pminusf) and ldquophiEprdquo

represents ϕfρfEf + ϕfpf

22 Time Integration After calculating fluxes and gradientsaccording to equations (6) and (7) equation (5) can benumerically integrated between time tn and tn+1 tn + Δtusing multistage RungendashKutta scheme Denoting by Ns thenumber of stages this yields

Ψl

n Ψ0n + αlΔtL Ψlminus1n1113874 1113875 l 1 Ns (8)

where Ψ0n equiv Ψ(tn) and ΨNs

n equiv Ψ(tn+1) In the proposedsolver rhoCentralRK4Foam we use a four-stage Run-gendashKutta scheme which is obtained by setting Ns 4 andα1 14 α2 13 α3 12 and α4 1 )e implementa-tion in OpenFOAM is reported in Algorithm 2 and the C++code is publicly available at httpsgithubcomSiboLi666rhoCentralRK4Foam Note that the original Euler schemeimplemented in rhoCentralFoam can be simply obtainedfrom equation (8) by setting Ns 1 and α1 1

23 Hardware and Profiling Tools )e simulations were runon supercomputer platform ldquo)etardquo at Argonne National

Laboratory )eta is a Cray XC40 system with a second-generation Intel Xeon Phi (Knights Landing) processor andthe Cray Aries proprietary interconnect )e system com-prises 4392 compute nodes Each compute node is a singleXeon Phi chip with 64 cores 16GB Multi-Channel DRAM(MCDRAM) and 192GB DDR4 memory To avoid anypossible memory issues with OpenFoam the simulationswere run using 32 cores of each computing node (half of itsmaximum capacity)

We assess the scalability and parallel efficiency of the twosolvers rhoCentralFoam and rhoCentralRK4Foam by an-alyzing their speedup on the same grid and by monitoringthe portions of CPU time spent in communication and incomputation respectively )ese measurements are per-formed with performance tools adapted to the machine Inthe strong scaling tests the tools are set to start countingafter the initial setup stage is completed in our case after 20time steps lasting for an extra 75 time steps )e strongscaling tests are performed on)eta In order to measure thetime spent in communication we rely on the TAU Per-formance system [19])e TAU performance tool suite is anopen-source software developed at the University of Oregonand offers a diverse range of performance On )eta theOpenFOAM makefile is modified to utilize the TAUwrapper functions for compilation TAU can automaticallyparse the source code and insert instrumentation calls tocollect profile andor trace data which allow us to measurethe total time spent in communication during the targeted75 time steps and identify the hotspots for the two solvers

3 Results for the Scalability Analysis

31 Test Cases Description For the scalability analysis wetested the new solver rhoCentralRK4Foam in two differentbenchmark cases (i) the three-dimensional ONERA M6transonic wing (ii) the two-dimensional supersonic forwardfacing step)e two cases are steady flows they are first usedfor validating the new solverrsquos shock-capturing capabilityand then used for detailed parallel performance analysis ofthe solver in the next section For the two cases we used thedecomposePar tool in OpenFOAM to decompose thegenerated mesh )e scotch decomposition method [20] isused to partition the mesh into subdomains A brief de-scription of the two cases is presented next

311 Transonic M6Wing In the ONERAM6 wing case themean aerodynamic chord is c 053m the semispan isb 1m and the computational domain extends to 20c )eangle of attack is 306deg and the free stream mach number isMa 08395 )e Reynolds number in the original exper-iment was Re 1172 times 106 and so the flow is certainlyturbulent however in order to capture the pressure dis-tribution and the shock location along the wing inviscidcalculations can be safely employed and are indeed cus-tomary (see for example Modesti and Pirozzoli [4]) )egeometry is meshed using hexahedral elements and threegrids with different sizes are generated Grid1 has 1 millioncells (shown in Figure 1(a)) Grid2 has 5 million cells and

Scientific Programming 3

Grid3 has 43 million cells )e flow-filed analysis wasconducted on Grid1 (which is sufficient for grid conver-gence) while Grid2 and Grid 3 were used for scalinganalysis Figure 1(b) shows the pressure distribution on thewing surface computed by using rhoCentralFoam andrhoCentralRK4Foam In Figure 1(c) the pressure coefficientat the inner wing section (20 span) computed by the twosolvers is compared with the experimental data (blue circlesymbols [21]) We can observe that the newly developedsolver captures the main flow features It generates apressure distribution on the wing surface similar to thatobtained with rhoCentralFoam and captures the shock lo-cation precisely

312 Supersonic Forward-Facing Step Computations ofinviscid flow over a forward-facing step are used tofurther verify the shock-capturing capability of

rhoCentralRK4Foam solver )e flow configuration used byWoodward and Colella [22] is considered in this work )esupersonic flow mach number is Mainfin 3 )e grid hasNx times Ny 240 times 80 cells in the coordinate directions )eshock pattern represented by density distribution shown inFigure 2 confirms that the rhoCentralRK4Foam is capable ofcapturing strong shocks for supersonic flows

32 Strong ScalingTests In the strong scaling tests we testedthe rhoCentralFoam and rhoCentralRK4Foam solvers onthree different mesh sizes of ONERA M6 wing Here wepresent the scalability results by showing the speedup as wellas the speedup increment percentage for rhoCen-tralRK4Foam for Grid1 up to 1024 ranks in Table 1 Grid2 upto 2048 ranks in Table 2 and Grid3 up to 4096 ranks inTable 3 Note that the scaling is presented as CPU timenormalized by the CPU time obtained with 128 ranks which

Result construct Eulerian fluxes explicitlywhile runTimeloop() doSaving quantities at preavious time steps if rk4 thenlowast )e following firstly constructs the Eulerian fluxes by applying the Kurganov and Tadmor (KT) scheme lowastsurfaceScalarField phi(ldquophirdquo(aphiv poslowast rho pos + aphiv neglowast rho neg))surfaceVectorField phiUp(ldquophiUprdquo(aphiv poslowast rhoU pos + xaphiv neglowast rhoU neg)

+ (a poslowastp pos + a neglowastp neglowastmeshSf))surfaceScalarField phiEp(ldquophiEprdquo(aphiv poslowast (rho poslowast (e pos + 05lowastmagSqr(U pos)) + p pos)

+ aphiv neglowast (rho neglowast (e neg + 05lowastmagSqr(U neg)) + p neg)

+ aSflowastp pos minus aSflowastp neg))lowast )en the divergence theorem (Gaussrsquos law) is applied to relate the spatial integrals to surface integralslowastvolScalarField phiSum (ldquophiSumrdquo fvc di v(phi))volVectorField phiUpSum2 (ldquophiUpSum2rdquo fvc di v(tauMC))volVectorField phiUpSum3 (ldquophiUpSum3rdquo fvc laplacian(muEff U))

elseend

end

ALGORITHM 1)e Eulerian fluxes are calculated explicitly so that the ordinary differential equations system can be constructed and furtherintegrated in time by using explicit RungendashKutta algorithm

Result Calculate the three conservative variablesfor (int cycle 0 cyclelt rkloopsize cycle + +)lowast )e following calculates the three conservative variables specifically density (ρ) momentum density (ρU) and total energy

density (ρE) lowastrho rhoOl d + rk[cycle]lowast runTimede ltaTlowast(minusphiSum)rhoU rhoUOl d + rk[cycle]lowast runTimede ltaTlowast(minusphiUpSum + phiUpSum2 + phiUpSum3)rhoE rhoEOl d + rk[cycle]lowast runTimede ltaTlowast(minusenFl + enFl2)

ALGORITHM 2 )e time integration of the resulting ordinary differential equations system is carried out by a third-order four-stageRungendashKutta algorithm

4 Scientific Programming

is the minimum number of ranks that fit the memory re-quirements for the largest grid For example for 4096 ranksthe ideal speedup would then be 4096128 32 Itcan be observed that rhoCentralRK4Foam outperformsrhoCentralFoam in speedup in each case For the same gridsize the speedup increment percentage increases as the

number of ranks increases To better illustrate and analyzethe scaling performance improvement the results reportedin the three tables are summarized in Figure 3 First we canobserve that for the Grid3 there is a dramatic increase inspeedup when the number of ranks rises from 1024 to 2048and from 2048 to 4096)e speedup of rhoCentralRK4Foam

(a)

075 090 10 12060p

(b)

rhoCentralFoamrhoCentralRK4FoamExperiment

ndash10

ndash05

00

05

10

15

ndashCp

02 04 06 08 1000(x ndash x1)(xt ndash x1)

(c)

Figure 1 ONERAM6 transonic wing results (a) computational mesh (Grid 1) (b) pressure distribution computed with the proposed solverrhoCentralRK4Foam (c) comparison of pressure coefficient at section located (20 span) computed with rhoCentralRK4Foam and theoriginal rhoCentralFoam

190

340

490

0400

640rho

Figure 2 Density distribution in supersonic forward facing step simulations obtained using rhoCentralRK4Foam

Scientific Programming 5

has a 16 times increase from 1024 ranks to 2048 ranks and a17 times increase from 2048 ranks to 4096 ranks for Grid3)e reason is that rhoCentralFoam scales very slowly after1024 ranks eventually reaching a plateau which is an in-dication the solver attained its maximum theoreticalspeedup It is indeed instructive to estimate the serial portionf of CPU time from the speedup of the two solvers usingAmdahlrsquos law [23]

S Nr( 1113857 Nr

1 + Nr minus 1( 1113857(1 minus f) withf

tp

tp + ts

(9)

where tp and ts are the CPU time spent in parallel and serial(ie not parallelizable) sections of the solver respectivelyFrom the previous equations we have

f 1 minus 1S Nr( 1113857( 1113857

1 minus 1Nr( 1113857⟶ f 1 minus

1Sinfin

(10)

where Sinfin equiv S(Nr⟶infin) is the maximum theoreticalspeedup obtained for Nr⟶infin By measuring Sinfin we candirectly evaluate f from equation (10) for the cases featuringasymptotic speedup for rhoCentralFoam (Grid3) this yieldsf≃9988 We can also estimate f but best fitting S(Nc) toAmdahlrsquos formula in equation (9) for rhoCentralRK4Foamwhich yields f≃9998 Of course these results can beaffected by other important factors such as latency andbandwidth of the machine that are not taken into account inAmdahlrsquos model We also observe that the scalability be-comes better as the problem size increases as the workload of

communication over computation decreases For examplewith 2048 ranks the speedup for rhoCentralRK4Foam is6005 for Grid2 (5 million cells) while it becomes 9115 forGrid3 (43million cells) Moreover we can also see that as thegrid size increases the maximum speedup increment per-centage also increases which corresponding to 16 32and 123 for Grid1 Grid2 and Grid3 respectively As afinal exercise the TAU performance tool was applied toprofile the code for Grid3 with 2048 ranks Figure 4 alsoshows the communication and computation time percent-age for rhoCentralFoam and rhoCentralRK4Foam on Grid3from 128 to 4096 ranks We can observe that the rho-CentralRK4Foam solver takes less time to communicatewhich lead to the improved parallel performance found inprevious tests In addition among the MPI subroutines thatare relevant for communication (ie called every time stepin the simulation) MPI_waitall() was the one that employedmost of the time (see Figure 5) which is in line with previousprofiling (see for example Axtmann and Rist [24])

33 Weak Scaling Tests In order to further analyze theparallel scalability of rhoCentralRK4Foam solver we con-ducted a weak scaling analysis based on the ONERA M6wing case and the forward-facing step case )e grid sizeranges from 700 000 to 44 800 000 cells in ONERA M6wing case and from 1 032 000 to 66 060 000 cells in theforward-facing step case )e configurations of the weakscaling test cases are the same as the one in the previous testcases )e number of ranks ranges from 16 to 1024 and 64 to4096 respectively )e number of grid points per rank staysconstant at around 43 times 103 and 16 times 103 for two caseswhich ensures that the computation workload is not im-pacted much by communication Denoting by τ the CPUtime for a given run the relative CPU time is obtained by

Table 1 Normalized speedup for Grid1 (ONERA M6 wing) up to1024 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1341 1367 1939512 1594 1814 138021024 2364 2747 16201

Table 2 Normalized speedup for Grid2 (ONERA M6 wing) up to2048 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1574 1601 1715512 2308 2423 49831024 2916 3575 225992048 4541 6005 32240

Table 3 Normalized speedup for Grid3 (ONERA M6 wing) up to4096 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1871 1896 1336512 3275 3527 76951024 5098 5809 139472048 6341 9115 437474096 6895 1539 123205

rhoCentralFoam-Grid3 rhoCentralRk4Foam-Grid3rhoCentralRk4Foam-Grid2

rhoCentralFoam-Grid1 rhoCentralRk4Foam-Grid1Linear

0

6

12

18

24

30

36

Spee

dup

1024 2048 3072 40960ranks

rhoCentralFoam-Grid2

Figure 3 Strong scaling normalized speedups of rhoCentralFoamand rhoCentralRK4Foam for three grids of the M6 wing test case

6 Scientific Programming

normalizing with respect to the values obtained for 16 and 64ranks ττRanks 16 for 16 ranks and ττRanks 64 for 64ranks )e relative CPU time are recorded in Tables 4 and 5)e comparison of the two solvers is also plotted in Figures 6and 7 )e weak scaling tests give us indication about howwell does the two solvers scale when the problem size isincreased proportional to process count From Tables 4 and5 and Figures 6 and 7 it can be observed that for lower MPItasks (16 to 64) both of the two solvers scale reasonably wellHowever for higher MPI tasks (128 to 1024) the rho-CentralRK4Foam solver scales better Remarkably we canobserve a distinguishable relative time difference betweenthe two solvers as the number of ranks increases from 512 to1024 )e profiling results from TAU show that in the testcase with 1024 ranks rhoCentralRK4Foam spends around

40 time on computation while rhoCentralFoam only hasaround 20 time spent on computation As for the forward-facing step case we were able to conduct the tests at largercores count (up to 4096 cores) and it can be confirmed thatrhoCentralRK4Foam solver has better scaling performancethan rhoCentralFoam solver Indeed it scales better withlarger grid size and the relative time of rhoCentralRK4Foamsolver maintains at 1063 with 4096 cores (the relative time is1148 with 1024 cores in ONERA M6 wing case) Generallythe rhoCentralRK4Foam solver outperforms the rhoCen-tralFoam solver at large-scale ranks due to communicationhiding

rhoCentralFoam communicationrhoCentralRk4Foam communication

0

20

40

60

80

100

Tim

e per

cent

age (

)

600 1200 1800 2400 3000 3600 42000ranks

(a)

rhoCentralFoam communicationrhoCentralRk4Foam communication

0

20

40

60

80

100

Tim

e per

cent

age (

)

600 1200 1800 2400 3000 3600 42000ranks

(b)

Figure 4 Communication (a) and computation (b) time percentage for the strong scaling analysis (M6 test case Grid 3)

128 256 512 1024 20480

2

4

6

8

10

12

14

489

468

356

519

416

114

2

101

6

883 9

52

766

MPI

wai

tall

()

rhoCentralFoamrhoCentralRK4Foam

Figure 5 MPI waitall() time percentage for rhoCentralFoam andrhoCentralRK4Foam for M6 test case (Grid3) up to 2048 ranks

Table 4 Relative time results ττRanks 16 of weak scaling test(ONERA M6 wing)

Ranks rhoCentralFoam rhoCentralRK4Foam16 1 132 1085 106264 1106 1105128 1148 1057256 1191 1100512 1213 11581024 1255 1148

Table 5 Relative time results ττRanks 64 of the weak scaling test(forward-facing step)

Ranks rhoCentralFoam rhoCentralRK4Foam64 1 1128 1034 1002256 1085 1018512 1153 10311024 1186 10412048 1212 10674096 1203 1063

Scientific Programming 7

4 Results for the Time-to-Solution Analysis

)e scaling analysis is an important exercise to evaluate theparallel performance of the code on multiple cores howeverit does not provide insights into the time accuracy of thenumerical algorithm nor into the time-to-solution needed toevolve the discretized system of equations (equations(1)ndash(3)) up to a final state within an acceptable numericalerror For example in the scaling study conducted on theM6transonic wing the number of iterations was fixed for bothsolvers rhoCentralFoam and rhoCentralRK4Foam andcalculations were performed in the inviscid limit so that theimplicit solver used in rhoCentralFoam to integrate the

viscous fluxes was not active In such circumstancescomparing the two solvers essentially amounts at comparingfirst-order (Euler) and fourth-order forward in time Run-gendashKutta algorithms Hence it is not surprising that theCPU time for rhoCentralRK4Foam is about four times largerthan rhoCentralFoam (for the same number of iterations)since it requires four more evaluations of the right-hand-sideof equations (5) per time step For a sound evaluation of thetime-to-solution however we need to compare the twosolvers over the same physical time at which the errors withrespect to an exact solution are comparable Because the timeaccuracy of the solvers is different time step is also differentand so are the number of iterations required to achieve thefinal state

41 Test Case Description For the time-to-solution analysiswe consider the numerical simulation of TaylorndashGreen (TG)vortex a benchmark problem in CFD [25] for validatingunsteady flow solvers [26 27] requiring time accurate in-tegration scheme as the RungendashKutta scheme implementedin rhoCentralRK4Foam )ere is no turbulence model ap-plied in the current simulation )e TG vortex admits an-alytical time-dependent solutions which allows for precisedefinition of the numerical error of the algorithm withrespect to a given quantity of interest )e flow is initializedin a square domain 0lex yleL as follows

ρ(x y t) ρ0 (11)

u(x y 0) minusU0 cos kx sin ky (12)

v(x y 0) U0 sin kx cos ky (13)

p(x y 0) p0 minusρ0U2

04

(cos 2 kx + cos 2 ky) (14)

where u and v are the velocity components along x and ydirections k equiv 2πL is the wave number and U0 ρ0 and p0are the arbitrary constant velocity density and pressurerespectively )e analytical solution for the initial-valueproblem defined by equations (11)ndash(14) is

ρ(x y t) ρ0 (15)

u(x y t) minusU0 cos kx sin kyeminus 2]tk2

(16)

v(x y t) U0 sin kx cos kyeminus 2]tk2

(17)

p(x y t) p0 minusρ0U2

04

(cos 2 kx + cos 2 ky)eminus 4]tk2

(18)

which shows the velocity decay due to viscous dissipationAs a quantity of interest we chose to monitor both thevelocity profiles u at selected location and the overall kineticenergy in the computational box which can be readilyobtained from equations (16)ndash(18) as

ε 1L2 1113946

L

01113946

L

0

u2 + v2

2dxdy

U204

eminus 4]tk2

(19)

13

12

11

10

0 400ranks

Rela

tive t

ime

rhoCentralFoamrhoCentralRK4Foam

800 1200

Figure 6 Relative time ττRanks 16 of weak scaling tests forONERA M6 wing case

0 1000 2000ranks

3000 4000 5000

Rela

tive t

ime

rhoCentralFoamrhoCentralRK4Foam

12

11

10

Figure 7 Relative time ττRanks 64 of weak scaling tests forforward-facing step

8 Scientific Programming

42 Comparison of Time-to-Solution For the analysis ofrhoCentralFoam and rhoCentralRK4Foam solvers we con-sider fixed box size L 2πm and U0 1ms and two valuesof Reynolds number Re LU0v to test the effect of vis-cosity )e computation is done on a 160 times 160 pointsuniform grid )e simulations run up to t 058 s for Re

628 and t 116 s for Re 1256 which corresponds to thesame nondimensional time ttref 00147 with tref equiv L2]At this time the kinetic energy ε decreased to 10 from theinitial value as an illustrative example Figure 8 shows thevelocity magnitude contour for the Re 628 Figure 9presents the computed velocity profile u(y) in the middle

of the box x L2 rhoC stands for rhoCentralFoam solverand rhoCRK4 stands for rhoCentralRK4Foam solver It canbe observed that rhoCentralRK4Foam shows an excellentagreement with the analytical profile for Δt 10minus 4 s (re-ducing the time step further does provide any furtherconvergence for the grid used here) In order to achieve thesame level of accuracy with rhoCentralFoam time step hasto be reduced by a factor of 5 to Δt 2 times 10minus 5 s compared torhoCentralRK4Foam )e same behavior is observed for theevolution of ε as shown in Figure 10

In order to calculate the time-to-solution τ we firstevaluated the CPU time τ0 per time step per grid point per

0081275

016255

024383

4939e ndash 10

3251e ndash 01U magnitude

Y

ZX

Figure 8 Contour of velocity magnitude U equiv (u2 + v2)12 for the TaylorndashGreen vortex at ttref 00147 for the case Re 628

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

ndash030

ndash015

000

015

030

u

157 314 471 628000y

(a)

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

157 314 471 628000y

ndash030

ndash015

000

015

030

u

(b)

Figure 9 Velocity distribution in the TaylorndashGreen vortex problem at ttref 00147 (a)Re 628 (b)Re 1256

Scientific Programming 9

core for both solvers We used an Intel Xeon E5-2695v4processor installed on )eta and Bebop platforms atArgonne National Laboratory and reported the results of thetests in Table 6 Even though τ0 depends on the specificprocessor used but the ratio τ0rhoCentralRK4τ

0rhoCentral does not

and so is a good indication of the relative performance of thetwo solvers In the present case this ratio is consistentlyfound to be 3 for all Reynolds number we made additionaltests with much higher Reynolds up to Re sim 105 to confirmthis (see Table 6) Denoting by tf the physical final time ofthe simulation by Niter the number of iterations and byNpoints the number of grid points the time-to-solution forthe two solvers can be obtained as

τ τ0 times Niter times Npoints withNiter tf

Δt (20)

and is reported in Table 7 It is concluded that to achieve thesame level of accuracy τrhoCentralRK4 is about 15 to 16 timessmaller than τrhoCentral Additionally as we discussed in thescaling analysis rhoCentralRK4Foam can achieve up to123 improvement in scalability over rhoCentralFoamwhen using 4096 ranks )us the reduction in time-to-solution for rhoCentralRK4Foam is expected to be evenlarger for large-scale time-accurate simulations utilizingthousands of parallel cores

5 Conclusion

In this study we presented a new solver called rhoCen-tralRK4Foam for the integration of NavierndashStokes equationsin the CFD package OpenFOAM)e novelty consists in thereplacement of first-order time integration scheme of thenative solver rhoCentralFoam with a third-order Run-gendashKutta scheme which is more suitable for the simulationof time-dependent flows We first analyzed the scalability ofthe two solvers for a benchmark case featuring transonicflow over a wing on a structured mesh and showed thatrhoCentralRK4Foam can achieve a substantial improvement(up to 120) in strong scaling compared to rhoCen-tralFoam We also observed that the scalability becomesbetter as the problem size grows Hence even thoughOpenFOAM scalability is generally poor or decent at bestthe new solver can at least alleviate this deficiency We thenanalyzed the performance of the two solvers in the case ofTaylorndashGreen vortex decay and compared the time-to-so-lution which gives an indication of the workload needed toachieve the same level of accuracy of the solution (ie thesame numerical error) For the problem considered here weobtained a reduction of time-to-solution by a factor of about15 when using rhoCentralRK4Foam compared to rho-CentralFoam due to the use of larger time steps to get to thefinal solution with the same numerical accuracy )is factorwill be eventually even larger for larger-scale simulationsemploying thousands or more cores thanks to the betterspeedup of rhoCentralRK4Foam )e proposed solver ispotentially a useful alternative for the simulation of com-pressible flows requiring time-accurate integration like indirect or large-eddy simulations Furthermore the imple-mentation in OpenFOAM can be easily generalized todifferent schemes of RungendashKutta family with minimal codemodification

Data Availability

)e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

)e authors declare that they have no conflicts of interest

Acknowledgments

)is work was supported by Argonne National Laboratorythrough grant ANL 4J-30361-0030A titled ldquoMultiscale

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

000

005

010

015

020

025

ε

01 02 03 04 05 0600Time (s)

Figure 10 Evolution of total kinetic energy

Table 6 CPU time per time step per grid point per core τ0

τ0rhoCentral τ0rhoCentralRK4 τ0rhoCentralRK4τ0rhoCentral

Re 628 1230eminus 06 s 3590eminus 06 s 2919Re 1256 1120eminus 06 s 3630eminus 06 s 3241Re 1260 1189eminus 06 s 3626eminus 06 s 3046Re 12600 1187eminus 06 s 3603eminus 06 s 3035Re 126000 1191eminus 06 s 3622eminus 06 s 3044

Table 7 Time-to-solution τ for rhoCentralFoam and rhoCentral-RK4Foam

τrhoCentral τrhoCentralRK4 τrhoCentralτrhoCentralRK4Δt 2 times 10minus 5 s Δt 10minus 4 s

Re 628 848 s 532 s 159Re 1256 826 s 539 s 153

10 Scientific Programming

modeling of complex flowsrdquo Numerical simulations wereperformed using the Director Discretionary allocationldquoOF_SCALINGrdquo in the Leadership Computing Facility atArgonne National Laboratory

References

[1] H G Weller G Tabor H Jasak and C Fureby ldquoA tensorialapproach to computational continuum mechanics usingobject-oriented techniquesrdquo Computers in Physics vol 12no 6 pp 620ndash631 1998

[2] V Vuorinen J-P Keskinen C Duwig and B J BoersmaldquoOn the implementation of low-dissipative Runge-Kuttaprojection methods for time dependent flows using Open-FOAMrdquo Computers amp Fluids vol 93 pp 153ndash163 2014

[3] C Shen F Sun and X Xia ldquoImplementation of density-basedsolver for all speeds in the framework of openFOAMrdquoComputer Physics Communications vol 185 no 10pp 2730ndash2741 2014

[4] D Modesti and S Pirozzoli ldquoA low-dissipative solver forturbulent compressible flows on unstructured meshes withOpenFOAM implementationrdquo Computers amp Fluids vol 152pp 14ndash23 2017

[5] S Li and R Paoli ldquoModeling of ice accretion over aircraftwings using a compressible OpenFOAM solverrdquo Interna-tional Journal of Aerospace Engineering vol 2019 Article ID4864927 11 pages 2019

[6] Q Yang P Zhao and H Ge ldquoReactingFoam-SCI an opensource CFD platform for reacting flow simulationrdquo Com-puters amp Fluids vol 190 pp 114ndash127 2019

[7] M Culpo Current Bottlenecks in the Scalability of OpenFOAMon Massively Parallel Clusters Partnership for AdvancedComputing in Europe Brussels Belgium 2012

[8] O Rivera and K Furlinger ldquoParallel aspects of OpenFOAMwith large eddy simulationsrdquo in Proceedings of the 2011 IEEEInternational Conference on High Performance Computingand Communications pp 389ndash396 Banff Canada September2011

[9] A Duran M S Celebi S Piskin and M Tuncel ldquoScalabilityof OpenFOAM for bio-medical flow simulationsrdquoCe Journalof Supercomputing vol 71 no 3 pp 938ndash951 2015

[10] Z Lin W Yang H Zhou et al ldquoCommunication optimi-zation for multiphase flow solver in the library of Open-FOAMrdquo Water vol 10 no 10 pp 1461ndash1529 2018

[11] R Ojha P Pawar S Gupta M Klemm and M NambiarldquoPerformance optimization of OpenFOAM on clusters ofIntel Xeon Phitrade processorsrdquo in Proceedings of the 2017 IEEE24th International Conference on High Performance Com-puting Workshops (HiPCW) Jaipur India December 2017

[12] C J Greenshields H G Weller L Gasparini and J M ReeseldquoImplementation of semi-discrete non-staggered centralschemes in a colocated polyhedral finite volume frameworkfor high-speed viscous flowsrdquo International Journal for Nu-merical Methods in Fluids vol 63 pp 1ndash21 2010

[13] A Kurganov and E Tadmor ldquoNew high-resolution centralschemes for nonlinear conservation laws and convection-diffusion equationsrdquo Journal of Computational Physicsvol 160 no 1 pp 241ndash282 2000

[14] A Kurganov S Noelle and G Petrova ldquoSemidiscrete central-upwind schemes for hyperbolic conservation laws andHamiltonndashJacobi equationsrdquo SIAM Journal on ScientificComputing vol 23 no 3 pp 707ndash740 2006

[15] J A Heyns O F Oxtoby and A Steenkamp ldquoModellinghigh-speed viscous flow in OpenFOAMrdquo in Proceedings of the9th OpenFOAM Workshop Zagreb Croatia June 2014

[16] M S Liou and C J Steffen Jr ldquoA new flux splitting schemerdquoJournal of Computational Physics vol 107 no 23 p 39 1993

[17] D Drikakis M Hahn A Mosedale and B )ornber ldquoLargeeddy simulation using high-resolution and high-ordermethodsrdquo Philosophical Transactions of the Royal Society AMathematical Physical and Engineering Sciences vol 367no 1899 pp 2985ndash2997 2019

[18] K Harms T Leggett B Allen et al ldquo)eta rapid installationand acceptance of an XC40 KNL systemrdquo Concurrency andComputation Practice and Experience vol 30 no 1 p 112019

[19] S S Shende and A D Malony ldquo)e TAU parallel perfor-mance systemrdquo Ce International Journal of High Perfor-mance Computing Applications vol 20 no 2 pp 287ndash3112006

[20] C Chevalier and F Pellegrini ldquoPT-Scotch a tool for efficientparallel graph orderingrdquo Parallel Computing vol 34 no 6ndash8pp 318ndash331 2008

[21] V Schmitt and F Charpin ldquoPressure distributions on theONERA-M6-wing at transonic mach numbersrdquo in Experi-mental Data Base for Computer Program Assessment andReport of the Fluid Dynamics Panel Working Group 04AGARD Neuilly sur Seine France 1979

[22] P Woodward and P Colella ldquo)e numerical simulation oftwo-dimensional fluid flow with strong shocksrdquo Journal ofComputational Physics vol 54 no 1 pp 115ndash173 1984

[23] G M Amdahl ldquoValidity of the single processor approach toachieving large-scale computing capabilitiesrdquo in Proceedingsof the 1967 spring joint computer conference onmdashAFIPSrsquo67(Spring) vol 30 pp 483ndash485 Atlantic NJ USA April 1967

[24] G Axtmann and U Rist ldquoScalability of OpenFOAM withlarge-leddy simulations and DNS on high-performancesytemsrdquo in High-Performance Computing in Science andEngineering W E Nagel Ed Springer International Pub-lishing Berlin Germany pp 413ndash425 2016

[25] A Shah L Yuan and S Islam ldquoNumerical solution of un-steady Navier-Stokes equations on curvilinear meshesrdquoComputers amp Mathematics with Applications vol 63 no 11pp 1548ndash1556 2012

[26] J Kim and P Moin ldquoApplication of a fractional-step methodto incompressible Navier-Stokes equationsrdquo Journal ofComputational Physics vol 59 no 2 pp 308ndash323 1985

[27] A Quarteroni F Saleri and A Veneziani ldquoFactorizationmethods for the numerical approximation of Navier-Stokesequationsrdquo Computer Methods in Applied Mechanics andEngineering vol 188 no 1ndash3 pp 505ndash526 2000

Scientific Programming 11

Page 3: ScalabilityofOpenFOAMDensity-BasedSolverwithRunge–Kutta ...downloads.hindawi.com/journals/sp/2020/9083620.pdfmethods in OpenFOAM for transient flow calculations. eseinclude,forexample,numericalalgorithmsforDNS

estimated by summing up the flux contributions from allfaces f of the cell )e volume average of the state vector ofconserved variables Ψ equiv [ρ ρu ρE] is given by

Ψ 1V

1113946VΨdV (4)

So the integral form of equations (1)ndash(3) can beexpressed as

dΨdt

L(Ψ) (5)

where L(Ψ) denotes the operator returning the right-handside of equations (1)ndash(3) containing all inviscid and viscousfluxes )ese fluxes have to be estimated numerically usingthe volume-averaged state variables Ψ of adjacent cells Inparticular the convective fluxes are obtained as

1113946Vnabla middot (uΨ)dV 1113946

SuΨ middot dS asymp 1113944

fisinSuf middot SfΨf1113872 1113873 1113944

fisinSϕfΨf

(6)

where subscript f identifies the faces of the cell volumewhereas the termsΨf uf Sf and ϕf equiv uf middot Sf represent thestate vector velocity surface area and volumetric flux at theinterface between two adjacent cells respectively )eproduct ϕfρf is obtained by applying the KT scheme

ϕfΨf 12

ϕ+fΨ

++ ϕminus

fΨminusf1113872 1113873 minus

12αmax Ψminus

f minus Ψ+f1113872 1113873 (7)

where + andminus superscripts denote surface values interpo-lated from the owner cell and neighbor cell respectivelywhile αmax is an artificial diffusive factor (see [12] for details)In the implementation of rhoCentralRK4Foam the Eu-lerian fluxes are firstly calculated explicitly as shown inAlgorithm 1 where ldquophirdquo represents ϕfρf ldquophiUprdquo repre-sents ϕfρfUf + Sfpf with pf 12(p+

f + pminusf) and ldquophiEprdquo

represents ϕfρfEf + ϕfpf

22 Time Integration After calculating fluxes and gradientsaccording to equations (6) and (7) equation (5) can benumerically integrated between time tn and tn+1 tn + Δtusing multistage RungendashKutta scheme Denoting by Ns thenumber of stages this yields

Ψl

n Ψ0n + αlΔtL Ψlminus1n1113874 1113875 l 1 Ns (8)

where Ψ0n equiv Ψ(tn) and ΨNs

n equiv Ψ(tn+1) In the proposedsolver rhoCentralRK4Foam we use a four-stage Run-gendashKutta scheme which is obtained by setting Ns 4 andα1 14 α2 13 α3 12 and α4 1 )e implementa-tion in OpenFOAM is reported in Algorithm 2 and the C++code is publicly available at httpsgithubcomSiboLi666rhoCentralRK4Foam Note that the original Euler schemeimplemented in rhoCentralFoam can be simply obtainedfrom equation (8) by setting Ns 1 and α1 1

23 Hardware and Profiling Tools )e simulations were runon supercomputer platform ldquo)etardquo at Argonne National

Laboratory )eta is a Cray XC40 system with a second-generation Intel Xeon Phi (Knights Landing) processor andthe Cray Aries proprietary interconnect )e system com-prises 4392 compute nodes Each compute node is a singleXeon Phi chip with 64 cores 16GB Multi-Channel DRAM(MCDRAM) and 192GB DDR4 memory To avoid anypossible memory issues with OpenFoam the simulationswere run using 32 cores of each computing node (half of itsmaximum capacity)

We assess the scalability and parallel efficiency of the twosolvers rhoCentralFoam and rhoCentralRK4Foam by an-alyzing their speedup on the same grid and by monitoringthe portions of CPU time spent in communication and incomputation respectively )ese measurements are per-formed with performance tools adapted to the machine Inthe strong scaling tests the tools are set to start countingafter the initial setup stage is completed in our case after 20time steps lasting for an extra 75 time steps )e strongscaling tests are performed on)eta In order to measure thetime spent in communication we rely on the TAU Per-formance system [19])e TAU performance tool suite is anopen-source software developed at the University of Oregonand offers a diverse range of performance On )eta theOpenFOAM makefile is modified to utilize the TAUwrapper functions for compilation TAU can automaticallyparse the source code and insert instrumentation calls tocollect profile andor trace data which allow us to measurethe total time spent in communication during the targeted75 time steps and identify the hotspots for the two solvers

3 Results for the Scalability Analysis

31 Test Cases Description For the scalability analysis wetested the new solver rhoCentralRK4Foam in two differentbenchmark cases (i) the three-dimensional ONERA M6transonic wing (ii) the two-dimensional supersonic forwardfacing step)e two cases are steady flows they are first usedfor validating the new solverrsquos shock-capturing capabilityand then used for detailed parallel performance analysis ofthe solver in the next section For the two cases we used thedecomposePar tool in OpenFOAM to decompose thegenerated mesh )e scotch decomposition method [20] isused to partition the mesh into subdomains A brief de-scription of the two cases is presented next

311 Transonic M6Wing In the ONERAM6 wing case themean aerodynamic chord is c 053m the semispan isb 1m and the computational domain extends to 20c )eangle of attack is 306deg and the free stream mach number isMa 08395 )e Reynolds number in the original exper-iment was Re 1172 times 106 and so the flow is certainlyturbulent however in order to capture the pressure dis-tribution and the shock location along the wing inviscidcalculations can be safely employed and are indeed cus-tomary (see for example Modesti and Pirozzoli [4]) )egeometry is meshed using hexahedral elements and threegrids with different sizes are generated Grid1 has 1 millioncells (shown in Figure 1(a)) Grid2 has 5 million cells and

Scientific Programming 3

Grid3 has 43 million cells )e flow-filed analysis wasconducted on Grid1 (which is sufficient for grid conver-gence) while Grid2 and Grid 3 were used for scalinganalysis Figure 1(b) shows the pressure distribution on thewing surface computed by using rhoCentralFoam andrhoCentralRK4Foam In Figure 1(c) the pressure coefficientat the inner wing section (20 span) computed by the twosolvers is compared with the experimental data (blue circlesymbols [21]) We can observe that the newly developedsolver captures the main flow features It generates apressure distribution on the wing surface similar to thatobtained with rhoCentralFoam and captures the shock lo-cation precisely

312 Supersonic Forward-Facing Step Computations ofinviscid flow over a forward-facing step are used tofurther verify the shock-capturing capability of

rhoCentralRK4Foam solver )e flow configuration used byWoodward and Colella [22] is considered in this work )esupersonic flow mach number is Mainfin 3 )e grid hasNx times Ny 240 times 80 cells in the coordinate directions )eshock pattern represented by density distribution shown inFigure 2 confirms that the rhoCentralRK4Foam is capable ofcapturing strong shocks for supersonic flows

32 Strong ScalingTests In the strong scaling tests we testedthe rhoCentralFoam and rhoCentralRK4Foam solvers onthree different mesh sizes of ONERA M6 wing Here wepresent the scalability results by showing the speedup as wellas the speedup increment percentage for rhoCen-tralRK4Foam for Grid1 up to 1024 ranks in Table 1 Grid2 upto 2048 ranks in Table 2 and Grid3 up to 4096 ranks inTable 3 Note that the scaling is presented as CPU timenormalized by the CPU time obtained with 128 ranks which

Result construct Eulerian fluxes explicitlywhile runTimeloop() doSaving quantities at preavious time steps if rk4 thenlowast )e following firstly constructs the Eulerian fluxes by applying the Kurganov and Tadmor (KT) scheme lowastsurfaceScalarField phi(ldquophirdquo(aphiv poslowast rho pos + aphiv neglowast rho neg))surfaceVectorField phiUp(ldquophiUprdquo(aphiv poslowast rhoU pos + xaphiv neglowast rhoU neg)

+ (a poslowastp pos + a neglowastp neglowastmeshSf))surfaceScalarField phiEp(ldquophiEprdquo(aphiv poslowast (rho poslowast (e pos + 05lowastmagSqr(U pos)) + p pos)

+ aphiv neglowast (rho neglowast (e neg + 05lowastmagSqr(U neg)) + p neg)

+ aSflowastp pos minus aSflowastp neg))lowast )en the divergence theorem (Gaussrsquos law) is applied to relate the spatial integrals to surface integralslowastvolScalarField phiSum (ldquophiSumrdquo fvc di v(phi))volVectorField phiUpSum2 (ldquophiUpSum2rdquo fvc di v(tauMC))volVectorField phiUpSum3 (ldquophiUpSum3rdquo fvc laplacian(muEff U))

elseend

end

ALGORITHM 1)e Eulerian fluxes are calculated explicitly so that the ordinary differential equations system can be constructed and furtherintegrated in time by using explicit RungendashKutta algorithm

Result Calculate the three conservative variablesfor (int cycle 0 cyclelt rkloopsize cycle + +)lowast )e following calculates the three conservative variables specifically density (ρ) momentum density (ρU) and total energy

density (ρE) lowastrho rhoOl d + rk[cycle]lowast runTimede ltaTlowast(minusphiSum)rhoU rhoUOl d + rk[cycle]lowast runTimede ltaTlowast(minusphiUpSum + phiUpSum2 + phiUpSum3)rhoE rhoEOl d + rk[cycle]lowast runTimede ltaTlowast(minusenFl + enFl2)

ALGORITHM 2 )e time integration of the resulting ordinary differential equations system is carried out by a third-order four-stageRungendashKutta algorithm

4 Scientific Programming

is the minimum number of ranks that fit the memory re-quirements for the largest grid For example for 4096 ranksthe ideal speedup would then be 4096128 32 Itcan be observed that rhoCentralRK4Foam outperformsrhoCentralFoam in speedup in each case For the same gridsize the speedup increment percentage increases as the

number of ranks increases To better illustrate and analyzethe scaling performance improvement the results reportedin the three tables are summarized in Figure 3 First we canobserve that for the Grid3 there is a dramatic increase inspeedup when the number of ranks rises from 1024 to 2048and from 2048 to 4096)e speedup of rhoCentralRK4Foam

(a)

075 090 10 12060p

(b)

rhoCentralFoamrhoCentralRK4FoamExperiment

ndash10

ndash05

00

05

10

15

ndashCp

02 04 06 08 1000(x ndash x1)(xt ndash x1)

(c)

Figure 1 ONERAM6 transonic wing results (a) computational mesh (Grid 1) (b) pressure distribution computed with the proposed solverrhoCentralRK4Foam (c) comparison of pressure coefficient at section located (20 span) computed with rhoCentralRK4Foam and theoriginal rhoCentralFoam

190

340

490

0400

640rho

Figure 2 Density distribution in supersonic forward facing step simulations obtained using rhoCentralRK4Foam

Scientific Programming 5

has a 16 times increase from 1024 ranks to 2048 ranks and a17 times increase from 2048 ranks to 4096 ranks for Grid3)e reason is that rhoCentralFoam scales very slowly after1024 ranks eventually reaching a plateau which is an in-dication the solver attained its maximum theoreticalspeedup It is indeed instructive to estimate the serial portionf of CPU time from the speedup of the two solvers usingAmdahlrsquos law [23]

S Nr( 1113857 Nr

1 + Nr minus 1( 1113857(1 minus f) withf

tp

tp + ts

(9)

where tp and ts are the CPU time spent in parallel and serial(ie not parallelizable) sections of the solver respectivelyFrom the previous equations we have

f 1 minus 1S Nr( 1113857( 1113857

1 minus 1Nr( 1113857⟶ f 1 minus

1Sinfin

(10)

where Sinfin equiv S(Nr⟶infin) is the maximum theoreticalspeedup obtained for Nr⟶infin By measuring Sinfin we candirectly evaluate f from equation (10) for the cases featuringasymptotic speedup for rhoCentralFoam (Grid3) this yieldsf≃9988 We can also estimate f but best fitting S(Nc) toAmdahlrsquos formula in equation (9) for rhoCentralRK4Foamwhich yields f≃9998 Of course these results can beaffected by other important factors such as latency andbandwidth of the machine that are not taken into account inAmdahlrsquos model We also observe that the scalability be-comes better as the problem size increases as the workload of

communication over computation decreases For examplewith 2048 ranks the speedup for rhoCentralRK4Foam is6005 for Grid2 (5 million cells) while it becomes 9115 forGrid3 (43million cells) Moreover we can also see that as thegrid size increases the maximum speedup increment per-centage also increases which corresponding to 16 32and 123 for Grid1 Grid2 and Grid3 respectively As afinal exercise the TAU performance tool was applied toprofile the code for Grid3 with 2048 ranks Figure 4 alsoshows the communication and computation time percent-age for rhoCentralFoam and rhoCentralRK4Foam on Grid3from 128 to 4096 ranks We can observe that the rho-CentralRK4Foam solver takes less time to communicatewhich lead to the improved parallel performance found inprevious tests In addition among the MPI subroutines thatare relevant for communication (ie called every time stepin the simulation) MPI_waitall() was the one that employedmost of the time (see Figure 5) which is in line with previousprofiling (see for example Axtmann and Rist [24])

33 Weak Scaling Tests In order to further analyze theparallel scalability of rhoCentralRK4Foam solver we con-ducted a weak scaling analysis based on the ONERA M6wing case and the forward-facing step case )e grid sizeranges from 700 000 to 44 800 000 cells in ONERA M6wing case and from 1 032 000 to 66 060 000 cells in theforward-facing step case )e configurations of the weakscaling test cases are the same as the one in the previous testcases )e number of ranks ranges from 16 to 1024 and 64 to4096 respectively )e number of grid points per rank staysconstant at around 43 times 103 and 16 times 103 for two caseswhich ensures that the computation workload is not im-pacted much by communication Denoting by τ the CPUtime for a given run the relative CPU time is obtained by

Table 1 Normalized speedup for Grid1 (ONERA M6 wing) up to1024 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1341 1367 1939512 1594 1814 138021024 2364 2747 16201

Table 2 Normalized speedup for Grid2 (ONERA M6 wing) up to2048 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1574 1601 1715512 2308 2423 49831024 2916 3575 225992048 4541 6005 32240

Table 3 Normalized speedup for Grid3 (ONERA M6 wing) up to4096 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1871 1896 1336512 3275 3527 76951024 5098 5809 139472048 6341 9115 437474096 6895 1539 123205

rhoCentralFoam-Grid3 rhoCentralRk4Foam-Grid3rhoCentralRk4Foam-Grid2

rhoCentralFoam-Grid1 rhoCentralRk4Foam-Grid1Linear

0

6

12

18

24

30

36

Spee

dup

1024 2048 3072 40960ranks

rhoCentralFoam-Grid2

Figure 3 Strong scaling normalized speedups of rhoCentralFoamand rhoCentralRK4Foam for three grids of the M6 wing test case

6 Scientific Programming

normalizing with respect to the values obtained for 16 and 64ranks ττRanks 16 for 16 ranks and ττRanks 64 for 64ranks )e relative CPU time are recorded in Tables 4 and 5)e comparison of the two solvers is also plotted in Figures 6and 7 )e weak scaling tests give us indication about howwell does the two solvers scale when the problem size isincreased proportional to process count From Tables 4 and5 and Figures 6 and 7 it can be observed that for lower MPItasks (16 to 64) both of the two solvers scale reasonably wellHowever for higher MPI tasks (128 to 1024) the rho-CentralRK4Foam solver scales better Remarkably we canobserve a distinguishable relative time difference betweenthe two solvers as the number of ranks increases from 512 to1024 )e profiling results from TAU show that in the testcase with 1024 ranks rhoCentralRK4Foam spends around

40 time on computation while rhoCentralFoam only hasaround 20 time spent on computation As for the forward-facing step case we were able to conduct the tests at largercores count (up to 4096 cores) and it can be confirmed thatrhoCentralRK4Foam solver has better scaling performancethan rhoCentralFoam solver Indeed it scales better withlarger grid size and the relative time of rhoCentralRK4Foamsolver maintains at 1063 with 4096 cores (the relative time is1148 with 1024 cores in ONERA M6 wing case) Generallythe rhoCentralRK4Foam solver outperforms the rhoCen-tralFoam solver at large-scale ranks due to communicationhiding

rhoCentralFoam communicationrhoCentralRk4Foam communication

0

20

40

60

80

100

Tim

e per

cent

age (

)

600 1200 1800 2400 3000 3600 42000ranks

(a)

rhoCentralFoam communicationrhoCentralRk4Foam communication

0

20

40

60

80

100

Tim

e per

cent

age (

)

600 1200 1800 2400 3000 3600 42000ranks

(b)

Figure 4 Communication (a) and computation (b) time percentage for the strong scaling analysis (M6 test case Grid 3)

128 256 512 1024 20480

2

4

6

8

10

12

14

489

468

356

519

416

114

2

101

6

883 9

52

766

MPI

wai

tall

()

rhoCentralFoamrhoCentralRK4Foam

Figure 5 MPI waitall() time percentage for rhoCentralFoam andrhoCentralRK4Foam for M6 test case (Grid3) up to 2048 ranks

Table 4 Relative time results ττRanks 16 of weak scaling test(ONERA M6 wing)

Ranks rhoCentralFoam rhoCentralRK4Foam16 1 132 1085 106264 1106 1105128 1148 1057256 1191 1100512 1213 11581024 1255 1148

Table 5 Relative time results ττRanks 64 of the weak scaling test(forward-facing step)

Ranks rhoCentralFoam rhoCentralRK4Foam64 1 1128 1034 1002256 1085 1018512 1153 10311024 1186 10412048 1212 10674096 1203 1063

Scientific Programming 7

4 Results for the Time-to-Solution Analysis

)e scaling analysis is an important exercise to evaluate theparallel performance of the code on multiple cores howeverit does not provide insights into the time accuracy of thenumerical algorithm nor into the time-to-solution needed toevolve the discretized system of equations (equations(1)ndash(3)) up to a final state within an acceptable numericalerror For example in the scaling study conducted on theM6transonic wing the number of iterations was fixed for bothsolvers rhoCentralFoam and rhoCentralRK4Foam andcalculations were performed in the inviscid limit so that theimplicit solver used in rhoCentralFoam to integrate the

viscous fluxes was not active In such circumstancescomparing the two solvers essentially amounts at comparingfirst-order (Euler) and fourth-order forward in time Run-gendashKutta algorithms Hence it is not surprising that theCPU time for rhoCentralRK4Foam is about four times largerthan rhoCentralFoam (for the same number of iterations)since it requires four more evaluations of the right-hand-sideof equations (5) per time step For a sound evaluation of thetime-to-solution however we need to compare the twosolvers over the same physical time at which the errors withrespect to an exact solution are comparable Because the timeaccuracy of the solvers is different time step is also differentand so are the number of iterations required to achieve thefinal state

41 Test Case Description For the time-to-solution analysiswe consider the numerical simulation of TaylorndashGreen (TG)vortex a benchmark problem in CFD [25] for validatingunsteady flow solvers [26 27] requiring time accurate in-tegration scheme as the RungendashKutta scheme implementedin rhoCentralRK4Foam )ere is no turbulence model ap-plied in the current simulation )e TG vortex admits an-alytical time-dependent solutions which allows for precisedefinition of the numerical error of the algorithm withrespect to a given quantity of interest )e flow is initializedin a square domain 0lex yleL as follows

ρ(x y t) ρ0 (11)

u(x y 0) minusU0 cos kx sin ky (12)

v(x y 0) U0 sin kx cos ky (13)

p(x y 0) p0 minusρ0U2

04

(cos 2 kx + cos 2 ky) (14)

where u and v are the velocity components along x and ydirections k equiv 2πL is the wave number and U0 ρ0 and p0are the arbitrary constant velocity density and pressurerespectively )e analytical solution for the initial-valueproblem defined by equations (11)ndash(14) is

ρ(x y t) ρ0 (15)

u(x y t) minusU0 cos kx sin kyeminus 2]tk2

(16)

v(x y t) U0 sin kx cos kyeminus 2]tk2

(17)

p(x y t) p0 minusρ0U2

04

(cos 2 kx + cos 2 ky)eminus 4]tk2

(18)

which shows the velocity decay due to viscous dissipationAs a quantity of interest we chose to monitor both thevelocity profiles u at selected location and the overall kineticenergy in the computational box which can be readilyobtained from equations (16)ndash(18) as

ε 1L2 1113946

L

01113946

L

0

u2 + v2

2dxdy

U204

eminus 4]tk2

(19)

13

12

11

10

0 400ranks

Rela

tive t

ime

rhoCentralFoamrhoCentralRK4Foam

800 1200

Figure 6 Relative time ττRanks 16 of weak scaling tests forONERA M6 wing case

0 1000 2000ranks

3000 4000 5000

Rela

tive t

ime

rhoCentralFoamrhoCentralRK4Foam

12

11

10

Figure 7 Relative time ττRanks 64 of weak scaling tests forforward-facing step

8 Scientific Programming

42 Comparison of Time-to-Solution For the analysis ofrhoCentralFoam and rhoCentralRK4Foam solvers we con-sider fixed box size L 2πm and U0 1ms and two valuesof Reynolds number Re LU0v to test the effect of vis-cosity )e computation is done on a 160 times 160 pointsuniform grid )e simulations run up to t 058 s for Re

628 and t 116 s for Re 1256 which corresponds to thesame nondimensional time ttref 00147 with tref equiv L2]At this time the kinetic energy ε decreased to 10 from theinitial value as an illustrative example Figure 8 shows thevelocity magnitude contour for the Re 628 Figure 9presents the computed velocity profile u(y) in the middle

of the box x L2 rhoC stands for rhoCentralFoam solverand rhoCRK4 stands for rhoCentralRK4Foam solver It canbe observed that rhoCentralRK4Foam shows an excellentagreement with the analytical profile for Δt 10minus 4 s (re-ducing the time step further does provide any furtherconvergence for the grid used here) In order to achieve thesame level of accuracy with rhoCentralFoam time step hasto be reduced by a factor of 5 to Δt 2 times 10minus 5 s compared torhoCentralRK4Foam )e same behavior is observed for theevolution of ε as shown in Figure 10

In order to calculate the time-to-solution τ we firstevaluated the CPU time τ0 per time step per grid point per

0081275

016255

024383

4939e ndash 10

3251e ndash 01U magnitude

Y

ZX

Figure 8 Contour of velocity magnitude U equiv (u2 + v2)12 for the TaylorndashGreen vortex at ttref 00147 for the case Re 628

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

ndash030

ndash015

000

015

030

u

157 314 471 628000y

(a)

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

157 314 471 628000y

ndash030

ndash015

000

015

030

u

(b)

Figure 9 Velocity distribution in the TaylorndashGreen vortex problem at ttref 00147 (a)Re 628 (b)Re 1256

Scientific Programming 9

core for both solvers We used an Intel Xeon E5-2695v4processor installed on )eta and Bebop platforms atArgonne National Laboratory and reported the results of thetests in Table 6 Even though τ0 depends on the specificprocessor used but the ratio τ0rhoCentralRK4τ

0rhoCentral does not

and so is a good indication of the relative performance of thetwo solvers In the present case this ratio is consistentlyfound to be 3 for all Reynolds number we made additionaltests with much higher Reynolds up to Re sim 105 to confirmthis (see Table 6) Denoting by tf the physical final time ofthe simulation by Niter the number of iterations and byNpoints the number of grid points the time-to-solution forthe two solvers can be obtained as

τ τ0 times Niter times Npoints withNiter tf

Δt (20)

and is reported in Table 7 It is concluded that to achieve thesame level of accuracy τrhoCentralRK4 is about 15 to 16 timessmaller than τrhoCentral Additionally as we discussed in thescaling analysis rhoCentralRK4Foam can achieve up to123 improvement in scalability over rhoCentralFoamwhen using 4096 ranks )us the reduction in time-to-solution for rhoCentralRK4Foam is expected to be evenlarger for large-scale time-accurate simulations utilizingthousands of parallel cores

5 Conclusion

In this study we presented a new solver called rhoCen-tralRK4Foam for the integration of NavierndashStokes equationsin the CFD package OpenFOAM)e novelty consists in thereplacement of first-order time integration scheme of thenative solver rhoCentralFoam with a third-order Run-gendashKutta scheme which is more suitable for the simulationof time-dependent flows We first analyzed the scalability ofthe two solvers for a benchmark case featuring transonicflow over a wing on a structured mesh and showed thatrhoCentralRK4Foam can achieve a substantial improvement(up to 120) in strong scaling compared to rhoCen-tralFoam We also observed that the scalability becomesbetter as the problem size grows Hence even thoughOpenFOAM scalability is generally poor or decent at bestthe new solver can at least alleviate this deficiency We thenanalyzed the performance of the two solvers in the case ofTaylorndashGreen vortex decay and compared the time-to-so-lution which gives an indication of the workload needed toachieve the same level of accuracy of the solution (ie thesame numerical error) For the problem considered here weobtained a reduction of time-to-solution by a factor of about15 when using rhoCentralRK4Foam compared to rho-CentralFoam due to the use of larger time steps to get to thefinal solution with the same numerical accuracy )is factorwill be eventually even larger for larger-scale simulationsemploying thousands or more cores thanks to the betterspeedup of rhoCentralRK4Foam )e proposed solver ispotentially a useful alternative for the simulation of com-pressible flows requiring time-accurate integration like indirect or large-eddy simulations Furthermore the imple-mentation in OpenFOAM can be easily generalized todifferent schemes of RungendashKutta family with minimal codemodification

Data Availability

)e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

)e authors declare that they have no conflicts of interest

Acknowledgments

)is work was supported by Argonne National Laboratorythrough grant ANL 4J-30361-0030A titled ldquoMultiscale

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

000

005

010

015

020

025

ε

01 02 03 04 05 0600Time (s)

Figure 10 Evolution of total kinetic energy

Table 6 CPU time per time step per grid point per core τ0

τ0rhoCentral τ0rhoCentralRK4 τ0rhoCentralRK4τ0rhoCentral

Re 628 1230eminus 06 s 3590eminus 06 s 2919Re 1256 1120eminus 06 s 3630eminus 06 s 3241Re 1260 1189eminus 06 s 3626eminus 06 s 3046Re 12600 1187eminus 06 s 3603eminus 06 s 3035Re 126000 1191eminus 06 s 3622eminus 06 s 3044

Table 7 Time-to-solution τ for rhoCentralFoam and rhoCentral-RK4Foam

τrhoCentral τrhoCentralRK4 τrhoCentralτrhoCentralRK4Δt 2 times 10minus 5 s Δt 10minus 4 s

Re 628 848 s 532 s 159Re 1256 826 s 539 s 153

10 Scientific Programming

modeling of complex flowsrdquo Numerical simulations wereperformed using the Director Discretionary allocationldquoOF_SCALINGrdquo in the Leadership Computing Facility atArgonne National Laboratory

References

[1] H G Weller G Tabor H Jasak and C Fureby ldquoA tensorialapproach to computational continuum mechanics usingobject-oriented techniquesrdquo Computers in Physics vol 12no 6 pp 620ndash631 1998

[2] V Vuorinen J-P Keskinen C Duwig and B J BoersmaldquoOn the implementation of low-dissipative Runge-Kuttaprojection methods for time dependent flows using Open-FOAMrdquo Computers amp Fluids vol 93 pp 153ndash163 2014

[3] C Shen F Sun and X Xia ldquoImplementation of density-basedsolver for all speeds in the framework of openFOAMrdquoComputer Physics Communications vol 185 no 10pp 2730ndash2741 2014

[4] D Modesti and S Pirozzoli ldquoA low-dissipative solver forturbulent compressible flows on unstructured meshes withOpenFOAM implementationrdquo Computers amp Fluids vol 152pp 14ndash23 2017

[5] S Li and R Paoli ldquoModeling of ice accretion over aircraftwings using a compressible OpenFOAM solverrdquo Interna-tional Journal of Aerospace Engineering vol 2019 Article ID4864927 11 pages 2019

[6] Q Yang P Zhao and H Ge ldquoReactingFoam-SCI an opensource CFD platform for reacting flow simulationrdquo Com-puters amp Fluids vol 190 pp 114ndash127 2019

[7] M Culpo Current Bottlenecks in the Scalability of OpenFOAMon Massively Parallel Clusters Partnership for AdvancedComputing in Europe Brussels Belgium 2012

[8] O Rivera and K Furlinger ldquoParallel aspects of OpenFOAMwith large eddy simulationsrdquo in Proceedings of the 2011 IEEEInternational Conference on High Performance Computingand Communications pp 389ndash396 Banff Canada September2011

[9] A Duran M S Celebi S Piskin and M Tuncel ldquoScalabilityof OpenFOAM for bio-medical flow simulationsrdquoCe Journalof Supercomputing vol 71 no 3 pp 938ndash951 2015

[10] Z Lin W Yang H Zhou et al ldquoCommunication optimi-zation for multiphase flow solver in the library of Open-FOAMrdquo Water vol 10 no 10 pp 1461ndash1529 2018

[11] R Ojha P Pawar S Gupta M Klemm and M NambiarldquoPerformance optimization of OpenFOAM on clusters ofIntel Xeon Phitrade processorsrdquo in Proceedings of the 2017 IEEE24th International Conference on High Performance Com-puting Workshops (HiPCW) Jaipur India December 2017

[12] C J Greenshields H G Weller L Gasparini and J M ReeseldquoImplementation of semi-discrete non-staggered centralschemes in a colocated polyhedral finite volume frameworkfor high-speed viscous flowsrdquo International Journal for Nu-merical Methods in Fluids vol 63 pp 1ndash21 2010

[13] A Kurganov and E Tadmor ldquoNew high-resolution centralschemes for nonlinear conservation laws and convection-diffusion equationsrdquo Journal of Computational Physicsvol 160 no 1 pp 241ndash282 2000

[14] A Kurganov S Noelle and G Petrova ldquoSemidiscrete central-upwind schemes for hyperbolic conservation laws andHamiltonndashJacobi equationsrdquo SIAM Journal on ScientificComputing vol 23 no 3 pp 707ndash740 2006

[15] J A Heyns O F Oxtoby and A Steenkamp ldquoModellinghigh-speed viscous flow in OpenFOAMrdquo in Proceedings of the9th OpenFOAM Workshop Zagreb Croatia June 2014

[16] M S Liou and C J Steffen Jr ldquoA new flux splitting schemerdquoJournal of Computational Physics vol 107 no 23 p 39 1993

[17] D Drikakis M Hahn A Mosedale and B )ornber ldquoLargeeddy simulation using high-resolution and high-ordermethodsrdquo Philosophical Transactions of the Royal Society AMathematical Physical and Engineering Sciences vol 367no 1899 pp 2985ndash2997 2019

[18] K Harms T Leggett B Allen et al ldquo)eta rapid installationand acceptance of an XC40 KNL systemrdquo Concurrency andComputation Practice and Experience vol 30 no 1 p 112019

[19] S S Shende and A D Malony ldquo)e TAU parallel perfor-mance systemrdquo Ce International Journal of High Perfor-mance Computing Applications vol 20 no 2 pp 287ndash3112006

[20] C Chevalier and F Pellegrini ldquoPT-Scotch a tool for efficientparallel graph orderingrdquo Parallel Computing vol 34 no 6ndash8pp 318ndash331 2008

[21] V Schmitt and F Charpin ldquoPressure distributions on theONERA-M6-wing at transonic mach numbersrdquo in Experi-mental Data Base for Computer Program Assessment andReport of the Fluid Dynamics Panel Working Group 04AGARD Neuilly sur Seine France 1979

[22] P Woodward and P Colella ldquo)e numerical simulation oftwo-dimensional fluid flow with strong shocksrdquo Journal ofComputational Physics vol 54 no 1 pp 115ndash173 1984

[23] G M Amdahl ldquoValidity of the single processor approach toachieving large-scale computing capabilitiesrdquo in Proceedingsof the 1967 spring joint computer conference onmdashAFIPSrsquo67(Spring) vol 30 pp 483ndash485 Atlantic NJ USA April 1967

[24] G Axtmann and U Rist ldquoScalability of OpenFOAM withlarge-leddy simulations and DNS on high-performancesytemsrdquo in High-Performance Computing in Science andEngineering W E Nagel Ed Springer International Pub-lishing Berlin Germany pp 413ndash425 2016

[25] A Shah L Yuan and S Islam ldquoNumerical solution of un-steady Navier-Stokes equations on curvilinear meshesrdquoComputers amp Mathematics with Applications vol 63 no 11pp 1548ndash1556 2012

[26] J Kim and P Moin ldquoApplication of a fractional-step methodto incompressible Navier-Stokes equationsrdquo Journal ofComputational Physics vol 59 no 2 pp 308ndash323 1985

[27] A Quarteroni F Saleri and A Veneziani ldquoFactorizationmethods for the numerical approximation of Navier-Stokesequationsrdquo Computer Methods in Applied Mechanics andEngineering vol 188 no 1ndash3 pp 505ndash526 2000

Scientific Programming 11

Page 4: ScalabilityofOpenFOAMDensity-BasedSolverwithRunge–Kutta ...downloads.hindawi.com/journals/sp/2020/9083620.pdfmethods in OpenFOAM for transient flow calculations. eseinclude,forexample,numericalalgorithmsforDNS

Grid3 has 43 million cells )e flow-filed analysis wasconducted on Grid1 (which is sufficient for grid conver-gence) while Grid2 and Grid 3 were used for scalinganalysis Figure 1(b) shows the pressure distribution on thewing surface computed by using rhoCentralFoam andrhoCentralRK4Foam In Figure 1(c) the pressure coefficientat the inner wing section (20 span) computed by the twosolvers is compared with the experimental data (blue circlesymbols [21]) We can observe that the newly developedsolver captures the main flow features It generates apressure distribution on the wing surface similar to thatobtained with rhoCentralFoam and captures the shock lo-cation precisely

312 Supersonic Forward-Facing Step Computations ofinviscid flow over a forward-facing step are used tofurther verify the shock-capturing capability of

rhoCentralRK4Foam solver )e flow configuration used byWoodward and Colella [22] is considered in this work )esupersonic flow mach number is Mainfin 3 )e grid hasNx times Ny 240 times 80 cells in the coordinate directions )eshock pattern represented by density distribution shown inFigure 2 confirms that the rhoCentralRK4Foam is capable ofcapturing strong shocks for supersonic flows

32 Strong ScalingTests In the strong scaling tests we testedthe rhoCentralFoam and rhoCentralRK4Foam solvers onthree different mesh sizes of ONERA M6 wing Here wepresent the scalability results by showing the speedup as wellas the speedup increment percentage for rhoCen-tralRK4Foam for Grid1 up to 1024 ranks in Table 1 Grid2 upto 2048 ranks in Table 2 and Grid3 up to 4096 ranks inTable 3 Note that the scaling is presented as CPU timenormalized by the CPU time obtained with 128 ranks which

Result construct Eulerian fluxes explicitlywhile runTimeloop() doSaving quantities at preavious time steps if rk4 thenlowast )e following firstly constructs the Eulerian fluxes by applying the Kurganov and Tadmor (KT) scheme lowastsurfaceScalarField phi(ldquophirdquo(aphiv poslowast rho pos + aphiv neglowast rho neg))surfaceVectorField phiUp(ldquophiUprdquo(aphiv poslowast rhoU pos + xaphiv neglowast rhoU neg)

+ (a poslowastp pos + a neglowastp neglowastmeshSf))surfaceScalarField phiEp(ldquophiEprdquo(aphiv poslowast (rho poslowast (e pos + 05lowastmagSqr(U pos)) + p pos)

+ aphiv neglowast (rho neglowast (e neg + 05lowastmagSqr(U neg)) + p neg)

+ aSflowastp pos minus aSflowastp neg))lowast )en the divergence theorem (Gaussrsquos law) is applied to relate the spatial integrals to surface integralslowastvolScalarField phiSum (ldquophiSumrdquo fvc di v(phi))volVectorField phiUpSum2 (ldquophiUpSum2rdquo fvc di v(tauMC))volVectorField phiUpSum3 (ldquophiUpSum3rdquo fvc laplacian(muEff U))

elseend

end

ALGORITHM 1)e Eulerian fluxes are calculated explicitly so that the ordinary differential equations system can be constructed and furtherintegrated in time by using explicit RungendashKutta algorithm

Result Calculate the three conservative variablesfor (int cycle 0 cyclelt rkloopsize cycle + +)lowast )e following calculates the three conservative variables specifically density (ρ) momentum density (ρU) and total energy

density (ρE) lowastrho rhoOl d + rk[cycle]lowast runTimede ltaTlowast(minusphiSum)rhoU rhoUOl d + rk[cycle]lowast runTimede ltaTlowast(minusphiUpSum + phiUpSum2 + phiUpSum3)rhoE rhoEOl d + rk[cycle]lowast runTimede ltaTlowast(minusenFl + enFl2)

ALGORITHM 2 )e time integration of the resulting ordinary differential equations system is carried out by a third-order four-stageRungendashKutta algorithm

4 Scientific Programming

is the minimum number of ranks that fit the memory re-quirements for the largest grid For example for 4096 ranksthe ideal speedup would then be 4096128 32 Itcan be observed that rhoCentralRK4Foam outperformsrhoCentralFoam in speedup in each case For the same gridsize the speedup increment percentage increases as the

number of ranks increases To better illustrate and analyzethe scaling performance improvement the results reportedin the three tables are summarized in Figure 3 First we canobserve that for the Grid3 there is a dramatic increase inspeedup when the number of ranks rises from 1024 to 2048and from 2048 to 4096)e speedup of rhoCentralRK4Foam

(a)

075 090 10 12060p

(b)

rhoCentralFoamrhoCentralRK4FoamExperiment

ndash10

ndash05

00

05

10

15

ndashCp

02 04 06 08 1000(x ndash x1)(xt ndash x1)

(c)

Figure 1 ONERAM6 transonic wing results (a) computational mesh (Grid 1) (b) pressure distribution computed with the proposed solverrhoCentralRK4Foam (c) comparison of pressure coefficient at section located (20 span) computed with rhoCentralRK4Foam and theoriginal rhoCentralFoam

190

340

490

0400

640rho

Figure 2 Density distribution in supersonic forward facing step simulations obtained using rhoCentralRK4Foam

Scientific Programming 5

has a 16 times increase from 1024 ranks to 2048 ranks and a17 times increase from 2048 ranks to 4096 ranks for Grid3)e reason is that rhoCentralFoam scales very slowly after1024 ranks eventually reaching a plateau which is an in-dication the solver attained its maximum theoreticalspeedup It is indeed instructive to estimate the serial portionf of CPU time from the speedup of the two solvers usingAmdahlrsquos law [23]

S Nr( 1113857 Nr

1 + Nr minus 1( 1113857(1 minus f) withf

tp

tp + ts

(9)

where tp and ts are the CPU time spent in parallel and serial(ie not parallelizable) sections of the solver respectivelyFrom the previous equations we have

f 1 minus 1S Nr( 1113857( 1113857

1 minus 1Nr( 1113857⟶ f 1 minus

1Sinfin

(10)

where Sinfin equiv S(Nr⟶infin) is the maximum theoreticalspeedup obtained for Nr⟶infin By measuring Sinfin we candirectly evaluate f from equation (10) for the cases featuringasymptotic speedup for rhoCentralFoam (Grid3) this yieldsf≃9988 We can also estimate f but best fitting S(Nc) toAmdahlrsquos formula in equation (9) for rhoCentralRK4Foamwhich yields f≃9998 Of course these results can beaffected by other important factors such as latency andbandwidth of the machine that are not taken into account inAmdahlrsquos model We also observe that the scalability be-comes better as the problem size increases as the workload of

communication over computation decreases For examplewith 2048 ranks the speedup for rhoCentralRK4Foam is6005 for Grid2 (5 million cells) while it becomes 9115 forGrid3 (43million cells) Moreover we can also see that as thegrid size increases the maximum speedup increment per-centage also increases which corresponding to 16 32and 123 for Grid1 Grid2 and Grid3 respectively As afinal exercise the TAU performance tool was applied toprofile the code for Grid3 with 2048 ranks Figure 4 alsoshows the communication and computation time percent-age for rhoCentralFoam and rhoCentralRK4Foam on Grid3from 128 to 4096 ranks We can observe that the rho-CentralRK4Foam solver takes less time to communicatewhich lead to the improved parallel performance found inprevious tests In addition among the MPI subroutines thatare relevant for communication (ie called every time stepin the simulation) MPI_waitall() was the one that employedmost of the time (see Figure 5) which is in line with previousprofiling (see for example Axtmann and Rist [24])

33 Weak Scaling Tests In order to further analyze theparallel scalability of rhoCentralRK4Foam solver we con-ducted a weak scaling analysis based on the ONERA M6wing case and the forward-facing step case )e grid sizeranges from 700 000 to 44 800 000 cells in ONERA M6wing case and from 1 032 000 to 66 060 000 cells in theforward-facing step case )e configurations of the weakscaling test cases are the same as the one in the previous testcases )e number of ranks ranges from 16 to 1024 and 64 to4096 respectively )e number of grid points per rank staysconstant at around 43 times 103 and 16 times 103 for two caseswhich ensures that the computation workload is not im-pacted much by communication Denoting by τ the CPUtime for a given run the relative CPU time is obtained by

Table 1 Normalized speedup for Grid1 (ONERA M6 wing) up to1024 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1341 1367 1939512 1594 1814 138021024 2364 2747 16201

Table 2 Normalized speedup for Grid2 (ONERA M6 wing) up to2048 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1574 1601 1715512 2308 2423 49831024 2916 3575 225992048 4541 6005 32240

Table 3 Normalized speedup for Grid3 (ONERA M6 wing) up to4096 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1871 1896 1336512 3275 3527 76951024 5098 5809 139472048 6341 9115 437474096 6895 1539 123205

rhoCentralFoam-Grid3 rhoCentralRk4Foam-Grid3rhoCentralRk4Foam-Grid2

rhoCentralFoam-Grid1 rhoCentralRk4Foam-Grid1Linear

0

6

12

18

24

30

36

Spee

dup

1024 2048 3072 40960ranks

rhoCentralFoam-Grid2

Figure 3 Strong scaling normalized speedups of rhoCentralFoamand rhoCentralRK4Foam for three grids of the M6 wing test case

6 Scientific Programming

normalizing with respect to the values obtained for 16 and 64ranks ττRanks 16 for 16 ranks and ττRanks 64 for 64ranks )e relative CPU time are recorded in Tables 4 and 5)e comparison of the two solvers is also plotted in Figures 6and 7 )e weak scaling tests give us indication about howwell does the two solvers scale when the problem size isincreased proportional to process count From Tables 4 and5 and Figures 6 and 7 it can be observed that for lower MPItasks (16 to 64) both of the two solvers scale reasonably wellHowever for higher MPI tasks (128 to 1024) the rho-CentralRK4Foam solver scales better Remarkably we canobserve a distinguishable relative time difference betweenthe two solvers as the number of ranks increases from 512 to1024 )e profiling results from TAU show that in the testcase with 1024 ranks rhoCentralRK4Foam spends around

40 time on computation while rhoCentralFoam only hasaround 20 time spent on computation As for the forward-facing step case we were able to conduct the tests at largercores count (up to 4096 cores) and it can be confirmed thatrhoCentralRK4Foam solver has better scaling performancethan rhoCentralFoam solver Indeed it scales better withlarger grid size and the relative time of rhoCentralRK4Foamsolver maintains at 1063 with 4096 cores (the relative time is1148 with 1024 cores in ONERA M6 wing case) Generallythe rhoCentralRK4Foam solver outperforms the rhoCen-tralFoam solver at large-scale ranks due to communicationhiding

rhoCentralFoam communicationrhoCentralRk4Foam communication

0

20

40

60

80

100

Tim

e per

cent

age (

)

600 1200 1800 2400 3000 3600 42000ranks

(a)

rhoCentralFoam communicationrhoCentralRk4Foam communication

0

20

40

60

80

100

Tim

e per

cent

age (

)

600 1200 1800 2400 3000 3600 42000ranks

(b)

Figure 4 Communication (a) and computation (b) time percentage for the strong scaling analysis (M6 test case Grid 3)

128 256 512 1024 20480

2

4

6

8

10

12

14

489

468

356

519

416

114

2

101

6

883 9

52

766

MPI

wai

tall

()

rhoCentralFoamrhoCentralRK4Foam

Figure 5 MPI waitall() time percentage for rhoCentralFoam andrhoCentralRK4Foam for M6 test case (Grid3) up to 2048 ranks

Table 4 Relative time results ττRanks 16 of weak scaling test(ONERA M6 wing)

Ranks rhoCentralFoam rhoCentralRK4Foam16 1 132 1085 106264 1106 1105128 1148 1057256 1191 1100512 1213 11581024 1255 1148

Table 5 Relative time results ττRanks 64 of the weak scaling test(forward-facing step)

Ranks rhoCentralFoam rhoCentralRK4Foam64 1 1128 1034 1002256 1085 1018512 1153 10311024 1186 10412048 1212 10674096 1203 1063

Scientific Programming 7

4 Results for the Time-to-Solution Analysis

)e scaling analysis is an important exercise to evaluate theparallel performance of the code on multiple cores howeverit does not provide insights into the time accuracy of thenumerical algorithm nor into the time-to-solution needed toevolve the discretized system of equations (equations(1)ndash(3)) up to a final state within an acceptable numericalerror For example in the scaling study conducted on theM6transonic wing the number of iterations was fixed for bothsolvers rhoCentralFoam and rhoCentralRK4Foam andcalculations were performed in the inviscid limit so that theimplicit solver used in rhoCentralFoam to integrate the

viscous fluxes was not active In such circumstancescomparing the two solvers essentially amounts at comparingfirst-order (Euler) and fourth-order forward in time Run-gendashKutta algorithms Hence it is not surprising that theCPU time for rhoCentralRK4Foam is about four times largerthan rhoCentralFoam (for the same number of iterations)since it requires four more evaluations of the right-hand-sideof equations (5) per time step For a sound evaluation of thetime-to-solution however we need to compare the twosolvers over the same physical time at which the errors withrespect to an exact solution are comparable Because the timeaccuracy of the solvers is different time step is also differentand so are the number of iterations required to achieve thefinal state

41 Test Case Description For the time-to-solution analysiswe consider the numerical simulation of TaylorndashGreen (TG)vortex a benchmark problem in CFD [25] for validatingunsteady flow solvers [26 27] requiring time accurate in-tegration scheme as the RungendashKutta scheme implementedin rhoCentralRK4Foam )ere is no turbulence model ap-plied in the current simulation )e TG vortex admits an-alytical time-dependent solutions which allows for precisedefinition of the numerical error of the algorithm withrespect to a given quantity of interest )e flow is initializedin a square domain 0lex yleL as follows

ρ(x y t) ρ0 (11)

u(x y 0) minusU0 cos kx sin ky (12)

v(x y 0) U0 sin kx cos ky (13)

p(x y 0) p0 minusρ0U2

04

(cos 2 kx + cos 2 ky) (14)

where u and v are the velocity components along x and ydirections k equiv 2πL is the wave number and U0 ρ0 and p0are the arbitrary constant velocity density and pressurerespectively )e analytical solution for the initial-valueproblem defined by equations (11)ndash(14) is

ρ(x y t) ρ0 (15)

u(x y t) minusU0 cos kx sin kyeminus 2]tk2

(16)

v(x y t) U0 sin kx cos kyeminus 2]tk2

(17)

p(x y t) p0 minusρ0U2

04

(cos 2 kx + cos 2 ky)eminus 4]tk2

(18)

which shows the velocity decay due to viscous dissipationAs a quantity of interest we chose to monitor both thevelocity profiles u at selected location and the overall kineticenergy in the computational box which can be readilyobtained from equations (16)ndash(18) as

ε 1L2 1113946

L

01113946

L

0

u2 + v2

2dxdy

U204

eminus 4]tk2

(19)

13

12

11

10

0 400ranks

Rela

tive t

ime

rhoCentralFoamrhoCentralRK4Foam

800 1200

Figure 6 Relative time ττRanks 16 of weak scaling tests forONERA M6 wing case

0 1000 2000ranks

3000 4000 5000

Rela

tive t

ime

rhoCentralFoamrhoCentralRK4Foam

12

11

10

Figure 7 Relative time ττRanks 64 of weak scaling tests forforward-facing step

8 Scientific Programming

42 Comparison of Time-to-Solution For the analysis ofrhoCentralFoam and rhoCentralRK4Foam solvers we con-sider fixed box size L 2πm and U0 1ms and two valuesof Reynolds number Re LU0v to test the effect of vis-cosity )e computation is done on a 160 times 160 pointsuniform grid )e simulations run up to t 058 s for Re

628 and t 116 s for Re 1256 which corresponds to thesame nondimensional time ttref 00147 with tref equiv L2]At this time the kinetic energy ε decreased to 10 from theinitial value as an illustrative example Figure 8 shows thevelocity magnitude contour for the Re 628 Figure 9presents the computed velocity profile u(y) in the middle

of the box x L2 rhoC stands for rhoCentralFoam solverand rhoCRK4 stands for rhoCentralRK4Foam solver It canbe observed that rhoCentralRK4Foam shows an excellentagreement with the analytical profile for Δt 10minus 4 s (re-ducing the time step further does provide any furtherconvergence for the grid used here) In order to achieve thesame level of accuracy with rhoCentralFoam time step hasto be reduced by a factor of 5 to Δt 2 times 10minus 5 s compared torhoCentralRK4Foam )e same behavior is observed for theevolution of ε as shown in Figure 10

In order to calculate the time-to-solution τ we firstevaluated the CPU time τ0 per time step per grid point per

0081275

016255

024383

4939e ndash 10

3251e ndash 01U magnitude

Y

ZX

Figure 8 Contour of velocity magnitude U equiv (u2 + v2)12 for the TaylorndashGreen vortex at ttref 00147 for the case Re 628

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

ndash030

ndash015

000

015

030

u

157 314 471 628000y

(a)

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

157 314 471 628000y

ndash030

ndash015

000

015

030

u

(b)

Figure 9 Velocity distribution in the TaylorndashGreen vortex problem at ttref 00147 (a)Re 628 (b)Re 1256

Scientific Programming 9

core for both solvers We used an Intel Xeon E5-2695v4processor installed on )eta and Bebop platforms atArgonne National Laboratory and reported the results of thetests in Table 6 Even though τ0 depends on the specificprocessor used but the ratio τ0rhoCentralRK4τ

0rhoCentral does not

and so is a good indication of the relative performance of thetwo solvers In the present case this ratio is consistentlyfound to be 3 for all Reynolds number we made additionaltests with much higher Reynolds up to Re sim 105 to confirmthis (see Table 6) Denoting by tf the physical final time ofthe simulation by Niter the number of iterations and byNpoints the number of grid points the time-to-solution forthe two solvers can be obtained as

τ τ0 times Niter times Npoints withNiter tf

Δt (20)

and is reported in Table 7 It is concluded that to achieve thesame level of accuracy τrhoCentralRK4 is about 15 to 16 timessmaller than τrhoCentral Additionally as we discussed in thescaling analysis rhoCentralRK4Foam can achieve up to123 improvement in scalability over rhoCentralFoamwhen using 4096 ranks )us the reduction in time-to-solution for rhoCentralRK4Foam is expected to be evenlarger for large-scale time-accurate simulations utilizingthousands of parallel cores

5 Conclusion

In this study we presented a new solver called rhoCen-tralRK4Foam for the integration of NavierndashStokes equationsin the CFD package OpenFOAM)e novelty consists in thereplacement of first-order time integration scheme of thenative solver rhoCentralFoam with a third-order Run-gendashKutta scheme which is more suitable for the simulationof time-dependent flows We first analyzed the scalability ofthe two solvers for a benchmark case featuring transonicflow over a wing on a structured mesh and showed thatrhoCentralRK4Foam can achieve a substantial improvement(up to 120) in strong scaling compared to rhoCen-tralFoam We also observed that the scalability becomesbetter as the problem size grows Hence even thoughOpenFOAM scalability is generally poor or decent at bestthe new solver can at least alleviate this deficiency We thenanalyzed the performance of the two solvers in the case ofTaylorndashGreen vortex decay and compared the time-to-so-lution which gives an indication of the workload needed toachieve the same level of accuracy of the solution (ie thesame numerical error) For the problem considered here weobtained a reduction of time-to-solution by a factor of about15 when using rhoCentralRK4Foam compared to rho-CentralFoam due to the use of larger time steps to get to thefinal solution with the same numerical accuracy )is factorwill be eventually even larger for larger-scale simulationsemploying thousands or more cores thanks to the betterspeedup of rhoCentralRK4Foam )e proposed solver ispotentially a useful alternative for the simulation of com-pressible flows requiring time-accurate integration like indirect or large-eddy simulations Furthermore the imple-mentation in OpenFOAM can be easily generalized todifferent schemes of RungendashKutta family with minimal codemodification

Data Availability

)e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

)e authors declare that they have no conflicts of interest

Acknowledgments

)is work was supported by Argonne National Laboratorythrough grant ANL 4J-30361-0030A titled ldquoMultiscale

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

000

005

010

015

020

025

ε

01 02 03 04 05 0600Time (s)

Figure 10 Evolution of total kinetic energy

Table 6 CPU time per time step per grid point per core τ0

τ0rhoCentral τ0rhoCentralRK4 τ0rhoCentralRK4τ0rhoCentral

Re 628 1230eminus 06 s 3590eminus 06 s 2919Re 1256 1120eminus 06 s 3630eminus 06 s 3241Re 1260 1189eminus 06 s 3626eminus 06 s 3046Re 12600 1187eminus 06 s 3603eminus 06 s 3035Re 126000 1191eminus 06 s 3622eminus 06 s 3044

Table 7 Time-to-solution τ for rhoCentralFoam and rhoCentral-RK4Foam

τrhoCentral τrhoCentralRK4 τrhoCentralτrhoCentralRK4Δt 2 times 10minus 5 s Δt 10minus 4 s

Re 628 848 s 532 s 159Re 1256 826 s 539 s 153

10 Scientific Programming

modeling of complex flowsrdquo Numerical simulations wereperformed using the Director Discretionary allocationldquoOF_SCALINGrdquo in the Leadership Computing Facility atArgonne National Laboratory

References

[1] H G Weller G Tabor H Jasak and C Fureby ldquoA tensorialapproach to computational continuum mechanics usingobject-oriented techniquesrdquo Computers in Physics vol 12no 6 pp 620ndash631 1998

[2] V Vuorinen J-P Keskinen C Duwig and B J BoersmaldquoOn the implementation of low-dissipative Runge-Kuttaprojection methods for time dependent flows using Open-FOAMrdquo Computers amp Fluids vol 93 pp 153ndash163 2014

[3] C Shen F Sun and X Xia ldquoImplementation of density-basedsolver for all speeds in the framework of openFOAMrdquoComputer Physics Communications vol 185 no 10pp 2730ndash2741 2014

[4] D Modesti and S Pirozzoli ldquoA low-dissipative solver forturbulent compressible flows on unstructured meshes withOpenFOAM implementationrdquo Computers amp Fluids vol 152pp 14ndash23 2017

[5] S Li and R Paoli ldquoModeling of ice accretion over aircraftwings using a compressible OpenFOAM solverrdquo Interna-tional Journal of Aerospace Engineering vol 2019 Article ID4864927 11 pages 2019

[6] Q Yang P Zhao and H Ge ldquoReactingFoam-SCI an opensource CFD platform for reacting flow simulationrdquo Com-puters amp Fluids vol 190 pp 114ndash127 2019

[7] M Culpo Current Bottlenecks in the Scalability of OpenFOAMon Massively Parallel Clusters Partnership for AdvancedComputing in Europe Brussels Belgium 2012

[8] O Rivera and K Furlinger ldquoParallel aspects of OpenFOAMwith large eddy simulationsrdquo in Proceedings of the 2011 IEEEInternational Conference on High Performance Computingand Communications pp 389ndash396 Banff Canada September2011

[9] A Duran M S Celebi S Piskin and M Tuncel ldquoScalabilityof OpenFOAM for bio-medical flow simulationsrdquoCe Journalof Supercomputing vol 71 no 3 pp 938ndash951 2015

[10] Z Lin W Yang H Zhou et al ldquoCommunication optimi-zation for multiphase flow solver in the library of Open-FOAMrdquo Water vol 10 no 10 pp 1461ndash1529 2018

[11] R Ojha P Pawar S Gupta M Klemm and M NambiarldquoPerformance optimization of OpenFOAM on clusters ofIntel Xeon Phitrade processorsrdquo in Proceedings of the 2017 IEEE24th International Conference on High Performance Com-puting Workshops (HiPCW) Jaipur India December 2017

[12] C J Greenshields H G Weller L Gasparini and J M ReeseldquoImplementation of semi-discrete non-staggered centralschemes in a colocated polyhedral finite volume frameworkfor high-speed viscous flowsrdquo International Journal for Nu-merical Methods in Fluids vol 63 pp 1ndash21 2010

[13] A Kurganov and E Tadmor ldquoNew high-resolution centralschemes for nonlinear conservation laws and convection-diffusion equationsrdquo Journal of Computational Physicsvol 160 no 1 pp 241ndash282 2000

[14] A Kurganov S Noelle and G Petrova ldquoSemidiscrete central-upwind schemes for hyperbolic conservation laws andHamiltonndashJacobi equationsrdquo SIAM Journal on ScientificComputing vol 23 no 3 pp 707ndash740 2006

[15] J A Heyns O F Oxtoby and A Steenkamp ldquoModellinghigh-speed viscous flow in OpenFOAMrdquo in Proceedings of the9th OpenFOAM Workshop Zagreb Croatia June 2014

[16] M S Liou and C J Steffen Jr ldquoA new flux splitting schemerdquoJournal of Computational Physics vol 107 no 23 p 39 1993

[17] D Drikakis M Hahn A Mosedale and B )ornber ldquoLargeeddy simulation using high-resolution and high-ordermethodsrdquo Philosophical Transactions of the Royal Society AMathematical Physical and Engineering Sciences vol 367no 1899 pp 2985ndash2997 2019

[18] K Harms T Leggett B Allen et al ldquo)eta rapid installationand acceptance of an XC40 KNL systemrdquo Concurrency andComputation Practice and Experience vol 30 no 1 p 112019

[19] S S Shende and A D Malony ldquo)e TAU parallel perfor-mance systemrdquo Ce International Journal of High Perfor-mance Computing Applications vol 20 no 2 pp 287ndash3112006

[20] C Chevalier and F Pellegrini ldquoPT-Scotch a tool for efficientparallel graph orderingrdquo Parallel Computing vol 34 no 6ndash8pp 318ndash331 2008

[21] V Schmitt and F Charpin ldquoPressure distributions on theONERA-M6-wing at transonic mach numbersrdquo in Experi-mental Data Base for Computer Program Assessment andReport of the Fluid Dynamics Panel Working Group 04AGARD Neuilly sur Seine France 1979

[22] P Woodward and P Colella ldquo)e numerical simulation oftwo-dimensional fluid flow with strong shocksrdquo Journal ofComputational Physics vol 54 no 1 pp 115ndash173 1984

[23] G M Amdahl ldquoValidity of the single processor approach toachieving large-scale computing capabilitiesrdquo in Proceedingsof the 1967 spring joint computer conference onmdashAFIPSrsquo67(Spring) vol 30 pp 483ndash485 Atlantic NJ USA April 1967

[24] G Axtmann and U Rist ldquoScalability of OpenFOAM withlarge-leddy simulations and DNS on high-performancesytemsrdquo in High-Performance Computing in Science andEngineering W E Nagel Ed Springer International Pub-lishing Berlin Germany pp 413ndash425 2016

[25] A Shah L Yuan and S Islam ldquoNumerical solution of un-steady Navier-Stokes equations on curvilinear meshesrdquoComputers amp Mathematics with Applications vol 63 no 11pp 1548ndash1556 2012

[26] J Kim and P Moin ldquoApplication of a fractional-step methodto incompressible Navier-Stokes equationsrdquo Journal ofComputational Physics vol 59 no 2 pp 308ndash323 1985

[27] A Quarteroni F Saleri and A Veneziani ldquoFactorizationmethods for the numerical approximation of Navier-Stokesequationsrdquo Computer Methods in Applied Mechanics andEngineering vol 188 no 1ndash3 pp 505ndash526 2000

Scientific Programming 11

Page 5: ScalabilityofOpenFOAMDensity-BasedSolverwithRunge–Kutta ...downloads.hindawi.com/journals/sp/2020/9083620.pdfmethods in OpenFOAM for transient flow calculations. eseinclude,forexample,numericalalgorithmsforDNS

is the minimum number of ranks that fit the memory re-quirements for the largest grid For example for 4096 ranksthe ideal speedup would then be 4096128 32 Itcan be observed that rhoCentralRK4Foam outperformsrhoCentralFoam in speedup in each case For the same gridsize the speedup increment percentage increases as the

number of ranks increases To better illustrate and analyzethe scaling performance improvement the results reportedin the three tables are summarized in Figure 3 First we canobserve that for the Grid3 there is a dramatic increase inspeedup when the number of ranks rises from 1024 to 2048and from 2048 to 4096)e speedup of rhoCentralRK4Foam

(a)

075 090 10 12060p

(b)

rhoCentralFoamrhoCentralRK4FoamExperiment

ndash10

ndash05

00

05

10

15

ndashCp

02 04 06 08 1000(x ndash x1)(xt ndash x1)

(c)

Figure 1 ONERAM6 transonic wing results (a) computational mesh (Grid 1) (b) pressure distribution computed with the proposed solverrhoCentralRK4Foam (c) comparison of pressure coefficient at section located (20 span) computed with rhoCentralRK4Foam and theoriginal rhoCentralFoam

190

340

490

0400

640rho

Figure 2 Density distribution in supersonic forward facing step simulations obtained using rhoCentralRK4Foam

Scientific Programming 5

has a 16 times increase from 1024 ranks to 2048 ranks and a17 times increase from 2048 ranks to 4096 ranks for Grid3)e reason is that rhoCentralFoam scales very slowly after1024 ranks eventually reaching a plateau which is an in-dication the solver attained its maximum theoreticalspeedup It is indeed instructive to estimate the serial portionf of CPU time from the speedup of the two solvers usingAmdahlrsquos law [23]

S Nr( 1113857 Nr

1 + Nr minus 1( 1113857(1 minus f) withf

tp

tp + ts

(9)

where tp and ts are the CPU time spent in parallel and serial(ie not parallelizable) sections of the solver respectivelyFrom the previous equations we have

f 1 minus 1S Nr( 1113857( 1113857

1 minus 1Nr( 1113857⟶ f 1 minus

1Sinfin

(10)

where Sinfin equiv S(Nr⟶infin) is the maximum theoreticalspeedup obtained for Nr⟶infin By measuring Sinfin we candirectly evaluate f from equation (10) for the cases featuringasymptotic speedup for rhoCentralFoam (Grid3) this yieldsf≃9988 We can also estimate f but best fitting S(Nc) toAmdahlrsquos formula in equation (9) for rhoCentralRK4Foamwhich yields f≃9998 Of course these results can beaffected by other important factors such as latency andbandwidth of the machine that are not taken into account inAmdahlrsquos model We also observe that the scalability be-comes better as the problem size increases as the workload of

communication over computation decreases For examplewith 2048 ranks the speedup for rhoCentralRK4Foam is6005 for Grid2 (5 million cells) while it becomes 9115 forGrid3 (43million cells) Moreover we can also see that as thegrid size increases the maximum speedup increment per-centage also increases which corresponding to 16 32and 123 for Grid1 Grid2 and Grid3 respectively As afinal exercise the TAU performance tool was applied toprofile the code for Grid3 with 2048 ranks Figure 4 alsoshows the communication and computation time percent-age for rhoCentralFoam and rhoCentralRK4Foam on Grid3from 128 to 4096 ranks We can observe that the rho-CentralRK4Foam solver takes less time to communicatewhich lead to the improved parallel performance found inprevious tests In addition among the MPI subroutines thatare relevant for communication (ie called every time stepin the simulation) MPI_waitall() was the one that employedmost of the time (see Figure 5) which is in line with previousprofiling (see for example Axtmann and Rist [24])

33 Weak Scaling Tests In order to further analyze theparallel scalability of rhoCentralRK4Foam solver we con-ducted a weak scaling analysis based on the ONERA M6wing case and the forward-facing step case )e grid sizeranges from 700 000 to 44 800 000 cells in ONERA M6wing case and from 1 032 000 to 66 060 000 cells in theforward-facing step case )e configurations of the weakscaling test cases are the same as the one in the previous testcases )e number of ranks ranges from 16 to 1024 and 64 to4096 respectively )e number of grid points per rank staysconstant at around 43 times 103 and 16 times 103 for two caseswhich ensures that the computation workload is not im-pacted much by communication Denoting by τ the CPUtime for a given run the relative CPU time is obtained by

Table 1 Normalized speedup for Grid1 (ONERA M6 wing) up to1024 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1341 1367 1939512 1594 1814 138021024 2364 2747 16201

Table 2 Normalized speedup for Grid2 (ONERA M6 wing) up to2048 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1574 1601 1715512 2308 2423 49831024 2916 3575 225992048 4541 6005 32240

Table 3 Normalized speedup for Grid3 (ONERA M6 wing) up to4096 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1871 1896 1336512 3275 3527 76951024 5098 5809 139472048 6341 9115 437474096 6895 1539 123205

rhoCentralFoam-Grid3 rhoCentralRk4Foam-Grid3rhoCentralRk4Foam-Grid2

rhoCentralFoam-Grid1 rhoCentralRk4Foam-Grid1Linear

0

6

12

18

24

30

36

Spee

dup

1024 2048 3072 40960ranks

rhoCentralFoam-Grid2

Figure 3 Strong scaling normalized speedups of rhoCentralFoamand rhoCentralRK4Foam for three grids of the M6 wing test case

6 Scientific Programming

normalizing with respect to the values obtained for 16 and 64ranks ττRanks 16 for 16 ranks and ττRanks 64 for 64ranks )e relative CPU time are recorded in Tables 4 and 5)e comparison of the two solvers is also plotted in Figures 6and 7 )e weak scaling tests give us indication about howwell does the two solvers scale when the problem size isincreased proportional to process count From Tables 4 and5 and Figures 6 and 7 it can be observed that for lower MPItasks (16 to 64) both of the two solvers scale reasonably wellHowever for higher MPI tasks (128 to 1024) the rho-CentralRK4Foam solver scales better Remarkably we canobserve a distinguishable relative time difference betweenthe two solvers as the number of ranks increases from 512 to1024 )e profiling results from TAU show that in the testcase with 1024 ranks rhoCentralRK4Foam spends around

40 time on computation while rhoCentralFoam only hasaround 20 time spent on computation As for the forward-facing step case we were able to conduct the tests at largercores count (up to 4096 cores) and it can be confirmed thatrhoCentralRK4Foam solver has better scaling performancethan rhoCentralFoam solver Indeed it scales better withlarger grid size and the relative time of rhoCentralRK4Foamsolver maintains at 1063 with 4096 cores (the relative time is1148 with 1024 cores in ONERA M6 wing case) Generallythe rhoCentralRK4Foam solver outperforms the rhoCen-tralFoam solver at large-scale ranks due to communicationhiding

rhoCentralFoam communicationrhoCentralRk4Foam communication

0

20

40

60

80

100

Tim

e per

cent

age (

)

600 1200 1800 2400 3000 3600 42000ranks

(a)

rhoCentralFoam communicationrhoCentralRk4Foam communication

0

20

40

60

80

100

Tim

e per

cent

age (

)

600 1200 1800 2400 3000 3600 42000ranks

(b)

Figure 4 Communication (a) and computation (b) time percentage for the strong scaling analysis (M6 test case Grid 3)

128 256 512 1024 20480

2

4

6

8

10

12

14

489

468

356

519

416

114

2

101

6

883 9

52

766

MPI

wai

tall

()

rhoCentralFoamrhoCentralRK4Foam

Figure 5 MPI waitall() time percentage for rhoCentralFoam andrhoCentralRK4Foam for M6 test case (Grid3) up to 2048 ranks

Table 4 Relative time results ττRanks 16 of weak scaling test(ONERA M6 wing)

Ranks rhoCentralFoam rhoCentralRK4Foam16 1 132 1085 106264 1106 1105128 1148 1057256 1191 1100512 1213 11581024 1255 1148

Table 5 Relative time results ττRanks 64 of the weak scaling test(forward-facing step)

Ranks rhoCentralFoam rhoCentralRK4Foam64 1 1128 1034 1002256 1085 1018512 1153 10311024 1186 10412048 1212 10674096 1203 1063

Scientific Programming 7

4 Results for the Time-to-Solution Analysis

)e scaling analysis is an important exercise to evaluate theparallel performance of the code on multiple cores howeverit does not provide insights into the time accuracy of thenumerical algorithm nor into the time-to-solution needed toevolve the discretized system of equations (equations(1)ndash(3)) up to a final state within an acceptable numericalerror For example in the scaling study conducted on theM6transonic wing the number of iterations was fixed for bothsolvers rhoCentralFoam and rhoCentralRK4Foam andcalculations were performed in the inviscid limit so that theimplicit solver used in rhoCentralFoam to integrate the

viscous fluxes was not active In such circumstancescomparing the two solvers essentially amounts at comparingfirst-order (Euler) and fourth-order forward in time Run-gendashKutta algorithms Hence it is not surprising that theCPU time for rhoCentralRK4Foam is about four times largerthan rhoCentralFoam (for the same number of iterations)since it requires four more evaluations of the right-hand-sideof equations (5) per time step For a sound evaluation of thetime-to-solution however we need to compare the twosolvers over the same physical time at which the errors withrespect to an exact solution are comparable Because the timeaccuracy of the solvers is different time step is also differentand so are the number of iterations required to achieve thefinal state

41 Test Case Description For the time-to-solution analysiswe consider the numerical simulation of TaylorndashGreen (TG)vortex a benchmark problem in CFD [25] for validatingunsteady flow solvers [26 27] requiring time accurate in-tegration scheme as the RungendashKutta scheme implementedin rhoCentralRK4Foam )ere is no turbulence model ap-plied in the current simulation )e TG vortex admits an-alytical time-dependent solutions which allows for precisedefinition of the numerical error of the algorithm withrespect to a given quantity of interest )e flow is initializedin a square domain 0lex yleL as follows

ρ(x y t) ρ0 (11)

u(x y 0) minusU0 cos kx sin ky (12)

v(x y 0) U0 sin kx cos ky (13)

p(x y 0) p0 minusρ0U2

04

(cos 2 kx + cos 2 ky) (14)

where u and v are the velocity components along x and ydirections k equiv 2πL is the wave number and U0 ρ0 and p0are the arbitrary constant velocity density and pressurerespectively )e analytical solution for the initial-valueproblem defined by equations (11)ndash(14) is

ρ(x y t) ρ0 (15)

u(x y t) minusU0 cos kx sin kyeminus 2]tk2

(16)

v(x y t) U0 sin kx cos kyeminus 2]tk2

(17)

p(x y t) p0 minusρ0U2

04

(cos 2 kx + cos 2 ky)eminus 4]tk2

(18)

which shows the velocity decay due to viscous dissipationAs a quantity of interest we chose to monitor both thevelocity profiles u at selected location and the overall kineticenergy in the computational box which can be readilyobtained from equations (16)ndash(18) as

ε 1L2 1113946

L

01113946

L

0

u2 + v2

2dxdy

U204

eminus 4]tk2

(19)

13

12

11

10

0 400ranks

Rela

tive t

ime

rhoCentralFoamrhoCentralRK4Foam

800 1200

Figure 6 Relative time ττRanks 16 of weak scaling tests forONERA M6 wing case

0 1000 2000ranks

3000 4000 5000

Rela

tive t

ime

rhoCentralFoamrhoCentralRK4Foam

12

11

10

Figure 7 Relative time ττRanks 64 of weak scaling tests forforward-facing step

8 Scientific Programming

42 Comparison of Time-to-Solution For the analysis ofrhoCentralFoam and rhoCentralRK4Foam solvers we con-sider fixed box size L 2πm and U0 1ms and two valuesof Reynolds number Re LU0v to test the effect of vis-cosity )e computation is done on a 160 times 160 pointsuniform grid )e simulations run up to t 058 s for Re

628 and t 116 s for Re 1256 which corresponds to thesame nondimensional time ttref 00147 with tref equiv L2]At this time the kinetic energy ε decreased to 10 from theinitial value as an illustrative example Figure 8 shows thevelocity magnitude contour for the Re 628 Figure 9presents the computed velocity profile u(y) in the middle

of the box x L2 rhoC stands for rhoCentralFoam solverand rhoCRK4 stands for rhoCentralRK4Foam solver It canbe observed that rhoCentralRK4Foam shows an excellentagreement with the analytical profile for Δt 10minus 4 s (re-ducing the time step further does provide any furtherconvergence for the grid used here) In order to achieve thesame level of accuracy with rhoCentralFoam time step hasto be reduced by a factor of 5 to Δt 2 times 10minus 5 s compared torhoCentralRK4Foam )e same behavior is observed for theevolution of ε as shown in Figure 10

In order to calculate the time-to-solution τ we firstevaluated the CPU time τ0 per time step per grid point per

0081275

016255

024383

4939e ndash 10

3251e ndash 01U magnitude

Y

ZX

Figure 8 Contour of velocity magnitude U equiv (u2 + v2)12 for the TaylorndashGreen vortex at ttref 00147 for the case Re 628

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

ndash030

ndash015

000

015

030

u

157 314 471 628000y

(a)

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

157 314 471 628000y

ndash030

ndash015

000

015

030

u

(b)

Figure 9 Velocity distribution in the TaylorndashGreen vortex problem at ttref 00147 (a)Re 628 (b)Re 1256

Scientific Programming 9

core for both solvers We used an Intel Xeon E5-2695v4processor installed on )eta and Bebop platforms atArgonne National Laboratory and reported the results of thetests in Table 6 Even though τ0 depends on the specificprocessor used but the ratio τ0rhoCentralRK4τ

0rhoCentral does not

and so is a good indication of the relative performance of thetwo solvers In the present case this ratio is consistentlyfound to be 3 for all Reynolds number we made additionaltests with much higher Reynolds up to Re sim 105 to confirmthis (see Table 6) Denoting by tf the physical final time ofthe simulation by Niter the number of iterations and byNpoints the number of grid points the time-to-solution forthe two solvers can be obtained as

τ τ0 times Niter times Npoints withNiter tf

Δt (20)

and is reported in Table 7 It is concluded that to achieve thesame level of accuracy τrhoCentralRK4 is about 15 to 16 timessmaller than τrhoCentral Additionally as we discussed in thescaling analysis rhoCentralRK4Foam can achieve up to123 improvement in scalability over rhoCentralFoamwhen using 4096 ranks )us the reduction in time-to-solution for rhoCentralRK4Foam is expected to be evenlarger for large-scale time-accurate simulations utilizingthousands of parallel cores

5 Conclusion

In this study we presented a new solver called rhoCen-tralRK4Foam for the integration of NavierndashStokes equationsin the CFD package OpenFOAM)e novelty consists in thereplacement of first-order time integration scheme of thenative solver rhoCentralFoam with a third-order Run-gendashKutta scheme which is more suitable for the simulationof time-dependent flows We first analyzed the scalability ofthe two solvers for a benchmark case featuring transonicflow over a wing on a structured mesh and showed thatrhoCentralRK4Foam can achieve a substantial improvement(up to 120) in strong scaling compared to rhoCen-tralFoam We also observed that the scalability becomesbetter as the problem size grows Hence even thoughOpenFOAM scalability is generally poor or decent at bestthe new solver can at least alleviate this deficiency We thenanalyzed the performance of the two solvers in the case ofTaylorndashGreen vortex decay and compared the time-to-so-lution which gives an indication of the workload needed toachieve the same level of accuracy of the solution (ie thesame numerical error) For the problem considered here weobtained a reduction of time-to-solution by a factor of about15 when using rhoCentralRK4Foam compared to rho-CentralFoam due to the use of larger time steps to get to thefinal solution with the same numerical accuracy )is factorwill be eventually even larger for larger-scale simulationsemploying thousands or more cores thanks to the betterspeedup of rhoCentralRK4Foam )e proposed solver ispotentially a useful alternative for the simulation of com-pressible flows requiring time-accurate integration like indirect or large-eddy simulations Furthermore the imple-mentation in OpenFOAM can be easily generalized todifferent schemes of RungendashKutta family with minimal codemodification

Data Availability

)e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

)e authors declare that they have no conflicts of interest

Acknowledgments

)is work was supported by Argonne National Laboratorythrough grant ANL 4J-30361-0030A titled ldquoMultiscale

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

000

005

010

015

020

025

ε

01 02 03 04 05 0600Time (s)

Figure 10 Evolution of total kinetic energy

Table 6 CPU time per time step per grid point per core τ0

τ0rhoCentral τ0rhoCentralRK4 τ0rhoCentralRK4τ0rhoCentral

Re 628 1230eminus 06 s 3590eminus 06 s 2919Re 1256 1120eminus 06 s 3630eminus 06 s 3241Re 1260 1189eminus 06 s 3626eminus 06 s 3046Re 12600 1187eminus 06 s 3603eminus 06 s 3035Re 126000 1191eminus 06 s 3622eminus 06 s 3044

Table 7 Time-to-solution τ for rhoCentralFoam and rhoCentral-RK4Foam

τrhoCentral τrhoCentralRK4 τrhoCentralτrhoCentralRK4Δt 2 times 10minus 5 s Δt 10minus 4 s

Re 628 848 s 532 s 159Re 1256 826 s 539 s 153

10 Scientific Programming

modeling of complex flowsrdquo Numerical simulations wereperformed using the Director Discretionary allocationldquoOF_SCALINGrdquo in the Leadership Computing Facility atArgonne National Laboratory

References

[1] H G Weller G Tabor H Jasak and C Fureby ldquoA tensorialapproach to computational continuum mechanics usingobject-oriented techniquesrdquo Computers in Physics vol 12no 6 pp 620ndash631 1998

[2] V Vuorinen J-P Keskinen C Duwig and B J BoersmaldquoOn the implementation of low-dissipative Runge-Kuttaprojection methods for time dependent flows using Open-FOAMrdquo Computers amp Fluids vol 93 pp 153ndash163 2014

[3] C Shen F Sun and X Xia ldquoImplementation of density-basedsolver for all speeds in the framework of openFOAMrdquoComputer Physics Communications vol 185 no 10pp 2730ndash2741 2014

[4] D Modesti and S Pirozzoli ldquoA low-dissipative solver forturbulent compressible flows on unstructured meshes withOpenFOAM implementationrdquo Computers amp Fluids vol 152pp 14ndash23 2017

[5] S Li and R Paoli ldquoModeling of ice accretion over aircraftwings using a compressible OpenFOAM solverrdquo Interna-tional Journal of Aerospace Engineering vol 2019 Article ID4864927 11 pages 2019

[6] Q Yang P Zhao and H Ge ldquoReactingFoam-SCI an opensource CFD platform for reacting flow simulationrdquo Com-puters amp Fluids vol 190 pp 114ndash127 2019

[7] M Culpo Current Bottlenecks in the Scalability of OpenFOAMon Massively Parallel Clusters Partnership for AdvancedComputing in Europe Brussels Belgium 2012

[8] O Rivera and K Furlinger ldquoParallel aspects of OpenFOAMwith large eddy simulationsrdquo in Proceedings of the 2011 IEEEInternational Conference on High Performance Computingand Communications pp 389ndash396 Banff Canada September2011

[9] A Duran M S Celebi S Piskin and M Tuncel ldquoScalabilityof OpenFOAM for bio-medical flow simulationsrdquoCe Journalof Supercomputing vol 71 no 3 pp 938ndash951 2015

[10] Z Lin W Yang H Zhou et al ldquoCommunication optimi-zation for multiphase flow solver in the library of Open-FOAMrdquo Water vol 10 no 10 pp 1461ndash1529 2018

[11] R Ojha P Pawar S Gupta M Klemm and M NambiarldquoPerformance optimization of OpenFOAM on clusters ofIntel Xeon Phitrade processorsrdquo in Proceedings of the 2017 IEEE24th International Conference on High Performance Com-puting Workshops (HiPCW) Jaipur India December 2017

[12] C J Greenshields H G Weller L Gasparini and J M ReeseldquoImplementation of semi-discrete non-staggered centralschemes in a colocated polyhedral finite volume frameworkfor high-speed viscous flowsrdquo International Journal for Nu-merical Methods in Fluids vol 63 pp 1ndash21 2010

[13] A Kurganov and E Tadmor ldquoNew high-resolution centralschemes for nonlinear conservation laws and convection-diffusion equationsrdquo Journal of Computational Physicsvol 160 no 1 pp 241ndash282 2000

[14] A Kurganov S Noelle and G Petrova ldquoSemidiscrete central-upwind schemes for hyperbolic conservation laws andHamiltonndashJacobi equationsrdquo SIAM Journal on ScientificComputing vol 23 no 3 pp 707ndash740 2006

[15] J A Heyns O F Oxtoby and A Steenkamp ldquoModellinghigh-speed viscous flow in OpenFOAMrdquo in Proceedings of the9th OpenFOAM Workshop Zagreb Croatia June 2014

[16] M S Liou and C J Steffen Jr ldquoA new flux splitting schemerdquoJournal of Computational Physics vol 107 no 23 p 39 1993

[17] D Drikakis M Hahn A Mosedale and B )ornber ldquoLargeeddy simulation using high-resolution and high-ordermethodsrdquo Philosophical Transactions of the Royal Society AMathematical Physical and Engineering Sciences vol 367no 1899 pp 2985ndash2997 2019

[18] K Harms T Leggett B Allen et al ldquo)eta rapid installationand acceptance of an XC40 KNL systemrdquo Concurrency andComputation Practice and Experience vol 30 no 1 p 112019

[19] S S Shende and A D Malony ldquo)e TAU parallel perfor-mance systemrdquo Ce International Journal of High Perfor-mance Computing Applications vol 20 no 2 pp 287ndash3112006

[20] C Chevalier and F Pellegrini ldquoPT-Scotch a tool for efficientparallel graph orderingrdquo Parallel Computing vol 34 no 6ndash8pp 318ndash331 2008

[21] V Schmitt and F Charpin ldquoPressure distributions on theONERA-M6-wing at transonic mach numbersrdquo in Experi-mental Data Base for Computer Program Assessment andReport of the Fluid Dynamics Panel Working Group 04AGARD Neuilly sur Seine France 1979

[22] P Woodward and P Colella ldquo)e numerical simulation oftwo-dimensional fluid flow with strong shocksrdquo Journal ofComputational Physics vol 54 no 1 pp 115ndash173 1984

[23] G M Amdahl ldquoValidity of the single processor approach toachieving large-scale computing capabilitiesrdquo in Proceedingsof the 1967 spring joint computer conference onmdashAFIPSrsquo67(Spring) vol 30 pp 483ndash485 Atlantic NJ USA April 1967

[24] G Axtmann and U Rist ldquoScalability of OpenFOAM withlarge-leddy simulations and DNS on high-performancesytemsrdquo in High-Performance Computing in Science andEngineering W E Nagel Ed Springer International Pub-lishing Berlin Germany pp 413ndash425 2016

[25] A Shah L Yuan and S Islam ldquoNumerical solution of un-steady Navier-Stokes equations on curvilinear meshesrdquoComputers amp Mathematics with Applications vol 63 no 11pp 1548ndash1556 2012

[26] J Kim and P Moin ldquoApplication of a fractional-step methodto incompressible Navier-Stokes equationsrdquo Journal ofComputational Physics vol 59 no 2 pp 308ndash323 1985

[27] A Quarteroni F Saleri and A Veneziani ldquoFactorizationmethods for the numerical approximation of Navier-Stokesequationsrdquo Computer Methods in Applied Mechanics andEngineering vol 188 no 1ndash3 pp 505ndash526 2000

Scientific Programming 11

Page 6: ScalabilityofOpenFOAMDensity-BasedSolverwithRunge–Kutta ...downloads.hindawi.com/journals/sp/2020/9083620.pdfmethods in OpenFOAM for transient flow calculations. eseinclude,forexample,numericalalgorithmsforDNS

has a 16 times increase from 1024 ranks to 2048 ranks and a17 times increase from 2048 ranks to 4096 ranks for Grid3)e reason is that rhoCentralFoam scales very slowly after1024 ranks eventually reaching a plateau which is an in-dication the solver attained its maximum theoreticalspeedup It is indeed instructive to estimate the serial portionf of CPU time from the speedup of the two solvers usingAmdahlrsquos law [23]

S Nr( 1113857 Nr

1 + Nr minus 1( 1113857(1 minus f) withf

tp

tp + ts

(9)

where tp and ts are the CPU time spent in parallel and serial(ie not parallelizable) sections of the solver respectivelyFrom the previous equations we have

f 1 minus 1S Nr( 1113857( 1113857

1 minus 1Nr( 1113857⟶ f 1 minus

1Sinfin

(10)

where Sinfin equiv S(Nr⟶infin) is the maximum theoreticalspeedup obtained for Nr⟶infin By measuring Sinfin we candirectly evaluate f from equation (10) for the cases featuringasymptotic speedup for rhoCentralFoam (Grid3) this yieldsf≃9988 We can also estimate f but best fitting S(Nc) toAmdahlrsquos formula in equation (9) for rhoCentralRK4Foamwhich yields f≃9998 Of course these results can beaffected by other important factors such as latency andbandwidth of the machine that are not taken into account inAmdahlrsquos model We also observe that the scalability be-comes better as the problem size increases as the workload of

communication over computation decreases For examplewith 2048 ranks the speedup for rhoCentralRK4Foam is6005 for Grid2 (5 million cells) while it becomes 9115 forGrid3 (43million cells) Moreover we can also see that as thegrid size increases the maximum speedup increment per-centage also increases which corresponding to 16 32and 123 for Grid1 Grid2 and Grid3 respectively As afinal exercise the TAU performance tool was applied toprofile the code for Grid3 with 2048 ranks Figure 4 alsoshows the communication and computation time percent-age for rhoCentralFoam and rhoCentralRK4Foam on Grid3from 128 to 4096 ranks We can observe that the rho-CentralRK4Foam solver takes less time to communicatewhich lead to the improved parallel performance found inprevious tests In addition among the MPI subroutines thatare relevant for communication (ie called every time stepin the simulation) MPI_waitall() was the one that employedmost of the time (see Figure 5) which is in line with previousprofiling (see for example Axtmann and Rist [24])

33 Weak Scaling Tests In order to further analyze theparallel scalability of rhoCentralRK4Foam solver we con-ducted a weak scaling analysis based on the ONERA M6wing case and the forward-facing step case )e grid sizeranges from 700 000 to 44 800 000 cells in ONERA M6wing case and from 1 032 000 to 66 060 000 cells in theforward-facing step case )e configurations of the weakscaling test cases are the same as the one in the previous testcases )e number of ranks ranges from 16 to 1024 and 64 to4096 respectively )e number of grid points per rank staysconstant at around 43 times 103 and 16 times 103 for two caseswhich ensures that the computation workload is not im-pacted much by communication Denoting by τ the CPUtime for a given run the relative CPU time is obtained by

Table 1 Normalized speedup for Grid1 (ONERA M6 wing) up to1024 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1341 1367 1939512 1594 1814 138021024 2364 2747 16201

Table 2 Normalized speedup for Grid2 (ONERA M6 wing) up to2048 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1574 1601 1715512 2308 2423 49831024 2916 3575 225992048 4541 6005 32240

Table 3 Normalized speedup for Grid3 (ONERA M6 wing) up to4096 ranks

Ranks rhoCentralFoam rhoCentralRK4Foam increment128 1 1 0256 1871 1896 1336512 3275 3527 76951024 5098 5809 139472048 6341 9115 437474096 6895 1539 123205

rhoCentralFoam-Grid3 rhoCentralRk4Foam-Grid3rhoCentralRk4Foam-Grid2

rhoCentralFoam-Grid1 rhoCentralRk4Foam-Grid1Linear

0

6

12

18

24

30

36

Spee

dup

1024 2048 3072 40960ranks

rhoCentralFoam-Grid2

Figure 3 Strong scaling normalized speedups of rhoCentralFoamand rhoCentralRK4Foam for three grids of the M6 wing test case

6 Scientific Programming

normalizing with respect to the values obtained for 16 and 64ranks ττRanks 16 for 16 ranks and ττRanks 64 for 64ranks )e relative CPU time are recorded in Tables 4 and 5)e comparison of the two solvers is also plotted in Figures 6and 7 )e weak scaling tests give us indication about howwell does the two solvers scale when the problem size isincreased proportional to process count From Tables 4 and5 and Figures 6 and 7 it can be observed that for lower MPItasks (16 to 64) both of the two solvers scale reasonably wellHowever for higher MPI tasks (128 to 1024) the rho-CentralRK4Foam solver scales better Remarkably we canobserve a distinguishable relative time difference betweenthe two solvers as the number of ranks increases from 512 to1024 )e profiling results from TAU show that in the testcase with 1024 ranks rhoCentralRK4Foam spends around

40 time on computation while rhoCentralFoam only hasaround 20 time spent on computation As for the forward-facing step case we were able to conduct the tests at largercores count (up to 4096 cores) and it can be confirmed thatrhoCentralRK4Foam solver has better scaling performancethan rhoCentralFoam solver Indeed it scales better withlarger grid size and the relative time of rhoCentralRK4Foamsolver maintains at 1063 with 4096 cores (the relative time is1148 with 1024 cores in ONERA M6 wing case) Generallythe rhoCentralRK4Foam solver outperforms the rhoCen-tralFoam solver at large-scale ranks due to communicationhiding

rhoCentralFoam communicationrhoCentralRk4Foam communication

0

20

40

60

80

100

Tim

e per

cent

age (

)

600 1200 1800 2400 3000 3600 42000ranks

(a)

rhoCentralFoam communicationrhoCentralRk4Foam communication

0

20

40

60

80

100

Tim

e per

cent

age (

)

600 1200 1800 2400 3000 3600 42000ranks

(b)

Figure 4 Communication (a) and computation (b) time percentage for the strong scaling analysis (M6 test case Grid 3)

128 256 512 1024 20480

2

4

6

8

10

12

14

489

468

356

519

416

114

2

101

6

883 9

52

766

MPI

wai

tall

()

rhoCentralFoamrhoCentralRK4Foam

Figure 5 MPI waitall() time percentage for rhoCentralFoam andrhoCentralRK4Foam for M6 test case (Grid3) up to 2048 ranks

Table 4 Relative time results ττRanks 16 of weak scaling test(ONERA M6 wing)

Ranks rhoCentralFoam rhoCentralRK4Foam16 1 132 1085 106264 1106 1105128 1148 1057256 1191 1100512 1213 11581024 1255 1148

Table 5 Relative time results ττRanks 64 of the weak scaling test(forward-facing step)

Ranks rhoCentralFoam rhoCentralRK4Foam64 1 1128 1034 1002256 1085 1018512 1153 10311024 1186 10412048 1212 10674096 1203 1063

Scientific Programming 7

4 Results for the Time-to-Solution Analysis

)e scaling analysis is an important exercise to evaluate theparallel performance of the code on multiple cores howeverit does not provide insights into the time accuracy of thenumerical algorithm nor into the time-to-solution needed toevolve the discretized system of equations (equations(1)ndash(3)) up to a final state within an acceptable numericalerror For example in the scaling study conducted on theM6transonic wing the number of iterations was fixed for bothsolvers rhoCentralFoam and rhoCentralRK4Foam andcalculations were performed in the inviscid limit so that theimplicit solver used in rhoCentralFoam to integrate the

viscous fluxes was not active In such circumstancescomparing the two solvers essentially amounts at comparingfirst-order (Euler) and fourth-order forward in time Run-gendashKutta algorithms Hence it is not surprising that theCPU time for rhoCentralRK4Foam is about four times largerthan rhoCentralFoam (for the same number of iterations)since it requires four more evaluations of the right-hand-sideof equations (5) per time step For a sound evaluation of thetime-to-solution however we need to compare the twosolvers over the same physical time at which the errors withrespect to an exact solution are comparable Because the timeaccuracy of the solvers is different time step is also differentand so are the number of iterations required to achieve thefinal state

41 Test Case Description For the time-to-solution analysiswe consider the numerical simulation of TaylorndashGreen (TG)vortex a benchmark problem in CFD [25] for validatingunsteady flow solvers [26 27] requiring time accurate in-tegration scheme as the RungendashKutta scheme implementedin rhoCentralRK4Foam )ere is no turbulence model ap-plied in the current simulation )e TG vortex admits an-alytical time-dependent solutions which allows for precisedefinition of the numerical error of the algorithm withrespect to a given quantity of interest )e flow is initializedin a square domain 0lex yleL as follows

ρ(x y t) ρ0 (11)

u(x y 0) minusU0 cos kx sin ky (12)

v(x y 0) U0 sin kx cos ky (13)

p(x y 0) p0 minusρ0U2

04

(cos 2 kx + cos 2 ky) (14)

where u and v are the velocity components along x and ydirections k equiv 2πL is the wave number and U0 ρ0 and p0are the arbitrary constant velocity density and pressurerespectively )e analytical solution for the initial-valueproblem defined by equations (11)ndash(14) is

ρ(x y t) ρ0 (15)

u(x y t) minusU0 cos kx sin kyeminus 2]tk2

(16)

v(x y t) U0 sin kx cos kyeminus 2]tk2

(17)

p(x y t) p0 minusρ0U2

04

(cos 2 kx + cos 2 ky)eminus 4]tk2

(18)

which shows the velocity decay due to viscous dissipationAs a quantity of interest we chose to monitor both thevelocity profiles u at selected location and the overall kineticenergy in the computational box which can be readilyobtained from equations (16)ndash(18) as

ε 1L2 1113946

L

01113946

L

0

u2 + v2

2dxdy

U204

eminus 4]tk2

(19)

13

12

11

10

0 400ranks

Rela

tive t

ime

rhoCentralFoamrhoCentralRK4Foam

800 1200

Figure 6 Relative time ττRanks 16 of weak scaling tests forONERA M6 wing case

0 1000 2000ranks

3000 4000 5000

Rela

tive t

ime

rhoCentralFoamrhoCentralRK4Foam

12

11

10

Figure 7 Relative time ττRanks 64 of weak scaling tests forforward-facing step

8 Scientific Programming

42 Comparison of Time-to-Solution For the analysis ofrhoCentralFoam and rhoCentralRK4Foam solvers we con-sider fixed box size L 2πm and U0 1ms and two valuesof Reynolds number Re LU0v to test the effect of vis-cosity )e computation is done on a 160 times 160 pointsuniform grid )e simulations run up to t 058 s for Re

628 and t 116 s for Re 1256 which corresponds to thesame nondimensional time ttref 00147 with tref equiv L2]At this time the kinetic energy ε decreased to 10 from theinitial value as an illustrative example Figure 8 shows thevelocity magnitude contour for the Re 628 Figure 9presents the computed velocity profile u(y) in the middle

of the box x L2 rhoC stands for rhoCentralFoam solverand rhoCRK4 stands for rhoCentralRK4Foam solver It canbe observed that rhoCentralRK4Foam shows an excellentagreement with the analytical profile for Δt 10minus 4 s (re-ducing the time step further does provide any furtherconvergence for the grid used here) In order to achieve thesame level of accuracy with rhoCentralFoam time step hasto be reduced by a factor of 5 to Δt 2 times 10minus 5 s compared torhoCentralRK4Foam )e same behavior is observed for theevolution of ε as shown in Figure 10

In order to calculate the time-to-solution τ we firstevaluated the CPU time τ0 per time step per grid point per

0081275

016255

024383

4939e ndash 10

3251e ndash 01U magnitude

Y

ZX

Figure 8 Contour of velocity magnitude U equiv (u2 + v2)12 for the TaylorndashGreen vortex at ttref 00147 for the case Re 628

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

ndash030

ndash015

000

015

030

u

157 314 471 628000y

(a)

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

157 314 471 628000y

ndash030

ndash015

000

015

030

u

(b)

Figure 9 Velocity distribution in the TaylorndashGreen vortex problem at ttref 00147 (a)Re 628 (b)Re 1256

Scientific Programming 9

core for both solvers We used an Intel Xeon E5-2695v4processor installed on )eta and Bebop platforms atArgonne National Laboratory and reported the results of thetests in Table 6 Even though τ0 depends on the specificprocessor used but the ratio τ0rhoCentralRK4τ

0rhoCentral does not

and so is a good indication of the relative performance of thetwo solvers In the present case this ratio is consistentlyfound to be 3 for all Reynolds number we made additionaltests with much higher Reynolds up to Re sim 105 to confirmthis (see Table 6) Denoting by tf the physical final time ofthe simulation by Niter the number of iterations and byNpoints the number of grid points the time-to-solution forthe two solvers can be obtained as

τ τ0 times Niter times Npoints withNiter tf

Δt (20)

and is reported in Table 7 It is concluded that to achieve thesame level of accuracy τrhoCentralRK4 is about 15 to 16 timessmaller than τrhoCentral Additionally as we discussed in thescaling analysis rhoCentralRK4Foam can achieve up to123 improvement in scalability over rhoCentralFoamwhen using 4096 ranks )us the reduction in time-to-solution for rhoCentralRK4Foam is expected to be evenlarger for large-scale time-accurate simulations utilizingthousands of parallel cores

5 Conclusion

In this study we presented a new solver called rhoCen-tralRK4Foam for the integration of NavierndashStokes equationsin the CFD package OpenFOAM)e novelty consists in thereplacement of first-order time integration scheme of thenative solver rhoCentralFoam with a third-order Run-gendashKutta scheme which is more suitable for the simulationof time-dependent flows We first analyzed the scalability ofthe two solvers for a benchmark case featuring transonicflow over a wing on a structured mesh and showed thatrhoCentralRK4Foam can achieve a substantial improvement(up to 120) in strong scaling compared to rhoCen-tralFoam We also observed that the scalability becomesbetter as the problem size grows Hence even thoughOpenFOAM scalability is generally poor or decent at bestthe new solver can at least alleviate this deficiency We thenanalyzed the performance of the two solvers in the case ofTaylorndashGreen vortex decay and compared the time-to-so-lution which gives an indication of the workload needed toachieve the same level of accuracy of the solution (ie thesame numerical error) For the problem considered here weobtained a reduction of time-to-solution by a factor of about15 when using rhoCentralRK4Foam compared to rho-CentralFoam due to the use of larger time steps to get to thefinal solution with the same numerical accuracy )is factorwill be eventually even larger for larger-scale simulationsemploying thousands or more cores thanks to the betterspeedup of rhoCentralRK4Foam )e proposed solver ispotentially a useful alternative for the simulation of com-pressible flows requiring time-accurate integration like indirect or large-eddy simulations Furthermore the imple-mentation in OpenFOAM can be easily generalized todifferent schemes of RungendashKutta family with minimal codemodification

Data Availability

)e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

)e authors declare that they have no conflicts of interest

Acknowledgments

)is work was supported by Argonne National Laboratorythrough grant ANL 4J-30361-0030A titled ldquoMultiscale

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

000

005

010

015

020

025

ε

01 02 03 04 05 0600Time (s)

Figure 10 Evolution of total kinetic energy

Table 6 CPU time per time step per grid point per core τ0

τ0rhoCentral τ0rhoCentralRK4 τ0rhoCentralRK4τ0rhoCentral

Re 628 1230eminus 06 s 3590eminus 06 s 2919Re 1256 1120eminus 06 s 3630eminus 06 s 3241Re 1260 1189eminus 06 s 3626eminus 06 s 3046Re 12600 1187eminus 06 s 3603eminus 06 s 3035Re 126000 1191eminus 06 s 3622eminus 06 s 3044

Table 7 Time-to-solution τ for rhoCentralFoam and rhoCentral-RK4Foam

τrhoCentral τrhoCentralRK4 τrhoCentralτrhoCentralRK4Δt 2 times 10minus 5 s Δt 10minus 4 s

Re 628 848 s 532 s 159Re 1256 826 s 539 s 153

10 Scientific Programming

modeling of complex flowsrdquo Numerical simulations wereperformed using the Director Discretionary allocationldquoOF_SCALINGrdquo in the Leadership Computing Facility atArgonne National Laboratory

References

[1] H G Weller G Tabor H Jasak and C Fureby ldquoA tensorialapproach to computational continuum mechanics usingobject-oriented techniquesrdquo Computers in Physics vol 12no 6 pp 620ndash631 1998

[2] V Vuorinen J-P Keskinen C Duwig and B J BoersmaldquoOn the implementation of low-dissipative Runge-Kuttaprojection methods for time dependent flows using Open-FOAMrdquo Computers amp Fluids vol 93 pp 153ndash163 2014

[3] C Shen F Sun and X Xia ldquoImplementation of density-basedsolver for all speeds in the framework of openFOAMrdquoComputer Physics Communications vol 185 no 10pp 2730ndash2741 2014

[4] D Modesti and S Pirozzoli ldquoA low-dissipative solver forturbulent compressible flows on unstructured meshes withOpenFOAM implementationrdquo Computers amp Fluids vol 152pp 14ndash23 2017

[5] S Li and R Paoli ldquoModeling of ice accretion over aircraftwings using a compressible OpenFOAM solverrdquo Interna-tional Journal of Aerospace Engineering vol 2019 Article ID4864927 11 pages 2019

[6] Q Yang P Zhao and H Ge ldquoReactingFoam-SCI an opensource CFD platform for reacting flow simulationrdquo Com-puters amp Fluids vol 190 pp 114ndash127 2019

[7] M Culpo Current Bottlenecks in the Scalability of OpenFOAMon Massively Parallel Clusters Partnership for AdvancedComputing in Europe Brussels Belgium 2012

[8] O Rivera and K Furlinger ldquoParallel aspects of OpenFOAMwith large eddy simulationsrdquo in Proceedings of the 2011 IEEEInternational Conference on High Performance Computingand Communications pp 389ndash396 Banff Canada September2011

[9] A Duran M S Celebi S Piskin and M Tuncel ldquoScalabilityof OpenFOAM for bio-medical flow simulationsrdquoCe Journalof Supercomputing vol 71 no 3 pp 938ndash951 2015

[10] Z Lin W Yang H Zhou et al ldquoCommunication optimi-zation for multiphase flow solver in the library of Open-FOAMrdquo Water vol 10 no 10 pp 1461ndash1529 2018

[11] R Ojha P Pawar S Gupta M Klemm and M NambiarldquoPerformance optimization of OpenFOAM on clusters ofIntel Xeon Phitrade processorsrdquo in Proceedings of the 2017 IEEE24th International Conference on High Performance Com-puting Workshops (HiPCW) Jaipur India December 2017

[12] C J Greenshields H G Weller L Gasparini and J M ReeseldquoImplementation of semi-discrete non-staggered centralschemes in a colocated polyhedral finite volume frameworkfor high-speed viscous flowsrdquo International Journal for Nu-merical Methods in Fluids vol 63 pp 1ndash21 2010

[13] A Kurganov and E Tadmor ldquoNew high-resolution centralschemes for nonlinear conservation laws and convection-diffusion equationsrdquo Journal of Computational Physicsvol 160 no 1 pp 241ndash282 2000

[14] A Kurganov S Noelle and G Petrova ldquoSemidiscrete central-upwind schemes for hyperbolic conservation laws andHamiltonndashJacobi equationsrdquo SIAM Journal on ScientificComputing vol 23 no 3 pp 707ndash740 2006

[15] J A Heyns O F Oxtoby and A Steenkamp ldquoModellinghigh-speed viscous flow in OpenFOAMrdquo in Proceedings of the9th OpenFOAM Workshop Zagreb Croatia June 2014

[16] M S Liou and C J Steffen Jr ldquoA new flux splitting schemerdquoJournal of Computational Physics vol 107 no 23 p 39 1993

[17] D Drikakis M Hahn A Mosedale and B )ornber ldquoLargeeddy simulation using high-resolution and high-ordermethodsrdquo Philosophical Transactions of the Royal Society AMathematical Physical and Engineering Sciences vol 367no 1899 pp 2985ndash2997 2019

[18] K Harms T Leggett B Allen et al ldquo)eta rapid installationand acceptance of an XC40 KNL systemrdquo Concurrency andComputation Practice and Experience vol 30 no 1 p 112019

[19] S S Shende and A D Malony ldquo)e TAU parallel perfor-mance systemrdquo Ce International Journal of High Perfor-mance Computing Applications vol 20 no 2 pp 287ndash3112006

[20] C Chevalier and F Pellegrini ldquoPT-Scotch a tool for efficientparallel graph orderingrdquo Parallel Computing vol 34 no 6ndash8pp 318ndash331 2008

[21] V Schmitt and F Charpin ldquoPressure distributions on theONERA-M6-wing at transonic mach numbersrdquo in Experi-mental Data Base for Computer Program Assessment andReport of the Fluid Dynamics Panel Working Group 04AGARD Neuilly sur Seine France 1979

[22] P Woodward and P Colella ldquo)e numerical simulation oftwo-dimensional fluid flow with strong shocksrdquo Journal ofComputational Physics vol 54 no 1 pp 115ndash173 1984

[23] G M Amdahl ldquoValidity of the single processor approach toachieving large-scale computing capabilitiesrdquo in Proceedingsof the 1967 spring joint computer conference onmdashAFIPSrsquo67(Spring) vol 30 pp 483ndash485 Atlantic NJ USA April 1967

[24] G Axtmann and U Rist ldquoScalability of OpenFOAM withlarge-leddy simulations and DNS on high-performancesytemsrdquo in High-Performance Computing in Science andEngineering W E Nagel Ed Springer International Pub-lishing Berlin Germany pp 413ndash425 2016

[25] A Shah L Yuan and S Islam ldquoNumerical solution of un-steady Navier-Stokes equations on curvilinear meshesrdquoComputers amp Mathematics with Applications vol 63 no 11pp 1548ndash1556 2012

[26] J Kim and P Moin ldquoApplication of a fractional-step methodto incompressible Navier-Stokes equationsrdquo Journal ofComputational Physics vol 59 no 2 pp 308ndash323 1985

[27] A Quarteroni F Saleri and A Veneziani ldquoFactorizationmethods for the numerical approximation of Navier-Stokesequationsrdquo Computer Methods in Applied Mechanics andEngineering vol 188 no 1ndash3 pp 505ndash526 2000

Scientific Programming 11

Page 7: ScalabilityofOpenFOAMDensity-BasedSolverwithRunge–Kutta ...downloads.hindawi.com/journals/sp/2020/9083620.pdfmethods in OpenFOAM for transient flow calculations. eseinclude,forexample,numericalalgorithmsforDNS

normalizing with respect to the values obtained for 16 and 64ranks ττRanks 16 for 16 ranks and ττRanks 64 for 64ranks )e relative CPU time are recorded in Tables 4 and 5)e comparison of the two solvers is also plotted in Figures 6and 7 )e weak scaling tests give us indication about howwell does the two solvers scale when the problem size isincreased proportional to process count From Tables 4 and5 and Figures 6 and 7 it can be observed that for lower MPItasks (16 to 64) both of the two solvers scale reasonably wellHowever for higher MPI tasks (128 to 1024) the rho-CentralRK4Foam solver scales better Remarkably we canobserve a distinguishable relative time difference betweenthe two solvers as the number of ranks increases from 512 to1024 )e profiling results from TAU show that in the testcase with 1024 ranks rhoCentralRK4Foam spends around

40 time on computation while rhoCentralFoam only hasaround 20 time spent on computation As for the forward-facing step case we were able to conduct the tests at largercores count (up to 4096 cores) and it can be confirmed thatrhoCentralRK4Foam solver has better scaling performancethan rhoCentralFoam solver Indeed it scales better withlarger grid size and the relative time of rhoCentralRK4Foamsolver maintains at 1063 with 4096 cores (the relative time is1148 with 1024 cores in ONERA M6 wing case) Generallythe rhoCentralRK4Foam solver outperforms the rhoCen-tralFoam solver at large-scale ranks due to communicationhiding

rhoCentralFoam communicationrhoCentralRk4Foam communication

0

20

40

60

80

100

Tim

e per

cent

age (

)

600 1200 1800 2400 3000 3600 42000ranks

(a)

rhoCentralFoam communicationrhoCentralRk4Foam communication

0

20

40

60

80

100

Tim

e per

cent

age (

)

600 1200 1800 2400 3000 3600 42000ranks

(b)

Figure 4 Communication (a) and computation (b) time percentage for the strong scaling analysis (M6 test case Grid 3)

128 256 512 1024 20480

2

4

6

8

10

12

14

489

468

356

519

416

114

2

101

6

883 9

52

766

MPI

wai

tall

()

rhoCentralFoamrhoCentralRK4Foam

Figure 5 MPI waitall() time percentage for rhoCentralFoam andrhoCentralRK4Foam for M6 test case (Grid3) up to 2048 ranks

Table 4 Relative time results ττRanks 16 of weak scaling test(ONERA M6 wing)

Ranks rhoCentralFoam rhoCentralRK4Foam16 1 132 1085 106264 1106 1105128 1148 1057256 1191 1100512 1213 11581024 1255 1148

Table 5 Relative time results ττRanks 64 of the weak scaling test(forward-facing step)

Ranks rhoCentralFoam rhoCentralRK4Foam64 1 1128 1034 1002256 1085 1018512 1153 10311024 1186 10412048 1212 10674096 1203 1063

Scientific Programming 7

4 Results for the Time-to-Solution Analysis

)e scaling analysis is an important exercise to evaluate theparallel performance of the code on multiple cores howeverit does not provide insights into the time accuracy of thenumerical algorithm nor into the time-to-solution needed toevolve the discretized system of equations (equations(1)ndash(3)) up to a final state within an acceptable numericalerror For example in the scaling study conducted on theM6transonic wing the number of iterations was fixed for bothsolvers rhoCentralFoam and rhoCentralRK4Foam andcalculations were performed in the inviscid limit so that theimplicit solver used in rhoCentralFoam to integrate the

viscous fluxes was not active In such circumstancescomparing the two solvers essentially amounts at comparingfirst-order (Euler) and fourth-order forward in time Run-gendashKutta algorithms Hence it is not surprising that theCPU time for rhoCentralRK4Foam is about four times largerthan rhoCentralFoam (for the same number of iterations)since it requires four more evaluations of the right-hand-sideof equations (5) per time step For a sound evaluation of thetime-to-solution however we need to compare the twosolvers over the same physical time at which the errors withrespect to an exact solution are comparable Because the timeaccuracy of the solvers is different time step is also differentand so are the number of iterations required to achieve thefinal state

41 Test Case Description For the time-to-solution analysiswe consider the numerical simulation of TaylorndashGreen (TG)vortex a benchmark problem in CFD [25] for validatingunsteady flow solvers [26 27] requiring time accurate in-tegration scheme as the RungendashKutta scheme implementedin rhoCentralRK4Foam )ere is no turbulence model ap-plied in the current simulation )e TG vortex admits an-alytical time-dependent solutions which allows for precisedefinition of the numerical error of the algorithm withrespect to a given quantity of interest )e flow is initializedin a square domain 0lex yleL as follows

ρ(x y t) ρ0 (11)

u(x y 0) minusU0 cos kx sin ky (12)

v(x y 0) U0 sin kx cos ky (13)

p(x y 0) p0 minusρ0U2

04

(cos 2 kx + cos 2 ky) (14)

where u and v are the velocity components along x and ydirections k equiv 2πL is the wave number and U0 ρ0 and p0are the arbitrary constant velocity density and pressurerespectively )e analytical solution for the initial-valueproblem defined by equations (11)ndash(14) is

ρ(x y t) ρ0 (15)

u(x y t) minusU0 cos kx sin kyeminus 2]tk2

(16)

v(x y t) U0 sin kx cos kyeminus 2]tk2

(17)

p(x y t) p0 minusρ0U2

04

(cos 2 kx + cos 2 ky)eminus 4]tk2

(18)

which shows the velocity decay due to viscous dissipationAs a quantity of interest we chose to monitor both thevelocity profiles u at selected location and the overall kineticenergy in the computational box which can be readilyobtained from equations (16)ndash(18) as

ε 1L2 1113946

L

01113946

L

0

u2 + v2

2dxdy

U204

eminus 4]tk2

(19)

13

12

11

10

0 400ranks

Rela

tive t

ime

rhoCentralFoamrhoCentralRK4Foam

800 1200

Figure 6 Relative time ττRanks 16 of weak scaling tests forONERA M6 wing case

0 1000 2000ranks

3000 4000 5000

Rela

tive t

ime

rhoCentralFoamrhoCentralRK4Foam

12

11

10

Figure 7 Relative time ττRanks 64 of weak scaling tests forforward-facing step

8 Scientific Programming

42 Comparison of Time-to-Solution For the analysis ofrhoCentralFoam and rhoCentralRK4Foam solvers we con-sider fixed box size L 2πm and U0 1ms and two valuesof Reynolds number Re LU0v to test the effect of vis-cosity )e computation is done on a 160 times 160 pointsuniform grid )e simulations run up to t 058 s for Re

628 and t 116 s for Re 1256 which corresponds to thesame nondimensional time ttref 00147 with tref equiv L2]At this time the kinetic energy ε decreased to 10 from theinitial value as an illustrative example Figure 8 shows thevelocity magnitude contour for the Re 628 Figure 9presents the computed velocity profile u(y) in the middle

of the box x L2 rhoC stands for rhoCentralFoam solverand rhoCRK4 stands for rhoCentralRK4Foam solver It canbe observed that rhoCentralRK4Foam shows an excellentagreement with the analytical profile for Δt 10minus 4 s (re-ducing the time step further does provide any furtherconvergence for the grid used here) In order to achieve thesame level of accuracy with rhoCentralFoam time step hasto be reduced by a factor of 5 to Δt 2 times 10minus 5 s compared torhoCentralRK4Foam )e same behavior is observed for theevolution of ε as shown in Figure 10

In order to calculate the time-to-solution τ we firstevaluated the CPU time τ0 per time step per grid point per

0081275

016255

024383

4939e ndash 10

3251e ndash 01U magnitude

Y

ZX

Figure 8 Contour of velocity magnitude U equiv (u2 + v2)12 for the TaylorndashGreen vortex at ttref 00147 for the case Re 628

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

ndash030

ndash015

000

015

030

u

157 314 471 628000y

(a)

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

157 314 471 628000y

ndash030

ndash015

000

015

030

u

(b)

Figure 9 Velocity distribution in the TaylorndashGreen vortex problem at ttref 00147 (a)Re 628 (b)Re 1256

Scientific Programming 9

core for both solvers We used an Intel Xeon E5-2695v4processor installed on )eta and Bebop platforms atArgonne National Laboratory and reported the results of thetests in Table 6 Even though τ0 depends on the specificprocessor used but the ratio τ0rhoCentralRK4τ

0rhoCentral does not

and so is a good indication of the relative performance of thetwo solvers In the present case this ratio is consistentlyfound to be 3 for all Reynolds number we made additionaltests with much higher Reynolds up to Re sim 105 to confirmthis (see Table 6) Denoting by tf the physical final time ofthe simulation by Niter the number of iterations and byNpoints the number of grid points the time-to-solution forthe two solvers can be obtained as

τ τ0 times Niter times Npoints withNiter tf

Δt (20)

and is reported in Table 7 It is concluded that to achieve thesame level of accuracy τrhoCentralRK4 is about 15 to 16 timessmaller than τrhoCentral Additionally as we discussed in thescaling analysis rhoCentralRK4Foam can achieve up to123 improvement in scalability over rhoCentralFoamwhen using 4096 ranks )us the reduction in time-to-solution for rhoCentralRK4Foam is expected to be evenlarger for large-scale time-accurate simulations utilizingthousands of parallel cores

5 Conclusion

In this study we presented a new solver called rhoCen-tralRK4Foam for the integration of NavierndashStokes equationsin the CFD package OpenFOAM)e novelty consists in thereplacement of first-order time integration scheme of thenative solver rhoCentralFoam with a third-order Run-gendashKutta scheme which is more suitable for the simulationof time-dependent flows We first analyzed the scalability ofthe two solvers for a benchmark case featuring transonicflow over a wing on a structured mesh and showed thatrhoCentralRK4Foam can achieve a substantial improvement(up to 120) in strong scaling compared to rhoCen-tralFoam We also observed that the scalability becomesbetter as the problem size grows Hence even thoughOpenFOAM scalability is generally poor or decent at bestthe new solver can at least alleviate this deficiency We thenanalyzed the performance of the two solvers in the case ofTaylorndashGreen vortex decay and compared the time-to-so-lution which gives an indication of the workload needed toachieve the same level of accuracy of the solution (ie thesame numerical error) For the problem considered here weobtained a reduction of time-to-solution by a factor of about15 when using rhoCentralRK4Foam compared to rho-CentralFoam due to the use of larger time steps to get to thefinal solution with the same numerical accuracy )is factorwill be eventually even larger for larger-scale simulationsemploying thousands or more cores thanks to the betterspeedup of rhoCentralRK4Foam )e proposed solver ispotentially a useful alternative for the simulation of com-pressible flows requiring time-accurate integration like indirect or large-eddy simulations Furthermore the imple-mentation in OpenFOAM can be easily generalized todifferent schemes of RungendashKutta family with minimal codemodification

Data Availability

)e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

)e authors declare that they have no conflicts of interest

Acknowledgments

)is work was supported by Argonne National Laboratorythrough grant ANL 4J-30361-0030A titled ldquoMultiscale

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

000

005

010

015

020

025

ε

01 02 03 04 05 0600Time (s)

Figure 10 Evolution of total kinetic energy

Table 6 CPU time per time step per grid point per core τ0

τ0rhoCentral τ0rhoCentralRK4 τ0rhoCentralRK4τ0rhoCentral

Re 628 1230eminus 06 s 3590eminus 06 s 2919Re 1256 1120eminus 06 s 3630eminus 06 s 3241Re 1260 1189eminus 06 s 3626eminus 06 s 3046Re 12600 1187eminus 06 s 3603eminus 06 s 3035Re 126000 1191eminus 06 s 3622eminus 06 s 3044

Table 7 Time-to-solution τ for rhoCentralFoam and rhoCentral-RK4Foam

τrhoCentral τrhoCentralRK4 τrhoCentralτrhoCentralRK4Δt 2 times 10minus 5 s Δt 10minus 4 s

Re 628 848 s 532 s 159Re 1256 826 s 539 s 153

10 Scientific Programming

modeling of complex flowsrdquo Numerical simulations wereperformed using the Director Discretionary allocationldquoOF_SCALINGrdquo in the Leadership Computing Facility atArgonne National Laboratory

References

[1] H G Weller G Tabor H Jasak and C Fureby ldquoA tensorialapproach to computational continuum mechanics usingobject-oriented techniquesrdquo Computers in Physics vol 12no 6 pp 620ndash631 1998

[2] V Vuorinen J-P Keskinen C Duwig and B J BoersmaldquoOn the implementation of low-dissipative Runge-Kuttaprojection methods for time dependent flows using Open-FOAMrdquo Computers amp Fluids vol 93 pp 153ndash163 2014

[3] C Shen F Sun and X Xia ldquoImplementation of density-basedsolver for all speeds in the framework of openFOAMrdquoComputer Physics Communications vol 185 no 10pp 2730ndash2741 2014

[4] D Modesti and S Pirozzoli ldquoA low-dissipative solver forturbulent compressible flows on unstructured meshes withOpenFOAM implementationrdquo Computers amp Fluids vol 152pp 14ndash23 2017

[5] S Li and R Paoli ldquoModeling of ice accretion over aircraftwings using a compressible OpenFOAM solverrdquo Interna-tional Journal of Aerospace Engineering vol 2019 Article ID4864927 11 pages 2019

[6] Q Yang P Zhao and H Ge ldquoReactingFoam-SCI an opensource CFD platform for reacting flow simulationrdquo Com-puters amp Fluids vol 190 pp 114ndash127 2019

[7] M Culpo Current Bottlenecks in the Scalability of OpenFOAMon Massively Parallel Clusters Partnership for AdvancedComputing in Europe Brussels Belgium 2012

[8] O Rivera and K Furlinger ldquoParallel aspects of OpenFOAMwith large eddy simulationsrdquo in Proceedings of the 2011 IEEEInternational Conference on High Performance Computingand Communications pp 389ndash396 Banff Canada September2011

[9] A Duran M S Celebi S Piskin and M Tuncel ldquoScalabilityof OpenFOAM for bio-medical flow simulationsrdquoCe Journalof Supercomputing vol 71 no 3 pp 938ndash951 2015

[10] Z Lin W Yang H Zhou et al ldquoCommunication optimi-zation for multiphase flow solver in the library of Open-FOAMrdquo Water vol 10 no 10 pp 1461ndash1529 2018

[11] R Ojha P Pawar S Gupta M Klemm and M NambiarldquoPerformance optimization of OpenFOAM on clusters ofIntel Xeon Phitrade processorsrdquo in Proceedings of the 2017 IEEE24th International Conference on High Performance Com-puting Workshops (HiPCW) Jaipur India December 2017

[12] C J Greenshields H G Weller L Gasparini and J M ReeseldquoImplementation of semi-discrete non-staggered centralschemes in a colocated polyhedral finite volume frameworkfor high-speed viscous flowsrdquo International Journal for Nu-merical Methods in Fluids vol 63 pp 1ndash21 2010

[13] A Kurganov and E Tadmor ldquoNew high-resolution centralschemes for nonlinear conservation laws and convection-diffusion equationsrdquo Journal of Computational Physicsvol 160 no 1 pp 241ndash282 2000

[14] A Kurganov S Noelle and G Petrova ldquoSemidiscrete central-upwind schemes for hyperbolic conservation laws andHamiltonndashJacobi equationsrdquo SIAM Journal on ScientificComputing vol 23 no 3 pp 707ndash740 2006

[15] J A Heyns O F Oxtoby and A Steenkamp ldquoModellinghigh-speed viscous flow in OpenFOAMrdquo in Proceedings of the9th OpenFOAM Workshop Zagreb Croatia June 2014

[16] M S Liou and C J Steffen Jr ldquoA new flux splitting schemerdquoJournal of Computational Physics vol 107 no 23 p 39 1993

[17] D Drikakis M Hahn A Mosedale and B )ornber ldquoLargeeddy simulation using high-resolution and high-ordermethodsrdquo Philosophical Transactions of the Royal Society AMathematical Physical and Engineering Sciences vol 367no 1899 pp 2985ndash2997 2019

[18] K Harms T Leggett B Allen et al ldquo)eta rapid installationand acceptance of an XC40 KNL systemrdquo Concurrency andComputation Practice and Experience vol 30 no 1 p 112019

[19] S S Shende and A D Malony ldquo)e TAU parallel perfor-mance systemrdquo Ce International Journal of High Perfor-mance Computing Applications vol 20 no 2 pp 287ndash3112006

[20] C Chevalier and F Pellegrini ldquoPT-Scotch a tool for efficientparallel graph orderingrdquo Parallel Computing vol 34 no 6ndash8pp 318ndash331 2008

[21] V Schmitt and F Charpin ldquoPressure distributions on theONERA-M6-wing at transonic mach numbersrdquo in Experi-mental Data Base for Computer Program Assessment andReport of the Fluid Dynamics Panel Working Group 04AGARD Neuilly sur Seine France 1979

[22] P Woodward and P Colella ldquo)e numerical simulation oftwo-dimensional fluid flow with strong shocksrdquo Journal ofComputational Physics vol 54 no 1 pp 115ndash173 1984

[23] G M Amdahl ldquoValidity of the single processor approach toachieving large-scale computing capabilitiesrdquo in Proceedingsof the 1967 spring joint computer conference onmdashAFIPSrsquo67(Spring) vol 30 pp 483ndash485 Atlantic NJ USA April 1967

[24] G Axtmann and U Rist ldquoScalability of OpenFOAM withlarge-leddy simulations and DNS on high-performancesytemsrdquo in High-Performance Computing in Science andEngineering W E Nagel Ed Springer International Pub-lishing Berlin Germany pp 413ndash425 2016

[25] A Shah L Yuan and S Islam ldquoNumerical solution of un-steady Navier-Stokes equations on curvilinear meshesrdquoComputers amp Mathematics with Applications vol 63 no 11pp 1548ndash1556 2012

[26] J Kim and P Moin ldquoApplication of a fractional-step methodto incompressible Navier-Stokes equationsrdquo Journal ofComputational Physics vol 59 no 2 pp 308ndash323 1985

[27] A Quarteroni F Saleri and A Veneziani ldquoFactorizationmethods for the numerical approximation of Navier-Stokesequationsrdquo Computer Methods in Applied Mechanics andEngineering vol 188 no 1ndash3 pp 505ndash526 2000

Scientific Programming 11

Page 8: ScalabilityofOpenFOAMDensity-BasedSolverwithRunge–Kutta ...downloads.hindawi.com/journals/sp/2020/9083620.pdfmethods in OpenFOAM for transient flow calculations. eseinclude,forexample,numericalalgorithmsforDNS

4 Results for the Time-to-Solution Analysis

)e scaling analysis is an important exercise to evaluate theparallel performance of the code on multiple cores howeverit does not provide insights into the time accuracy of thenumerical algorithm nor into the time-to-solution needed toevolve the discretized system of equations (equations(1)ndash(3)) up to a final state within an acceptable numericalerror For example in the scaling study conducted on theM6transonic wing the number of iterations was fixed for bothsolvers rhoCentralFoam and rhoCentralRK4Foam andcalculations were performed in the inviscid limit so that theimplicit solver used in rhoCentralFoam to integrate the

viscous fluxes was not active In such circumstancescomparing the two solvers essentially amounts at comparingfirst-order (Euler) and fourth-order forward in time Run-gendashKutta algorithms Hence it is not surprising that theCPU time for rhoCentralRK4Foam is about four times largerthan rhoCentralFoam (for the same number of iterations)since it requires four more evaluations of the right-hand-sideof equations (5) per time step For a sound evaluation of thetime-to-solution however we need to compare the twosolvers over the same physical time at which the errors withrespect to an exact solution are comparable Because the timeaccuracy of the solvers is different time step is also differentand so are the number of iterations required to achieve thefinal state

41 Test Case Description For the time-to-solution analysiswe consider the numerical simulation of TaylorndashGreen (TG)vortex a benchmark problem in CFD [25] for validatingunsteady flow solvers [26 27] requiring time accurate in-tegration scheme as the RungendashKutta scheme implementedin rhoCentralRK4Foam )ere is no turbulence model ap-plied in the current simulation )e TG vortex admits an-alytical time-dependent solutions which allows for precisedefinition of the numerical error of the algorithm withrespect to a given quantity of interest )e flow is initializedin a square domain 0lex yleL as follows

ρ(x y t) ρ0 (11)

u(x y 0) minusU0 cos kx sin ky (12)

v(x y 0) U0 sin kx cos ky (13)

p(x y 0) p0 minusρ0U2

04

(cos 2 kx + cos 2 ky) (14)

where u and v are the velocity components along x and ydirections k equiv 2πL is the wave number and U0 ρ0 and p0are the arbitrary constant velocity density and pressurerespectively )e analytical solution for the initial-valueproblem defined by equations (11)ndash(14) is

ρ(x y t) ρ0 (15)

u(x y t) minusU0 cos kx sin kyeminus 2]tk2

(16)

v(x y t) U0 sin kx cos kyeminus 2]tk2

(17)

p(x y t) p0 minusρ0U2

04

(cos 2 kx + cos 2 ky)eminus 4]tk2

(18)

which shows the velocity decay due to viscous dissipationAs a quantity of interest we chose to monitor both thevelocity profiles u at selected location and the overall kineticenergy in the computational box which can be readilyobtained from equations (16)ndash(18) as

ε 1L2 1113946

L

01113946

L

0

u2 + v2

2dxdy

U204

eminus 4]tk2

(19)

13

12

11

10

0 400ranks

Rela

tive t

ime

rhoCentralFoamrhoCentralRK4Foam

800 1200

Figure 6 Relative time ττRanks 16 of weak scaling tests forONERA M6 wing case

0 1000 2000ranks

3000 4000 5000

Rela

tive t

ime

rhoCentralFoamrhoCentralRK4Foam

12

11

10

Figure 7 Relative time ττRanks 64 of weak scaling tests forforward-facing step

8 Scientific Programming

42 Comparison of Time-to-Solution For the analysis ofrhoCentralFoam and rhoCentralRK4Foam solvers we con-sider fixed box size L 2πm and U0 1ms and two valuesof Reynolds number Re LU0v to test the effect of vis-cosity )e computation is done on a 160 times 160 pointsuniform grid )e simulations run up to t 058 s for Re

628 and t 116 s for Re 1256 which corresponds to thesame nondimensional time ttref 00147 with tref equiv L2]At this time the kinetic energy ε decreased to 10 from theinitial value as an illustrative example Figure 8 shows thevelocity magnitude contour for the Re 628 Figure 9presents the computed velocity profile u(y) in the middle

of the box x L2 rhoC stands for rhoCentralFoam solverand rhoCRK4 stands for rhoCentralRK4Foam solver It canbe observed that rhoCentralRK4Foam shows an excellentagreement with the analytical profile for Δt 10minus 4 s (re-ducing the time step further does provide any furtherconvergence for the grid used here) In order to achieve thesame level of accuracy with rhoCentralFoam time step hasto be reduced by a factor of 5 to Δt 2 times 10minus 5 s compared torhoCentralRK4Foam )e same behavior is observed for theevolution of ε as shown in Figure 10

In order to calculate the time-to-solution τ we firstevaluated the CPU time τ0 per time step per grid point per

0081275

016255

024383

4939e ndash 10

3251e ndash 01U magnitude

Y

ZX

Figure 8 Contour of velocity magnitude U equiv (u2 + v2)12 for the TaylorndashGreen vortex at ttref 00147 for the case Re 628

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

ndash030

ndash015

000

015

030

u

157 314 471 628000y

(a)

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

157 314 471 628000y

ndash030

ndash015

000

015

030

u

(b)

Figure 9 Velocity distribution in the TaylorndashGreen vortex problem at ttref 00147 (a)Re 628 (b)Re 1256

Scientific Programming 9

core for both solvers We used an Intel Xeon E5-2695v4processor installed on )eta and Bebop platforms atArgonne National Laboratory and reported the results of thetests in Table 6 Even though τ0 depends on the specificprocessor used but the ratio τ0rhoCentralRK4τ

0rhoCentral does not

and so is a good indication of the relative performance of thetwo solvers In the present case this ratio is consistentlyfound to be 3 for all Reynolds number we made additionaltests with much higher Reynolds up to Re sim 105 to confirmthis (see Table 6) Denoting by tf the physical final time ofthe simulation by Niter the number of iterations and byNpoints the number of grid points the time-to-solution forthe two solvers can be obtained as

τ τ0 times Niter times Npoints withNiter tf

Δt (20)

and is reported in Table 7 It is concluded that to achieve thesame level of accuracy τrhoCentralRK4 is about 15 to 16 timessmaller than τrhoCentral Additionally as we discussed in thescaling analysis rhoCentralRK4Foam can achieve up to123 improvement in scalability over rhoCentralFoamwhen using 4096 ranks )us the reduction in time-to-solution for rhoCentralRK4Foam is expected to be evenlarger for large-scale time-accurate simulations utilizingthousands of parallel cores

5 Conclusion

In this study we presented a new solver called rhoCen-tralRK4Foam for the integration of NavierndashStokes equationsin the CFD package OpenFOAM)e novelty consists in thereplacement of first-order time integration scheme of thenative solver rhoCentralFoam with a third-order Run-gendashKutta scheme which is more suitable for the simulationof time-dependent flows We first analyzed the scalability ofthe two solvers for a benchmark case featuring transonicflow over a wing on a structured mesh and showed thatrhoCentralRK4Foam can achieve a substantial improvement(up to 120) in strong scaling compared to rhoCen-tralFoam We also observed that the scalability becomesbetter as the problem size grows Hence even thoughOpenFOAM scalability is generally poor or decent at bestthe new solver can at least alleviate this deficiency We thenanalyzed the performance of the two solvers in the case ofTaylorndashGreen vortex decay and compared the time-to-so-lution which gives an indication of the workload needed toachieve the same level of accuracy of the solution (ie thesame numerical error) For the problem considered here weobtained a reduction of time-to-solution by a factor of about15 when using rhoCentralRK4Foam compared to rho-CentralFoam due to the use of larger time steps to get to thefinal solution with the same numerical accuracy )is factorwill be eventually even larger for larger-scale simulationsemploying thousands or more cores thanks to the betterspeedup of rhoCentralRK4Foam )e proposed solver ispotentially a useful alternative for the simulation of com-pressible flows requiring time-accurate integration like indirect or large-eddy simulations Furthermore the imple-mentation in OpenFOAM can be easily generalized todifferent schemes of RungendashKutta family with minimal codemodification

Data Availability

)e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

)e authors declare that they have no conflicts of interest

Acknowledgments

)is work was supported by Argonne National Laboratorythrough grant ANL 4J-30361-0030A titled ldquoMultiscale

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

000

005

010

015

020

025

ε

01 02 03 04 05 0600Time (s)

Figure 10 Evolution of total kinetic energy

Table 6 CPU time per time step per grid point per core τ0

τ0rhoCentral τ0rhoCentralRK4 τ0rhoCentralRK4τ0rhoCentral

Re 628 1230eminus 06 s 3590eminus 06 s 2919Re 1256 1120eminus 06 s 3630eminus 06 s 3241Re 1260 1189eminus 06 s 3626eminus 06 s 3046Re 12600 1187eminus 06 s 3603eminus 06 s 3035Re 126000 1191eminus 06 s 3622eminus 06 s 3044

Table 7 Time-to-solution τ for rhoCentralFoam and rhoCentral-RK4Foam

τrhoCentral τrhoCentralRK4 τrhoCentralτrhoCentralRK4Δt 2 times 10minus 5 s Δt 10minus 4 s

Re 628 848 s 532 s 159Re 1256 826 s 539 s 153

10 Scientific Programming

modeling of complex flowsrdquo Numerical simulations wereperformed using the Director Discretionary allocationldquoOF_SCALINGrdquo in the Leadership Computing Facility atArgonne National Laboratory

References

[1] H G Weller G Tabor H Jasak and C Fureby ldquoA tensorialapproach to computational continuum mechanics usingobject-oriented techniquesrdquo Computers in Physics vol 12no 6 pp 620ndash631 1998

[2] V Vuorinen J-P Keskinen C Duwig and B J BoersmaldquoOn the implementation of low-dissipative Runge-Kuttaprojection methods for time dependent flows using Open-FOAMrdquo Computers amp Fluids vol 93 pp 153ndash163 2014

[3] C Shen F Sun and X Xia ldquoImplementation of density-basedsolver for all speeds in the framework of openFOAMrdquoComputer Physics Communications vol 185 no 10pp 2730ndash2741 2014

[4] D Modesti and S Pirozzoli ldquoA low-dissipative solver forturbulent compressible flows on unstructured meshes withOpenFOAM implementationrdquo Computers amp Fluids vol 152pp 14ndash23 2017

[5] S Li and R Paoli ldquoModeling of ice accretion over aircraftwings using a compressible OpenFOAM solverrdquo Interna-tional Journal of Aerospace Engineering vol 2019 Article ID4864927 11 pages 2019

[6] Q Yang P Zhao and H Ge ldquoReactingFoam-SCI an opensource CFD platform for reacting flow simulationrdquo Com-puters amp Fluids vol 190 pp 114ndash127 2019

[7] M Culpo Current Bottlenecks in the Scalability of OpenFOAMon Massively Parallel Clusters Partnership for AdvancedComputing in Europe Brussels Belgium 2012

[8] O Rivera and K Furlinger ldquoParallel aspects of OpenFOAMwith large eddy simulationsrdquo in Proceedings of the 2011 IEEEInternational Conference on High Performance Computingand Communications pp 389ndash396 Banff Canada September2011

[9] A Duran M S Celebi S Piskin and M Tuncel ldquoScalabilityof OpenFOAM for bio-medical flow simulationsrdquoCe Journalof Supercomputing vol 71 no 3 pp 938ndash951 2015

[10] Z Lin W Yang H Zhou et al ldquoCommunication optimi-zation for multiphase flow solver in the library of Open-FOAMrdquo Water vol 10 no 10 pp 1461ndash1529 2018

[11] R Ojha P Pawar S Gupta M Klemm and M NambiarldquoPerformance optimization of OpenFOAM on clusters ofIntel Xeon Phitrade processorsrdquo in Proceedings of the 2017 IEEE24th International Conference on High Performance Com-puting Workshops (HiPCW) Jaipur India December 2017

[12] C J Greenshields H G Weller L Gasparini and J M ReeseldquoImplementation of semi-discrete non-staggered centralschemes in a colocated polyhedral finite volume frameworkfor high-speed viscous flowsrdquo International Journal for Nu-merical Methods in Fluids vol 63 pp 1ndash21 2010

[13] A Kurganov and E Tadmor ldquoNew high-resolution centralschemes for nonlinear conservation laws and convection-diffusion equationsrdquo Journal of Computational Physicsvol 160 no 1 pp 241ndash282 2000

[14] A Kurganov S Noelle and G Petrova ldquoSemidiscrete central-upwind schemes for hyperbolic conservation laws andHamiltonndashJacobi equationsrdquo SIAM Journal on ScientificComputing vol 23 no 3 pp 707ndash740 2006

[15] J A Heyns O F Oxtoby and A Steenkamp ldquoModellinghigh-speed viscous flow in OpenFOAMrdquo in Proceedings of the9th OpenFOAM Workshop Zagreb Croatia June 2014

[16] M S Liou and C J Steffen Jr ldquoA new flux splitting schemerdquoJournal of Computational Physics vol 107 no 23 p 39 1993

[17] D Drikakis M Hahn A Mosedale and B )ornber ldquoLargeeddy simulation using high-resolution and high-ordermethodsrdquo Philosophical Transactions of the Royal Society AMathematical Physical and Engineering Sciences vol 367no 1899 pp 2985ndash2997 2019

[18] K Harms T Leggett B Allen et al ldquo)eta rapid installationand acceptance of an XC40 KNL systemrdquo Concurrency andComputation Practice and Experience vol 30 no 1 p 112019

[19] S S Shende and A D Malony ldquo)e TAU parallel perfor-mance systemrdquo Ce International Journal of High Perfor-mance Computing Applications vol 20 no 2 pp 287ndash3112006

[20] C Chevalier and F Pellegrini ldquoPT-Scotch a tool for efficientparallel graph orderingrdquo Parallel Computing vol 34 no 6ndash8pp 318ndash331 2008

[21] V Schmitt and F Charpin ldquoPressure distributions on theONERA-M6-wing at transonic mach numbersrdquo in Experi-mental Data Base for Computer Program Assessment andReport of the Fluid Dynamics Panel Working Group 04AGARD Neuilly sur Seine France 1979

[22] P Woodward and P Colella ldquo)e numerical simulation oftwo-dimensional fluid flow with strong shocksrdquo Journal ofComputational Physics vol 54 no 1 pp 115ndash173 1984

[23] G M Amdahl ldquoValidity of the single processor approach toachieving large-scale computing capabilitiesrdquo in Proceedingsof the 1967 spring joint computer conference onmdashAFIPSrsquo67(Spring) vol 30 pp 483ndash485 Atlantic NJ USA April 1967

[24] G Axtmann and U Rist ldquoScalability of OpenFOAM withlarge-leddy simulations and DNS on high-performancesytemsrdquo in High-Performance Computing in Science andEngineering W E Nagel Ed Springer International Pub-lishing Berlin Germany pp 413ndash425 2016

[25] A Shah L Yuan and S Islam ldquoNumerical solution of un-steady Navier-Stokes equations on curvilinear meshesrdquoComputers amp Mathematics with Applications vol 63 no 11pp 1548ndash1556 2012

[26] J Kim and P Moin ldquoApplication of a fractional-step methodto incompressible Navier-Stokes equationsrdquo Journal ofComputational Physics vol 59 no 2 pp 308ndash323 1985

[27] A Quarteroni F Saleri and A Veneziani ldquoFactorizationmethods for the numerical approximation of Navier-Stokesequationsrdquo Computer Methods in Applied Mechanics andEngineering vol 188 no 1ndash3 pp 505ndash526 2000

Scientific Programming 11

Page 9: ScalabilityofOpenFOAMDensity-BasedSolverwithRunge–Kutta ...downloads.hindawi.com/journals/sp/2020/9083620.pdfmethods in OpenFOAM for transient flow calculations. eseinclude,forexample,numericalalgorithmsforDNS

42 Comparison of Time-to-Solution For the analysis ofrhoCentralFoam and rhoCentralRK4Foam solvers we con-sider fixed box size L 2πm and U0 1ms and two valuesof Reynolds number Re LU0v to test the effect of vis-cosity )e computation is done on a 160 times 160 pointsuniform grid )e simulations run up to t 058 s for Re

628 and t 116 s for Re 1256 which corresponds to thesame nondimensional time ttref 00147 with tref equiv L2]At this time the kinetic energy ε decreased to 10 from theinitial value as an illustrative example Figure 8 shows thevelocity magnitude contour for the Re 628 Figure 9presents the computed velocity profile u(y) in the middle

of the box x L2 rhoC stands for rhoCentralFoam solverand rhoCRK4 stands for rhoCentralRK4Foam solver It canbe observed that rhoCentralRK4Foam shows an excellentagreement with the analytical profile for Δt 10minus 4 s (re-ducing the time step further does provide any furtherconvergence for the grid used here) In order to achieve thesame level of accuracy with rhoCentralFoam time step hasto be reduced by a factor of 5 to Δt 2 times 10minus 5 s compared torhoCentralRK4Foam )e same behavior is observed for theevolution of ε as shown in Figure 10

In order to calculate the time-to-solution τ we firstevaluated the CPU time τ0 per time step per grid point per

0081275

016255

024383

4939e ndash 10

3251e ndash 01U magnitude

Y

ZX

Figure 8 Contour of velocity magnitude U equiv (u2 + v2)12 for the TaylorndashGreen vortex at ttref 00147 for the case Re 628

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

ndash030

ndash015

000

015

030

u

157 314 471 628000y

(a)

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

157 314 471 628000y

ndash030

ndash015

000

015

030

u

(b)

Figure 9 Velocity distribution in the TaylorndashGreen vortex problem at ttref 00147 (a)Re 628 (b)Re 1256

Scientific Programming 9

core for both solvers We used an Intel Xeon E5-2695v4processor installed on )eta and Bebop platforms atArgonne National Laboratory and reported the results of thetests in Table 6 Even though τ0 depends on the specificprocessor used but the ratio τ0rhoCentralRK4τ

0rhoCentral does not

and so is a good indication of the relative performance of thetwo solvers In the present case this ratio is consistentlyfound to be 3 for all Reynolds number we made additionaltests with much higher Reynolds up to Re sim 105 to confirmthis (see Table 6) Denoting by tf the physical final time ofthe simulation by Niter the number of iterations and byNpoints the number of grid points the time-to-solution forthe two solvers can be obtained as

τ τ0 times Niter times Npoints withNiter tf

Δt (20)

and is reported in Table 7 It is concluded that to achieve thesame level of accuracy τrhoCentralRK4 is about 15 to 16 timessmaller than τrhoCentral Additionally as we discussed in thescaling analysis rhoCentralRK4Foam can achieve up to123 improvement in scalability over rhoCentralFoamwhen using 4096 ranks )us the reduction in time-to-solution for rhoCentralRK4Foam is expected to be evenlarger for large-scale time-accurate simulations utilizingthousands of parallel cores

5 Conclusion

In this study we presented a new solver called rhoCen-tralRK4Foam for the integration of NavierndashStokes equationsin the CFD package OpenFOAM)e novelty consists in thereplacement of first-order time integration scheme of thenative solver rhoCentralFoam with a third-order Run-gendashKutta scheme which is more suitable for the simulationof time-dependent flows We first analyzed the scalability ofthe two solvers for a benchmark case featuring transonicflow over a wing on a structured mesh and showed thatrhoCentralRK4Foam can achieve a substantial improvement(up to 120) in strong scaling compared to rhoCen-tralFoam We also observed that the scalability becomesbetter as the problem size grows Hence even thoughOpenFOAM scalability is generally poor or decent at bestthe new solver can at least alleviate this deficiency We thenanalyzed the performance of the two solvers in the case ofTaylorndashGreen vortex decay and compared the time-to-so-lution which gives an indication of the workload needed toachieve the same level of accuracy of the solution (ie thesame numerical error) For the problem considered here weobtained a reduction of time-to-solution by a factor of about15 when using rhoCentralRK4Foam compared to rho-CentralFoam due to the use of larger time steps to get to thefinal solution with the same numerical accuracy )is factorwill be eventually even larger for larger-scale simulationsemploying thousands or more cores thanks to the betterspeedup of rhoCentralRK4Foam )e proposed solver ispotentially a useful alternative for the simulation of com-pressible flows requiring time-accurate integration like indirect or large-eddy simulations Furthermore the imple-mentation in OpenFOAM can be easily generalized todifferent schemes of RungendashKutta family with minimal codemodification

Data Availability

)e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

)e authors declare that they have no conflicts of interest

Acknowledgments

)is work was supported by Argonne National Laboratorythrough grant ANL 4J-30361-0030A titled ldquoMultiscale

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

000

005

010

015

020

025

ε

01 02 03 04 05 0600Time (s)

Figure 10 Evolution of total kinetic energy

Table 6 CPU time per time step per grid point per core τ0

τ0rhoCentral τ0rhoCentralRK4 τ0rhoCentralRK4τ0rhoCentral

Re 628 1230eminus 06 s 3590eminus 06 s 2919Re 1256 1120eminus 06 s 3630eminus 06 s 3241Re 1260 1189eminus 06 s 3626eminus 06 s 3046Re 12600 1187eminus 06 s 3603eminus 06 s 3035Re 126000 1191eminus 06 s 3622eminus 06 s 3044

Table 7 Time-to-solution τ for rhoCentralFoam and rhoCentral-RK4Foam

τrhoCentral τrhoCentralRK4 τrhoCentralτrhoCentralRK4Δt 2 times 10minus 5 s Δt 10minus 4 s

Re 628 848 s 532 s 159Re 1256 826 s 539 s 153

10 Scientific Programming

modeling of complex flowsrdquo Numerical simulations wereperformed using the Director Discretionary allocationldquoOF_SCALINGrdquo in the Leadership Computing Facility atArgonne National Laboratory

References

[1] H G Weller G Tabor H Jasak and C Fureby ldquoA tensorialapproach to computational continuum mechanics usingobject-oriented techniquesrdquo Computers in Physics vol 12no 6 pp 620ndash631 1998

[2] V Vuorinen J-P Keskinen C Duwig and B J BoersmaldquoOn the implementation of low-dissipative Runge-Kuttaprojection methods for time dependent flows using Open-FOAMrdquo Computers amp Fluids vol 93 pp 153ndash163 2014

[3] C Shen F Sun and X Xia ldquoImplementation of density-basedsolver for all speeds in the framework of openFOAMrdquoComputer Physics Communications vol 185 no 10pp 2730ndash2741 2014

[4] D Modesti and S Pirozzoli ldquoA low-dissipative solver forturbulent compressible flows on unstructured meshes withOpenFOAM implementationrdquo Computers amp Fluids vol 152pp 14ndash23 2017

[5] S Li and R Paoli ldquoModeling of ice accretion over aircraftwings using a compressible OpenFOAM solverrdquo Interna-tional Journal of Aerospace Engineering vol 2019 Article ID4864927 11 pages 2019

[6] Q Yang P Zhao and H Ge ldquoReactingFoam-SCI an opensource CFD platform for reacting flow simulationrdquo Com-puters amp Fluids vol 190 pp 114ndash127 2019

[7] M Culpo Current Bottlenecks in the Scalability of OpenFOAMon Massively Parallel Clusters Partnership for AdvancedComputing in Europe Brussels Belgium 2012

[8] O Rivera and K Furlinger ldquoParallel aspects of OpenFOAMwith large eddy simulationsrdquo in Proceedings of the 2011 IEEEInternational Conference on High Performance Computingand Communications pp 389ndash396 Banff Canada September2011

[9] A Duran M S Celebi S Piskin and M Tuncel ldquoScalabilityof OpenFOAM for bio-medical flow simulationsrdquoCe Journalof Supercomputing vol 71 no 3 pp 938ndash951 2015

[10] Z Lin W Yang H Zhou et al ldquoCommunication optimi-zation for multiphase flow solver in the library of Open-FOAMrdquo Water vol 10 no 10 pp 1461ndash1529 2018

[11] R Ojha P Pawar S Gupta M Klemm and M NambiarldquoPerformance optimization of OpenFOAM on clusters ofIntel Xeon Phitrade processorsrdquo in Proceedings of the 2017 IEEE24th International Conference on High Performance Com-puting Workshops (HiPCW) Jaipur India December 2017

[12] C J Greenshields H G Weller L Gasparini and J M ReeseldquoImplementation of semi-discrete non-staggered centralschemes in a colocated polyhedral finite volume frameworkfor high-speed viscous flowsrdquo International Journal for Nu-merical Methods in Fluids vol 63 pp 1ndash21 2010

[13] A Kurganov and E Tadmor ldquoNew high-resolution centralschemes for nonlinear conservation laws and convection-diffusion equationsrdquo Journal of Computational Physicsvol 160 no 1 pp 241ndash282 2000

[14] A Kurganov S Noelle and G Petrova ldquoSemidiscrete central-upwind schemes for hyperbolic conservation laws andHamiltonndashJacobi equationsrdquo SIAM Journal on ScientificComputing vol 23 no 3 pp 707ndash740 2006

[15] J A Heyns O F Oxtoby and A Steenkamp ldquoModellinghigh-speed viscous flow in OpenFOAMrdquo in Proceedings of the9th OpenFOAM Workshop Zagreb Croatia June 2014

[16] M S Liou and C J Steffen Jr ldquoA new flux splitting schemerdquoJournal of Computational Physics vol 107 no 23 p 39 1993

[17] D Drikakis M Hahn A Mosedale and B )ornber ldquoLargeeddy simulation using high-resolution and high-ordermethodsrdquo Philosophical Transactions of the Royal Society AMathematical Physical and Engineering Sciences vol 367no 1899 pp 2985ndash2997 2019

[18] K Harms T Leggett B Allen et al ldquo)eta rapid installationand acceptance of an XC40 KNL systemrdquo Concurrency andComputation Practice and Experience vol 30 no 1 p 112019

[19] S S Shende and A D Malony ldquo)e TAU parallel perfor-mance systemrdquo Ce International Journal of High Perfor-mance Computing Applications vol 20 no 2 pp 287ndash3112006

[20] C Chevalier and F Pellegrini ldquoPT-Scotch a tool for efficientparallel graph orderingrdquo Parallel Computing vol 34 no 6ndash8pp 318ndash331 2008

[21] V Schmitt and F Charpin ldquoPressure distributions on theONERA-M6-wing at transonic mach numbersrdquo in Experi-mental Data Base for Computer Program Assessment andReport of the Fluid Dynamics Panel Working Group 04AGARD Neuilly sur Seine France 1979

[22] P Woodward and P Colella ldquo)e numerical simulation oftwo-dimensional fluid flow with strong shocksrdquo Journal ofComputational Physics vol 54 no 1 pp 115ndash173 1984

[23] G M Amdahl ldquoValidity of the single processor approach toachieving large-scale computing capabilitiesrdquo in Proceedingsof the 1967 spring joint computer conference onmdashAFIPSrsquo67(Spring) vol 30 pp 483ndash485 Atlantic NJ USA April 1967

[24] G Axtmann and U Rist ldquoScalability of OpenFOAM withlarge-leddy simulations and DNS on high-performancesytemsrdquo in High-Performance Computing in Science andEngineering W E Nagel Ed Springer International Pub-lishing Berlin Germany pp 413ndash425 2016

[25] A Shah L Yuan and S Islam ldquoNumerical solution of un-steady Navier-Stokes equations on curvilinear meshesrdquoComputers amp Mathematics with Applications vol 63 no 11pp 1548ndash1556 2012

[26] J Kim and P Moin ldquoApplication of a fractional-step methodto incompressible Navier-Stokes equationsrdquo Journal ofComputational Physics vol 59 no 2 pp 308ndash323 1985

[27] A Quarteroni F Saleri and A Veneziani ldquoFactorizationmethods for the numerical approximation of Navier-Stokesequationsrdquo Computer Methods in Applied Mechanics andEngineering vol 188 no 1ndash3 pp 505ndash526 2000

Scientific Programming 11

Page 10: ScalabilityofOpenFOAMDensity-BasedSolverwithRunge–Kutta ...downloads.hindawi.com/journals/sp/2020/9083620.pdfmethods in OpenFOAM for transient flow calculations. eseinclude,forexample,numericalalgorithmsforDNS

core for both solvers We used an Intel Xeon E5-2695v4processor installed on )eta and Bebop platforms atArgonne National Laboratory and reported the results of thetests in Table 6 Even though τ0 depends on the specificprocessor used but the ratio τ0rhoCentralRK4τ

0rhoCentral does not

and so is a good indication of the relative performance of thetwo solvers In the present case this ratio is consistentlyfound to be 3 for all Reynolds number we made additionaltests with much higher Reynolds up to Re sim 105 to confirmthis (see Table 6) Denoting by tf the physical final time ofthe simulation by Niter the number of iterations and byNpoints the number of grid points the time-to-solution forthe two solvers can be obtained as

τ τ0 times Niter times Npoints withNiter tf

Δt (20)

and is reported in Table 7 It is concluded that to achieve thesame level of accuracy τrhoCentralRK4 is about 15 to 16 timessmaller than τrhoCentral Additionally as we discussed in thescaling analysis rhoCentralRK4Foam can achieve up to123 improvement in scalability over rhoCentralFoamwhen using 4096 ranks )us the reduction in time-to-solution for rhoCentralRK4Foam is expected to be evenlarger for large-scale time-accurate simulations utilizingthousands of parallel cores

5 Conclusion

In this study we presented a new solver called rhoCen-tralRK4Foam for the integration of NavierndashStokes equationsin the CFD package OpenFOAM)e novelty consists in thereplacement of first-order time integration scheme of thenative solver rhoCentralFoam with a third-order Run-gendashKutta scheme which is more suitable for the simulationof time-dependent flows We first analyzed the scalability ofthe two solvers for a benchmark case featuring transonicflow over a wing on a structured mesh and showed thatrhoCentralRK4Foam can achieve a substantial improvement(up to 120) in strong scaling compared to rhoCen-tralFoam We also observed that the scalability becomesbetter as the problem size grows Hence even thoughOpenFOAM scalability is generally poor or decent at bestthe new solver can at least alleviate this deficiency We thenanalyzed the performance of the two solvers in the case ofTaylorndashGreen vortex decay and compared the time-to-so-lution which gives an indication of the workload needed toachieve the same level of accuracy of the solution (ie thesame numerical error) For the problem considered here weobtained a reduction of time-to-solution by a factor of about15 when using rhoCentralRK4Foam compared to rho-CentralFoam due to the use of larger time steps to get to thefinal solution with the same numerical accuracy )is factorwill be eventually even larger for larger-scale simulationsemploying thousands or more cores thanks to the betterspeedup of rhoCentralRK4Foam )e proposed solver ispotentially a useful alternative for the simulation of com-pressible flows requiring time-accurate integration like indirect or large-eddy simulations Furthermore the imple-mentation in OpenFOAM can be easily generalized todifferent schemes of RungendashKutta family with minimal codemodification

Data Availability

)e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

)e authors declare that they have no conflicts of interest

Acknowledgments

)is work was supported by Argonne National Laboratorythrough grant ANL 4J-30361-0030A titled ldquoMultiscale

rhoC-dt = 1e ndash 4srhoC-dt = 5e ndash 5srhoC-dt = 25e ndash 5s

rhoC-dt = 2e ndash 5srhoCRK4-dt = 1e ndash 4sAnalytical

000

005

010

015

020

025

ε

01 02 03 04 05 0600Time (s)

Figure 10 Evolution of total kinetic energy

Table 6 CPU time per time step per grid point per core τ0

τ0rhoCentral τ0rhoCentralRK4 τ0rhoCentralRK4τ0rhoCentral

Re 628 1230eminus 06 s 3590eminus 06 s 2919Re 1256 1120eminus 06 s 3630eminus 06 s 3241Re 1260 1189eminus 06 s 3626eminus 06 s 3046Re 12600 1187eminus 06 s 3603eminus 06 s 3035Re 126000 1191eminus 06 s 3622eminus 06 s 3044

Table 7 Time-to-solution τ for rhoCentralFoam and rhoCentral-RK4Foam

τrhoCentral τrhoCentralRK4 τrhoCentralτrhoCentralRK4Δt 2 times 10minus 5 s Δt 10minus 4 s

Re 628 848 s 532 s 159Re 1256 826 s 539 s 153

10 Scientific Programming

modeling of complex flowsrdquo Numerical simulations wereperformed using the Director Discretionary allocationldquoOF_SCALINGrdquo in the Leadership Computing Facility atArgonne National Laboratory

References

[1] H G Weller G Tabor H Jasak and C Fureby ldquoA tensorialapproach to computational continuum mechanics usingobject-oriented techniquesrdquo Computers in Physics vol 12no 6 pp 620ndash631 1998

[2] V Vuorinen J-P Keskinen C Duwig and B J BoersmaldquoOn the implementation of low-dissipative Runge-Kuttaprojection methods for time dependent flows using Open-FOAMrdquo Computers amp Fluids vol 93 pp 153ndash163 2014

[3] C Shen F Sun and X Xia ldquoImplementation of density-basedsolver for all speeds in the framework of openFOAMrdquoComputer Physics Communications vol 185 no 10pp 2730ndash2741 2014

[4] D Modesti and S Pirozzoli ldquoA low-dissipative solver forturbulent compressible flows on unstructured meshes withOpenFOAM implementationrdquo Computers amp Fluids vol 152pp 14ndash23 2017

[5] S Li and R Paoli ldquoModeling of ice accretion over aircraftwings using a compressible OpenFOAM solverrdquo Interna-tional Journal of Aerospace Engineering vol 2019 Article ID4864927 11 pages 2019

[6] Q Yang P Zhao and H Ge ldquoReactingFoam-SCI an opensource CFD platform for reacting flow simulationrdquo Com-puters amp Fluids vol 190 pp 114ndash127 2019

[7] M Culpo Current Bottlenecks in the Scalability of OpenFOAMon Massively Parallel Clusters Partnership for AdvancedComputing in Europe Brussels Belgium 2012

[8] O Rivera and K Furlinger ldquoParallel aspects of OpenFOAMwith large eddy simulationsrdquo in Proceedings of the 2011 IEEEInternational Conference on High Performance Computingand Communications pp 389ndash396 Banff Canada September2011

[9] A Duran M S Celebi S Piskin and M Tuncel ldquoScalabilityof OpenFOAM for bio-medical flow simulationsrdquoCe Journalof Supercomputing vol 71 no 3 pp 938ndash951 2015

[10] Z Lin W Yang H Zhou et al ldquoCommunication optimi-zation for multiphase flow solver in the library of Open-FOAMrdquo Water vol 10 no 10 pp 1461ndash1529 2018

[11] R Ojha P Pawar S Gupta M Klemm and M NambiarldquoPerformance optimization of OpenFOAM on clusters ofIntel Xeon Phitrade processorsrdquo in Proceedings of the 2017 IEEE24th International Conference on High Performance Com-puting Workshops (HiPCW) Jaipur India December 2017

[12] C J Greenshields H G Weller L Gasparini and J M ReeseldquoImplementation of semi-discrete non-staggered centralschemes in a colocated polyhedral finite volume frameworkfor high-speed viscous flowsrdquo International Journal for Nu-merical Methods in Fluids vol 63 pp 1ndash21 2010

[13] A Kurganov and E Tadmor ldquoNew high-resolution centralschemes for nonlinear conservation laws and convection-diffusion equationsrdquo Journal of Computational Physicsvol 160 no 1 pp 241ndash282 2000

[14] A Kurganov S Noelle and G Petrova ldquoSemidiscrete central-upwind schemes for hyperbolic conservation laws andHamiltonndashJacobi equationsrdquo SIAM Journal on ScientificComputing vol 23 no 3 pp 707ndash740 2006

[15] J A Heyns O F Oxtoby and A Steenkamp ldquoModellinghigh-speed viscous flow in OpenFOAMrdquo in Proceedings of the9th OpenFOAM Workshop Zagreb Croatia June 2014

[16] M S Liou and C J Steffen Jr ldquoA new flux splitting schemerdquoJournal of Computational Physics vol 107 no 23 p 39 1993

[17] D Drikakis M Hahn A Mosedale and B )ornber ldquoLargeeddy simulation using high-resolution and high-ordermethodsrdquo Philosophical Transactions of the Royal Society AMathematical Physical and Engineering Sciences vol 367no 1899 pp 2985ndash2997 2019

[18] K Harms T Leggett B Allen et al ldquo)eta rapid installationand acceptance of an XC40 KNL systemrdquo Concurrency andComputation Practice and Experience vol 30 no 1 p 112019

[19] S S Shende and A D Malony ldquo)e TAU parallel perfor-mance systemrdquo Ce International Journal of High Perfor-mance Computing Applications vol 20 no 2 pp 287ndash3112006

[20] C Chevalier and F Pellegrini ldquoPT-Scotch a tool for efficientparallel graph orderingrdquo Parallel Computing vol 34 no 6ndash8pp 318ndash331 2008

[21] V Schmitt and F Charpin ldquoPressure distributions on theONERA-M6-wing at transonic mach numbersrdquo in Experi-mental Data Base for Computer Program Assessment andReport of the Fluid Dynamics Panel Working Group 04AGARD Neuilly sur Seine France 1979

[22] P Woodward and P Colella ldquo)e numerical simulation oftwo-dimensional fluid flow with strong shocksrdquo Journal ofComputational Physics vol 54 no 1 pp 115ndash173 1984

[23] G M Amdahl ldquoValidity of the single processor approach toachieving large-scale computing capabilitiesrdquo in Proceedingsof the 1967 spring joint computer conference onmdashAFIPSrsquo67(Spring) vol 30 pp 483ndash485 Atlantic NJ USA April 1967

[24] G Axtmann and U Rist ldquoScalability of OpenFOAM withlarge-leddy simulations and DNS on high-performancesytemsrdquo in High-Performance Computing in Science andEngineering W E Nagel Ed Springer International Pub-lishing Berlin Germany pp 413ndash425 2016

[25] A Shah L Yuan and S Islam ldquoNumerical solution of un-steady Navier-Stokes equations on curvilinear meshesrdquoComputers amp Mathematics with Applications vol 63 no 11pp 1548ndash1556 2012

[26] J Kim and P Moin ldquoApplication of a fractional-step methodto incompressible Navier-Stokes equationsrdquo Journal ofComputational Physics vol 59 no 2 pp 308ndash323 1985

[27] A Quarteroni F Saleri and A Veneziani ldquoFactorizationmethods for the numerical approximation of Navier-Stokesequationsrdquo Computer Methods in Applied Mechanics andEngineering vol 188 no 1ndash3 pp 505ndash526 2000

Scientific Programming 11

Page 11: ScalabilityofOpenFOAMDensity-BasedSolverwithRunge–Kutta ...downloads.hindawi.com/journals/sp/2020/9083620.pdfmethods in OpenFOAM for transient flow calculations. eseinclude,forexample,numericalalgorithmsforDNS

modeling of complex flowsrdquo Numerical simulations wereperformed using the Director Discretionary allocationldquoOF_SCALINGrdquo in the Leadership Computing Facility atArgonne National Laboratory

References

[1] H G Weller G Tabor H Jasak and C Fureby ldquoA tensorialapproach to computational continuum mechanics usingobject-oriented techniquesrdquo Computers in Physics vol 12no 6 pp 620ndash631 1998

[2] V Vuorinen J-P Keskinen C Duwig and B J BoersmaldquoOn the implementation of low-dissipative Runge-Kuttaprojection methods for time dependent flows using Open-FOAMrdquo Computers amp Fluids vol 93 pp 153ndash163 2014

[3] C Shen F Sun and X Xia ldquoImplementation of density-basedsolver for all speeds in the framework of openFOAMrdquoComputer Physics Communications vol 185 no 10pp 2730ndash2741 2014

[4] D Modesti and S Pirozzoli ldquoA low-dissipative solver forturbulent compressible flows on unstructured meshes withOpenFOAM implementationrdquo Computers amp Fluids vol 152pp 14ndash23 2017

[5] S Li and R Paoli ldquoModeling of ice accretion over aircraftwings using a compressible OpenFOAM solverrdquo Interna-tional Journal of Aerospace Engineering vol 2019 Article ID4864927 11 pages 2019

[6] Q Yang P Zhao and H Ge ldquoReactingFoam-SCI an opensource CFD platform for reacting flow simulationrdquo Com-puters amp Fluids vol 190 pp 114ndash127 2019

[7] M Culpo Current Bottlenecks in the Scalability of OpenFOAMon Massively Parallel Clusters Partnership for AdvancedComputing in Europe Brussels Belgium 2012

[8] O Rivera and K Furlinger ldquoParallel aspects of OpenFOAMwith large eddy simulationsrdquo in Proceedings of the 2011 IEEEInternational Conference on High Performance Computingand Communications pp 389ndash396 Banff Canada September2011

[9] A Duran M S Celebi S Piskin and M Tuncel ldquoScalabilityof OpenFOAM for bio-medical flow simulationsrdquoCe Journalof Supercomputing vol 71 no 3 pp 938ndash951 2015

[10] Z Lin W Yang H Zhou et al ldquoCommunication optimi-zation for multiphase flow solver in the library of Open-FOAMrdquo Water vol 10 no 10 pp 1461ndash1529 2018

[11] R Ojha P Pawar S Gupta M Klemm and M NambiarldquoPerformance optimization of OpenFOAM on clusters ofIntel Xeon Phitrade processorsrdquo in Proceedings of the 2017 IEEE24th International Conference on High Performance Com-puting Workshops (HiPCW) Jaipur India December 2017

[12] C J Greenshields H G Weller L Gasparini and J M ReeseldquoImplementation of semi-discrete non-staggered centralschemes in a colocated polyhedral finite volume frameworkfor high-speed viscous flowsrdquo International Journal for Nu-merical Methods in Fluids vol 63 pp 1ndash21 2010

[13] A Kurganov and E Tadmor ldquoNew high-resolution centralschemes for nonlinear conservation laws and convection-diffusion equationsrdquo Journal of Computational Physicsvol 160 no 1 pp 241ndash282 2000

[14] A Kurganov S Noelle and G Petrova ldquoSemidiscrete central-upwind schemes for hyperbolic conservation laws andHamiltonndashJacobi equationsrdquo SIAM Journal on ScientificComputing vol 23 no 3 pp 707ndash740 2006

[15] J A Heyns O F Oxtoby and A Steenkamp ldquoModellinghigh-speed viscous flow in OpenFOAMrdquo in Proceedings of the9th OpenFOAM Workshop Zagreb Croatia June 2014

[16] M S Liou and C J Steffen Jr ldquoA new flux splitting schemerdquoJournal of Computational Physics vol 107 no 23 p 39 1993

[17] D Drikakis M Hahn A Mosedale and B )ornber ldquoLargeeddy simulation using high-resolution and high-ordermethodsrdquo Philosophical Transactions of the Royal Society AMathematical Physical and Engineering Sciences vol 367no 1899 pp 2985ndash2997 2019

[18] K Harms T Leggett B Allen et al ldquo)eta rapid installationand acceptance of an XC40 KNL systemrdquo Concurrency andComputation Practice and Experience vol 30 no 1 p 112019

[19] S S Shende and A D Malony ldquo)e TAU parallel perfor-mance systemrdquo Ce International Journal of High Perfor-mance Computing Applications vol 20 no 2 pp 287ndash3112006

[20] C Chevalier and F Pellegrini ldquoPT-Scotch a tool for efficientparallel graph orderingrdquo Parallel Computing vol 34 no 6ndash8pp 318ndash331 2008

[21] V Schmitt and F Charpin ldquoPressure distributions on theONERA-M6-wing at transonic mach numbersrdquo in Experi-mental Data Base for Computer Program Assessment andReport of the Fluid Dynamics Panel Working Group 04AGARD Neuilly sur Seine France 1979

[22] P Woodward and P Colella ldquo)e numerical simulation oftwo-dimensional fluid flow with strong shocksrdquo Journal ofComputational Physics vol 54 no 1 pp 115ndash173 1984

[23] G M Amdahl ldquoValidity of the single processor approach toachieving large-scale computing capabilitiesrdquo in Proceedingsof the 1967 spring joint computer conference onmdashAFIPSrsquo67(Spring) vol 30 pp 483ndash485 Atlantic NJ USA April 1967

[24] G Axtmann and U Rist ldquoScalability of OpenFOAM withlarge-leddy simulations and DNS on high-performancesytemsrdquo in High-Performance Computing in Science andEngineering W E Nagel Ed Springer International Pub-lishing Berlin Germany pp 413ndash425 2016

[25] A Shah L Yuan and S Islam ldquoNumerical solution of un-steady Navier-Stokes equations on curvilinear meshesrdquoComputers amp Mathematics with Applications vol 63 no 11pp 1548ndash1556 2012

[26] J Kim and P Moin ldquoApplication of a fractional-step methodto incompressible Navier-Stokes equationsrdquo Journal ofComputational Physics vol 59 no 2 pp 308ndash323 1985

[27] A Quarteroni F Saleri and A Veneziani ldquoFactorizationmethods for the numerical approximation of Navier-Stokesequationsrdquo Computer Methods in Applied Mechanics andEngineering vol 188 no 1ndash3 pp 505ndash526 2000

Scientific Programming 11