Numerical Study of Grid Refinement Techniques for the ... · 2 TheLatticeBoltzmannMethod ThisthesisshowcasesandcomparesgridreﬁnementtechniquesfortheLatticeBoltzmannmethod (LBM…

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERGTECHNISCHE FAKULTÄT • DEPARTMENT INFORMATIK

Lehrstuhl für Informatik 10 (Systemsimulation)

Numerical Study of Grid Refinement Techniques for the LatticeBoltzmann Method

Nicolas Krieg

Bachelor Thesis

Numerical Study of Grid Refinement Techniques for the LatticeBoltzmann Method

Nicolas KriegBachelor Thesis

Aufgabensteller: Prof. Dr. U. RüdeBetreuer: C. Rettinger, M.Sc.(hons)Bearbeitungszeitraum: 7.5.2019 – 7.10.2019

Erklärung:

Ich versichere, dass ich die Arbeit ohne fremde Hilfe und ohne Benutzung anderer als der angege-benen Quellen angefertigt habe und dass die Arbeit in gleicher oder ähnlicher Form noch keineranderen Prüfungsbehörde vorgelegen hat und von dieser als Teil einer Prüfungsleistung angenom-men wurde. Alle Ausführungen, die wörtlich oder sinngemäß übernommen wurden, sind als solchegekennzeichnet.

Der Universität Erlangen-Nürnberg, vertreten durch den Lehrstuhl für Systemsimulation (Informa-tik 10), wird für Zwecke der Forschung und Lehre ein einfaches, kostenloses, zeitlich und örtlichunbeschränktes Nutzungsrecht an den Arbeitsergebnissen der Bachelor Thesis einschließlich etwai-ger Schutzrechte und Urheberrechte eingeräumt.

Erlangen, den 5. Oktober 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Contents1 Introduction 1

2 The Lattice Boltzmann Method 22.1 Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Boundary Treatment and Parallelization Concepts . . . . . . . . . . . . . . . . . . . 5

3 Grid Refinement 73.1 Non-Uniform Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Timestep Algorithm for Non-Uniform Grids . . . . . . . . . . . . . . . . . . . . . . . 83.3 Communication Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3.1 Equal Level Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3.2 Coarse to Fine Communication . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3.3 Fine to Coarse Commmunication . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.4 Grid Refinement Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.5 Homogeneous Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.6 Linear Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.7 Compact Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Implementation in waLBerla 204.1 Existing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Newly Implemented Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5 Validation of Test Cases 315.1 Homogeneous Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.2 Linear Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.3 Compact Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.4 Large Couette Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6 Conclusion 48

I

List of Figures1 A Lattice Boltzmann simulation with a refined grid. . . . . . . . . . . . . . . . . . . 12 D2Q9 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 The basic timestep algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Graphical explanation of the no-slip boundary condition . . . . . . . . . . . . . . . . 65 A 2D grid with a maximal refinement of n = 2. . . . . . . . . . . . . . . . . . . . . . 86 Communication process where two blocks send their information to a neighboring

block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Schematic representation of one coarse step and two fine steps in two neighboring

blocks with a different level of refinement.(Idea: [Sch18]) . . . . . . . . . . . . . . . . 118 An intersection in 2D with three fine and one coarse block. . . . . . . . . . . . . . . 129 Minimal requirements of pdf data (purple) which needs to be send when there is a

coarse neighbor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1210 The orange cells inside the orange marker are the coarse cells that need to be com-

municated to the fine block located in the middle. . . . . . . . . . . . . . . . . . . . 1311 Two fine blocks sending their ghost layer cells to a coarse block. . . . . . . . . . . . . 1412 The amount of particle distribution values are being sent into the coarse cell. . . . . 1413 Assigning coarse cells from the fine ghost layer cells in 2D. . . . . . . . . . . . . . . . 1514 Assigning fine ghost layer cells from the coarse cells in 2D. . . . . . . . . . . . . . . . 1515 Flow diagram of LinearExplosion(). Rectangular boxes represent statements.

Trapezoids are for eachloops and diamond-shaped boxes are ifconditions. Theround boxes declare the inside of a loop, where the iterating element can be foundin the boxes top left corner. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

16 Flow diagram of LinearInterpolation() . . . . . . . . . . . . . . . . . . . . . . . . 2417 Flow diagramm of LinearExplosion(). . . . . . . . . . . . . . . . . . . . . . . . . . 2618 Visual Example of the trilinear interpolation for equations (29). . . . . . . . . . . . . 2719 The orange area can be interpolated after receiving the coarse values. . . . . . . . . 2820 The interpolation area (orange and purple) when creating additional coarse points

(purple). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2921 Some coarse values can not be created. . . . . . . . . . . . . . . . . . . . . . . . . . . 2922 The current possible area of interpolation. . . . . . . . . . . . . . . . . . . . . . . . . 3023 A Couette flow is a linear flow profile, that develops between one moving and one

stationary surface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3124 The left picture shows the blocks of the domain and the right picture shows the cells

inside the domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3125 The left picture shows the blocks of the domain and the right picture shows the cells

inside the domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3226 The velocity in x-direction after 650 timesteps for test cases H and V. . . . . . . . . 3227 The velocity in x-direction after 650 timesteps for test case V on the left and for test

case H on the right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3328 The velocity in x-direction after 650 timesteps for test case V on the fine and coarse

grid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3329 The velocity in x-direction after one timestep in test case V. . . . . . . . . . . . . . 3430 The velocity in z-direction after one timestepin test case V. . . . . . . . . . . . . . . 3431 The velocity in x-direction after two timesteps in test case V. . . . . . . . . . . . . . 3532 The velocity in z-direction after two timesteps in test case V. . . . . . . . . . . . . . 3533 The velocity in z-direction after 650 timesteps in test case V. . . . . . . . . . . . . . 3534 The velocity in x-direction after 650 timesteps in in test case V. . . . . . . . . . . . 3635 The absolute error after 650 timesteps in test case V. . . . . . . . . . . . . . . . . . 3736 The velocity in x-direction after 650 timesteps in test case V. The second image is

an enlargement of the plot on the left to better showcase the velocity deviation. . . . 3737 The velocity in z-direction after 1 timestep in test case V. The velocity is exactly the

same as in figure 29. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3838 The velocity in z-direction after 2 timesteps in test case V. . . . . . . . . . . . . . . 3839 The velocity in z-direction after 650 timesteps in test case V. . . . . . . . . . . . . . 39

II

40 The velocity in x-direction after 650 timesteps in test case V. The second image isan enlargement of the plot on the left to better showcase the velocity deviation. . . . 39

41 The velocity in x-direction after 650 timesteps in the horizontally refined domain. . . 4042 On the left is the ideal distribution of values in the ghost layers. On the right is the

current distribution of values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4143 The velocity in x-direction after 650 timesteps in test case V. . . . . . . . . . . . . . 4144 The velocity in x-direction after 2 timesteps in test case V. . . . . . . . . . . . . . . 4245 The velocity on the border in x-direction after 650 timesteps in test case H. . . . . . 4246 The velocity on the border in x-direction after 650 timesteps in test case V. The

second image is an enlargement of the plot on the left to better showcase the velocitydeviation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

47 The velocity on the fine and coarse grid in x-direction after 650 timesteps in test caseV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

48 The blocks of the large domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4449 Velocity in x-direction of homogeneous distribution, linear explosion, compact inter-

polation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4550 The absolute errors in terms of velocity on a line through coarse cells and the finer

domain. (1) is the absolute error of homogeneous distribution, (2) of linear explo-sion, (3) of compact interpolation and (4) of compact interpolation with a trilinearinterpolation of the velocity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

51 The absolute errors in terms of velocity on a line through coarse and fine cells thatlay on the vertical border of the fine domain. (1) is the absolute error of homogeneousdistribution, (2) of linear explosion, (3) of compact interpolation and (4) the error ofcompact interpolation with a trilinear interpolation of the velocity. . . . . . . . . . . 47

52 Another concept for interpolating the innermost ghost layer. . . . . . . . . . . . . . . 48

III

List of AbbreviationsCFD Computational fluid dynamics

LBM Lattice Boltzmann method

pdf particle distribution function

V vertically refined domain

H horizontally refined domain

IV

1 IntroductionThis thesis studies the behavior of simulated fluids. Fluid is an umbrella term for a substance whichcontinuously changes its form when internal or external forces are being applied.The formulas describing the fluid, are complicated to solve analytically as the fluid itself is of quitea complex form. Thus, the field computational fluid dynamics (CFD) gained in popularity. Insteadof solving fluid problems analytically, numerical analysis and certain data structures are used tosimulate fluid behavior. Nowadays, CFD is used to solve problems in engineering and research fields.CFD being applied in a broad range reaching from weather simulation or biological engineering allthe way to engine combustion, aerospace, and aerodynamic analysis. It separates itself into manydifferent branches and solvers. The Lattice Boltzmann method was selected for fluid simulation inthis thesis. Many fluid simulation methods discretize the domain on a grid or lattice, also being thecase for the Lattice Boltzmann method. CFD simulations usually carry high computational costs.Grid refinement and parallelization are used by applications to decrease the runtime.A specific problem for fluid simulation would be a very large simulation domain with a very smallarea of interest. If the grid of the simulation is uniform, the choice of the mesh size is of highimportance. On the one hand, a small mesh size increases the accuracy of the area of interest aswell as the computational cost. A large mesh size, on the other hand, decreases both the accuracyfor the area of interest as well as the computational cost. Resulting in the typical trade-off betweenaccuracy and computational cost.

Figure 1: A Lattice Boltzmann simulation with a refined grid.

A simple solution is choosing a large mesh size and only refining it at a certain area of interest.This achieves a balance between the computational cost and the desired accuracy. This is depictedin figure 1, showing the flow profile of a Lattice Boltzmann simulation. The point of interest ofthe simulation being on the right side, where the domain is refined. On the left side, in the refinedarea, the flow profile seems to be much smoother.The following sections examine grid refinement techniques focusing on their implementation as wellas their validation in certain test cases. The high-performance software framework waLBerla isused for the implementation. Validation results of the chosen grid refinement techniques can varyfor each software framework as it always depends on how the basic mechanisms are implemented.Most importantly, a deeper understanding of multiple grid refinement techniques is provided.

1

2 The Lattice Boltzmann MethodThis thesis showcases and compares grid refinement techniques for the Lattice Boltzmann method(LBM). Hence, a short explanation of the Lattice Boltzmann method is necessary.The Lattice Boltzmann method is used for simulating fluids in the field of computational fluiddynamics (CFD). It differentiates itself from other CFD methods, by not directly solving solvemacroscopic quantities, but rather simulating a particle flow.In CFD, there are multiple ways to describe and view a fluid. Microscopic, mesoscopic, and macro-scopic are terms, often referred to in this context. As mentioned in [Kr17], microscopic describesa molecular view of the fluid, whereas macroscopic is a full continuum picture with tangible quan-tities. Macroscopic quantities are, for example, the fluid density or the fluid velocity. Mesoscopicdenotes a view that does not track each individual molecule of the fluid, but rather it keeps trackof the distributions of the molecules.One of the most known equations in computational fluid dynamics is the Navier-Stokes equation,which is based on a macroscopic view. The LBM, on the other hand, is based on a mesoscopicview. Nevertheless, it is possible to simulate the Navier-Stokes equation utilizing the LBM. This isshown in detail in [Kr17].Furthermore, there are multiple kinds of views on velocity namely: particle velocity ξ (microscopic),mesoscopic velocity c and macroscopic velocity u.For simulating the particle flow, the LBM is divided into two separate steps, a collision step, whichis afterward followed by a streaming step. All operations performed by the LBM are based onthe mesoscopic view. The LBM divides time and space into separate steps to form a lattice anddiscretize the fluid as particles, positioning them into individual cells.

2.1 Basic PrinciplesThe following section describes the mechanics and the fundamentals of the LBM. Firstly, the deriva-tion of the method from the Boltzmann equation will be stated. Afterward, basic principles like thecollision operator and the timestep algorithm will be declared. Also, the translation of mesoscopicvalues into macroscopic quantities will be listed. For this section [Kr17] was the only source.

Dimension and Velocity models

The Lattice Boltzmann method uses a uniform cartesian discretization of space. It can be executedin a one-, two- and three-dimensional space. Each cell on the lattice has neighbors, capable ofinteracting with each other. The number of neighbors is limited by the dimension of the domain.For 1D and 2D, the usual models are D1Q3 and D2Q9 where D describes the number of dimensionsand Q the number of velocities.Due to a high increase in computational costs in 3D, there are multiple commonly used velocitysets, e.g., D3Q15, D3Q19, and D3Q27. However, the fewer neighbors/velocity directions, the lowerthe achieved accuracy in the simulation, resulting in a constant trade-off between accuracy andcomputational cost.

Figure 2: D2Q9 model

2

Figure 2 shows a lattice cell with its distribution function in each direction. Given that it is a D2Q9model there are nine distribution functions, eight in all cardinal directions and one located at thecenter, which represents the immovable part of the particle distribution. In the following parts ofthe thesis the direction of the distribution functions will be addressed via the cardinal directions.When describing 3D models, directions on the third axis are labeled top in positive, and bottom innegative direction.

The Boltzmann Equation

A fluid particle distribution function f (pdf) can be described by the following variables: theposition x, the particle velocity ξ and the time t. The total differential Ω(f) = Df

Dt of f is

Df

Dt=

(∂f

∂t

)dt

dt+

(∂f

∂xβ

)dxβdt

+

(∂f

∂ξβ

)dξβdt

. (1)

The term dξβ/dt = Fβ/ρ can be derived from Newton’s second law. Utilizing further knowledge ofdt/dt = 1 and dxβ/dt = ξβ , equation (1) can be simplified and expressed as

∂f

∂t+ ξβ

∂f

∂xβ+Fβρ

∂f

∂ξβ= Ω(f), (2)

see [Kr17]. The first term of the equation describes the flow of particles in time, the second onethe change of the position, and the third one the forces affecting the local velocity. Fluid particlescan collide, and Ω(f) expresses a redistribution of fluid collided particles. Hence, Ω(f) is called thecollision operator.By using the Boltzmann formula, it is possible to obtain mesoscopic moment conservation, whichcan be translated to a conservation of mass, momentum, total energy, and internal energy.

Bhatnagar-Gross-Krook Collision Operator

An often-used approach to describe Ω(f), is the Bhatnagar-Groos-Krook (BGK) collision operator

Ωi(f) = −fi − feqi

τ∆t. (3)

A commonly used synonym for the BGK collision operator is single-relaxation-time (SRT) collisionoperator, as it possesses only one relaxation time. As shown in equation (3), feqi is an equilib-rium state and τ is the relaxation parameter. If a fluid or gas flows for a certain amount of timeit is assumed that the single distribution function f(x, ξ, t) will reach an equilibrium distributionfeq(x, ξ, t). This means the flow is going to have the macroscopic mean velocity u. The BGKoperator converges to this equilibrium state with a given relaxation parameter τ .However, one problem with the BGK operator is that it does not achieve the same accuracy as theBoltzmann operator. Exact and invariant conservation of the mass equation is achievable, but theenergy equations cannot be conserved. The cause being that the BGK operator relaxes all momentsequally.Besides the SRT collision operator, there is also the two-relaxation-times (TRT) collision operatorand the multiple-relaxation-time (MRT) collision operator. Those models provide different relax-ation parameters for different moments, resulting in higher accuracy. A problem is that they carry ahigher computational cost along with them. Demonstrating the typical trade-off between accuracyand computational cost.

Collision and Streaming Step

As stated above the Lattice Boltzmann method consists of two steps. Each timestep consists of acollision step, directly followed by a streaming step. Those steps are performed on each cell of thegrid. In the collision step there is a collision performed, described by the collision operator. Duringthe streaming step a streaming of the fluid from the current cell to its neighbours is performed.The collision step is

f∗i (x, t) = fi(x, t)−∆t

τ(fi(x, t)− feqi (x, t)), (4)

3

where the collision operator is performed onto the distribution function on each cell where fi(x, t)is the discrete-velocity distribution function. The streaming step is

fi(x + ci∆t, t+ ∆t) = f∗i (x, t). (5)

In the streaming process the particle distribution function of each direction is streamed into thecorresponding neighboring cells of the direction. Herein ci is a mesoscopic velocity, which is partof a set of velocities c.

fi(x + ci∆t, t+ ∆t) = fi(x, t)−∆t

τ(fi(x, t)− feqi (x, t)). (6)

In equation (6), collision and streaming step are expressed in a single equation. However, the moretypical notation for the LBM description is:

fi(x + ci∆t, t+ ∆t) = fi(x, t) + Ωi(x, t). (7)

Macroscopic Quantities

The particle distribution functions f are mesoscopic, but macroscopic quantities provide an easierinterpretation of the simulation, hence the macroscopic quantities have to be calculated. Further-more, some of the grid refinement techniques, which are later on described, need macroscopic valuesfor their algorithm.It is also possible to calculate feqi with macroscopic quantities, as shown in the equation below. Theequilibrium is connected with macroscopic quantities like ρ and u, as such it is possible to expressthe equilibrium state feqi for each direction i as

feqi = wiρ

(1 +

c · uc2s

+(c · u)2

2c4s+

u · u2c2s

), (8)

with wi and cs being constants. The LBM initializes each cell with a certain weight wi pairedwith a chosen velocity set. With cs being the mesoscopic speed of sound used for LBM, expressedas cs = (∆x/∆t)

√3. Pressure p is obtained by p = c2sρ. The non-equilibrium is calculated with

fneqi = fi − feqi . ∑i

feqi =∑i

fi = ρ (9)

∑i

cifeqi =

∑i

cifi = ρu (10)

The density ρ and momentum ρu are defined in equation (9) and equation (10). The viscosity isobtained by

ν = c2s

(τ − ∆t

2

)=

1

3

(1

ω− 1

2

)∆x2

∆t. (11)

In the following sections will make use of a relaxation rate mentioned, which is defined as ω = ∆tτ .

The viscous stress tensor can be expressed by

σαβ = −(

1− ∆t

2τ

)∑i

ciαciβfneqi . (12)

Finally the strain rate tensor Sαβ is given by

Sαβ = −3ω

2ρ

q∑i=1

ciαciβfneqi . (13)

4

The Basic Timestep Algorithm

As stated earlier, the LBM consists of a collision step and a streaming step. The collision steprequires two macroscopic quantities: the density ρ and the velocity u, making it possible to calculatefeqi for the collision operator. After computing f∗i (x, t) the streaming step can be executed. Thesetwo operations make up one timestep.A very basic description of the timestep algorithm was proposed by [Kr17].

1. Calculate ρ(x, t) and u(x, t) via fi(x, t).

2. Calculate the equilibrium distribution feqi (x, t).

3. It is possible to calculate/save macroscopic fields ρ(x, t), u(x, t) and σ(x, t) in a file for visu-alisation or post-processing. The viscious stress tensor σ is obtained via equation (12).

4. Execute the collision step as described in equation (4).

5. Execute the streaming step as described in equation (5).

6. Increment the time step, from t to t+ ∆t, and start again at step 1 until convergence or thestep limit has been reached.

A graphical description can be found in figure 3. As depicted the collision step processes the newstreamed distribution directly after the streaming step applies. Subsequently, the same procedurerepeats itself. The computation of ρ, u(x, t) and feqi (x, t) was summarized in the step "updatemacroscopic quantities".

Figure 3: The basic timestep algorithm

2.2 Boundary Treatment and Parallelization ConceptsFor the LBM, the events on the boundary of the domain need to be defined. There are severalboundary conditions, although only no-slip and periodic boundary conditions are relevant for thisthesis.The no-slip boundary condition prevents fluids from streaming through the boundary. For instance,this boundary can simulate an impermeable wall or object. Periodic boundary conditions simplycopy the value from one side of the boundary domain to the other. With periodic boundaryconditions, an infinite pipeline can be modeled.The domain can have different boundary conditions on different parts of the boundary. Boundarytreatment is performed in the post-collision pre-streaming step, where the pdf values of boundarycells are calculated depending on the conditions. During the following streaming step, these valuesare being pushed to the cells neighboring the boundary conditions. This integration of the boundarytreatment requires no additional steps, in the timestep algorithm, for boundary treatment.

5

Figure 4: Graphical explanation of the no-slip boundary condition

As figure 4 illustrates, the boundary conditions for the no-slip condition are being calculated justbefore the streaming step. While the lower cell (blue) is inside the domain (also called border cell),the boundary cell (gray) is outside of the domain. With the no-slip boundary condition, the fluidcan not stream through the boundary, as mentioned before. This is why the particle distributionfunctions are being copied from the directions facing the cell, as seen in part two of figure 4.This is similar to a reflection or a "bouncing back" movement as the same distribution function isstreamed in the opposite direction. The streaming from the border cells into the boundary cells canbe discarded because the no-slip boundary treatment does only rely on the distribution functionsof the neighboring cells. Usually, a cell on the lattice can either be a fluid cell or a boundary cell.Finally, it should be mentioned that it is possible to parallelize the LBM. This is a major advantagebecause parallelization can improve performance tremendously. The following grid refinement islinked to the parallelization concept mentioned in [Sch18].In [Sch18] it is proposed to divide the uniform grid into equally sized blocks where each block canbe computed in parallel. Because of the communication/sending of values between cells in thestreaming step, each block has to communicate with its neighbor block, therefore needing ghostlayers. Ghost layers are cell layers surrounding a block, providing additional storage for data fromthe surrounding blocks.

6

3 Grid RefinementUsually, the Lattice Boltzmann method requires a uniform lattice. However, this means that themesh size for a large domain with a small point of interest has to be uniform, resulting in the typicaltrade-off between accuracy and performance.For that reason, non-uniform grids are used in the LBM.The selected refinement scheme is based on the blocks used for parallelization. Since the high-performance software framework waLBerla is chosen for this thesis, the refinement scheme, whichrelies on [Roh+06], was used as the basis for the grid refinement techniques. The following sec-tions give a detailed account of the refinement and communication scheme, based on [Sch18] and[Roh+06].

3.1 Non-Uniform GridsThe parallelization concept, where the domain is separated into blocks, mentioned in section 2.2,serves as a basis for the refinement. Non-uniform grids are multiple blocks containing a grid, wherethe contained grid is uniform, although other blocks can have a grid with a different mesh size.Consequently, these blocks can contain grids, which are non-uniform to each other. Therefore, it ispossible to have multiple grids with different mesh sizes in the same simulation domain.Figure 5 shows a refined grid. The term coarse and fine grid later on mentioned, mean a coarsegrid is a grid with a larger mesh size than the fine one.In [Roh+06] a refinement scheme is presented where the proportions between fine grid and coarsegrid are two to one. Meaning two fine cells on a fine grid in each axis direction correspond to onecell on a coarse grid. For example in 1D two fine cells correspond to one coarse cell, in 2D fourfine cells correspond to one coarse cell, and subsequently in 3D eight fine cells correspond to onecoarse cell. When referring to different grid sizes the term "level" is generally used, describing thedegree of refinement. Thereby level zero is the coarsest and level n the finest grid. With the two toone ratio in refinement, the mesh size of a grid at level one is half the mesh size of a grid at levelzero and the step size of a grid at level two would be a quarter of the stepsize at level zero. This isdescribed in

∆xL+1 =∆xL

2=⇒ ∆xL =

∆x0

2L, (14)

where ∆x is the mesh size and L denotes the level of the current grid. As mentioned by [Sch18],the algorithm implies that the scaling in space is proportional to the spacing in time, thus leadingto the following conclusion for the stepsize in time:

∆tL+1 =∆tL

2=⇒ ∆tL =

∆t02L

. (15)

Following these assumptions the speed of sound cs = ∆x∆t√

3as well as the viscosity ν have to remain

constant. To keep ν constant, the relaxation rate ω has to change in each level in order to stayconstant.

νL = ν0

=⇒ c2s

(1

ωL− 1

2

)∆tL = c2s

(1

ω0− 1

2

)∆t0

=⇒ ωL =2ω0

2L+1 + (1− 2L)ω0

A more generalized version between levels K and L is given by

ωL =2K+1ωK

2L+1 + (2K − 2L)ωK. (17)

Since the ratio of two to one needs to be maintained, it is only possible to refine cells when all theirsourrounding cells are on the same level. In figure 5 each refined grid is only surrounded by othergrids, which have a maximal level difference of one.

7

Figure 5: A 2D grid with a maximal refinement of n = 2.

One of the most important aspects of grid refinement for the LBM is the transition between differentgrid levels. Before the refinement, the behavior on the border of each grid was only influenced bythe treatment of the boundaries. However, with grid refinement, each grid shares a connection onan area of the border with a coarser or finer grid. Therefore there has to be a special treatmentfor those borders because information needs to be transferred from the coarse grid to the fine gridand vice versa. The following section elaborates on the interaction of grids and the informationtransfers.

3.2 Timestep Algorithm for Non-Uniform GridsAs previously explained, when working on a non-uniform grid, it is split into multiple grids withdifferent levels of refinement, which in itself are uniform. Because of the multiple grids with differentmesh sizes the timestep algorithm has to be changed as its current layout (described in section 2.1)only works when handling a single grid. With multiple grids, a timestep needs multiple forms ofcommunication to transfer information between separate grids.Since the grid refinement techniques are implemented for waLBerla, two timestep algorithmswill be explained, both of which are described in detail in [Sch18]. They are both built on theparallelization concept mentioned in section 2.2, resulting in the partition of a grid into multipleblocks.The timestep algorithm for non-uniform grids mentioned in [Sch18] and [Roh+06] is a recursivemethod with a distinct structure. Algorithm 1 explains the structure of the algorithm, which is asimilar description, as found in [Sch18]. It starts on the coarsest level, l = 0, and recursively worksits way down to the finest level, l = n.

8

1 Function RecursiveNonUniformTimestep(l : int)2 performCollisionStep(l);3 if l 6= n then4 RecursiveNonUniformTimestep(l + 1)5 end6 if l 6= 0 then7 CommunicationCoarserLevel(l, l − 1)8 end9 CommunicationEqualLevel(l);

10 PerformStreamingStep(l);11 if l 6= n then12 CommunicationFinerLevel(l, l + 1)13 end14 if l = 0 then15 return;16 end17 performCollisionStep(l);18 if l 6= n then19 RecursiveNonUniformTimestep(l + 1)20 end21 CommunicationEqualLevel(l);22 PerformStreamingStep(l);23 if l 6= n then24 CommunicationFinerLevel(l, l + 1)25 end26 end

Algorithm 1: Timestep algorithm for non uniform grids

Since the increment is linear, l + 1, the levels are run-through in a straight top to bottom order.The methods PerformCollisionStep(l) and PerformStreamingStep(l) execute the collision andstreaming step for all blocks on a certain level l. CommunicationEqualLevel(l),CommunicationCoarserLevel(l, l − 1) and CommunicationFinerLevel(l, l + 1) are methodswhere each block communicates either with blocks on its level or one level below or above. Com-munication on equal levels can only occur if neighboring blocks exist, that are connected to thecurrent block. The same applies for the communication to blocks of higher or lower levels. In2D, neighboring blocks are connected via lines or a point. In 3D the connection between blocks isexpressed by faces or a corner/edge.The boundary conditions are computed right before the streaming step. It is also possible to achievea higher level of performance by merging the first streaming and second collision step on the finestlevel into a stream-collide operation.This method allows to perform a parallel Lattice Boltzmann method with multiple non-uniformgrids.A grid refinement technique can also be chosen for non-uniform grids to achieve a higher perfor-mance. Its integration into the timestep will be explained in the next section. This technique isonly used on blocks of the current level l.

9

3.3 Communication SchemesThere are three different types of communication schemes. Equal level communication, coarse to finecommunication, and fine to coarse communication. With equal level communication neighboringblocks of the same level of refinement exchange information, whereas with coarse to fine commu-nication a coarse block sends information to a neighboring block of a finer level of refinement, andfine to coarse communication, where a finer block transfers its information to a neighboring coarserblock.A LBM-specific feature, which improves performance, is that cells only need information from theirdirect neighbors to perform the collision and streaming steps. In the context of the parallelizationconcept an information transfer, which only takes place on the shared border of a block is needed.For the communication between bordering blocks of different levels more information is needed.The following communication schemes are based on the block parallelization proposed in [Sch18].All the shown figures in this subsection are similar to [Sch18].

Figure 6: Communication process where two blocks send their information to a neighboring block.

Figure 6 illustrates the basic principle of communication between blocks. The purple arrows displaythe data send to the neighboring block. In each step, every block sends its information to its directlyadjacent neighbors. Located on the upper left, is the receiver block, and on the right side are twoneighboring blocks, which send their information to the ghost layers of the receiver. The continuouslines mark the inner cells of the block, and the dashed lines mark the cells of ghost layers. As itcan be seen in figure 6, each sender block only sends the cell information of cells directly borderingonto the receiver block. The pdfs that are being transferred into the ghost layer are the ones facingin the direction where the receiver block is located. In the 2D example of figure 6, each block onlysends their pdf values facing west.

10

Figure 7: Schematic representation of one coarse step and two fine steps in two neighboring blockswith a different level of refinement.(Idea: [Sch18])

The interaction between a coarse block and a fine block is portrayed in figure 7. Similar to theimage introduced in [Sch18], where Schornbaum explained the interaction between different blocksin a timestep graphically. The upper row represents the steps on the coarse grid and the lower rowthe steps on the fine grid. In those seven steps, one timestep is performed on the coarse and twoon the fine block. Gray arrows stand for pdf values in the coarse ghost layers or other surroundingcells. Orange arrows denote the values in the coarse cells while continuous orange arrows denotethe starting values in the southern coarse cell and dashed orange arrows the starting values inthe northern coarse cell. Purple arrows are the starting values of the fine cells, the difference inthe shape is to symbolize the possibility of different values within the fine cells. The blue boxes,surrounding the values inside a cell, mark a content change of the cell.A description for each step of the figure above is provided via an enumeration of steps. Before thefirst step, both blocks are in a post-streaming state.

1. Perform a collision step in all cells of both blocks except for the ghost layers.

2. Transfer the data of the two cells from the coarser block into the ghost layers of the finerblock.

3. Perform a streaming step in both grids.

4. Perform a collision step on the cells of the fine block.

5. Perform a streaming step on the fine block.

6. Transfer the data of the two innermost ghost layers to the southern coarse cell.

7. Start again at step 1.

Four ghost layers on the fine grid are needed because the streaming steps for the two innermostghost layers of the fine block need two layers of cells to receive enough information.

11

3.3.1 Equal Level Communication

Equal level communication takes place between connected blocks of the same level of refinement.For equal level communication, the same principle applies as for the LBM without refinement.Therefore, it is only necessary to exchange one ghost layer. Usually the operations are performedas depicted in figure 6.Although there is one exception, which applies whenever there is also a coarser block on the sameintersection point or line in 2D or faces or edge in 3D.

Figure 8: An intersection in 2D with three fine and one coarse block.

Figure 8 shows a scenario in which the block located in the south-east direction is the receiver ofthe communication. The boundaries of the blocks are marked in purple, and the intersection pointis depicted as an orange dot. Usually, the normal process for equal level communication wouldapply for these two neighboring blocks with the same level of refinement, although in this case, theorange dot depicts an intersection with a coarser block, resulting in the exception.In cases like this, the neighboring equal level blocks have to send at least two layers of data forD2Q9, D3Q19, and at least three layers for D3Q27. In figure 9 the minimal requirements of datathat need to be sent, when there is a coarser block in its proximity, is shown. Black arrows showthe direction in which the data is sent to an equal level neighbor. The increased amount of datathat needs to be sent is due to the fact that during the process of coarse to fine communication,more data needs to be transmitted.

Figure 9: Minimal requirements of pdf data (purple) which needs to be send when there is a coarseneighbor.

12

3.3.2 Coarse to Fine Communication

Earlier it was stated that fine blocks are required to have at least four rows of ghost layers. Withthe ratio of refinement being two to one, two rows of coarse cells are needed to fill four rows ofghost layers. The particle distribution functions of cells of the coarser blocks are being transferredto the ghost layer cells of the finer block.

Figure 10: The orange cells inside the orange marker are the coarse cells that need to be commu-nicated to the fine block located in the middle.

In figure 10 there are two coarse blocks that have to send their information to the fine block locatedin the vertical center. It is illustrated that on the fine ghost layer grid, the lower left and upper leftcorners are being filled by coarse neighbor cells, which also border another fine block. Thereforedata is being sent to multiple fine neighboring blocks because the two upper right rows of the coarseupper block are being communicated to the fine middle block and fine upper block. While the twoupper right rows of the lower coarse block are being sent to the fine middle block as well as finelower block.The distribution of the coarse values to the fine ghost layer values will be explained in-depth insection 3.4.In general, coarse to fine communication can be considered as a pre-streaming step. Thus it has tobe performed before any grid refinement technique.

3.3.3 Fine to Coarse Commmunication

The communication from the fine grid to the coarse grid is quite different from equal level andcoarse to fine communication. The values are being sent from the fine ghost layer directly into thecoarse cell block. Therefore there is no need for ghost layers in the coarse grid in the process offine to coarse communication. The two inner ghost layers send their information to their coarserneighbors in proximity. Their values will be written in the coarse border cells of the block. Themerging of four ghost layer cells in 2D or eight ghost layer cells in 3D into a single coarse cell isstated in section 3.4.It is important to notice that not all particle distribution functions of each ghost layer cells arebeing sent. Only pdfs, which pointing into the direction of the intersection between the currentfine and the coarse block, are being sent. Contrary to the coarse to fine communication, there is noinformation transfer through points or edges if the blocks are connected via a face.

13

Figure 11: Two fine blocks sending their ghost layer cells to a coarse block.

The scenario in figure 11 illustrates the fine to coarse communication for two fine blocks borderinga coarse block. The amount and position of cells sent by each block is described by the circled cells.It is the same setup as in figure 10 with the communication being reversed. In this case the fineblocks only send data from cells that are located directly at the border.

Figure 12: The amount of particle distribution values are being sent into the coarse cell.

If there is a coarse block in proximity and the only intersection is a point or an edge, the fine blockwill only send those ghost layers, which have particle distribution functions pointing in the directionof the block. Figure 12 shows the amount of ghost layers and particle distribution functions, whichhave to be sent into a coarser block for the D2Q9 model. On the one hand, the purple arrow displaysvalues that only belong to a single block when merged in the coarse cell. The orange arrows, on theother hand, signal that there is data from multiple fine blocks that have to be merged into a singlevalue in the coarse cell. Hereby the sent values from each block should not overwrite each other.Fine to coarse communication usually takes place as a post-streaming step.

14

3.4 Grid Refinement TechniquesGrid refinement techniques revolve around the redistribution of the data from coarse cells to theircorresponding fine cells and vice versa. When declaring multiple connected grids of a different degreeof refinement, one of the most important things is the transition. It can be difficult to obtain theconservation of mass, momentum, and energy. Therefore the correct assignment of ghost layerswhen handling multiple grids is Crucial. If the blocks are graphically visualized in the domain, theghost layers overlap with the other block. The overlap of cells and ghost layers in the domain showswhich coarse cells correspond to which fine ghost layer cells. This is illustrated in the followingfigures.

Figure 13: Assigning coarse cells from the fine ghost layer cells in 2D.

Figure 13 demonstrates data being transferred to the coarse ghost layers. Within the domain coarseghost layers overlap with the finer grid. For the 2D example it is apparent that there are four finecells for one coarse ghost layer cell. Meaning there are multiple sources of information for thecalculation of one cell.

Figure 14: Assigning fine ghost layer cells from the coarse cells in 2D.

The opposite case is presented in figure 14 with the assignment of the fine ghost layers. In 2D,there is only one coarse point for four fine ghost layer cells, resulting in only a single source of

15

information used for the calculation of multiple ghost layer cells. To avoid a diminishing accuracy,close attention is paid to the distribution from the coarse grid to the fine grid.Today there are multiple methods to choose from for the assignment of the ghost layers. Threemethods were chosen to be applied and evaluated on the framework waLBerla.

3.5 Homogeneous DistributionA simple but effective method was proposed in [Roh+06], which was the only source for thissubsection. It is a generic mass conserving grid refinement technique, that was also implementedby Schornbaum ([Sch18]) into waLBerla. There is a distribution and merging method proposedfor each type of data transfer based on a volumetric view. With the volumetric description, one isable to achieve a conservation of mass, as it is stated in [Roh+06]. A restriction to this method isthe fact that it can only work between levels with a total difference of one, e.g., L and L+ 1.While transferring data from coarse to fine grid, there is a homogeneous particle redistribution ofthe cells, shown by

fi,fine(xf , t) = fi,coarse(xc, t). (18a)

When transfering data from fine to coarse grid the particles are again homogeneously merged intoa coarse cell

fi,coarse(xc, t) =1

kD

kD∑p=1

fi,p,fine(xf , t). (18b)

For equations (18a) and (18b), xf denotes a fine cell that corresponds to the coarse cell xc. k standsfor the number of finer cells, making up a coarse cell, and D denotes the dimension.This method performs a redistribution of particles by using a volumetric description of trans-port behaviour of lattice Boltzmann particles as stated in [Roh+06]. There is no rescaling of thenon-equilibrium and no second order interpolation needed, therefore resulting in a simple imple-mentation. The particle distributions are viewed as mass distribution that travels between coarseand fine blocks by which the conservation of mass can be achieved.

3.6 Linear InterpolationThis method was introduced first by [Che+06] and states that it conserves mass, momentum, andenergy while transferring the coarse cell values onto the fine grid. It has properties of a linearinterpolation. This technique only applies for the coarse to fine data transfer. [Che+06] was theonly source for this subsection.It is stated in [Che+06] that by using this method it is possible to achieve a first order accuracy of thehydrodynamic solution. In addition, conservation laws can be preserved and [Che+06] states thatlinear interpolation in the volumetric formulation can achieve a second order numerical accuracy.

ffi (xf , t) = f ci (xc, t) + (xf − xc) · fi(xc, t) (19)

The fundamental formula (19) describes that each fine cell ffi (~xf , t) adds a factor ~Fi(~r c, t) to thecorresponding coarse cell f ci (xc, t). f is a cells post collision state, i the direction of a certain particledistribution and xf denotes the center of fine cels while xc denotes the center of coarse cells. Herein~e is the unit vector, and α = x, y, z, also ∆c

x = ∆cy = ∆c

z is valid because each single grid on eachlevel is still uniform. ~Fi(xc, t) is used to keep the conservation of mass, momentum and energy andit can be described by

Fiα(xc, t) =f ci (xc + ~eα∆c

α, t)− f ci (xc − ~eα∆cα, t)

2∆cα

. (20)

Equation (20) shows that for each direction the gradient gets calculated, resulting in ∇N ci . There-

fore, in a 3D case the gradient on all three axis gets calculated, resulting in the need for 3 coarsepoints in x, y and z. Meaning that for each fine cell the information of the corresponding coarsecell and the information of the surrounding coarse cells are required.

16

Combining (20) and (19), produces wrong results for a non-uniform spatial distribution of f ci , seecite [Che+06]. Therefore an extension is needed.

Ψiα(~rc, t) = Fiα(~rc, t) (21)

Ψiα(~rc, t) = 0 (22)

Equation (21) applies if ~rc+∆cx~eα and ~rc−∆c

x~eα are the centers of coarse cells, otherwise, equation(22) applies. Utilizing this rule it is possible to recalculte ~F as in

~Fnewi = ~Ψi −~ci(~ci ∗ ~Ψi)

c2i. (23)

This results in a mass conserving equation:

Nfi (~r f , t) = N c

i (~r c, t) + (~rf − ~r c) ∗ ~Fnewi (~r c, t). (24)

This volumetric approach is formulated in a way that the conservation laws are respected betweengrids of different levels. It has already been implemented in the software PowerFLOW, [Che+06],and in the framework waLBerla.

3.7 Compact InterpolationCompact Interpolation was published by [QKR19], is also based on the volumetric view and canonly function between two grids where the difference between levels is one. This technique onlyapplies for the coarse to fine data transfer similar to the linear interpolation scheme proposed be-forehand. For this subsection [QKR19] and [GGK09] were the only sources.All those conditions are met as they were already described in section 3.1. Geier, Greiner andKorvink ([GGK09]) were the first, who introduced compact interpolation in 2D. They used a min-imum of four source elements for a higher locality. In [QKR19] the idea of compact interpolationwas translated into 3D and expressed with the Lattice Boltzmann flow solver "Musubi". Geier,Schönherr, Stiebler and Krafczyk also puplished a similar algorithm for 3D [Gei+].The idea of compact interpolation is to interpolate fneq and pressure linearly and to interpolatevelocity quadratically by using a compact stencil. It is stated in [QKR19] that for second orderconvergence there are indications for the need of a quadratic or cubic interpolation, as well as indi-cations in which second order convergence can be met via linear interpolations with a subgrid scalemodel.The basic algorithm is described by [QKR19] as follows:

1. Calculate velocity, pressure, fneq and strain rate for each ghost cell from their correspondingsource elements.

2. Interpolate pressure and fneq linearly and interpolate velocity quadratically.

3. Calculate feq from velocity and pressure.

4. Rescale fneq by ωcfneqc = ωffneqf .

5. Assign the particle distribution function of the target ghost element by f = feq + fneq.

For the linear interpolation in 3D eight coarse points forming a cube are required. Within thiscoarse cube are eight fine ghost layer cells exist. The velocity is being approximated via a secondorder spatial polynomial as in

u(x, y, z) =

a0 + axx+ ayy + azz + axxx2 + ayyy

2 + azzz2 + axyxy + ayzyz + axzxz

b0 + bxx+ byy + bzz + bxxx2 + byyy

2 + bzzz2 + bxyxy + byzyz + bxzxzc0 + cxx+ cyy + czz + cxxx

2 + cyyy2 + czzz2 + cxyxy + cyzyz + cxzxz

. (25)

Due to the approximation via a second order spatial polynomial, there is a system of 30 coefficients,which has to be solved in order to calculate the velocity. This results in 30 equations for 30 unknown

17

coefficients. Each dimension of u has ten unknown coefficients which can by solved by a localinterpolation. Thus a finite difference scheme is a very intuitive solution for solving the coefficients.However, this requires a large amount points for the interpolation of the 30 coefficients. For thisreason, it was proposed in [QKR19] to use compact interpolation, which calculates 12 coefficientsfrom a set of velocities and 18 coefficients via a finite difference scheme for strain-rates. Furthermore,this enables the calculation of all 30 coefficients with only 4 source elements.To solve the first set of velocities a set of 4 source elements is chosen with the local coordinatesof H(0, 0, 0), K(1, 1, 0), M(1, 0, 1) and N(0, 1, 1), resulting in 4 source elements with 3 velocitycomponents. These equations make it possible to solve 12 out of the 30 coefficients.

u(0, 0, 0) =

a0

b0c0

(26a)

u(1, 1, 0) =

a0 + ax + ay + axx + ayy + axyb0 + bx + by + bxx + byy + bxyc0 + cx + cy + cxx + cyy + cxy

(26b)

u(1, 0, 1) =

a0 + ax + az + axx + azz + axzb0 + bx + bz + bxx + bzz + bxzc0 + cx + cz + cxx + czz + cxz

(26c)

u(0, 1, 1) =

a0 + ay + az + ayy + azz + ayzb0 + by + bz + byy + bzz + byzc0 + cy + cz + cyy + czz + cyz

(26d)

To solve the second set of coefficients, a finite difference scheme (equations (27)) is used for eachstrain-rate component Sαβ .

2∂Φ

∂x= −Φ(0, 0, 0) + Φ(1, 1, 0) + Φ(1, 0, 1)− Φ(1, 0, 1) (27a)

2∂Φ

∂y= −Φ(0, 0, 0) + Φ(1, 1, 0)− Φ(1, 0, 1) + Φ(1, 0, 1) (27b)

2∂Φ

∂z= −Φ(0, 0, 0)− Φ(1, 1, 0) + Φ(1, 0, 1) + Φ(1, 0, 1) (27c)

The following 18 equations can be used to calculate the missing coefficients.

2∂Sxx∂x

= 4axx = −Sxx(H) + Sxx(K) + Sxx(M)− Sxx(N) (28a)

2∂Sxx∂y

= 2axy = −Sxx(H) + Sxx(K)− Sxx(M) + Sxx(N) (28b)

2∂Sxx∂z

= 2azx = −Sxx(H)− Sxx(K) + Sxx(M) + Sxx(N) (28c)

2∂Syy∂z

= 2bxy = −Syy(H) + Syy(K) + Syy(M)− Syy(N) (28d)

2∂Syy∂z

= 4byy = −Syy(H) + Syy(K)− Syy(M) + Syy(N) (28e)

2∂Syy∂z

= 2byz = −Syy(H)− Syy(K) + Syy(M) + Syy(N) (28f)

2∂Szz∂x

= 2czx = −Szz(H) + Szz(K) + Szz(M)− Szz(N) (28g)

2∂Szz∂y

= 2cyz = −Szz(H) + Szz(K)− Szz(M) + Szz(N) (28h)

18

2∂Szz∂z

= 4czz = −Szz(H)− Szz(K) + Szz(M) + Szz(N) (28i)

2∂Sxy∂x

= axy + 2bxx = −Sxy(H) + Sxy(K) + Sxy(M)− Sxy(N) (28j)

2∂Sxy∂y

= 2ayy + bxy = −Sxy(H) + Sxy(K)− Sxy(M) + Sxy(N) (28k)

2∂Sxy∂z

= ayz + bzx = −Sxy(H)− Sxy(K) + Sxy(M) + Sxy(N) (28l)

2∂Syz∂x

= bzx + cxy = −Syz(H) + Syz(K) + Syz(M)− Syz(N) (28m)

2∂Syz∂y

= byz + 2cyy = −Syz(H) + Syz(K)− Syz(M) + Syz(N) (28n)

2∂Syz∂z

= 2bzz + cyz = −Syz(H)− Syz(K) + Syz(M) + Syz(N) (28o)

2∂Sxz∂x

= azx + 2cxx = −Sxz(H) + Sxz(K) + Sxz(M)− Sxz(N) (28p)

2∂Sxz∂y

= ayz + cxy = −Sxz(H)− Sxz(K) + Sxz(M) + Sxz(N) (28q)

2∂Sxz∂z

= 2azz + czx = −Sxz(H)− Sxz(K) + Sxz(M) + Sxz(N) (28r)

All those equations represent a linear system. At first the coefficients in the strain-rate equations(equations (28)) should be solved. Afterwards the coefficients in the velocity equations (26) can besolved.In summary [QKR19] presents a method, which requires multiple interpolations, therefore thetrilinear interpolation requires a minimum of eight sorrounding points, whereas for the quadraticinterpolation only four are needed.

19

4 Implementation in waLBerlaFor the implementation of test cases regarding the study of grid refinement techniques the frame-work waLBerla (widely applicable Lattice Boltzmann from Erlangen) was chosen. waLBerla isa framework for solving massively parallel simulations and multi-physics applications, as stated in[Wal]. The areas of applications for waLBerla are first and foremost Lattice Boltzmann solversfor hydrodynamic problems, but also multigrid and rigid body dynamics.It is possible to perform scaling simulations with waLBerla ranging from laptops to current su-percomputers. waLBerla already operates on three tier-1 High-Performance-Computing (HPC)systems: JUQUEEN in Juelich, SuperMUC in Munich, and Hazel Hen in Stuttgart. waLBerla isimplemented in C++ and can be compiled by the GCC, Visual Studio, Clang, or the Intel C++Compiler on Windows as well as Linux. A key feature is the block decomposition of the domain,which enables parallel computation. Complex geometry can be divided into blocks, which is veryuseful for data parallel algorithms such as CG, multigrid, or phasefield models.Simulating fluids using the LBM on waLBerla via making use of block parallel domain decompo-sition and grid refinement is crucial for this thesis. In waLBerla all the common collision modelslike single-relaxation-time (SRT), two-relaxation-time (TRT) and multiple-relaxation-time (MRT)can be chosen for simulations. It is coupled to the rigid body physics engine (pe) if needed.

The grid refinement scheme mentioned earlier is already part of waLBerla. Furthermore, thegrid refinement techniques previously mentioned in section 3.5 and 3.6 have been implemented.In preparation for this thesis compact interpolation was newly implemented into waLBerla. Al-though is not included in the current distribution.To create a simulation in waLBerla the domain has to be declared first. In addition, the boundaryconditions have to be assigned. The different blocks can be created via the blockforest module. Aplain grid with corresponding ghost layers is being created for each block. After the declaration ofthe domain it is possible to initialize all cell values.Subsequently after specifying the domain on a grid, the timeloop is created. The timeloop definesthe timestep algorithm. It is possible to add certain steps to the timestep, i.e. saving the wholedomain after a timestep, including a grid refinement technique, or adding a time logger. Afterwardtimestep can be executed. There is the possibility to run the whole simulation in one go or performone time step nested in a for loop for example.To actively refine a grid in waLBerla the refinement areas in the domain can be marked andautomatically get refined.In the source module the files that are responsible for refinement, can be found in src/lbm/refinement.When implementing grid refinement techniques Timestep.h, PdfFieldPackInfo.h and TimeStepPdf-PackInfo.h play an important role. The timestep algorithm for grid refinement is implementedin Timestep.h. There is an implementation for either an asynchronous timestep model or a basicone. PdfFieldPackInfo.h contains code for communication methods. When implementing a gridrefinement technique with a different initial distribution method this file need to be modified.

20

4.1 Existing TechniquesImplementation of Homogeneous Distribution

In section 3.5 a refinement technique was stated, which was proposed by [Roh+06]. The implemen-tation of this technique was done in the course of [Sch18]. It was already validated in waLBerlaand is an official part of the framework. The distribution of coarse to fine values as well as themerging from fine to coarse values is executed as stated in equation equation (18a) and (18b).For those two methods the implementation can be found in PdfFieldPackInfo.h.communicateLocalCoarseToFine() and unpackDataCoarseToFine() distribute the coarse pdf val-ues into their corresponding fine cells. First the coarse block (sender) and a neighboring fine block(reciever) have to be determined. In communicateLocalCoarseToFine() the arguments are thesending block, the recieving block and the direction where both blocks correlate. Then cell in-tervals of the cell area where the data has to be sent and the area where it has to be recievedare constructed. Subsequently communication and distribution of the coarse values is processed.packDataCoarseToFineImpl() and unpackDataCoarseToFine() are similar methods.For fine to coarse communication almost identical methods exist. The same concept mentionedabove is shared by communicateLocalFineToCoarse(), packDataFineToCoarseImpl(),and unpackDataFineToCoarse().

Implementation of Linear Interpolation

For the approach introduced by [Che+06], there exists the file LinearExplosion.h in which the wholetechnique is implemented. Linear explosion is integrated in the timestep module Timestep.h. Bycalling performLinearExplosion() in the timestep module before the simulation starts a booleanis set to true to execute linear explosion. The integration of grid refinement techniques can be foundin section 3.2. Since this method is only designed for coarse to fine communication, the techniqueof [Roh+06] is used for fine to coarse communication.In PdfFieldPackInfo.h the communication methods are slightly optimized.Neither communicateLocalCoarseToFine() nor unpackDataCoarseToFine() write their values inevery cell as they write their coarse value into one of the corresponding fine cells. This means, thatfor example in 2D a coarse value is sent from the coarse block to the neighboring fine block butthe coarse value is only copied into one of the four fine cells. Since the values are being changedduring the technique, they do not have to be distributed in all cells. Just the information ofthe corresponding coarse cell is needed. This results in an optimized performance as less writingoperations are executed.LinearExplosion.h consists of three methods: LinearExplosion(), fillTemporaryCoarseField()and linearInterpolation(). LinearExplosion() is the main method of the file. After beingcalled in Timestep.h a pointer to the current finer block (Block *block) is delivered via the callingparameters, which already possesses the coarser values in the ghost layers. Figure 15 is a visualrepresentation of LinearExplosion().

21

Figure 15: Flow diagram of LinearExplosion(). Rectangular boxes represent statements. Trape-zoids are for each loops and diamond-shaped boxes are if conditions. The round boxes declarethe inside of a loop, where the iterating element can be found in the boxes top left corner.

First, a coarse interval is defined with the corresponding size of the fine grid, which is saved in thecurrent block.An example of this is a fine block with an inner grid sizing from [0, 3] in x-, y- and z-direction.Afterward the corresponding coarse is created with the interval size of [0, 1] in x-,y- and z-direction,due to the fact that two fine cells equal one coarse cell on each axis. Subsequently, the coarseinterval is expanded in each dimension and each direction with two cell layers. A fine interval,including the size of ghost layers, is also being created. The coarse and the fine interval do nowhave the same corresponding size and cover an equal area of the current block.Thereafter each neighbor block of the current block is looked at. If it is coarser than the currentone, the coarse and fine intervals are being copied and modified. Giving them corresponding sizeand covering the corresponding space in relation to the direction of the coarser block. Afterward,

22

the intervals get pushed into a coarse and a fine C++ vector to be saved for later use. If thesevectors are empty, the method simply returns.A coarse buffer field is created, that possesses the size of the coarse interval and is now equal in gridsize to the current block. Furthermore, a coarse boolean field is created, which almost has the samesize as the coarse field. It just has one additional layer added in each axis and direction to markthe borders of the grid. Firstly, the temporary coarse field is being filled with the coarse values ofthe current fine grid. This task is being performed by fillTemporaryCoarseField(). For furtheroptimization, it is possible to run those lines with open multi-processing (OpenMP). With that,each coarse cell in the coarse boolean field is set to false. Afterward, all coarse intervals are runthrough by a for loop, and the cells that lay inside the domain (all cells in the intervals should laywithin the domain) are set to true. The boolean field now contains the area where the distributionof the coarse values takes place.Following the first lines of code, there is now a coarse boolean grid available to determine the cellswhere the refinement technique needs to be performed. There is a temporary coarse grid, containingthe values, that have to be distributed.

23

Figure 16: Flow diagram of LinearInterpolation()

The method linearInterpolation() performs the core task of linearExplosion(). Here theinterpolation takes place. For loops, that can be parallelized, by OpenMP, go through all fineintervals. Within the for loops linearInterpolation() is called. First the function checks if theactual fine cell lies inside the domain, subsequently it is checked whether all the other fine cells whocorrespond to the coarse cell are part of the domain. Afterwards, the corresponding coordinates ofthe coarse cell are calculated, followed by a calculation for all the coarse neighbours needed for theinterpolation. This results in two arrays Cell min[3] and Cell max[3] where all the coarse neighborcell indices are defined. The datatype Cell can hold exactly three coordinates to determine a cellposition on the grid. Only the neighbors in a direct axis direction are being calculated. This meansonly the cells below, and above the current cell in x, y and z-direction are stored in the arrays.First, in the interpolation, the coarse pdf value is saved in v , and a vector for the gradient withthe size of three is being created. Afterwards the gradient in each direction is calculated, if theboolean field is set to true for the smaller and larger neighbors. Subsequently, the gradient is beingnormalized and distributed on v , which is saved in the corresponding fine cell. As a cause of the

24

spatial distribution, all functions are being multiplied with the distance from the coarse to the finecell, either being +0.25 or −0.25.Via ifdef it is also possible to perform an extrapolation on each axis, that was implemented bySchornbaum and is not mentioned in [Che+06]. This modification is not active in the currentdistribution of waLBerla, but will still be evaluated later on. The extrapolation is performed ifthere is only one neighbor cell instead of two neighbor cells in an axis direction. In that case thecurrent coarse value and the other coarse cell perform an interpolation for half of the fine cells andan extrapolation for the other half.It should be mentioned that in both methods if there are no inter- or extrapolation neighbors, thecurrent coarse pdf value is assigned unchanged to the corresponding fine cells.

4.2 Newly Implemented TechniquesCompact interpolation, as shown in [QKR19], was newly implemented in this thesis for testing andevaluation. It is similar to linear explosion in terms of being a grid refinement technique that is onlydesigned for coarse to fine communication. This approach was also integrated into waLBerla thesame way as linear explosion. Again it is possible to set a boolean value in the timestep module,before the simulation starts, deciding whether compact interpolation is executed or not.This method is split into two files: CompactInterpolation.h and compactInterpolater.h. CompactInter-polation.h implements the main bulk of the functionality. With compactInterpolater.h, on the otherhand, everything related to interpolating values is implemented. The current implementation isonly designed for three-dimensional cases. Additionally, it should be mentioned that the constancyof the relaxation rate ω is presumed.CompactInterpolation() will be executed, if this grid refinement technique is chosen and markedas true. Firstly the proper intervals are being selected to decide where the distribution of coarse val-ues should take place. This process was taken from LinearExplosion.h. Afterward, a coarse booleanfield, a fine boolean field, and a coarse buffer field are created. Since macroscopic quantities likevelocity, strainrate, and density need to be calculated as mentioned in section 3.7, it is more efficientto calculate these values only once per timestep and storing them in a field. To skip the recalcu-lation, coarse fields for density, velocity, strain-rate, feq, and fneq are created, resulting in a totalof six coarse buffer fields for values needed for interpolation and two temporary boolean fields todetermine the cells for interpolation. Additionally, the relaxation parameters (ωc, ωf ) of the coarseand the current fine block have to be calculated. With neighboring blocks having a maximal leveldifference of one, all coarse neighbor blocks possess the same relaxation parameter. Afterward, theboolean and temporary fields are being filled. This task is separated into multiple methods. Agraphical representation of this method is presented in figure 17.

25

Figure 17: Flow diagramm of LinearExplosion().

At this point, there is all the needed information available to start the interpolation. All formulasfor the computation regarding the macroscopic quantities were given in section 2.1. If there areat least eight neighbors compactInterpolation() performs compact interpolation and if not noneof the inner cell values are changed (they already contain the corresponding coarse values). Aspreviously stated in section 3.7, fneq and the pressure are linear, and the velocity is quadraticallyinterpolated. This scheme was implemented in compactInterpolation() as described by [QKR19].For testing purposes there were also two other varinats implemented based on the approach givenin section 3.7.The method, which executes the interpolations, first creates coarse and fine vectors for fneq and thedensity. A vector for the fine values of feq is also created. Essential coarse values for the currentinterpolation are being copied from the buffer grids into the vectors. For all the following linearinterpolations eight neighbors are needed and for the quadratic interpolation proposed by [QKR19]only four neighbors are needed. After intializing the vectors performTrilinearInterpolation(),

26

of the module compactInterpolater.h, is called to interpolate the density and fneq. The functionperforms a trilinear interpolation for all eight fine values, as shown in equations (29). The conceptof trilinear interpolation was taken from [Gro18]. Figure 29 is similar to the one proposed in [Gro18].

f00 = (f000 · (1− α)) + f100 · α (29a)

f01 = (f001 · (1− α)) + f101 · α (29b)

f10 = (f010 · (1− α)) + f110 · α (29c)

f11 = (f011 · (1− α)) + f111 · α (29d)

f0 = (f00 · (1− β)) + f10 · β (29e)

f1 = (f01 · (1− β)) + f11 · β (29f)

f = f0 · (1− γ) + f1 · γ (29g)

Figure 18: Visual Example of the trilinear interpolation for equations (29).

Afterward performQuadraticInterpolation() is being executed to perform a quadratic interpo-lation of the velocity. There is already a description of the equation system in section 3.7, whichis being solved to calculate the coefficients for the quadratic interpolation. The fine feq values arebeing computed with the earlier interpolated fine density and velocity. Afterward, fneq is rescaled.Lastly, the corresponding fine cells are assigned by fi = feqi + fneqi .Besides this, there were two additional configurations of the algorithm tested. One of these wasto simply perform trilinear interpolation on the coarse feq values to get the fine feq values. Withthis configuration, a lot of computational costs could be saved, but it turned out that the loss ofaccuracy is very high. For this reason, it will not be mentioned further.The other configuration is to perform a trilinear interpolation of the velocity, instead of a quadraticinterpolation. This requires eight neighbors instead of four, but since eight neighbors are alsorequired for the interpolation of the density and the non-equilibrium, it does not change the cir-cumstances for possible interpolations. This configuration will also be evaluated besides the actualalgorithm mentioned in [QKR19].

27

Figure 19: The orange area can be interpolated after receiving the coarse values.

The lack of coarse cells opposes a problem. As presented in figure 19, coarse cell values (orangedots) are being distributed to the fine cells. The orange lines represent the area where compactinterpolation can take place. The cells located on the border of the area, where the coarse valueshave to be distributed, can not be interpolated since there are not enough coarse neighbors present.This presents a major problem as the closest ghost layer to the inner part of the block cannot beinterpolated. Those cells just get their corresponding coarse value assigned.To tackle this task there is the possibility to execute the method fillOuterBoolField(). The ideahere is to give more cells the ability to be interpolated by adding more coarse cells to the temporaryfield at the border of the interval.Each of the cells marked true, checks if it is on the border of the boolean field. If this condition istrue, then the fine neighbor cells, which make up a single coarse cell, are being transformed into acoarse cell, as in the fine to coarse communication (equation (18a)). The coarse cell is then writteninto the temporary coarse field. Therefore, more cells are available for the interpolation. Only coarsecells that are either inside the block or in direct proximity to the block are being interpolated.This is visualized in figure 20. The coarse cell values (orange dots) are being distributed on thefine cells. The orange/purple lines represent the area where compact interpolation is possible. Thepurple dots are the coarse cell values that were not sent during the coarse to fine communicationand instead were made up of the existing fine cell values. Within the purple areas, only areasmarked as true on the boolean field, are being interpolated. This results in an interpolation of theghost layer cells on the marked interval.Nevertheless, there is the complication that ghost layer values from the equal level communicationare needed. For this reason, compact interpolation was tested at a later point in the timestep wherethe equal level communication was already done. However, not enough values are exchanged inthe equal level communication to create a coarse point. This is illustrated in figure 21. The equallevel communication took place, as described in section 3.3.1. As a consequence of this scheme,there are pdf values missing. Due to this, some coarse values on the ghost layer cannot be created.Located above block one is a coarse block that sent its values to block 1 (orange dots). Equal levelcommunication between block 1 and block 2 already took place (purple pdf values). The purpledots represent the coarse cells which are needed to interpolate some areas of the innermost ghostlayer. The red boxes signal that there are pdf values missing. Thus the creation of the right coarsepoint is not possible.The current implementation contains an option where the method isThereBoarderCell() can beexecuted. It checks if there is a coarse value next to the cells on the border of the interval, so theinnermost ghost layer can be interpolated. As not all coarse points next to the interval can becreated, this method only interpolates some areas of the innermost ghost layer. The area is visuallydepicted in figure 22. If there is a change at a later point in the equal level communication, thismethod can easily be modified to select all neighboring coarse points. However, for now, only some

28

areas of the innermost ghost layer can be interpolated. Therefore the consistency of the innermostghost layer is lacking, making this approach unusable at the moment.

Figure 20: The interpolation area (orange and purple) when creating additional coarse points(purple).

Figure 21: Some coarse values can not be created.

29

Figure 22: The current possible area of interpolation.

Before validating the methods on the test cases, methods like the trilinear interpolation, the search-ing for cells, and others were carefully tested before being evaluated. To calculate the macroscopicquantities, already implemented methods in waLBerla were used except for the calculation ofthe strainrates. Only the area of cells in the ghost layers seen in figure 19 were interpolated whileperforming on the test cases. In figure 19 purple lines present the area that can be interpolated atthe current level of implementation by activating isThereBoarderCell(). The gray dots representthe coarse cells that cannot be created due to lack of information in the ghost layers, which arebeing filled in the equal level communication process.

30

5 Validation of Test CasesThe validation of three testcases is as follows. The test cases created are very simple and basic.That way other errors can be excluded and the appearing errors can be traced back to the refine-ment. All test cases possess a 3D Couette flow profile, but have different refined areas. There isbasically no difference between a 2D and a 3D Couette flow profile.

Figure 23: A Couette flow is a linear flow profile, that develops between one moving and onestationary surface.

A Couette flow describes the flow profile of a viscous fluid between two surfaces, with one of thesurfaces moves with a certain speed. As a result of the speed from the moving plate the viscous dragaffects the fluid, resulting in a linear velocity profile between the two plates. Figure 23 illustratesthis principle. With the current test cases both surfaces are placed in the x-y plane where the uppersurface moves along the x-axis. Describing the velocity the following way:

u =

usurface(z

zmax)

00

. (30)

This physical foundation was used for all testcases, although the domain got refined differentlyfor each testcase. All domains got refined by one level. For the first test case the domain gotrefined horizontally in the direction of the z-axis with z = zmax/2. The second domain was refinedvertically in the direction of the x-axis below x = xmax/2. The vertical and horizontal refinementare being displayed in figure 24 and figure 25. A coarse block contains 4x4x4 cells and a fine blockcontains 8x8x8 cells.

Figure 24: The left picture shows the blocks of the domain and the right picture shows the cellsinside the domain.

31

Figure 25: The left picture shows the blocks of the domain and the right picture shows the cellsinside the domain.

From now on the vertically refined test case is referred to test case V and the horizontally testcaseis referred to test case H. Utilizing ParaView the testcases were evaluated and validated. To aquireexact results the function makeAccuracyEvaluationLinePlot() from the field module was called.This method requires the velocity of each cell from a continous line of cells. The absolute andrelative error can be calculated, since the exact solution of the Couette flow is known. For thehorizontally refined case there will be a line going through the domain, that is parallel to the z-axis,which results in a line going through the coarse as well as the fine grid. For the vertically refinedcase there are three lines parallel to the z-axis. One is placed on the coarser side of the grid, one onthe finer and one directly on the fine cells bordering the coarse block. The three different positionsneed to be evaluated since the grid refinement technique influences the whole domain. For eachplot the position of the line within the domain is visible at the top. For test cases V the evaluationpoints were at (x = 3.5, y = 4.5), (7.5, 8.5) and (13.0, 9.0).The two surfaces are no-slip boundary conditions, with the other four faces having periodic bound-ary conditions.

5.1 Homogeneous DistributionWith the homogeneous distribution of pdf values as a refinement technique both testcases convergedjust before 500 iterations. To be on the safe side the data for the validation was analyzed after 650timesteps. For test case H the accuracy of the velocity was very high, with no significant errorsworth mentioning. As seen in equation (30), the exact velocity of the Couette flow should be zero.In the simulation the velocity in y- and z-direction was within the range between 10−19 and 10−20,which can be traced back to the numerical inaccuracy. Other than that it can be noted that therelative error is the highest where the fluid meets the non moving surface. The greatest absoluteerror occurred at the transition between coarse to fine grid. The velocity in x-direction is shown onthe right side of figure 27.

Figure 26: The velocity in x-direction after 650 timesteps for test cases H and V.

32

Figure 27: The velocity in x-direction after 650 timesteps for test case V on the left and for testcase H on the right.

In test case V the velocity in x-direction fits the profile of the Couette flow on the coarse gridas well as the fine grid. Directly at the border of the fine grid to the coarse grid, the simulatedvelocity in x-direction, which is shaped like a staircase function, obviously differs from the idealresult, see figure 27. This is due to the distribution of the coarse values to the fine cells. Since allcorresponding fine cells are being assigned the same coarse value, there is no linear velocity declinein the fine cells.

Figure 28: The velocity in x-direction after 650 timesteps for test case V on the fine and coarsegrid.

As seen in figure 28 there is no serious deviation of the x-velocity on the fine and coarse grid. Thevelocity in y direction lies within the range of 10−19 and 10−20 on the coarse and the fine grid.A very fundamental problem occurs for the velocity in z-direction. This affects the current and alllater evaluated grid refinement techniques.

33

Figure 29: The velocity in x-direction after one timestep in test case V.

Figure 30: The velocity in z-direction after one timestepin test case V.

The starting velocity for the whole simulation domain is zero. During performing the first timestepevery cell has initial velocity zero. The timestep algorithms first two streaming and collision stepsare performed on the fine grid. It is stated in [Sch18] that only the two innermost ghost layersperform collision steps and are affected by boundary conditions, this implies that the boundarytreatment does not affect all the ghost layers and that there is still some non moving fluid massafter one timesteps on the fine grid. This slows the cells neighboring the coarse grid down, see figure29. As a result of the slowed down border cells, the cell below that cell can stream better into thecoarse cells. This draft subsequently influences the cell below, which now has a small velocity inthe z-direction, see figure 30. Afterwards the velocity in z-direction gets transfered onto the coarsegrid via the coarse to fine communication.The same process happens within the second step as depicted in figure 31 and 32. Here the velocityin x direction is not constant because of the inconsistency caused by the first timestep. Theinconsistency is then transferred onto the coarse grid, see figure 31.The error of an appearing velocity in z-direction happens again at the fourth row of fine cells belowthe surface, see figure 32.

34

Figure 31: The velocity in x-direction after two timesteps in test case V.

Figure 32: The velocity in z-direction after two timesteps in test case V.

The inconsistency of the velocity in x-direction on the y-z plane vanishes with increasing iterations,except for the fine border region. The error of the velocity in z-direction appears to be a generalproblem, which does not directly originate from the homogeneous grid refinement technique, seefigure 33. This error appeared with all different grid refinement techniques tested for the verticallyrefined case.

Figure 33: The velocity in z-direction after 650 timesteps in test case V.

35

5.2 Linear InterpolationWhen testing the linear interpolation, the simulation for both testcases converged at around 500timesteps, with the validation data being taken at 650 timesteps.The results of the horizontally refined testcase were identical to the results with the homogeneousdistribution. As mentioned in section 4.1 linear explosion needs three coarse points in each axisdirection. In x- and y-direction, there are more than enough coarse points for the interpolation. Inthose directions the gradient is calculated, but is zero since all pdf values are the same on the x-yplane.The gradient in z direction should not be zero, as the cells alongside the z-axis possess differentvelocities. But the gradient is never being calculated in this direction, due to the need of threecoarse points on the z-axis. With there only being only four fine ghost layers in z-direction onlytwo coarse points are available. The whole gradient is zero, resulting in the assignment of the plaincoarse value to each corresponding fine cell, which is the same operation as using homogeneousdistribution.

Figure 34: The velocity in x-direction after 650 timesteps in in test case V.

For test case V the absolute error on the coarse and the fine grid is approximately ten times higherthan using homogeneous distribution.This has two causes: the first one being that for the first and the last two fine layers of cells in thex-y plane, linear explosion is not performed. The gradient there is zero, as all the coarse cells havethe same values on the y-direction and the upper coarse cell in z-direction is a boundary cell. In x-direction the gradient cannot be calculated, resulting from a lack of cells in this direction. Betweenthe third- and fourteenth cell layer linear explosion is first performed, due to the surroundingneighbors for the gradient in z-direction. Since the first and last two cell layers do not haveinterpolated ghost layers at the border an error gets introduced due to the inconsistency. Theabsolute error peaks as soon as a gradient can be calculated and the ghost layers get interpolated,see figure 35. In figure 36 a significant error increase in x-velocity can be observed at the pointswhere the absolute error peaks.

36

Figure 35: The absolute error after 650 timesteps in test case V.

Figure 36: The velocity in x-direction after 650 timesteps in test case V. The second image is anenlargement of the plot on the left to better showcase the velocity deviation.

The second reason is the error increase of the velocity in z-direction. With linear interpolation, allvalues are being interpolated, even errors. After one timestep the error of the velocity in z-directionlooks the same as with homogeneous distribution, see figure 30 and figure 37. After the secondtimestep more cells have a velocity in z-direction (figure 38) than with homogeneous distribution(figure 32).This is due to the calculation of the gradient. When a coarse cell without an error performsan interpolation with a cell containing an error, the error is being propagated in the gradientand onto the corresponding fine cells. This means more fine cells in the ghost layer contain avelocity in z-direction which gets later on propagated onto the coarse grid during the coarse to finecommunication. This can be seen in figure 39.Considering the increase in the absolute error on both grids, those errors are being propagated ineach of the grids.

37

Figure 37: The velocity in z-direction after 1 timestep in test case V. The velocity is exactly thesame as in figure 29.


38


Figure 40: The velocity in x-direction after 650 timesteps in test case V. The second image is anenlargement of the plot on the left to better showcase the velocity deviation.

Directly at the border next to the coarse cells, the absolute error is smaller than the absolute errorwhen using homogeneous distribution. The cause for this is the grid refinement technique. Theadvantage of this technique can be seen in figure 40. Here the velocity better fits the exact solution.It turned out that with additional extrapolation, mentioned in section 4.1, it was not possible toachieve a higher accuracy.When executing linear explosion with extrapolation in test case [H] an additional error in the z-

39

and y-direction of the velocity occured. The error is in the range of 10−5 and is appearing at theborder between the fine and coarse grid. After careful backtracing, the reason for the error wasfound. Without extrapolation there are not enough neighboring coarse cells for the interpolationin z-direction, but with it the gradient can be calculated. Afterwards the gradient gets normedas described in section 3.6. In this specific example the gradient was in the norming process alsoredistributed in x and y direction (the gradients of f11 - f18 had non zero elements in either x- ory-direction after being normed), resulting in a velocity change in x- and y-direction.In the vertically refined test case the same error occured as without extrapolation. Additionally avelocity in y-direction appeared. This has the same underlying causes as mentioned above. Withoutextrapolation there would not be enough coarse cells in x-direction, but with extrapolation thosecells are also being extrapolated and the errors are being distributed again.

5.3 Compact InterpolationCompact interpolation is the last evaluated grid refinement technique. First the algorithm describedin section 3.7 will be tested and afterwards the configuration of the algorithm, where the velocity isbeing linearly instead of quadratically interpolated, will be tested. Again all simulations convergedat around 500 timesteps, but the data for the validation was taken at 650 timesteps.

Figure 41: The velocity in x-direction after 650 timesteps in the horizontally refined domain.

The results for the horizontally refined testcase did not meet the expectations, because the velocityin x-direction is too high and the velocity in y and z direction has an deviation from around 10−8.The velocity in y- and z-direction can be explained by looking at the quadratic interpolation. Therethe strainrate tensor for ∂Sxx

∂z is not zero, which is to be expected since the velocities change inz-direction. However, also the entries ∂Syy

∂z and ∂Szz

∂z are not zero. This introduces an error for thevelocity in y- and z-direction.The velocity increase in x-direction originates from a different problem, which has nothing to dowith the interpolation itself, but rather with the area of interpolation.Since the velocity changes linearly in z-direction the velocities of the ghost layer cells should adaptto that. The expected distribution in z-direction in the innermost two ghost layers would be, thatthe second row has a slightly higher velocity and the innermost row has a slightly lower velocitythan the original coarse value. The problem, that arises here is that only the second ghost layerhas the interpolated values with the slightly higher velocities.Before going into depth it should be stated again that two more timesteps are performed on thefine grid than on the coarse. Meaning the first two ghost layers directly influence the fine cells atthe border, see figure 7. The third and fourth ghost layer are needed for the two innermost ghostlayers containing valid values for the fine to coarse communication. This means mainly the twoinnermost ghost layers influence the values of the bordering cells.In the current implementation the first ghost layer row gets the original coarse values assigned,because it is not being interpolated (figure 19 shows the interpolation areas). Now the first row

40

does have a slightly higher velocity than it should and the second row has the interpolated velocity.The problem being that the overall velocity of those two layers is too high. Resulting in a velocityincrease for the fine cells at the border. Figure 43 shows that the velocity peak in x-direction is inthe first layer of fine cells.A visual example of the problem is depicted in figure 42. The ideal distribution is on the left andthe actual case on the right. There it can be noticed that all fine cells on the right side in totalhave a higher velocity then all the cells on the left.

Figure 42: On the left is the ideal distribution of values in the ghost layers. On the right is thecurrent distribution of values.

Figure 43: The velocity in x-direction after 650 timesteps in test case V.

41

Figure 44: The velocity in x-direction after 2 timesteps in test case V.

For the vertically refined testcase the same error of the velocity in y-direction applies. Since thestrainrate tensor is very sensitive to small differences, the interpolation looses in accuracy. Figure44 shows the velocity in x-direction after two timesteps. It is noticeable that, the coarse cell nextto the fine cells has a higher velocity than the other coarse cells. In the process of coarse to finecommunication this error is being propagated onto the fine grid, therefore the fine cells containthis error after the interpolation. This can be observed at the velocity in x-direction in testcaseH, which is slower than the velocity of the exact solution, see figure 45. At first it looks like thesimulation did not converge but even after 2000 timesteps there was no change in the velocity.

Figure 45: The velocity on the border in x-direction after 650 timesteps in test case H.

The proof that the errors are produced by the sensitivity of the strainrate tensor will be presentedin the following. While considering different approaches the thought of performing a trilinearinterpolation, instead of a quadratic interpolation with the strainrate tensor, came up. The ideawas that the trilinear interpolation has less accuracy but is not as sensitive to errors as interpolatingwith the shear tensor.In the case of vertically refinement, the errors of the velocity in y and x direction vanish. Looking atthe velocity in x-direction at the border between the grids it becomes clear that it almost perfectlyfits the exact solution, see figure 46. The only remaining problem is the error of the velocity in x-direction in the horizontally refined testcase. But as previously explained this error more originatesfrom a lack of interpolated cells rather than from the interpolation itself. In y- and z-direction theerror vanishes to 10−17 for the horizontally refined case. The deviation of the velocity in x-directionon the coarse and fine grid can be seen in figure 47.

42

Figure 46: The velocity on the border in x-direction after 650 timesteps in test case V. The secondimage is an enlargement of the plot on the left to better showcase the velocity deviation.

Figure 47: The velocity on the fine and coarse grid in x-direction after 650 timesteps in test caseV.

43

5.4 Large Couette TestA larger domain, which combines the two previous refinement styles was chosen as a final test. Forthe current test the old domain size was doubled and inside the domain is now a cuboid with a finergrid, see figure 48. The large domain has the dimension 32x32x32. For better visibility the currentfigure only displays the blocks not all cells. The refined cuboid ranges from [16,32] in x-direction,[0,32] in y-direction and [8,24] in z-direction.

Figure 48: The blocks of the large domain.

In this testcase the following grid refinement techniques were evaluated: homogeneous distribution,linear explosion (without extrapolation), compact interpolation, and compact interpolation withthe modification where the velocity is only interpolated linearly. Generally it can be said that allprevious observations are confirmed in this testcase. The simulation converged at around 1300timesteps, but the data for the evaluation was taken at 2000 timesteps.For the homogeneous distribution the velocity in x-direction at the vertical border again differedfrom the exact solution in a stepfunction like manner. The velocity in x-direction was almost exactlyin the middle of the refined grid. A small deviation of the velocity in x-direction at the horizontalborders can be noticed in the middle of the cuboid. The velocity in y-direction was in a rangeof 10−19. Linear explosion also showed a difference of the velocity in x-direction at the verticalborder in form of a stepfunction. The difference between the simulated velocity and exact velocitywas smaller than the difference using homogeneous distribution. In the middle of the cuboid thevelocity in x-direction differed from the exact solution, on the horizontal border.

44

Figure 49: Velocity in x-direction of homogeneous distribution, linear explosion, compact interpo-lation.

For compact interpolation the velocity in x-direction is close to the exact solution at the border aswell as inside of the cuboid. An error in x-direction occurs only at the horizontal border, which is dueto the lack of refined cells for the horizontally refined grid as previously described. By interpolatingthe velocity trilinearly the velocity error in the y-direction vanishes from 10−8 to 10−19. Figure49 depicts the x-velocity on the vertical boarder of the domain. The plot on the left shows thex-velocity when using homogeneous distribution: The middle plot shows the x-velocity when usinglinear explosion and the plot on the right shows the x-velocity when using compact interpolation. Alllines went through the vertical fine cells bordering the coarse grid. Linear explosion demonstratingthe closest results the the ideal solution, since the errors at the horizontal boarders being the leastsignificant.

45

Figure 50: The absolute errors in terms of velocity on a line through coarse cells and the finerdomain. (1) is the absolute error of homogeneous distribution, (2) of linear explosion, (3) of compactinterpolation and (4) of compact interpolation with a trilinear interpolation of the velocity.

When simulating in the large domain the absolute error was the smallest inside the cuboid uti-lizing using linear interpolation. Linear explosion achieved the smallest absolute error directly atthe horizonatl borders. Compact interpolation has in terms of velocity in x-direction a smootherapproximation at the vertical border but there are very serious errors at the horizontal borders,increasing the absolute error. This is best illustrated at the peaks of the absolute error, peakinghorizontal borders.

46

Figure 51: The absolute errors in terms of velocity on a line through coarse and fine cells that layon the vertical border of the fine domain. (1) is the absolute error of homogeneous distribution, (2)of linear explosion, (3) of compact interpolation and (4) the error of compact interpolation with atrilinear interpolation of the velocity.

All grid refinement techniques tested for the Couette flow in the large domain had difficulties at theedges of the refined grid. X- and z-velocities have a noticeable error increase, with linear explosionshowcasing the most accurate results of all.A general observation is that the absolute error using homogeneous distribution is very oscillatingat the border. By using linear explosion the absolute error becomes smoother and when usingcompact interpolation the absolute error becomes even smoother.

47

6 ConclusionIn this thesis, three grid refinement techniques for the Lattice Boltzmann method were presented.All are implemented into the same refinement scheme, although they come from a very different the-oretical background. Homogeneous distribution assigns all coarse cell values to their correspondingfiner cells, disregarding their coarse neighbors. Using linear interpolation, the coarse pdf values areinterpolated before assigning them to the finer cells. The concept of compact interpolation relieson interpolating the equilibrium and non-equilibrium with the help of macroscopic quantities, priorto the distribution onto the fine cells.From the standpoint of resource intensity, with increasing interpolations, the computational costsrise. This concludes that homogeneous distribution has the lowest computational costs and compactinterpolation has the highest.The highest accuracy within the small domain is achieved using homogeneous distribution. Ho-mogeneous distribution and linear explosion achieved the same accuracy in the horizontally refineddomain, but in the vertically refined domain homogeneous distribution achieved the highest accu-racy.On the one hand, the most accurate results in the large domain were achieved by linear explosion.Although the velocity in x-direction was less smooth at the vertical border than compact interpo-lation, there were no tremendous errors within the velocity. Compact interpolation, on the otherhand, achieved smoother velocity in the x-direction at the vertical border, but still has unresolvedissues with the implementation.Due to compact interpolation being newly implemented during this thesis, special attention waspaid to solve the problems in the implementation. The major problem of the implementation being,that the innermost row of ghost layers is not interpolated. Multiple attempts were made to increasethe quantity of interpolated cells, but no satisfactory results were achieved.A possible solution for the problem of not having enough data from equal level neighbors wouldbe modifying the amount of pdf data, that is sent to equal-level neighbors if a coarser block is inproximity. Another idea for interpolating the innermost ghost layer is to assign all sent coarse datato their corresponding fine cells. Afterward, new coarse cells are created with a shift of one fine cell.The newly created coarse cells have nothing to do with the neighboring coarse block. With this shiftof one, the innermost ghost layer can be interpolated, but as a result, a lot of other interpolationareas are lost. This concept is depicted in 52. On the left side, the sent coarse values (gray) arenot used for the interpolation. Instead, new coarse values (orange) are being created. Now theinnermost ghost layer can be interpolated, but at the expense of some of the orange coarse valuesnot being created due to lack of neighboring information (right side).

Figure 52: Another concept for interpolating the innermost ghost layer.

48

The current framework, developed by the authors of [QKR19], compact interpolation is performedby interpolating higher mesoscopic moments instead of interpolating macroscopic quantities. Fur-thermore, in [Gei+] it is advised to interpolate moments in statistical notation.This thesis highlights that the choice of the grid refinement technique is a crucial element regardingaccuracy. This was proven by showcasing the advantages and disadvantages of each technique insimply structured test cases. Besides the chosen techniques for this thesis, there are still othertechniques. A good source for grid refinement techniques and their properties being [Gen+17].With real-world applications growing in complexity, grid refinement techniques for the LBM willgain in relevance due to its ability to increase accuracy and performance.

49

References[Che+06] H. Chen, O. Filippova, J. Hoch, K. Molvig, R. Shock, C. Teixeira, and R. Zhang.

“Grid refinement in lattice Boltzmann methods based on volumetric formulation”. In:Physica A: Statistical Mechanics and its Applications 362.1 (Mar. 2006), pp. 158–167.doi: 10.1016/j.physa.2005.09.036.

[GGK09] M. Geier, A. Greiner, and J. G. Korvink. “Bubble functions for the lattice Boltzmannmethod and their application to grid refinement”. In: The European Physical JournalSpecial Topics 171.1 (2009), pp. 173–179. issn: 1951-6401. doi: 10.1140/epjst/e2009-01026-6.

[Gei+] M. Geier, M. Schönherr, M. Stiebler, and M. Krafczyk. Compact second-order accuratemomentum interpolation for the Lattice Boltzmann method in three dimensions. http:/ / www . skalb . de / all _ pdf / ICMMES _ 2010 _ TU - BS _ Geier . pdf, last accessed on2019.09.21.

[Gen+17] Félix Gendre, Denis Ricot, Guillaume Fritz, and Pierre Sagaut. “Grid refinement foraeroacoustics in the lattice Boltzmann method: A directional splitting approach”. In:Physical Review E 96.2 (Aug. 2017). doi: 10.1103/physreve.96.023311.

[Gro18] Dr. Roberto Grosso. Lecture Notes: Applied Visualization. 2018.

[Kr17] Timm Krüger, Halim Kusumaatmaja, Alexandr Kuzmin, Orest Shardt, Goncalo Silva,and Erlend Magnus Viggen. The Lattice Boltzmann Method. Springer International Pub-lishing, 2017. doi: 10.1007/978-3-319-44649-3.

[QKR19] Jiaxing Qi, Harald Klimach, and Sabine Roller. “Implementation of the compact inter-polation within the octree based Lattice Boltzmann solver Musubi”. In: Computers &Mathematics with Applications (2019). doi: 10.1016/j.camwa.2016.06.025.

[Roh+06] M. Rohde, D. Kandhai, J. J. Derksen, and H. E. A. van den Akker. “A generic, massconservative local grid refinement technique for lattice-Boltzmann schemes”. In: In-ternational Journal for Numerical Methods in Fluids 51.4 (2006), pp. 439–468. doi:10.1002/fld.1140. url: https://doi.org/10.1002/fld.1140.

[Sch18] Florian Schornbaum. “Block-Structured Adaptive Mesh Refinement for Simulations onExtreme-Scale Supercomputers”. Doctoralthesis. Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 2018.

[Wal] waLBerla. url: https://www.walberla.net/.

50

https://doi.org/10.1016/j.physa.2005.09.036

https://doi.org/10.1140/epjst/e2009-01026-6

https://doi.org/10.1140/epjst/e2009-01026-6

http://www.skalb.de/all_pdf/ICMMES_2010_TU-BS_Geier.pdf

http://www.skalb.de/all_pdf/ICMMES_2010_TU-BS_Geier.pdf

https://doi.org/10.1103/physreve.96.023311

https://doi.org/10.1007/978-3-319-44649-3

https://doi.org/10.1016/j.camwa.2016.06.025

https://doi.org/10.1002/fld.1140

https://doi.org/10.1002/fld.1140

https://www.walberla.net/

Documents

Numerical Study of Grid Refinement Techniques for the ... · 2 TheLatticeBoltzmannMethod ThisthesisshowcasesandcomparesgridreﬁnementtechniquesfortheLatticeBoltzmannmethod (LBM…