19
J Supercomput (2012) 61:84–102 DOI 10.1007/s11227-011-0652-y Parallel computing of 3D smoking simulation based on OpenCL heterogeneous platform Zhiyong Yuan · Weixin Si · Xiangyun Liao · Zhaoliang Duan · Yihua Ding · Jianhui Zhao Published online: 13 July 2011 © Springer Science+Business Media, LLC 2011 Abstract Open Computing Language (OpenCL) is an open royalty-free standard for general purpose parallel programming across Central Processing Units (CPUs), Graphic Processing Units (GPUs) and other processors. This paper introduces OpenCL to implement real-time smoking simulation in a virtual surgery training simulation system. Firstly, the Computational Fluid Dynamics (CFD) is adopted to construct the real-time smoking simulation model based on the Navier–Stokes (N-S) equations of an incompressible fluid under the condition of normal temperature and pressure. Then we propose a parallel computing technique based on OpenCL to ac- complish the parallel computing of smoking simulation model on CPU and GPU, respectively. Finally, we render the smoke in real time by using a three-dimensional (3D) texture volume rendering method. Experimental results show that the parallel computing technique we have proposed achieve a satisfactory effect on image qual- ity and rendering rate both on CPU and GPU. Keywords Smoking simulation · CFD · OpenCL parallel computing model · 3D texture volume rendering · Real time physical model 1 Introduction The increased computational performance in science and engineering has led to the strong need for arithmetic intensive parallel computing [21]. Today’s computer sys- tems often include highly parallel CPUs, GPUs and other types of processors, it is im- portant to take full advantage of these heterogeneous processing platforms. OpenCL is an open royalty-free standard for general purpose parallel programming across CPUs, GPUs and other processors, enabling software developers fully use the power Z. Yuan ( ) · W. Si · X. Liao · Z. Duan · Y. Ding · J. Zhao School of Computer, Wuhan University, Wuhan 430072, China e-mail: [email protected]

Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

J Supercomput (2012) 61:84–102DOI 10.1007/s11227-011-0652-y

Parallel computing of 3D smoking simulation basedon OpenCL heterogeneous platform

Zhiyong Yuan · Weixin Si · Xiangyun Liao ·Zhaoliang Duan · Yihua Ding · Jianhui Zhao

Published online: 13 July 2011© Springer Science+Business Media, LLC 2011

Abstract Open Computing Language (OpenCL) is an open royalty-free standardfor general purpose parallel programming across Central Processing Units (CPUs),Graphic Processing Units (GPUs) and other processors. This paper introducesOpenCL to implement real-time smoking simulation in a virtual surgery trainingsimulation system. Firstly, the Computational Fluid Dynamics (CFD) is adopted toconstruct the real-time smoking simulation model based on the Navier–Stokes (N-S)equations of an incompressible fluid under the condition of normal temperature andpressure. Then we propose a parallel computing technique based on OpenCL to ac-complish the parallel computing of smoking simulation model on CPU and GPU,respectively. Finally, we render the smoke in real time by using a three-dimensional(3D) texture volume rendering method. Experimental results show that the parallelcomputing technique we have proposed achieve a satisfactory effect on image qual-ity and rendering rate both on CPU and GPU.

Keywords Smoking simulation · CFD · OpenCL parallel computing model · 3Dtexture volume rendering · Real time physical model

1 Introduction

The increased computational performance in science and engineering has led to thestrong need for arithmetic intensive parallel computing [21]. Today’s computer sys-tems often include highly parallel CPUs, GPUs and other types of processors, it is im-portant to take full advantage of these heterogeneous processing platforms. OpenCLis an open royalty-free standard for general purpose parallel programming acrossCPUs, GPUs and other processors, enabling software developers fully use the power

Z. Yuan (�) · W. Si · X. Liao · Z. Duan · Y. Ding · J. ZhaoSchool of Computer, Wuhan University, Wuhan 430072, Chinae-mail: [email protected]

Page 2: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

Parallel computing of 3D smoking simulation based on OpenCL 85

of these heterogeneous processing platforms [2, 14, 23]. As the architectures dif-ferences between hardware platforms, OpenCL defines core functionality that all de-vices support, as well as optional functionality for high-function devices, which guar-antee portability and correctness on different architectures and fully takes advantagesof capability of different hardware platform [16].

It is a challenging research subject to stimulate the irregular natural phenomenain a physics-based simulation. Among numerous natural phenomena, the simulationof smoke is a difficult problem and a hot spot because of its irregular appearance,matte surface and impressionable motion state. Foster and Metaxas et al. achievedsignificant breakthrough in hydrodynamics-based gas simulation in 1997 [9, 10].They adopted explicit integration to generate realistic quality of swirling steam, butwith low computational performance. In 1999, Stan introduced Lagrangian methodto solve advection of smoke, which is stable and allows developers to take large timesteps [20]. In 2001, Fedkiw et al. proposed Monotonic Cubic Interpolation basedon Stam’s method to maintain fluid’s swirl [8]. In 2005, Song et al. adopted Con-strained Interpolation Profile to improve position precision [19] with increased exe-cution complexity and computing time. In 2008, Schechter et al. proposed improvedsub-grid smoke animation [18]. Kim et al. proposed wavelet turbulence for smokesimulation [12]. Most of the existing smoking simulation methods are based on N-Sequations, following the physics law and achieving a realistic quality of smoke, whilethe calculation of N-S equations is complex and it is hard to meet the requirementsof real time.

As a hot spot in computer simulation, physics-based simulation technique has awide application in computer animation, games, surgery simulation and virtual re-ality. It requires realistic and robust quality of the simulation model, as well as thereal-time quality that users can interact with the virtual environment with a frame rate25 fps at least [22]. Here we adopt the parallel computing technique based on OpenCLto realize real-time smoking simulation, at the same time guaranteeing portability andcorrectness on different architectures.

2 OpenCL parallel computing model

2.1 OpenCL parallel computing model

2.1.1 Platform model

OpenCL platform model consists of a host connected to one or more OpenCL de-vices, as shown in Fig. 1. An OpenCL device is divided into one or more computeunits which are further divided into one or more processing elements. Parallel com-puting is realized within the processing elements. OpenCL device can be differenthardware platform, such as CPU, GPU and DSP.

OpenCL applications run on a host and submit commands from the host to executecomputations on the processing elements within a device [11].

Page 3: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

86 Z. Yuan et al.

Fig. 1 OpenCL platform model

Fig. 2 OpenCL memory model

Table 1 OpenCL memoryregion properties Memory types Read/Write Access

Global Memory R/W All work-items

Local Memory R/W Work-items within a work-group

Private Memory R/W Unique work-item

Constant Memory R All work-items

2.1.2 Memory model

Work-items have access to four different memory regions when executing a ker-nel: global memory, local memory, private memory and constant memory which areshown in Fig. 2 and their properties are shown in Table 1. We can choose differentmemory types to minimize the access time according to particular OpenCL imple-mentations.

Page 4: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

Parallel computing of 3D smoking simulation based on OpenCL 87

Fig. 3 OpenCL kernel execution model

2.1.3 Execution model

An index space is defined when the host submits a kernel for execution. An instanceof the kernel executes for each point in this index space. A work-item is a kernelinstance and it is identified by the instance’s point in the index space. Work-itemexecutes the same code but specific execution pathway through the code and the dataoperated upon can vary per work-item. The kernel execution model is shown in Fig. 3.

Work-items (Gx,Gy) are organized into work-groups (Lx,Ly) which are as-signed a unique work-group ID(wx,wy). Each work-item is identified by its globalID(gx, gy) or by its local ID(lx, ly) within a work-group. The following two equa-tions illustrate the relationship of a work-item’s global ID and local ID:

(gx, gy) = (wx · Lx + lx,wy · Ly + ly), (1)

(lx, ly) = (gx − wx · Lx,gy − wy · Ly) (2)

OpenCL manipulate kernels, memory and programs by creating a context. Oncea context is created, OpenCL programs can be compiled at runtime by passing thesource code to OpenCL compilation functions as arrays of strings. After an OpenCLprogram is compiled, handles can be obtained for kernel functions contained in theprogram. The kernels can then be launched on devices within the OpenCL context.OpenCL host-device memory transfer operations and kernels are executed by queingthem into a command queues associated with the target device [11].

2.1.4 Programming model

The OpenCL execution model supports data parallel and task parallel programmingmodels, as well as supporting hybrids of these two models. The primary model driv-ing the design of OpenCL is data parallel [11].

In the OpenCL data parallel programming model, each instance of kernel is exe-cuted by its corresponding work-item. Work-items within a work-group are executed

Page 5: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

88 Z. Yuan et al.

Fig. 4 OpenCL programming procedure

the same kernel in parallel but specific execution pathway through the code and thedata operated upon. In OpenCL task parallel programming model, a single instanceof a kernel is executed independent of any index space, which is logically equivalentto executing a kernel on a compute unit with a work-group containing a single work-item [11]. The hybrids of these two models provide a flexible solution for parallelprogramming, especially in large data processing and multitasking programming.

OpenCL Programming procedure is divided into the following steps, as shown inFig. 4.

3 OpenCL based parallel computing technique of 3D smoking simulation

3.1 3D smoking simulation model

Considering the smoke is fluid, we adopt computational fluid dynamics to constructsmoking simulation model. In physics, the exact mathematical formula which pre-cisely describes the fluid motion is the N-S equations [3, 20]. The state of smokedepends on its density, temperature and velocity. Suppose that smoke is an incom-pressible fluid which has constant temperature, so the smoking simulation model canbe expressed by incompressible fluid N-S governing equations:

∇ · u = 0, (3)

∂u∂t

= −(u · ∇)u − 1

ρ∇p + ν∇2u + f (4)

On right side of (4), there are advection, pressure gradient, diffusion and externalforce. To calculate divergence-free velocity, we adopt Helmholtz–Hodge decomposi-

Page 6: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

Parallel computing of 3D smoking simulation based on OpenCL 89

Fig. 5 Grids dividing andboundary

tion theory [6] to reduce (4). The reduced equation is

∂u∂t

= P(−(u · ∇)u + ν∇2u + f

)(5)

where P is an operator to calculate the divergence-free part of an vector field.As a scalar quantity, the density will advect and diffuse along the fluid, and the

governing equation of density field is

∂ρ

∂t= −(u · ∇)ρ + k∇2ρ + s (6)

In (4), (5) and (6), u is velocity vector, ν is coefficient of kinematics viscosity,f is external force, ρ is smoke density, k is diffusion coefficient, s is the generalizedsource.

In 3D fluid model, smoke fluid is considered to be stored in a cubic which isdivided into (N + 2) × (N + 2) × (N + 2) grids [1], and we define the velocityand density in the same grid. Then we adopt Finite Difference Method to solve thesmoking simulation model according to (5) and (6). To treat the boundary, the densityand velocity of the outmost layer grids are set to 0. Figure 5 shows dividing of thegrids and boundary treatment.

We define velocity field solver operator Su and density field solver operator Sρ ,indicating a solving of velocity field and density field, respectively. Components op-erators are defined: external force solver operator F, diffusion solver operator D,advection solver operator A, projection solver operator P. The velocity field solveroperator Su and density field solver operator Sρ are defined by

Su = P · A · D · F(u, f), (7)

Sρ = A · D · F(u, s, ρ) (8)

To show the smoke’s motion, we need to render the density field after we work outthe N-S equations. There are two kinds of volume rendering method: direct volumerendering method and indirect volume rendering method. Direct volume renderingmethod uses volume data for synthetic rendering according to optical model directly,while indirect volume rendering method extracts iso-surface from the volume data(such as density field) for surface rendering. To render the smoke in real time, herewe adopt a direct volume rendering method: hardware acceleration based 3D texture

Page 7: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

90 Z. Yuan et al.

rendering, which was proposed by T.J. Cullip et al. and is developing further in recentyears.

As we only need to consider the gray level of the smoke, the 3D texture volumerendering method [4, 5, 13, 17] has the same rendering effects with other volumerendering methods and has the fastest rendering rate. Some components of the 3Dtexture volume rendering model, such as texture mapping and tri-linear interpolation,are realized by hardware. In smoking simulation rendering, we firstly have volumedata preprocessing, slices generation and texture generation. Then the texture mem-ory data are submitted to OpenGL hardware mixing function to synthesize images.

We define rendering calculation operator R, indicating a solving of rendering cal-culation. Components operators are defined: data preprocessing operator L, slicesgeneration operator C, texture generation operator T, R is defined by

R = T · C · L(ρ,L,T) (9)

where L is gray level matrix and T is texture matrix.

3.2 Parallel computing of 3D smoking simulation

OpenCL parallel computing technique of smoking simulation focus on the followingaspects: parallel solving N-S equations, parallel preprocessing volume data, parallelgenerating slice and texture.

Firstly, velocity field and density field in the host machine will be transmitted toOpenCL device. Then the velocity field and the density field are solved in parallel andwe transmit the density field to parallel compute the rendering parts: data preprocess-ing, slices generation and texture generation. Finally, the slices will be transferredback to the host for rendering. Figure 6 illustrates the whole procedure.

3.2.1 Parallel preprocessing

Firstly, velocity field and density field are transferred from host to OpenCL device. Asthe bandwidth between OpenCL device and its memory is higher than that betweenthe device’s memory and the host’s memory [7, 15], it is necessary to minimize thedata transmission between the OpenCL device and host. We adopt three strategies tooptimize the parallel computing program.

(1) Minimize the data transmission between the OpenCL device and the host.(2) If there are loops in the kernel executed on GPU, we use local memory.(3) Use vector data.

The density field or velocity field transferred to the OpenCL device as an array ofdata in the memory buffer, and we define that the density data or velocity data of grid(i, j, k) is stored in IX(i, j, k) of the transferred data stored in memory buffer. Themapping relation is shown in (10) and Fig. 7.

IX(i, j, k) = i + (N + 2) · j + (N + 2)2 · k, (i, j, k) ∈ D0 (10)

where

D0 = {(i, j, k)|1 ≤ i ≤ N,1 ≤ j ≤ N,1 ≤ k ≤ N, i ∈ Z+, j ∈ Z+, k ∈ Z+}

Page 8: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

Parallel computing of 3D smoking simulation based on OpenCL 91

Fig. 6 OpenCL parallel computing technique for smoking simulation model

Fig. 7 Data mapping relation of device memory and 3D simulation model

We define the quantities stored in OpenCL device memory thus: velocity field Vwith element v = (vx, vy, vz), density field ρ with element ρ, pressure field P withelement p, external force field Ff with element f = (fx, fy, fz), generalized sourceSs with element s, gray level matrix L with element L, texture matrix T.

Page 9: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

92 Z. Yuan et al.

We defined parallel computing operator Pa, indicating parallel execute an opera-tion. The parallel computing of velocity field, density field and rendering calculationcan be described by

Pa(Su

) = Pa(P · A · D · F(V,Ff )

) = Pa(P) · Pa

(A

) · Pa(D

) · Pa(F)(V,Ff ), (11)

Pa(Sρ(ρ,Ss ,V)

) = Pa(A · D · F(ρ,Ss ,V)

) = Pa(A

) · Pa(D

) · Pa(F)(ρ,SS,V),

(12)

Pa(R(ρ,L,T)

) = Pa(T · C · L(ρ,L,T)

) = Pa(T

) · Pa(C

) · Pa(L

)(ρ,L,T) (13)

3.2.2 External force

The external force can be local force or global force applied on the whole domain ofsmoke. To achieve the rising effect of smoke, we apply an upward buoyant force toevery grid. The external force equation is

∂u∂t

= f (14)

Discretize the time and the time step �t , the external force equation transforms tothe differential forms, that is:

ui,j,k+ = fi,j,k�t (15)

Subscript i, j, k indicate grid (i, j, k). The parallel computing of external force forvelocity and density can be described by

Pa(F)(V) = Pa(V+ = Ff ) = Pa

(vIX(i,j,k)+ = fIX(i,j,k), (i, j, k) ∈ D0

), (16)

Pa(F)(ρ,Ss) = Pa(ρ+ = Ss) = Pa

(ρIX(i,j,k)+ = sIX(i,j,k), (i, j, k) ∈ D0

)(17)

We adopt global memory to parallel compute external force. After parallelization,the time complexity of parallel computing velocity field and density field both changefrom O(N3) to O(1).

3.2.3 Diffusion

Each grid has a diffusion effect with its 6 adjacent grids and the diffusion equation is:

∂u∂t

= ν∇2u (18)

Discretize the time and the time step �t , the diffusion equation transforms to thedifferential forms, that is:

ut+�ti,j,k = αut

i,j,k + (ut+�ti−1,j,k + ut+�t

i+1,j,k + ut+�ti,j−1,k + ut+�t

i,j+1,k + ut+�ti,j,k−1 + ut+�t

i,j,k+1)

6 + α(19)

Page 10: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

Parallel computing of 3D smoking simulation based on OpenCL 93

where α = h2

ν·�t,h = 1/(N + 2), (19) is Gauss–Seidel (G-S) iterative equation. ut is

the known velocity field at time t,ut+�t is the velocity field at time t + �t . Beforeiteration, ut+�t is arbitrary. We can obtain ut+�t on the left side of (19) by G-Siteration. Here we set iterations to 20 to solve ut+�t .

The main part of the diffusion equation’s solution is G-S iterations, as well asthe main computation to solve smoking simulation model. After its parallelization,we can achieve considerable computation reductions. During the iteration, kernel ac-cesses the same data for 20 times. So we consider solving the diffusion by using localmemory.

To logically indicate the data stored in OpenCL device’s local memory, we definean m × n (m,n ∈ Z+) logical memory matrix:

S =

⎜⎜⎜⎝

S0,0 S0,1 · · · S0,n−1S1,0 S1,1 · · · S1,n−1

......

...

Sm−1,0 Sm−1,0 · · · Sm−1,n−1

⎟⎟⎟⎠

,

Si,j =

⎜⎜⎜⎝

s0,0 s0,1 · · · s0,q−1s1,0 s1,1 · · · s1,q−1...

......

sp−1,0 sp−1,1 · · · sp−1,q−1

⎟⎟⎟⎠

i,j

, (i, j) ∈ D1

where

D1 = {(i, j)|0 ≤ i ≤ m − 1,0 ≤ j ≤ n − 1, i ∈ Z+, j ∈ Z+}

S1,1,S1,2, . . . ,Sm,n are p × q (p,q ∈ Z+)sub-matrix and the dimensionality ofmatrix S is (m × p) × (n × q). Every element of sub-matrix Si,j is expressed by(sk,l)i,j which is corresponding to a grid in 3D smoking simulation model. The map-ping relation is shown in (20):

(k, l)i,j → (k + 8j, l, i), (i, j) ∈ D1, (k, l) ∈ D2 (20)

where

D2 = {(t, l)|0 ≤ t ≤ p − 1,0 ≤ l ≤ q − 1, t ∈ Z+, l ∈ Z+}

St ,Sti,j , (sk,l)

ti,j are, respectively, matrix S, sub-matrix Si,j and its element (sk,l)i,j

at time t .We defined velocity field matrix V and density field matrix ρ as matrix type as S.

We define work-group containing 256 work-items. A layer grids consists of 32 × 32grids, which are allocated to 4 work-group. So there are totally 32 × 4 work-groups.Each work-group contains 8 × 32 work-items. The mapping relation is shown inFig. 8.

Page 11: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

94 Z. Yuan et al.

Fig. 8 Mapping relation of Logical memory and 3D smoking simulation model

Parallel computing of velocity diffusion and density diffusion can be described by

Pa(D

)(V) = Pa

⎜⎜⎝

(vk,l)t+�ti,j = 1

6 + α

(α(vk,l)

ti,j + ((vk−1,l)

t+�ti,j + (vk+1,l)

t+�ti,j

+ (vk,l−1)t+�ti,j + (vk,l+1)

t+�ti,j + (vk,l)

t+�ti−1,j

+ (vk,l)t+�ti+1,j )

), (i, j) ∈ D1, (k, l) ∈ D2

⎟⎟⎠

(21)

Pa(D

)(ρ) = Pa

⎜⎜⎜⎝

(ρk,l)t+�ti,j = 1

6 + β

(β(ρk,l)

ti,j + ((ρk−1,l)

t+�ti,j + (ρk+1,l)

t+�ti,j

+ (ρk,l−1)t+�ti,j + (ρk,l+1)

t+�ti,j + (ρk,l)

t+�ti−1,j

+ (ρk,l)t+�ti+1,j )

), (i, j) ∈ D1, (k, l) ∈ D2

⎟⎟⎟⎠

(22)

where α = h2

ν·�t,β = h2

k·�t. In (21), data within sub-matrix Vi,j ,Vi−1,j and Vi+1,j

are transferred to the local memory of a work-group. When dealing with the bound-ary, all needed data are transferred to the local memory of the same work-group.When parallel computing density field, it is the same as the velocity field. After par-allelization, the time complexity of parallel computing velocity diffusion and densitydiffusion both change from O(d · N3) to O(d), where d = 20.

3.2.4 Advection

Fluid’s velocity makes the mass, density and other quantity along its direction. Wemodel density as a set of particles. To calculate the velocity field and the density field,we just need to track particles’ locus when they pass through the smoking simulationmodel. For each time step, we place the particle in the center of its nearest grid, andwe can calculate the particle’s velocity and density by using the six adjacent particles

Page 12: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

Parallel computing of 3D smoking simulation based on OpenCL 95

according to its velocity and density at its position a time step before. For velocityfield, the advection equation is

∂u∂t

= −(u · ∇)u (23)

For large time step, it is unstable to solve the advection equation by using FiniteDifference Method. Here we adopt the following equation to solve advection:

u2(x) = u1(p(x,−�t)

)(24)

Equation (24) indicates a particle’s velocity at x is set to the velocity that theparticle, now at x, had at its previous location a time �t ago. For density advectionsolution, it is the same as the velocity advection solution. The parallel computing ofvelocity field and density field can be described by

Pa(A

)(V) = Pa

(vt

IX(i,j,k)(x) = vt−�tIX(i,j,k)

(p(x,−�t)

), (i, j, k) ∈ D0

), (25)

Pa(A

)(ρ) = Pa

(ρt

IX(i,j,k)(x) = ρt−�tIX(i,j,k)

(p(x,−�t)

), (i, j, k) ∈ D0

)(26)

After parallelization, the time complexity of parallel computing velocity advectionand density advection both change from O(N3) to O(1).

3.2.5 Projection

To achieve divergence-free velocity ∇ · u = 0, here we adopt Helmholtz–Hodge de-composition theory which points out any vector field w can be broken down into theonly the summation of divergence-free vector field and the gradient of scalar field.That is:

w = u + ∇p (27)

where u is divergence-free field and ∇ · u = 0,p is scalar field. When we applygradient operator ∇ on both sides if (27), as ∇ · u = 0, we have

∇w = ∇2p (28)

Equation (28) is called Poisson equation. In 3D space, for any velocity field w =(u, v,w), we discretize Poisson equation and we can derive equation (29). Here weadopt G-S iteration to compute

pi,j,k = div(i, j, k) + pi+1,j,k + pi−1,j,k + pi,j+1,k + pi,j−1,k + pi,j,k+1 + pi,j,k−1

6(29)

where

div(i, j, k) = −h(ui+1,j,k − ui−1,j,k) + (vi,j+1,k − vi,j−1,k) + (wi,j,k+1 + wi,j,k−1)

2(30)

Page 13: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

96 Z. Yuan et al.

The parallel computing of projection is divided into two parts, firstly we computediscretized Poisson equation in parallel.

Pa

⎜⎜⎝

(pk,l)i,j = 1

6

(div(k + 8j, l, i) + ((pk−1,l)i,j + (pk+1,l)i,j + (pk,l−1)i,j

+ (pk,l+1)i,j + (pk,l)i−1,j + (pk,l)i+1,j )),

(i, j) ∈ D1, (k, l) ∈ D2

⎟⎟⎠

(31)After parallelization, the time complexity of parallel computing discretized Pois-

son equation changes from O(d · N3) to O(d), where d = 20.For every div(i, j, k), it can be computed parallel by the following equation and

its time complexity changes from O(N3) to O(1):

Pa(

div(i, j, k) = −h�vx(i, j, k) + �vy(i, j, k) + �vz(i, j, k)

2, (i, j, k) ∈ D0

)

(32)where

�vx(i, j, k) = vIX(i+1,j,k)(vx) − vIX(i−1,j,k)(vx), (33)

�vy(i, j, k) = vIX(i,j+1,k)(vy) − vIX(i,j−1,k)(vy), (34)

�vz(i, j, k) = vIX(i,j,k+1)(vz) − vIX(i,j,k−1)(vz) (35)

Then we parallel compute the divergence-free velocity field according to equation(24), as shown in (36).

Pa

⎜⎜⎝

vIX(i,j,k)(vx)− = pIX(i+1,j,k)−pIX(i−1,j,k)

2h,

vIX(i,j,k)(vy)− = pIX(i,j+1,k)−pIX(i,j−1,k)

2h,

vIX(i,j,k)(vz)− = pIX(i,j,k+1)−pIX(i,j,k−1)

2h,

(i, j, k) ∈ D0

⎟⎟⎠ (36)

For this part, time complexity changes from O(N3) to O(1).

3.2.6 Volume data preprocessing

In 3D smoking simulation model, we just need to consider the gray level and store itin the texture memory. We define the grid’s gray level by its density. Suppose parallelrays with gray level of 255 penetrate the 3D smoking simulation model from onedirection, rays’ gray level will decrease because of the grids’ absorbability when raysgo through them while the grids would get the decreased gray level. We define the 3Dsmoking simulation model’s gray level matrix at time t as Lt , the gray level matrixat the next time step is

Lt+1 = Lt − ρt · 255 · σ (37)

The initial value of gray level is 255, ρt is density field at time t, σ is attenuationcoefficient. To ensure the visual effects of the smoke, for each grid (i, j, k) we setthe boundary conditions, when 100 ≤ [Lt ]ijk ≤ 255, we compute [Lt+1]ijk by (12),when [Lt ]ijk < 100, we set [Lt+1]ijk to 100.

Page 14: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

Parallel computing of 3D smoking simulation based on OpenCL 97

Suppose the rays are parallel and even, then at most the rays will penetrate the 3Dsimulation model from 3N2grids on three surfaces. The parallel computing of thispart can be described by

Pa(L

)(ρ,L) = Pa

((L(i,j,k))

t+�t = (L(i,j,k)

)t − 255 · σ · (ρ(i,j,k))t)

(38)

After parallelization, the time complexity changes from O(N2) to O(1).

3.2.7 Slices generation

We treat grids’ gray level as the 3D texture data, and then slice the data in texture ob-ject into slice clusters along the sight direction. The number of slices determines therendering frequency and rendering quality. Generally, the more the slices, the betterrendering quality and the lower rendering frequency is. We need to define the rightslice number to hold the balance between rendering frequency and rendering qual-ity. We compute the intersection points between each slice and the cubic in parallel,and sort the points counter-clockwise and output the slice as polygon. Suppose theslice number is M , and we parallelize slices data generation which can described as

Pa(C). The time complexity changes from O(M) to O(1).

3.2.8 Texture generation

We need to have texture mapping for the slices and set the gray level stored in thetexture memory to each vertices of the slices, by interpolation we can get each slice’sgray level. Here we create a world coordinate (x, y, z) where the center of the vol-ume data is the origin, and then create a texture coordinate (s, t, r) based on worldcoordinate. We can calculate the texture coordinate (s, t, r) by the following relation:

(s, t, r) =(

x + 1

2, y + 1

2, z + 1

2

)(39)

Rays penetrate the cubic along the ray path, and we can obtain the gray levelof each grid. Combined with the density field, then we compute grids’ opacity andtransmit them to the texture memory. The parallel computing of texture generation

can be described as Pa(T). The time complexity changes from O(N3) to O(1).The whole procedure of parallel computing of 3D smoking simulation model is

shown in Fig. 9.

4 Experimental results

In this paper, we realize parallel computing of smoking simulation on CPU and GPUrespectively. The platforms are as follows: (1) hardware: Intel Core (TM) 2 Duo CPU2.8 GHz, 2 GB memory. GPU: AMD ATI Radeon HD 5850 (1024 MB); (2) software:Visual Studio 2008, ATI Stream SDK v2.1. The fluid grid scale is (32 × 32 × 32).Some features of OpenCL devices are shown in Table 2.

Page 15: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

98 Z. Yuan et al.

Fig. 9 Procedure of parallel computing of 3D smoking simulation model

Table 2 OpenCL device features

OpenCL Compute Max work-item Max work-item Max work-group Allocatable Local

device unit dimensions size size memory size memory size

CPU 2 3 1024/1024/1024 1024 512 MB 32 KB

GPU 18 3 256/256/256 256 256 MB 32 KB

4.1 Comparison of parallel computing time by using scalar float and vector datafloat4

We parallel compute the external force on CPU and GPU by using float and float4respectively, and the comparison of the parallel computing time is as shown in Fig. 10.

Experimental results show that OpenCL implementations both on CPU and GPUhave less computing time by using float4 than using float. By using float4, we canaccess four floats a time which reduces the data access time. In addition, x86 CPUoften make best use of SSE (Streaming SIMD Extensions) when OpenCL kernelsare written with float4. For AMD GPU, ATI Radeon HD 5850 which uses a vectorarchitecture can achieve the best performance when OpenCL work-items operate onfour-element vector type such as float4.

4.2 Comparison of parallel computing time by using global memory and localmemory

We parallel compute the G-S iteration within the diffusion’s and projection’s solutionon CPU and GPU by using global memory and local memory, respectively. We set

Page 16: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

Parallel computing of 3D smoking simulation based on OpenCL 99

Fig. 10 Comparison of parallel computing time on CPU and GPU by using float and float4

Fig. 11 Comparison of parallel computing time on CPU and GPU by using global memory and localmemory

the iterations to 20 and the comparison of the parallel computing time is as shown inFig. 11.

Experimental results show that OpenCL implementations on GPU have less paral-lel computing time by using local memory than using global memory. While OpenCLimplementations on CPU have more parallel computing time by using local memorythan using global memory. For GPU, the local memory is a fast on-chip memory oflow-latency and the access time of local memory is less than that of global memory.For CPU, the OpenCL implementations map all memory spaces on the same hard-ware cache, a kernel which uses local memory might incur more overhead than theone that only uses global memory.

Page 17: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

100 Z. Yuan et al.

Table 3 Total parallel computing time on CPU and GPU

Grid scale Data processor Solution time (ms) Frame rate (fps) Speed up

32∗32∗32 CPU (nonparallel computing) 175.046 5.5 –

32∗32∗32 CPU (parallel computing) 37.196 26.6 4.8

32∗32∗32 GPU (parallel computing) 9.46 88.6 16.1

Fig. 12 some frames’ screenshot of real-time 3D smoking simulation on GPU

4.3 Real-time and realistic 3D smoking simulation

In order to get the precise time of the smoking simulation, we execute the CPU pro-gram of smoking simulation, OpenCL program of smoking simulation on CPU andGPU respectively for 1000 times, then we work out the average running time in eachsituation. The average times are shown in Table 3 and some frames’ screenshot ofreal-time 3D smoking simulation on GPU are shown in sub-figure a–f in Fig. 12.

Experimental results in Table 3 and Fig. 12 shown that OpenCL based parallelcomputing of 3D smoking simulation on CPU and GPU has a apparent advantageover normal smoking simulation on CPU and it meets the requirements of real timeand the visual effect of the smoking simulation.

Page 18: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

Parallel computing of 3D smoking simulation based on OpenCL 101

5 Conclusions

In this paper, firstly, we adopt the CFD combined with the Navier-Stokes equations ofincompressible fluid under the conditions of normal temperature and pressure to con-struct the real-time smoking simulation model. And then by introducing OpenCL par-allel computing model, we have proposed 3D smoking simulation parallel computingmodel based on OpenCL and designed the parallel algorithm based on OpenCL ac-cording to the GPU platform property. Finally, we have accomplished the mass par-allel computing of 3D smoking simulation. The experimental results show that thesmoking simulation model and parallel computing algorithm we have proposed canmeet the requirements of timeliness and visual effect of real-time physical model.

Acknowledgement This work was financially supported by the National Basic Research Program973 of China (Grant No. 2011CB707904) and the National Science Foundation of China (Grant No.61070079).

References

1. Brandvik T, Pullan G (2007) Acceleration of a two-dimensional Euler Flow Solver Using CommodityGraphics Hardware. Proceedings of the Institution of Mechanical Engineers, Part C:. J Mech Eng Sci221(C12):1745–1748

2. Breitbart J, Fohry C (2010) OpenCL—an effective programming model for data parallel computationsat the cell broadband engine. In: Proceedings of the 2010 IEEE international symposium on paralleland distributed processing, workshops and PhD forum, IPDPSW 2010

3. Chorin A, Marsden JE (1990) A mathematical introduction to fluid mechanics, 2nd edn. Springer,New York

4. Cullip T, Neumann U (1994) Accelerating volume reconstruction with 3D texture mapping hardware.Dissertation, University of North Carolina

5. Weiskopf D, Schafhitrel T, Erl T (2007) Texture-based visualization of 3D unsteady flow by real-timeadvection and volumetric illumination. IEEE Trans Vis Comput Graph 13(3):569–582

6. Denaro FM (2003) On the application of the Helmholtz-Hodge decomposition in projection methodsfor incompressible flows with general boundary condition. Int J Numer Methods Fluids 43(1):43–69

7. Fatalian K, Sugerman J, Hanrahan P (2004) Understanding the efficiency of GPU algorithms formatrix-matrix multiplication. In: Proceedings of the SIGGRAPH /EUROGRAPHICS workshop ongraphics hardware, pp 133–138

8. Fedkiw R, Stam J, Jensen HW (2001) Visual simulation of smoke. In: Proceedings of the ACM SIG-GRAPH conference on computer graphics, pp 15–22

9. Foster N, Metaxas D (1997) Modeling the motion of a hot, turbulent gas. In: Proceedings of the ACMSIGGRAPH conference on computer graphics, pp 181–188

10. Foster N, Metaxas D (1997) Controlling fluid animation. In: Proceedings of computer graphics inter-national conference, CGI, pp 178–188

11. Khronos OpenCL Working Group (2010) The OpenCL Specification, Version: 1.1, Document Revi-sion: 36. http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf. Accessed 15 October 2010

12. Kim T, Thürey N, James D, Markus G (2008) Wavelet turbulence for fluid simulation. ACM TransGraph 27(3): Article Number 50, August 1, 2008

13. Kraus M, Strengert M, Klein T et al (2007) Adaptive sampling in three dimensions for volume ren-dering on GPU. In: Asia-Pacific symposium on visualisation, APVIS 2007, Proceedings, pp 113–120

14. Nottingham A, Irwin B (2009) GPU packet classification using OpenCL: a consideration of viableclassification methods. In: Proceedings of the annual research conference of the South African insti-tute of computer scientists and information technologists, pp 160–169

15. Owens JD, Houseton M, Luebke D, Green S, Stone JE, Phillips JC (2008) GPU computing. ProcIEEE 96(5):879–899

16. Purnomo B, Rubin N, Houston M (2010) ATI stream profiler: a tool to optimize an OpenCL kernelon ATI radeon GPUs. In: ACM SIGGRAPH 2010 posters, SIGGRAPH ’10

Page 19: Parallel computing of 3D smoking simulation based on OpenCL …scbms.whu.edu.cn/pdf/Parallel computing of 3D smoking... · 2018. 1. 18. · Parallel computing of 3D smoking simulation

102 Z. Yuan et al.

17. Robler F, Botchen P, Ertl T (2008) Dynamic shader generation for flexible multi-volume visualization.In: IEEE Pacific visualisation symposium 2008, PacificVis proceedings, pp 9–16

18. Schechter H, Bridson R (2008) Evolving sub-grid turbulence for smoke animation. In: Proceedings ofthe 2008 ACM SIGGRAPH/Eurographics symposium on computer animation, pp 1–7

19. Song O-Y, Shin H, Ko H-S (2005) Stable but nondissipative water. ACM Trans Graph 24(1):81–9720. Stam J (1999) Stable fluids. In: Proceedings of the 26th annual conference on computer graphics and

interactive techniques, SIGGRAPH 99, pp 121–12821. Stone JE, Gohara D, Shi G (2010) OpenCL-a parallel programming standard for heterogeneous com-

puting systems. Comput Sci Eng 12(3):66–7222. Yeh TY, Faloutsos P, Reinman G (2006) Enabling real-time physics simulation in future interactive

entertainment. In: Proceedings—Sandbox symposium 2006: ACM SIGGRAPH video game sympo-sium, Sandbox ’06, pp 71–81

23. Zhang W, Zhang L, Sun S, Xing Y, Wang Y, Zheng J (2009) A preliminary study of OpenCL for ac-celerating CT reconstruction and image recognition. In: IEEE nuclear science symposium conferencerecord, NSS/MIC 2009, pp 4059–4063