18
Towards realistic and interactive sand simulation: A GPU-based framework Juan-Pierre Longmore , Patrick Marais, Michelle M. Kuttel Department of Computer Science, University of Cape Town, Private Bag X3, Rondebosch 7701, South Africa abstract article info Article history: Received 5 May 2012 Received in revised form 24 October 2012 Accepted 27 October 2012 Available online 5 November 2012 Keywords: Dynamic simulation Granular materials Visualisation Particle Computation Sand simulation We describe a highly efcient method for simulation of particulate materials at the granular level on graphics pro- cessing unit (GPU) hardware. Our GPU implementation of a discrete element method (DEM) allows for both rapid visualisation and physically accurate simulation of particulate materials, with a specic focus on sand. Our model represents each granule as a tetrahedral lattice of four particles, thereby implicitly modelling static friction through interlocking of neighbouring granules. Simulations performed with our implementation produce demonstrably realistic granular behaviour with respect to both force characteristics and reactive behaviour of typical sand piles. The implementation is also highly efcient, achieving 256 K tetrahedral granules at 120 milliseconds per frame of animation, and requires only a personal computer equipped with any recent commodity graphics card to accelerate all simulation physics. Further, our model admits subtle real-time lighting effects, such as particle self-shadowing and shadowing among granules and the environment, important for reproduction of the distinctive appearance of granular materials. Our model also supports interaction with a general environment by rst point-sampling objects and then treating these as large granules.In this way, our simulation naturally handles arbitrary rigid body interaction, thus making it applicable to broader real-time simulation applications. © 2012 Elsevier B.V. All rights reserved. 1. Introduction Particulate materials have unique physical responses to applied forces. Current knowledge suggests that macroscopic granular behaviour arises from ne-scale interactions among mesoscopic granules [1]. The problem, however, lies in extrapolating features at the granule level to the macroscopic properties of the granular assembly [2,3]. Traditionally, much of the research into the behaviour of sand and other particulate ma- terials has been done via physical experiments. For example, photoelastic measurement of forces in a pile of glass beads is one tool for understand- ing how grain properties might inuence granular assembly [4]. Howev- er, such manual experimentation is slow, cumbersome, and limited to bead properties that are not easy to vary across different experiments. The ability to rapidly simulate and visualise particulate material behav- iour has the potential to remove the complexities and expense of work- ing with granular materials directly. Computer simulation allows for faster experimentation with easy to modify parameters and readily accessible experimental results. Simulation also provides ready access to otherwise difcult to obtain granular data, such as vector forces, force distributions, and spatial correlations in a sand pile. In addition, a computationally efcient model of particulate material can be used for realistic simulations of the behaviour of sand for visual effects in both motion pictures and games. The distinct element method (DEM) is a class of numerical techniques which apply the principles of molecular dynamics (MD) to model the mo- tion of large numbers of particles [5]. Granules are typically represented as spherical particles interacting through the forces produced during col- lision of pairs of particles. However, DEM for granular material differs from true MD in operating at a mesoscopic (granules and powders) or macroscopic (rocks and boulders) scale, at perceivable distances, on a millisecond time-scale and with interactions that remain generally free of thermodynamic effects [6]. In reality, colliding material deforms on impact, with increasing deformation at higher velocities. In a dis- cretely time-stepped simulation, rigid particles instead overlap. Simulation models, therefore, use the depth of overlap, as well as the relative velocity of a collision pair, to estimate deformation and to calcu- late the repulsive force needed for particles to draw back. These binary forces and the resulting changes to particle velocities together control the large-scale behaviour of the simulated material. An important dem- onstration of the utility and power of DEM was provided in Cundall and Strack's seminal rock modelling paper [7] and DEM has subsequently been employed for simulation of granular materials such as sand [812]. While DEM is well-suited to modelling small-scale granular inter- action, it remains computationally expensive. In particular, a DEM simulation has to store and update the motional properties of hun- dreds of thousands of particles across each time step. In addition, a simulation needs to discover collisions among particles and calculate the forces that result. The high computational cost of a grain-level DEM simulation lead to much interest in the parallelisation of this algorithm. For particulate material, DEM decomposes naturally into Powder Technology 235 (2013) 9831000 Corresponding author. E-mail addresses: [email protected] (J.-P. Longmore), [email protected] (P. Marais), [email protected] (M.M. Kuttel). 0032-5910/$ see front matter © 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.powtec.2012.10.056 Contents lists available at SciVerse ScienceDirect Powder Technology journal homepage: www.elsevier.com/locate/powtec

Towards realistic and interactive sand simulation: A GPU-based framework

Embed Size (px)

Citation preview

Page 1: Towards realistic and interactive sand simulation: A GPU-based framework

Powder Technology 235 (2013) 983–1000

Contents lists available at SciVerse ScienceDirect

Powder Technology

j ourna l homepage: www.e lsev ie r .com/ locate /powtec

Towards realistic and interactive sand simulation: A GPU-based framework

Juan-Pierre Longmore ⁎, Patrick Marais, Michelle M. KuttelDepartment of Computer Science, University of Cape Town, Private Bag X3, Rondebosch 7701, South Africa

⁎ Corresponding author.E-mail addresses: [email protected] (J.-P. Longm

(P. Marais), [email protected] (M.M. Kuttel).

0032-5910/$ – see front matter © 2012 Elsevier B.V. Allhttp://dx.doi.org/10.1016/j.powtec.2012.10.056

a b s t r a c t

a r t i c l e i n f o

Article history:Received 5 May 2012Received in revised form 24 October 2012Accepted 27 October 2012Available online 5 November 2012

Keywords:Dynamic simulationGranular materialsVisualisationParticleComputationSand simulation

We describe a highly efficient method for simulation of particulate materials at the granular level on graphics pro-cessing unit (GPU) hardware. Our GPU implementation of a discrete element method (DEM) allows for both rapidvisualisation and physically accurate simulation of particulate materials, with a specific focus on sand. Our modelrepresents each granule as a tetrahedral lattice of four particles, thereby implicitly modelling static friction throughinterlocking of neighbouring granules. Simulations performed with our implementation produce demonstrablyrealistic granular behaviour with respect to both force characteristics and reactive behaviour of typical sand piles.The implementation is also highly efficient, achieving 256 K tetrahedral granules at 120 milliseconds per frame ofanimation, and requires only a personal computer equippedwith any recent commodity graphics card to accelerateall simulation physics.Further, ourmodel admits subtle real-time lighting effects, such as particle self-shadowing and shadowing amonggranules and the environment, important for reproduction of the distinctive appearance of granular materials.Ourmodel also supports interactionwith a general environment by first point-sampling objects and then treatingthese as large “granules.” In this way, our simulation naturally handles arbitrary rigid body interaction, thusmakingit applicable to broader real-time simulation applications.

© 2012 Elsevier B.V. All rights reserved.

1. Introduction

Particulate materials have unique physical responses to appliedforces. Current knowledge suggests thatmacroscopic granular behaviourarises from fine-scale interactions among mesoscopic granules [1]. Theproblem, however, lies in extrapolating features at the granule level tothe macroscopic properties of the granular assembly [2,3]. Traditionally,much of the research into the behaviour of sand andother particulatema-terials has been done via physical experiments. For example, photoelasticmeasurement of forces in a pile of glass beads is one tool for understand-ing how grain properties might influence granular assembly [4]. Howev-er, such manual experimentation is slow, cumbersome, and limited tobead properties that are not easy to vary across different experiments.The ability to rapidly simulate and visualise particulate material behav-iour has the potential to remove the complexities and expense of work-ing with granular materials directly. Computer simulation allows forfaster experimentation with easy to modify parameters and readilyaccessible experimental results. Simulation also provides ready accessto otherwise difficult to obtain granular data, such as vector forces,force distributions, and spatial correlations in a sand pile. In addition,a computationally efficient model of particulate material can be usedfor realistic simulations of the behaviour of sand for visual effects inboth motion pictures and games.

ore), [email protected]

rights reserved.

The distinct elementmethod (DEM) is a class of numerical techniqueswhich apply theprinciples ofmolecular dynamics (MD) tomodel themo-tion of large numbers of particles [5]. Granules are typically representedas spherical particles interacting through the forces produced during col-lision of pairs of particles. However, DEM for granular material differsfrom true MD in operating at a mesoscopic (granules and powders) ormacroscopic (rocks and boulders) scale, at perceivable distances, on amillisecond time-scale and with interactions that remain generally freeof thermodynamic effects [6]. In reality, colliding material deformson impact, with increasing deformation at higher velocities. In a dis-cretely time-stepped simulation, rigid particles instead overlap.Simulation models, therefore, use the depth of overlap, as well as therelative velocity of a collision pair, to estimate deformation and to calcu-late the repulsive force needed for particles to draw back. These binaryforces and the resulting changes to particle velocities together controlthe large-scale behaviour of the simulatedmaterial. An important dem-onstration of the utility and power of DEMwas provided in Cundall andStrack's seminal rock modelling paper [7] and DEM has subsequentlybeen employed for simulation of granularmaterials such as sand [8–12].

While DEM is well-suited to modelling small-scale granular inter-action, it remains computationally expensive. In particular, a DEMsimulation has to store and update the motional properties of hun-dreds of thousands of particles across each time step. In addition, asimulation needs to discover collisions among particles and calculatethe forces that result. The high computational cost of a grain-levelDEM simulation lead to much interest in the parallelisation of thisalgorithm. For particulate material, DEM decomposes naturally into

Page 2: Towards realistic and interactive sand simulation: A GPU-based framework

1 While general-purpose programming languages such as the Compute Unified De-vice Architecture (CUDA) [36–38] for NVIDIA GPUs are available, our use of GLSL al-lows calculation within the rendering context, thereby providing access to updatedpositional data as it becomes available. In addition, our implementation is not limitedto NVIDIA GPUs. Abstraction from the graphics context for the force calculation mighthave been carried out with a general purpose language, such as OpenCL [39], Brook[16], or Thrust [40], which would transparently compile to graphics context codes.However, we have avoided these, as they hide important hardware details, which wehave exploited to optimise the DEM simulation specifically for granular sand.

984 J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

force calculations and positional property updates, which are handledsimultaneously for each particle. Thus, CPU parallelisation is to divideforce calculations and positional updates among processors in a cluster[12]. One approach allocates fixed particle subsets to certain processors.In contrast, the spatial approach performs this allocation dynamically,assigning to a specific processor all particles that presently lie in a spe-cific subdomain [12]. However, CPU- based parallelisations face highnetwork latencies for inter-process communication, which is necessaryfor exchanging information about particles at the boundaries betweensubdomains.

Despitemuch progress in parallel DEM technologies and algorithms,few applications provide a satisfactory solution for large-scale industrialproblems, which need both physical accuracy and timely results. Someof the better results in the 2D case [13,14] include simulation of 200 Kmonodisperse particles in cylindrical and rectangular containers using50 processors. Similar results follow for the 3D case, however, usuallywith a pronounced decrease in simulation speed and particle count.The results are worse for polydisperse material, purportedly due tothe added complexity of resolving contact boundaries among differentsized particles. Nevertheless, polydisperse material simulation remainsan important goal, as such material evinces a greater variety of impor-tant granular properties. Recent work that simulates 20 K to 30 K poly-disperse particles is promising [12].

A straightforward approach to parallelisation is to allocate a singlethread to each particle (or grain). However, it is not possible to do thisefficiently for large numbers of particle on a CPU: deep pipelines andcomplicated data cache and flow-control logic [15] limit thread countand increase thread switching costs on a CPU. We therefore advocatethe graphics processing unit (GPU) as the parallel hardware forDEM-based sand simulation. The modern GPU has support for mas-sive thread-level SIMD parallelism, including lightweight thread con-texts. In particular, a streaming architecture allows data elements tomove through the constrained processing pipeline without stalls [16].The GPU also comes equipped with specialised fast on-board texturememory supporting cached access through spatially local accesses. In ad-dition, GPU memory has an order of magnitude more bandwidth thanmany CPU-based systems [17,18]. While previously limited to graphicsapplications, the increased programmability of the GPUmakes it applica-ble to general-purpose computation [19]. Moreover, GPU performance isincreasing rapidly, with a peak single-precision arithmetic starting at arate of more than 300 GFLOPS and increasing to 1350 GFLOPS onhigh-end cards [17,18]. In contrast, a modern high-end multicore CPUis rated at 50 GFLOPS peak performance. The strong high-volumemarketfor computer games also ensures a low price per FLOP relative to special-ity CPU hardware [20].

The massive parallelism now available with programmable GPUhardware has already shown its utility with applications to variousmaterial simulations, including fluids, fire, hair, and cloth. Specificto particulate material, much early work in particle systems in com-puter graphics arguably applies to granular simulation. One of thefirst important results in this area was presented by Kolb et al. [21],who described a GPU-based particle system simulator able to rendera dynamically growing particle system of up to one million particlesin real-time. Advancement in GPU programmability has lead to moresophisticated simulations on the GPU, beyond 1D particle systemsand 2D DEM. Harada et al. [22] and Venetillo et al. [23] have demon-strated 3D DEM simulations on the order of millions of particles run-ning in real-time. They perform collision detection, force calculation,and motion updates entirely on the GPU. The results of the simula-tion are visualised in real-time as well. While producing particulateDEM simulation, these approaches are unable to produce physicallyaccurate sand behaviour. Recent work by Yasuda et al. [24] hasgone some way in addressing granular material acceleration on theGPU. However, they do not account for static friction behaviour,which precludes many of the characteristic and important sand be-haviours vital to accurate simulation.

Here we describe a GPU-based DEM framework using hypothesisedforce-based interactions among granules [25,26]. A major problem fac-ing single particle models of granular material is the computationallyexpensive handling of static friction, which is necessary to achievesand pile formation. We proposed a multiparticle model, based on thework of Bell et al. [27], tailored to address this problem. For grains, wechoose the smallest possible multiparticle arrangement having three-dimensional symmetry: a regular tetrahedron comprised of four parti-cles. Its symmetry, in particular, allows us to accelerate orientation anddirection dependent physically-based calculation. Our spheropolygongranule model [28,29] represents each granule as a tetrahedral lattice offour particles and allows for nonlinear interactions among granules. Inthis newmodel, static friction is a natural consequence of the interlockingof neighbouring granules and produces demonstrably physically-accurate granular material behaviour, resulting in the emergence ofimportant macroscopic properties such as dune formation.

Our framework uses GPU-based algorithms for physics and granulecollision detection. In contrast to previous approaches [22,23,30], thisleads to physically realistic simulations in real-time. Further, our imple-mentation requires only a personal computer equippedwith any recentcommodity graphics card supporting programmable stages in the ren-dering pipeline to accelerate all simulation physics. The implementationalso supports real-time visualisation of the simulation results, whichfollows from having simulation data immediately available within thegraphics pipeline.

Our approach has been implemented with C++, OpenGL [31–33],and the OpenGL Shading Language (GLSL) [34,35] and applied to simu-lating over 200,000 tetrahedral granules in real-time on the NVIDIA8800 GTX GPU [41].1 This in many cases exceeds the granule countssimulated in previous distributed computing work on granular simula-tion [42]. We have employed techniques such as texture mapping,frame-buffer binding, multiple render-buffer surfaces, and render-to-texture, as well as spatially-local access pattern to achieve highperformance.

Our framework also supports interaction with a general environ-ment. Point sampling of environmental surfaces allows arbitrary objectsto be modelled as large “granules”, permitting environmental interac-tion with little modification to the underlying simulation. In this way,our simulation naturally handles arbitrary rigid body interaction, thusmaking it applicable to broader real-time simulation applications. In ad-dition, we are able to support real-time visualisation of simulation evo-lution that includes subtle environmental lighting effects, such asgranules that shadow themselves, their neighbours, and surroundingobjects. This is important for reproducing the subtle and distinctive visualappearance of granular material. Importantly, our model demonstratesperformance linear in granule count, suggesting seamless scalingwith fu-ture increases in GPU multiprocessor density.

2. GPU architecture

Conventionally, in computer graphics, the GPU acts as a program-mable processor for accelerating common and highly parallelisablegraphics routines collectively called the graphics pipeline. As illus-trated in Fig. 1, this pipeline contains fixed stages that translate a3D scene into a final on-screen image. The first stage, called vertexprocessing, transforms a three-dimensional scene, comprised of ge-ometry, namely, vertices and associated connectivity information,

Page 3: Towards realistic and interactive sand simulation: A GPU-based framework

Fig. 1. Simplified graphics rendering pipeline.

985J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

into a two-dimensional screen space. In addition, this stage evaluatescolour and texture coordinates associated with each vertex. The nextstage scan-converts the two-dimensional geometry into a collectionof fragments in screenspace. Each fragment holds the informationnecessary to update its associated pixel in the target frame buffer.The third stage, called fragment processing, uses the texture coordi-nates of each fragment to fetch associated texels (texture elements)from one or more textures. Such information can be combinedmath-ematically to decide the final colour of the associated pixel. Finally,textures and the frame buffer storing the pixels support an RGBA col-our format with a 32-bit floating-point value per colour channel.

2.1. General-purpose programming on the GPU

We can exploit the inherent parallelism in this hardware using theanalogy given in Fig. 2, which shows how rendering pipeline behaviourcan form the basis for familiar thread-based parallelism. Initially, aquadrilateral enters the setup phase (vertex processor) where it isaligned and scaled to screen space. This allows the following generationphase (scan-conversion) to produce fragments aligned in a one-to-one

Fig. 2. Parallel computation by exploiting parallelism in the rendering pipeline. A single execthe pipeline, threads (fragments) are spawned to carry out the parallel work, the results offact, includes one or more data buffers (textures) explicitly defined as output targets in the(that is, not writable) for a following cycle.

correspondence with matching texels. As processing happens indepen-dently for each fragment,we think of fragments as individual threads (ex-ecution contexts). In this case, as each fragment knows its assignedmatching texel coordinate, the associated data for a thread typically sitsat amatching position in one ormore data buffers (textures). A thread ac-cesses these data by issuing suitable texel fetches (buffer reads).

Through fragment generation we are able to spawn threads effi-ciently. In addition, this scheme allows for a general-purpose gatheroperation, as each fragment is able to fetch multiple texels, not onlyfrom a matching position in all the textures but also from arbitrarypositions on the same texture. However, we cannot arbitrarily writeto the output buffer, as fragmentsmustwrite their output to matchingpositions in one or more data buffers.

An alternative solution is to utilise the vertex processing stage, asillustrated in Fig. 3. Instead of issuing four vertices specifying a quadri-lateral, we send as many vertices as the number of threads we wish tospawn. During vertex processing, we not only scale vertices to screenspace but also position them independently of one another in a one-to-one fashion with the expected fragment generation points. Impor-tantly, we are able to resize vertices so they overlap a large enough

ution cycle starts with the passing of a quadrilateral to the GPU for “rendering”. Later inwhich they write to their respective positions in the output buffer. The latter buffer, inpresent cycle. These may not be read during the same cycle, but can be set as readable

Page 4: Towards realistic and interactive sand simulation: A GPU-based framework

2 This is an efficient representation, being the smallest arrangement of two or moreparticles with each primary axis having the same moment of inertia—a fact used laterto simplify rotational motion calculations.

Fig. 3. Initiating parallel thread execution using vertices only. This design permits nontrivial mapping between positions in an input buffer, as referenced by a thread (fragment)with a given id, and its output buffer position. Such remapping allows for random access writes—though this remains limited to one and only output coordinate in one or morebuffers.

986 J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

area in screen space to cause a single fragment generation per vertex.After fragment generation, the thread processing paradigm continuesas before.

The result here is an almost general-purpose scatter operation. How-ever, absent is an ability for one vertex-fragment pair to write to multi-ple positions in the same data buffer. Such behaviour requires issuingmultiple vertices per target fragment, which can significantly add tothe already heavy transfer load between CPU and GPU. A better solutionis possible with arrival of the programmable geometry stage in the ren-dering pipeline. This stage, lying between vertex processing and scan-conversion allows for one incoming vertex to spawn multiple outgoingvertices. As both vertex and geometry processing stages are now pro-grammable and allow arbitrary texture reads, we are thus able, for eachinput vertex, to write to one or more positions in the output buffer, asneeded. Conceptually, this merely results in an array of threads, as wasshown in Fig. 3, but with multiple threads mapping to the same arrayposition.

One disadvantage of the now general scatter remains the relativelylarge CPU-GPU transfer load. Fortunately, newer hardware allowsspawning of a large numbers of verticeswithin the pipeline initialisationstage. Each vertex receives a unique id from a user-specified starting id,incremented by one for each vertex.

2.2. Collision detection data structures on the GPU

Achieving general-purpose programming on the GPU requires ameans to represent and access data structures in GPU memory. Thelack of memory pointers in NVIDIA 8800 GTX and older hardwaremakes structures such as dynamic link-lists difficult to implement. In-stead, indirect referencing must be used. This involves storing texturecoordinates in textures themselves. An uncomplicated reading ofthese index textures provides texel-encoded coordinates specifyingpositions to read from in data-storing textures. However, if a GPU-computable injective mapping exists for reading data, then this pre-ferred over indirection, as memory access is bandwidth bound, whilecalculation is cheaper and carried out as SIMD. This is the case for sim-ple spatial data structures, such as a grid (see Section 4.1).

For efficiency, we need to store the identifiers for four particles inone 4-vector, while avoiding over-writing existing data. To do this, weorder the ids using the method of Harada et al. [43], which uses depthtesting to impose a per-fragment ordering.

Our complete threading model (Fig. 4) requires modern GPU hard-ware with programmable vertex, geometry, and fragment processingstages, whichwe now refer to as the output alignment, output multiplier,

and thread processing stages, respectively. To saturate the GPU, thismodel requires large numbers of threads,which can be generated eitherby the fragment-based (Fig. 3) or vertex-based (Fig. 2) approach.

This approach has three levels of parallelism. Firstly, multiplestages execute in parallel. Secondly, at each stage, vertices and frag-ments (threads) are handled simultaneously and independently ofone another. Finally, SIMD instructions exist for performing arithmeticoperations on vector data within each thread context thus supportinginstruction-level parallelism.

3. Simulation

The conventional DEMmodel of sand uses a spherical particle rep-resentation for granules. However, theory suggests that tangentialforces, specifically static and dynamic friction, dominate in heap for-mation [44]. Accounting for this in a spherical model often requiressacrificing significantly more memory and compute time to storeand track originating points of contact between granules [27]. Insteadof exacerbating an already computationally expensive process, weadopt a spheropolygonal representation that models each granule asa regular tetrahedral arrangement of four particles—one at each cor-ner.2 The hypothesis is that non-trivial granular boundaries willlead to computationally cheaper static friction modelling throughinterlocking behaviour among granules. We demonstrate throughtesting that this is a physically correct assumption.

In our model, the forces governing granule-granule interaction arethose used successfully in previous single particle models [25,26]. Inparticular, we follow the work of Bell et al. [27], which distinguishesissues at the granule and particle levels, specifically in reference tothe tetrahedral granule and its particle constituents. For the presentdiscussion, we shall assume a fixed set of particles and grains repre-sented respectively by index sets P and G. Also, we assume possessionof function f: P→G that maps indices of child particles to the indicesof their respective parent granules. In addition, we take a general ap-proach that supports not only tetrahedral granules but also arbitrarygranule structure.

Particle and granule properties, such as their position and velocity,evolve through time based on the forces they experience. Forces thatact on particles over the interval Δt are summed to give the total force

Page 5: Towards realistic and interactive sand simulation: A GPU-based framework

alignment code scatter-write code thread code

Outputalignment

Outputmultiplier Thread issue

Threadprocessor

Data buffers

Multi-buffer target

(for each thread)Read buffers(s)

Initiate singleexecution cycle

2D

position

2D

position threads

4-vectors

Per-elementstorage

selection

Fig. 4. Threading paradigm.

987J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

applied to their parent granules. For particle p in P we express math-ematically the total force acting on it as

Fp ¼ ∑i∈P− pf g

Fi þmg; ð1Þ

wherem is particle mass and g is gravity. Given the full set of particlesp1,…,pk mapping to a parent granule g in G, we can calculate the totalforce experienced by the granule as

Fg ¼Xkj¼1

Fj; ð2Þ

Tg ¼Xkj¼1

ri � Fi; ð3Þ

where we distinguish the total normal force Fg from the torque Tg

that act orthogonally on granule g. The value ri represents the relativevector from the centre of the granule to its child particle i. Dividing Fg

by total granule mass gives its acceleration a at time t. We can derivethe granule's new position x and velocity v at time t+Δt using a sim-ple first-order integration scheme, such as the forward Euler method[45].

To represent granule orientation, we use a quaternion [46], whichis a 4-vector offering four degrees of freedom with the single con-straint that the quaternion be of unit length. This requires that we de-compose the torque using Newton's equation into

Tg tð Þ ¼ I tð Þωg tð Þ ð4Þ

where I is a three-by-three matrix called the mass moment of inertiafor a grain, while ωg and Tg are the angular velocity and total torqueacting on the grain. For a tetrahedral granule, I is a diagonal matrixwith equal diagonal entries. This means we can replace matrix–vectormultiplication with cheaper scalar-vector multiplication. The nextstep expresses the quaternion qwith respect to a constant angular ve-locity ω acting over a short time-step Δt as a recurrence relation

q t þ Δtð Þ ¼ q tð Þ þ 12

ω tð Þ;0½ �q tð ÞΔt; ð5Þ

where the last term involves, from left to right, quaternion-quaternionmultiplication, then quaternion-vectormultiplication.Havingdescribedthe rotational and translational motion for the grain, we can derive theupdated properties of their particle constituents. However, we needωg(t) which entails calculating I(t)−1. Fortunately, a simple relation

exists for converting from the local coordinate system of the granuleto a world coordinate system using

I tð Þ−1 ¼ R tð ÞI 0ð Þ−1R tð ÞT ; ð6Þ

where R(t) is the rotation matrix describing the grain's orientation attime t. In addition, conversion between a quaternion and rotation ma-trix is necessary. A straightforward relationship exists between thequaternion and its rotationmatrix form. For details, we refer the read-er to Shoemake [46]. Importantly, while the need to convert betweenthese rotational representationsmight obviate the computational gainin avoiding matrix–vector multiplication, a quaternion uses just four,as opposed to nine, floating point values. This saves significantly onstorage costs where memory access is a expensive operation on theGPU.

We can derive the relative position rig of a particle constituent iwithrespect to the centre of the granule parent g and the rotation matrix Rg

representing the change in the granule's orientation. Given these, thenew absolute position of particle i is given by

xi ¼ xg þ Rgrgi ð7Þ

where xi' and xg are the respective positions of particle and parent attime t. Velocity follows a similar pattern:

v0

i ¼ vg þωg � rgi : ð8Þ

The final result depends on the initial force applied to particles. Sincetime is discretised, force is taken to act over some time interval Δt be-tween time steps. This leads easily to a soft spheremodel in which inter-penetration between spherical particles serves as a proxy for deformationduring collision. In particular, interpenetration depth approximates thesize of the reciprocal force acting in reaction to the collision. Providedour time stepping is small enough, this model can reliably estimate con-tinuous behaviour [47]. The difficulty lies in deriving the correct forcemodel, which ultimately drives behaviour of the entire granular system.

The forces used in our granular model of sand are given by the fol-lowing equations [27]:

Fn ¼ −γnξn12 _ξn−knξn

32; ð9Þ

Fs ¼ −minðμFn;γsjjv0

sjjÞv

0

s

v0s

�� ���� �� ; ð10Þ

where vs' is the particle velocity in the shearing direction (orthogonalto the normal force) and ξn is the penetration depth. The vector ver-sion of the normal force can be written as nFn. The constants γn and

Page 6: Towards realistic and interactive sand simulation: A GPU-based framework

0

0

3

2

1

Granule data buffer

Particle data buffer

Fig. 5. Mapping granules and particle properties to buffer elements. Here we associatethe first granule g0 with its four particle children pj for j=0, 1, 2, 4.

rx ry rz p

vx vy vz g

vx vy vz g

rx ry rz p

Particle properties

Granule properties

Fig. 6. Explicit storage of particle-granulemappings. On the left, we see how the ids stored ina particle's position and velocity (previously read from a buffer or received as input) can beused to access related properties, that is, from the same buffer position on another particle

988 J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

kn are the damping and stiffness coefficients, respectively, both ofwhich are functions of the particles effective mass [27]. The durationtn which sets the upperbound for Δt is derived as

tn ¼ πknmeff

− γn

2meff

� �2� �−12

; ð11Þ

where meff is the effective mass of the particle.

4. GPU implementation

We want to make granule positions, momentums, and their otherphysical properties available to code running on the GPU. In particular,these data must be rapidly accessible, which means storing it in GPUmemory. To achieve this, we use the data buffers as a GPUmemory res-ident data store.

4.1. Granule and particle storage

Grain storages requires that we assign to each granule a unique iden-tification number gi. Each gi is mapped one-to-one to a unique two-dimensional data buffer coordinate, as shown in Fig. 5. For tetrahedralgranules, every quartet of particles, starting from the bottom left of thebuffer andmoving right, is associatedwith a single granule. Thus, granuleproperties like velocity are ordered in the same way as the matchingproperties of their particle constituents. The internal ordering of particlesin the quartet, however, is unimportant, as long as they are stored contig-uously in the buffer.

We first note the equation used to map back and forth between theone-dimensional ids of both granules and particles and their two-dimensional surface position:

x; yð Þ ¼ id−w⋅$idw

%;

$idw

% !; ð12Þ

id ¼ xþ y⋅w: ð13Þ

Here (x,y) and w are the data buffer coordinate and width, while idholds the value either gi or pi depending on the buffer being considered.The floor function “⌊⌋” is defined as x ¼ max n∈Z n≤xj gf for a float-pointing value x. We now note that provided only tetrahedral granulesare stored in the buffers, the particles associated with some granule giare simply p4i,p4i+1,p4i+2,p4i+3.

Thus, to each granular property, namely position, linear velocity,angular velocity, and orientation, we allocate and assign a data buffer,where the same two-dimensional coordinate on each buffer correspondto a different property for the same granule. Similarly for particles, westore their positions and velocities.3

The problem remains that the GPU cannot infer a 1D ordering of ids,because threads run in a 2D grid, whichmeanswemust store this infor-mation somewhere. Fortunately, GPUs are optimised to read, write, andmanipulate 4-vector data. Storing properties as 4-vectors, gives us a3-vector for the property and a final element for the id information.Given that we have two independently stored properties for both parti-cles and granules we can now store sufficient information to fully sup-port the mapping presented earlier in Fig. 5. Specifically, each particlecan store its own id and the id of its parent granule, while each granulerecords its id and the id of its first child granule, as illustrated in Fig. 6.

3 Particles do not require angular velocity or orientation information, as the formeris calculable at runtime, while the latter is not relevant to particles with three degreesof freedom.

4.2. General rigid body storage

Ageneral rigid body can be viewed as a grainwith an arbitrary particlecount. Thus, nongrain rigid bodies have their positions, linear velocities,angular velocities, and orientations stored in the same textures as granuleproperties. Similarly, the positions, velocities, and relative positions ofparticles matched to nongrain rigid bodies are stored alongside the prop-erties of the granule particles. Nothing need change except that we nowneed to storemore information to fully recover the properties of a granuleon the GPU at runtime. Specifically, we need the number of particles con-stituting a granule. This value can be stored in the final element of the an-gular velocity property of the matching granule. This assumes rigid bodymass is a direct multiple of the particle count constituting it. Otherwise,we simply store it directly as mass for granules and rigid bodies alike.

In this way, we are able to represent granules, arbitrary particle-sampled rigid bodies, and their particle constituents all on the GPU.The properties associated with each easily read or derived on theGPU at runtime, as illustrated in Fig. 7.

4.3. Collision detection

To efficiently handle collision detection we use a 3D grid structure.For each cell in this grid, we store the ids of the particleswhose centreslie within the cell. By choosing particle diameters carefully, we can en-sure that under stable simulation conditions, at most four particlescan share a single cell. This again enables a simple 4-vector data bufferstore.

Two complications remain. First, we need ameans tomap between a3D cell location and a 2D data buffer element. Second, a mechanism isneeded for writing four elements to the same vector element in a databuffer. Fortunately, this problem has been solved by using the auxiliarydepth and stencil buffers to enable a four-pass write to a 4-vector databuffer element as discussed in Section 2.2.

The map between grid and data buffer is handled by transformingfrom an arbitrary position p=(px,py,pz) relative to the origin of thegrid in simulation space to a 2D data buffer coordinate (s, t) relative

texture. The associated granule id g can be used to find parent granule properties as well.An analogous idea applies to a given granule position and velocity (illustrated on theright), except that theparticle idp is for thefirstparticle constituent associatedwith the gran-ule id g.

Page 7: Towards realistic and interactive sand simulation: A GPU-based framework

particle at position

plane origin

Fig. 7. Collision detection for an implicitly represented wall or floor. The implicit rep-resentation has a finite plane with vectors for the normal n, origin s and sides a andb. The magnitude of the latter two vectors indicate the length of these sides. Giventhe radius rp of the plane particles and the radius rg of the grain particle, we can findthose plane particles colliding with the grain.

Particle positions

Grid construction

Force update

Grain update

Particle update

Particle velocities

Grain P/V/Av/O

Particle forces

Grid

Fig. 9. GPU-based DEM algorithm. Shaded boxes represent GPU-based data buffers thathold old and updated values of their matching properties. Stored values include grainposition (P), velocity (V), angular velocity (Av), and orientation (O), as well as particlepositions and velocities. Each dotted line indicates reading of a previous state value,while a solid line refers to a single write of an updated property for the currenttime-step. Unshaded boxes represent a single thread of execution on the GPU.

989J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

to the buffer's lower left corner. The first step is to normalise p into adiscrete grid position matched to all positions in a particular cell.Choosing the cell corner closest to the origin of grid, we can derivethis corner as q=(⌊px/cx⌋,⌊py/cy⌋,⌊pz/cz⌋) where cx,cy,cz are the dimen-sions of the grid. Importantly, we have assumed cells are uniform insize and cubic in shape to simply calculations in this section. To find(s, t) we need only compute

s ¼ qz%Tð Þkw þ qx; ð14Þ

t ¼ kh

$qzT

%þ qy; ð15Þ

where ‘%’ represents themodulus operation, which returns the remainderafter integer division. Thus, Eqs. (14) and (15) enable the simulation towrite a particle's id, based on its cell position in the grid, to a matchingbuffer position. This allows the simulation to update the grid data struc-ture, as shown in Fig. 8.

The general simulation loop now amounts to the flow control given inFig. 9. The buffers pictured are assumed to be initialised properly priorto the simulation start. In addition, the diagram indicates the processingperformed by exactly four threads. For example, the grid construction

Position buffer Grid buffer

Coordinate list

(0.5, 0.5)

(1.5, 0.5)

(2.5, 0.5)

(3.5, 0.5)

(3.5, 3.5)

OA TP

(x,y)

(x,y)

coordinate

(s,t), id

thread

(px,py,pz,id)output

Fig. 8. Mapping particle positions to the grid structure. The exhaustive coordinate list(for which we use a vertex buffer object [48] under the hood) contains 2D coordinatesthat we set to point to unique and used buffer elements in the particle position buffer.These are sent to the GPU where the Output Alignment (OA) processing retrieves fromthe position buffer the matching stored particle position. This allows for thread realign-ment to repoint to the correct position in the grid. Thread Processing (TP) then handleswriting the result to the correct position in the grid buffer.

stage indicates reads and writes for a single thread. In reality, there areas many threads as grid cells, all of which execute the same reads andwrites simultaneously, but on different data.

5. Results and discussion

The benchmark test case involves a 100d domain (where ‘d’ indi-cates size in relation to particle diameter) containing a cylinder of ra-dius 50d and height 100d (Fig. 10). The walls of the cylinder aresmooth, while the floor is “rough”, or particle-sampled. Sand isdropped into the cylinder in batches of 16,384 grains, where eachbatch consists of 32×32×16 grains. Grains in each batch are spacedwith their centres about two particle diameters apart, admittingsmall random perturbation in position, as well as randomised orien-tations. Each batch is centred above the cylinder's floor and droppedfrom roughly 15d above the current pile's top, excepting the firstbatch, which starts 50d above the floor. For each batch dropped, wewait for both the mean collision count and the mean frame time tostabilise for 30 seconds wall-clock time, after which the mean valuesare recorded and the next batch is dropped.

Importantly, since grains are added progressively to the simula-tion, reallocating storage and copying data into new textures wouldneedlessly waste GPU compute time. Instead, we pre-allocate enoughspace to contain all granules, up to those in the last batch. Further-more, since the implied particle and granule ids for empty texturespace are numerically zero, they are ignored in physically based cal-culations and in particle and granule updates. Finally, most of ourtests quantify independent variable performance with respect to ei-ther elapsed frame time or the variable's percentage contribution tototal frame time. The former measure enables comparison across dif-ferent trials, while the latter measure allows for in-trial comparisons.

The effect of sand volume on simulation time is a key determinantof framework performance. In Fig. 11, we show that increases in totalsimulation time are in direct proportion to increases in grain quanti-ty: adding sand grains into the simulation produces a predictable lin-ear increase in the total time spent simulating the entire sand volume.

The question remains as to what causes the increases in total simu-lation time seen with larger grain counts. Given that grain-based pro-cessing alone determines this relationship, we argue a priori for fourdominant causes, viz. greater cost maintaining particle and grain

Page 8: Towards realistic and interactive sand simulation: A GPU-based framework

40d

100d

Plunger

Smooth wall

Particulate base

Fig. 10. The cylinder setup used to hold the simulated sand during testing. Cylinder surfaces are represented implicitly, with a particle-sampled base adding “roughness” and smoothwalls preventing periodic effects. The plunger is used in later tests to apply an axial stress to the material. The suffix ‘d’ indicates units of particle diameter. The image on the rightshows the actual simulation arrangement, including 65,536 grains, with part of the cylinder removed for clarity.

990 J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

properties; longer grid construction time; more grid queries; and morephysically based calculation.

Firstly, for particles and grains, we have already eliminated alloca-tion costs by preallocating the total texture space. This leaves the costof updating the property textures. We measure this by disabling forcecalculation and grid operations. Property updates contribute as little as4 milliseconds (2.6%) to the total cost of simulating 256 K grains (Fig. 11).This contribution is manifestly less than other variables considered.

Next, the computational cost of grid construction is found by dis-abling all physically-based calculation and skipping particle and granuleupdates.We ignore the once-off cost of grid allocation and insteadmea-sure only grid update performance. Our results, seen in Fig. 11, includeinitial and subsequent testing on correspondingly older and newer gen-eration hardware. Grid update time increases linearly with increasingsand volume, while consistently contributing less than 20 millisecondsto total simulation time for up to 256 K grains (one million particles).

A related influence on simulation time is the cost of querying afully-constructed and updated grid. Computing this cost during a simu-lation is problematic, as it calls for eliminating collision calculations thatotherwise prevent particle interpenetration. Such unrestrained inter-penetration soon results in an invalid grid, which affords little

64 128 192 256 320 384 458 512×1,024

0

20

40

60

80

100

120

140

Grain count

Fra

me

time

(mill

isec

onds

)

GTX 470GTX 8800

Fig. 11. Simulation performance measured by comparing computational time versussand volume, showing time spent updating the grid, performing grid queries, calculat-ing particle forces, and updating particle and granule properties. The total simulationtime is shown along with its linear fit (dotted line). For the NVIDIA GTX 8800, wefind linear scaling in frame time with increasing grain count. Subsequent testing onnewer hardware, namely the NVIDIA GTX 470, which has three times the number ofcomputational cores as the GTX 8800 provides approximately 2.5 times less frametime found at matching grain counts. We again find linear scaling in frame time versusgrain count.

meaningful dynamic information on runtime performance. Instead,we record static performance by disabling not only force calculationsbut also particle and granule updates as well. This freezes particle andgranule positions and produces a typical (pile-like) particle-id distribu-tion in the grid. Constructing the grid once and reusing it for queriesproduces the result shown in Fig. 11. For 256 K grains, we find a 3 mil-lisecond query cost, representing a 1.9% contribution to total simulationtime. This result is surprising, as every shader instance, for each of theone million particles, queries its own particle's voxel and twenty-sixsurrounding voxels. The low cost here may follow from our query algo-rithm, which reads the grid in layers. Specifically, nine voxels are readfrom a single layer of the grid, which match to nine bordering texels onthe grid texture. This may facilitate prefetching from the texture units,making local queries fast. This reinforces the negligible contribution ofgrid operations, including both queries and updates, to the total simula-tion cost, especially as sand volume increases.

Finally, since sand (partially) fills the cylinder, additional grains areexpected to produce more collision events and more physically-basedforce calculations. The relationship between grain count and collisioncount is visualised in Fig. 12, wherewe observe linear scaling. This resultmay be explained, in part, by a geometric limit in the maximum numberof spheres that can be in contactwith one another at any given time, sincemost collisions occur in the bulk of the material, where this limiting be-haviour plays a role. In turn, this suggests a possible causal role of increas-ing collision count on increasing simulation time. This is supported inFig. 11, where we find that force calculation is the dominant contributorto total simulation cost.

16 32 48 64 80 96 112 128× 1,024

0.5

1

1.5

2

Grain count

Col

lisio

n co

unt

× 106

Fig. 12. Collision count versus grain count for grains added to a cylinder.

Page 9: Towards realistic and interactive sand simulation: A GPU-based framework

Table 1The significant correlations found between GPU hardware counters. Correlations areshown with their matching p-values in brackets. For example, collisions and fragmentshader activity is read as 0.8562 (pb0.05). We measured GPU performance across six-teen stationary piles with grain quantities from 16 K to 128 K inclusive, at 16 K incre-ments. For each pile, we recorded GPU data for 20 seconds wall-clock time and averagedthe results to obtain the final readings. The recorded data included frame time; fragment,vertex, and fragment shader activity; and the percentage of GPU compute time shadersspent waiting for texture units. We also included collision counts as measured previously.The table shows only the significant correlations we found.

Fragment shader Texture busy Texture waits Frame time

Collisions 0.8562 (0.05) 0.8118 (0.05) 0.9999 (0.0001)Fragmentshader

n/a 0.8883 (0.005) 0.8614 (0.01)

Texture busy n/a 0.7766 (0.05) 0.8143 (0.05)

991J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

We find further support for the importance of force calculation bydirectly analysing hardware performance, viz. Fragment shader, vertexshader, geometry shader, and texture unit use, for increasing collisioncounts. The results of this analysis, performed on stationary sand pilesof varying size, is given in Table 1.

In comparison, most vertex shaders perform simple coordinate trans-formations. However, the most important of these is selecting the posi-tions towrite particles ids to in the grid texture. Yet, our results shownosignificant correlations relating to vertex shaders, which again confirmsthe negligible contribution grid operations make to simulation time.

The correlations found across different collision counts, support theargument that increasing sand pile volume (in the cylinder), lengthensframe time by increasing the amount of physically-based collision cal-culation performed. We further validate this assertion by recordingthe activity of an ad hoc geometry shader added to the grid constructionprogram. Specifically, we replaced the vertex shader, used in grid con-struction, with a trivial (pass-through) shader and employed a geome-try shader to handle the coordinate transform work. Importantly, thegeometry shader replicates all the functionality of the vertex shader,with little modification to the underlying code. This allows us to distin-guish grid construction performance from other influences by observinggeometry shader activity relative to vertex and pixel shader activity.The activity of the geometry shader in Fig. 13 corroborates the lack of in-fluence of grid construction, though it does not discount the effect of gridqueries. However, we observe pixel shader activity and frame time bothincreasing and leveling off at similar times during the test, which again

35 40 45 50 55 60 650

10

20

30

40

50

Time (seconds)

Fra

me

time

(mill

isec

onds

)

frame time

0%

20%

40%

60%

80%

100%

Per

cent

age

CPUGPUPixel shaderVertex shaderGeometry shaderTexture units busyShader waits for texture

Fig. 13. Hardware performance measured by frame time and resource use versuswall-clock time. Frame time, indicatedby the triangle, is plottedwith respect to the left ver-tical axis. All other variables (indicated in the legend) aremeasured against the right verti-cal axis, as percentage use of their respective resources.

supports the relationship between simulation time and physically basedcalculation.

In addition, Fig. 13 also shows maximal use of the CPU, with erraticbehaviour due to both averaging of dual-CPU activity and CPU-GPUcommunication time. However, the GPU is used maximally as well,demonstrating efficient resource use and CPU-GPU transfer. In thecase of inefficient transfer, we would see less GPU activity as resourcesare stalled by data transfer. The results of this section together suggestspecific hardware limits affecting framework performance. We

Fig. 14. Example of general rigid body simulation running in our framework withoutsand.

Page 10: Towards realistic and interactive sand simulation: A GPU-based framework

Table 3Sand features to test for in the sand simulation and the expected result of each test.

Feature Expected findings

Force distributiona Exponential fit, as well as a Gaussian fitto the tail

Force distribution undercompressionb

Gamma function fits the entire range

Force distribution for tangentialforcesc

Exponential fit; and possible Gaussian fitto the tail; with the tail decreasingslower than for normal contact forces

Force–force spatial correlation Only a small local spatial correlation isseen, less than a few particle diameters

Contact angle distribution Friction has only a weak influence onthe distribution

Contact network Weak distinction found between “weak”and “strong” force-bearing structures

Stress–strain behaviour Elastic hysteresis and two-thirds strainrecovery observed during unloading

AOR Friction-dependent behaviour obeys apower law

Orifice flow (fixed D)d Constant mean flow rate, with slower

992 J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

speculate, if future hardware trends continue to include more shadercores, more sand could be simulated, with similar frame time results.However, simulating the same volume of sand at a higher rate, wouldneed larger texture memory bandwidth; faster texture access; improve-ment to the underlying simulation algorithm; or some combination ofthese.

5.1. Comparison to previous work

Providingmeaningful comparison to previous work is not straightfor-ward. In particular, different simulations use different time step lengthsand in most cases lack results for specific grain counts, complicatingframe time and frame rate comparisons.

We resolve these difficulties by basing performance comparisonson the Cundall number, which normalises differences in computationtime, grain quantity, and iteration count. Specifically, we computeC=Nt N/T, where Nt is the number of simulation time steps, N isthe number of rigid bodies, and T is the computational time takenby the CPU or GPU [30]. Therefore, the Cundall number C can be thoughtof as the number of granule time steps per computational second.

Comparing different simulations, based on their (fixed) Cundall num-bers, assumes an ideal situation, namely, these simulations demonstratelinear scaling between computational time and grain count. If a simula-tion exceeds linear scaling, such as with larger grain counts, then itsCundall number decreases for these values. Therefore, favouring previouswork, we calculate their Cundall numbers based on available data for thesmallest grain count (usually implying the smallest computational time).This scheme ensures that any comparison to previous work representsthe strongest criticism of our results. In Table 2, Cundall numbers areused to compare performance results with previous work. We haveused our worst Cundall number over the range of 16 K to 256 K grains.

The Cundall numbers demonstrate the sand simulation performs twoorders of magnitude more granule time steps compared to a CPU-basedDEM simulation, while matching the performance of previous GPU-based simulations. The discrepancy in Cundall numbers between ourGPU-based multiparticle rigid-body simulation, and the monodispersesingle-particle simulation described by Venetilo et al. [23], indicatesmonodisperse simulation perform better. The latter's advantage arisesfrom ignoring three degrees of rotational freedom, which means lesscalculation, but at the cost of less diverse behaviour. Finally, while theGPU outperforms the CPU for monodisperse simulation, we note thatpolydisperse granular simulation has only been addressed on the CPU.Nevertheless,most of the behavioural diversity associatedwith polydis-perse simulation is handled adequately by our multiparticle sandframework, as corroborated by our physical validation.

6. Granular physics validation

Confirming physically valid behaviour in the sand simulation frame-work requires that we first identify characteristic features of static anddynamic granular assemblies (sandpiles). In particular, we seek featuresthat admit comparison to previous DEM simulations and empirical workon real-world sand. In the former case,wemust distinguish between sim-ulations that usemonodisperseparticles,where all particles have the sameshape, size, and mass, as opposed to polydisperse particles, which vary inthese properties. Furthermore, we are interested in the distinction be-tween multiparticle and single particle models of sand. The physical fea-tures requiring validation, as identified in this section, are summarised in

Table 2Performance and feature comparison of the sand framework to other DEM simulations.

Framework Harada et al. Venetilo et al. Ferrez

Parallelism GPU GPU GPU SMPParticle sizes Uniform Uniform Uniform MultipleCundall number 1.490×106 6.652×105 4.736×106 2.0234×104

Table 3. In all cases, the literature provides quantitative data for simulatedor real-world sand. Confirming the physical correctness of the simulationframework, amounts to careful comparison of our results to this previousdata. The testingmethodology for extracting these results from the simu-lation framework is the topic of the next section.

6.1. Testing methodology

Physically validating the sand simulation requires assaying fiveaspects of the sandpile: force distributions and force–force correla-tion; contact structures; stress–strain behaviour; AOR with respectto friction; and granule flow rate through an orifice. In this section,we construct test cases able to measure these features and therebygive an indication of whether the simulated sand meets the expecta-tions summarised in Table 3. Note that since we are primarily inter-ested in assessing the behaviour of granular material under ourmodel, we do not insist on a single set of parameters for all experi-ments. Instead, we choose parameters that are appropriate for eachscenario and that clearly demonstrate that the behaviour we claimis indeed replicated.

6.1.1. ForcesTo analyse the force characteristics of our sand simulation, we adopt

the experimental setup of Silbert et al. [49]. This choice is justifiedby our framework's particle-level similarity to their granular model,where they use uniformly-sized spherical particles. They differ in allo-cating each particle as a separate 3D grain. Nevertheless, their forcemodel similarly assumes cohesionless, inelastic particles, which interactvia either the Hookean (linear) spring or Hertzian contact laws, whereour simulation employs the latter at the particle level.

Themultiparticle framework presented here, however, stands in con-trast to much of the previous work in DEM simulation, including Silbertet al., where single-particle granules are used. In particular, the lattersimulations usually need explicit static-frictionmodelling,whichmay in-fluence measured physical properties, such as the distribution of force.Therefore, we need to collect physical data on granules and their particleconstituents, thus ensuring that any deviation from physically valid be-haviour, at the grain- or particle-level, is detectable.

throughput for increasing interparticlefriction

Orifice flow (variable D)e Relationship between flow rate and orificediameter D obeys Beverloo's law

a Normal contact force distribution for sandpile at rest in cylinder.b Normal contact force distribution for sandpile under uniaxial compression.c Tangential contact force distribution for sandpile at rest in cylinder.d Orifice diameter is fixed across all simulations.e Orifice diameter isfixed for theduration of a simulation, but varied across simulations.

Page 11: Towards realistic and interactive sand simulation: A GPU-based framework

Fig. 15. Test case setup for angle of repose. A sandpile consisting of 64000 granules is placed into the reservoir. The outlet or scupper, running the entire 100d depth of the container,is then opened, thus releasing granules to flow into the sump below. After approximately 24000 granules have passed into the sump, the outlet is closed and the AOR of the materialin the sump is measured. The result of an actual simulation run is shown on the right.

993J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

Similar to Silbert et al. [49], we use a cylindrical container, with sur-faces set to the same frictional and elastic properties as the grains them-selves. As illustrated in Fig. 10, the cylinder is constructed from a “rough”(particle-sampled) circular base,while a smooth implicit surface is used tomodel the cylindrical walls. While both rough and smooth boundariesare addressed in Silbert et al., the rough case requires depth-averagenormalisation of forces. The latter is needed to account for periodic pack-ings due to “granular” (particle-sampled) sidewalls, in contrast to thesmooth case. Nevertheless, they find no significant differences betweenthe two approaches.

For simulation of a stationary sandpile, under previously stated condi-tions, we record both the normal and tangential contact forcemagnitudesexperienced by 64 K grains and 256 K particles, during a singlesimulation time step. Since particle forces are stored in the force tex-ture, we simply copy particle force data directly fromGPU texturemem-ory to CPU memory and save the data to disk. For grains, however, thetotal force is calculated and used only during grain dynamics, but notstored in a texture. Thus, we use an ad hoc texture image in the simula-tion and set this as an additional render target in the fragment shaderhandling grain property updates.

In each case, the probability distributions for the recorded forces arecomputed by a general procedure that we apply to any input data forwhich a probability distribution must be derived. We begin by

50d

Fig. 16. Test case for flow rate through funnel. An inverted cone with an aperture of 45 degr“funnel”with a hole of radius 8d. The funnel is made sufficiently large to comfortably contaiitself is modeled implicitly, as a smooth surface.

calculating the Kaplan-Meier estimate [50] of the cumulative distri-bution function (cdf) from input values (either grain or particleforces, in the present case), to produce an empirical cdf. The resultingcdf values, calculated at discrete points, are binned according to theFreedman–Diaconis rule [51]. The result is “differentiated” by applyingfinite differences to the cdf, to produce values for thematching probabil-ity density function, that is, the probability distribution of the input data.

Measuring forces in a sandpile under uniaxial compression, requiresusing a plunger in the simulation (see Fig. 10), to add a compressivestress at the top of pile. Instead of dynamic compression, however, weapply a vertical force of fixed magnitude and wait for the pile to stabi-lise. The latter means ensuring all grains have near zero linear and rota-tional velocity. When this condition is met, all force magnitudes arerecorded in a single simulation time step. Producing the normal contactforce distributions from these data follows the same procedure as de-scribed above for a sandpile under standard conditions.

6.1.2. Contact geometryA similar methodology to spatial correlation is followed to produce

the distribution of contact angles, which provides information on thecontact geometry within the pile. Here, we are most interested in thedistribution of grain contact angles, as grains are the smallest mobilerigid units comprising the sandpile. Nevertheless, for comparison, we

ees is sliced orthogonally to its central axis, thereby removing its apex and producing an 64 K granules and is placed with its hole at a height of 50d above the floor. The funnel

Page 12: Towards realistic and interactive sand simulation: A GPU-based framework

10−1

10−2

10−3

10−4

10−5

fn

P(f

n) particle normal forceparticle exponential fitparticle Gaussian fitgrain normal forcegrain exponential fitgrain Gaussian fit

particle normal forceparticle gamma fitgrain normal forcegrain gamma fit

0 1 2 3 4 5 6 7 8

100

10−1

10−2

10−3

10−4

10−5

fn

P(f

n)

0 1 2 3 4 5 6 7 8

100

10−1

10−2

10−3

10−4

10−5

fn

P(f

n)

0 1 2 3 4 5 6 7 8

100

10−1

10−2

10−3

10−4

10−5

ft

P(f

t)

0 1 2 3 4 5 6 7 8

particle tangential forceparticle Gaussian fitparticle normal force Gaussian fitgrain tangential forcegrain Gaussian fitgrain normal force Gaussian fit

particle normal forceparticle exponential fitparticle Gaussian fitgrain normal forcegrain exponential fitgrain Gaussian fit

100

Fig. 17. Results for normal and tangential force distributions, under standard condi-tions, with normal forces recorded under uniaxial compression as well.

0 0.5 1 1.5 2 2.5 3 3.5 40

2

4

6

8

10

12

r/d

F(r

)

µ = 0.0, normalµ = 0.5, normalµ = 0.5, tangential

Fig. 18. Spatial force-pair correlation function for normal and tangential contact forces.The results for μ=0 and μ=0.5 are plotted against a distance r normalised by the fixedparticle diameter d. The dashed black lines indicates F≡1, while the red dashed line isat F≡1.269.

3.5× 10−2

994 J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

compute the distribution of particle contact angles as well. The latterentails using neighbourhood queries to locate particle-level contactsamong distinct grains, while ignoring same-grain particle contacts, asthese do not inform about the interaction between moving compo-nents. Thus, in deriving both distributions, we need to resolve particle

Table 4Comparison of the parameters for fitting functions across test cases. The values for par-ticles and granules are from the simulated material. Test cases involving friction useμ=0.5.

Test case Exponential Gaussian Gamma

a b β α a b

Particles Friction 2.10 0.61 1.29 1.51No friction 2.78 0.72 1.46Compression 3.36 0.30

Granules Friction 2.14 0.66 1.24 2.45No friction 10.05 1.01 2.13Compression 6.14 0.16

Silbert et al. Friction 2.55 0.65 1.35

ids into their parent granules' ids, which is possible through the map-ping described in Section 4.1.

Global features of intergranule contact, namely, the contact networkcan be examined indirectly. For comparison to the result of Silbert et al.[49], we quantify the contribution of the “weak” and “strong” subnet-works to the average normal contact force in the bulk of the material.In particular, using a variable threshold force, fcut, we may calculate thefraction of contacts remaining in the force network whose contact forceis greater than fcut (for strong forces) or less than fcut (for weak forces).Similarly, we are able to calculate the percentage contribution each net-work makes to the average force.

6.1.3. Stress–strainThe global response of amaterial to a large uniform stress is an impor-

tant mechanical consideration, since this determines, in part, how thesimulated material will interact with arbitrary objects in the environ-ment. Using the same test setup as before (see Fig. 10), including theplunger for compression, we simulate the stress–strain experiment ofCoetzee et al. [52]. Sand is compressed initially by a small force to producea flattened top, after which the plunger is eased upwards to the lowestheight at which zero compression is recorded. The experiment then be-gins by recording the deformation of sand in response to increasingforce applied downwards by the piston. The compression rate used issmall to prevent dynamic effects and allow the sandpile adequate timeto respond.

6.1.4. AORAnother global feature of the sandpile is its AOR. Numerousmethods

exist tomeasure the AOR [53], at least two ofwhich employ directmea-surement of a stationary pile, formed by either grain injection [54] or

0 10 20 30 40 50 60 70 80 900

0.5

1

1.5

2

2.5

3

P(

)

µ = 0.0µ = 0.5µ = 0.5, fn > 2

Fig. 19. Probability distribution P(θ) for grain-grain contact angles. Here, θ is given rel-ative to the local spherical system and defined as the angle the contact pair makes withthe vertical, θ=0. Thus, horizontal contacts are given by θ=90. The dotted line indi-cates the distribution of forces twice the mean force.

Page 13: Towards realistic and interactive sand simulation: A GPU-based framework

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

fcut

Con

trib

utio

n to

nor

mal

forc

e

0%

20%

40%

60%

80%

100%

Rem

aini

ng c

onta

cts

fn < fcut

fn > fcut

Fig. 20. Fractional contribution to the bulk normal contact force (solid lines) and thepercentage of particle contacts that make up that contribution (dashed lines), as afunction of imposed contact force threshold fcut for a sand pile with μ=0.5. The bluelines (dotted and solid) indicate contributions from normal forces larger than the cut-off force, while black lines are for forces smaller than the cutoff. The arrow indicatesthat 50% of particle contacts contribute to 81% of the bulk average contact force.

995J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

grain discharge [9]. Alternatively, indirect measurement is achieved bytilting a leveled heap, typically in a rotating drum, until avalanche oc-curs [55].

We use an injection scheme, as shown in Fig. 15, modeled on the dis-charge schemeof Zhou et al. [56], but differing in two crucialways. Firstly,our test case discharges sand from a reservoir, through a single scupper,into the sump below. Since the AOR is measured for the pile in thesump, our approach is technically an injection scheme. Zhou et al., in con-trast, measures the AOR of sand released into the reservoir. This measure-ment ismade once excess runoff has discharged via two scuppers locatedon either side of the reservoir. We do not use this approach in our case,since friction and interlocking between non-spherical granules may has-ten scupper discharge by “pulling” otherwise stationary granules into theoutflow. The resulting AOR, therefore, fails to accurately model pile for-mation under conditions of more gradual runoff and, instead, estimatesthe resulting AOR after avalanching. Secondly, Zhou et al. use a containerwith an initial depth of four particle-diameters, which is varied across ex-periments. Instead,we choose to use amuch larger container depth to re-duce sidewall effects.

6.1.5. Granular flowThe final simulation property to measure is the flow rate of sand

through an orifice, such as present between the “bulbs” of an hourglass.While arbitrary hourglass designs are not guaranteed to produce con-stant mean flow rate, specific boundary conditions admitting this be-haviour have been found empirically [57]. In particular, an inverted

−3.5−3−2.5−2−1.5−1−0.50

−35

−30

−25

−20

−15

−10

−5

0

Axial strain (%)

Axi

al s

tres

s (k

Pa)

loading

unloading

Fig. 21. Uniaxial stress–strain behaviour for a sandpile in serial loading and unloadingphases. Hysteresis is seen as the unloading phase returning along a different path. Inaddition, unloading does not return to the origin, showing an incomplete recovery ofthe initial strain.

cone (hourglass) exhibits constant mean flow rate while the materialit holds stays at a height of two-and-half times its width (W) or more.In addition, thewidth of thematerial needs to conform to the inequalityW>D+30d, where D is the diameter of the outlet [8]. Finally, the flowrate itself is relatively smooth provided 4dbDb6d, where values of Dsmaller than the lower bound often produce blockage [8].

Considering these conditions, we construct the test case as shownin Fig. 16. In particular, we examine two determinants of flow rate,namely, interparticle friction and the outlet diameter D. In the lattercase, the outlet diameter takes on values in the interval [8d,24d].

6.2. Results and analysis

In the previous section, testing procedures for five aspects of thesandpile were outlined. In this section, we give the results of thosetests. For forces, we give the distributions for normal and tangentialcontact under standard conditions and under uniaxial compression, aswell as the results for force–force spatial correlation. The contact geom-etry among granules is quantified indirectly, through contact angledistributions and changes in intergranule contact count over differentforce thresholds.We show the strain in a sandpile relative to an appliedstress. Finally, we quantify the effect friction has on sandpile AOR andgranule flow rate. In the latter case, we also observe flow rate across dif-ferent outlet diameters.

6.2.1. ForcesFig. 17(a) shows the distributions of normal contact force magni-

tudes for particles and granules, in an unloaded sandpile, using a coeffi-cient of friction μ=0.5. For both particles and granules, the matchingexponential fits showgood agreementwith the exponentialfit of Silbertet al. [49]. Interestingly, while the fit parameters for particles and gran-ules are numerically close to one another (see Table 4), their graphsshow a marked deviation from one another, beginning with forcestwo to three times the mean force, that is, fn>2. In addition, while theexponential fit appears to give a closer approximation to the particledistribution, the latter distribution also starts deviating from the fit,near fn=4.

The deviation between particle and granule distribution is not un-expected, as some of the large forces that act on particles act tangen-tially to the parent grain's centre. In this way, they may neutralise oneanother through conversion into opposing angular accelerations. Ofgreater interest is the finding of a disagreement between the expo-nential fit and the distributions around fn=3. For comparison, wehave included in Fig. 17(a) the Gaussian approximations to theshape tails of the distributions. Surprisingly, in both cases, the Gauss-ian fit, with exponents in the expected ranges (see Table 4), providesa visibly better approximation over the entire distribution, not justthe tails. This result suggests that the distribution of small forces(fnb1), which are not included in the Gaussian fit calculation, are pre-dicted by the distribution of larger forces.

While agreeing strongly in the tails of the P(fn), the smaller forceregions appear to lack an obvious peak or plateau near fn=1. Howev-er, we found disagreement among previous experimental work andDEM simulation concerning the behaviour of P(fn) as fn approaches0 [49]. In addition, we face the burden of a single-precision floatingpoint arithmetic on the GPU, where rounding or numerical impreci-sion may invalidate comparisons based on small force magnitudes.

Nevertheless, previous work in DEM simulation finds, for forcesless than the mean, the distribution bends downwards when μ=0 and upwards when friction is present in the simulation [49]. Wefind the same result, as seen in Fig. 17(b), where both distributionsevidence a peak near fn=1. In addition, the Gaussian fits to the taildata provide better approximations to these distributions, as before.This result again shows that smaller forces can be estimated fromtail data alone. It also suggests the Gaussian fit is useful across differ-ent coefficients of friction.

Page 14: Towards realistic and interactive sand simulation: A GPU-based framework

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

5

10

15

20

25

Coefficient of sliding friction

Ang

le o

f rep

ose

(deg

rees

)

Fig. 22. Angle of repose versus the coefficient of sliding friction. Particle, wall, and floorrestitution is set to 0.5. Friction at the walls and floor is fixed at 0.5. The line representsa power function fit to the data with an exponent of 0.51.

8 10 12 14 16 18 20 22 24 260

0.2

0.4

0.6

0.8

1

1.2

1.4

D

Wg

× 104

Fig. 24. Flow rate Wg of grains versus the aperture diameter D of the inverted cone. Wg

is the number of grains fallen per unit time. The dashed line indicates the best fit,which is given byWg ¼ 7:94 D−4:107ð Þ5

2 , where the first constant incorporates the den-sity and gravitational coefficients.

996 J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

Our results for uniaxial compression are shown in Fig. 17(c), alongwith a gamma distribution fit in each case. The gamma fit for the parti-cle and grain force distributions appear to closely match the data (seealso Table 4). We also observe clear peak formation near fn=1 and amuch smaller force range than seen in previous distributions. The lattertwo observations may be explained by the presence of force bearingstructures in the sandpile, each conducting forces of different magni-tudes. When these structures are overwhelmed by a much larger ap-plied force, a more uniform force structure is produced in the pile.Thus, a largermean force dominates among the recorded forces, produc-ing peak behaviour at the mean and a smaller normalised force range.

For completeness, we show the distribution of tangential forcesP(ft) in Fig. 17(d), which includes the Gaussian fits to the normal con-tact force distributions for comparison. As before, normalised forceshave been used. The results show that P(ft) decays more slowlythan P(fn), which is in agreement with previous DEM simulation [49].

Finally, all force distributions in Fig. 17 appear to flatten out at theend of their tails. This is partly a result of the velocity and penetrationconstraints employed by the simulation, which prevents particles andgrains from disobeying the CFL condition [58], thus guaranteeing stablesimulation behaviour. This amounts to preventing excessive penetra-tion during a simulation time step, which would otherwise producelarger forces than expected and numerical instability. Thus, large forcesarise only sporadically in the simulation and correspond to unit-sizedbins at the far end of the force histogram. Conversion of the latter intoa distribution of unit area results in the apparent flatness at the lowerend of the logarithmic scale.

0 1 2 3 4 5 6 7 8 9

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Simulation time (seconds)

Gra

ins

rem

aini

ng

µ = 0.0µ = 0.2µ = 0.4µ = 0.6µ = 0.8

× 104

Fig. 23. Flow rate through an “hour glass” (inverted cone). 16 K grains, each weighing0.0143 kg, pass through a hole of diameter 8d in the bottom of an inverted cone havingan aperture of 45 degrees and a particle-wall friction coefficient μ=0.2. Since grainsare dropped into the cone at the start of simulation, we discount initial dynamic effectsby collected data once 1 K grains have passed through the hole. The resolution of therecorded data is five times denser than indicated by the plotmarks.

In Fig. 18, we give the spatial force–force correlation derived from thebulk of the simulated material. Distances in the diagram are normalisedby particle diameter and are found to be in the range 1− 1

10ð Þd;4d½ �. Theupper bound was chosen beforehand for comparison with the rangeused in previouswork. However, the lower bound results from a physicalconstraint set in the simulation that limits particle penetration depth toone tenth of a particle diameter. This prevents the production of verylarge forces and consequent numerical instability.

The latter penetration constraint is reflected as a large increase in Ffor r near 1− 1

10ð Þd, where forces of similar magnitude direct particlesaway from one another. In addition, we see a negligible effect of frictionon the correlation function, where the presence of friction resulted inonly a very slight increase in local correlation in previousDEMsimulation[49]. For all three cases in the figure, localised correlation is observedextending roughly two particle diameters into the bulk, suggesting anunlocalised structure to force transmission network.

We derive the previous observations based on the reference lineF≡1.269, whichwe found through a linearfit, constrained by a zero gra-dient. This produces the expected tapering off of F beyond r=2.5d. Pre-vious works finds the same behaviour, but relative to the line F≡1[49,59]. One possible cause for this anomaly is that we base our calcula-tions on particles. The interlocking between granules might producelong range networks of contact, leading to a slight uniform increase in F.

6.2.2. Contact geometryThe contact angle distributions are given in Fig. 19, where we give re-

sults for the presence and absence of friction in the simulation. For thefrictional case, we include a second distribution of contact angles, butonly among grains experiencing force magnitudes larger than twice themean force. Importantly, these distributions are derived from intergranulecontacts, thereby helping to discern geometric structure betweenmobileelements in the sandpile. The results show that friction does not play asignificant role in the relative orientations of grains. In the presence offriction, previous work has found large forces concentrated at lower an-gles, where we observe a small increase in concentration of large forcesat large angles, that is, in the horizontal plane. This suggests interlockingamong irregular granules, which results in contact “networks” that dissi-pate forces to the sides of the container.

In Fig. 20, we show the fraction of contacts remaining in the force net-work, for contact forces greater than or less than the cutoff force fcut. Thisindicates the size of the “strong” and “weak” force bearing parts of thesandpile, respectively, supporting the cutoff force. Also shown is the per-centage contribution these structures make to average force, given a spe-cific cutoff force.

The main result, indicated by the arrow in the figure, is that half ofthe contacts in the “strong” force network (where fn> fcut) contributeapproximately 81% to the average contact force. The latter distinctionis not as large as one might expect. However, this result is in

Page 15: Towards realistic and interactive sand simulation: A GPU-based framework

Fig. 25. Stable dune formation with an uneven surface.

997J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

agreement with Silbert et al. [49], who found no clear evidence of adistinction between “weak” and “strong” force phases in theforce-bearing structures.

6.2.3. Stress–strainIn Fig. 21, we show the change in stress and strain in a confined

sandpile produced by serial loading and unloading. The results showelastic hysteresis at the start of the unloading phase, as expected byelasticity theory and corroborated by experiment [52]. We also seeroughly two-thirds of the strain recovered at the end of the unloading

phase, which agrees with the experimental findings of Coetzee et al.[52]. Surprisingly, their numerical simulation, which used dual-particlegranules, failed to show this behaviour. This suggests that the advan-tage of our model arises from interlocking among complex granules,possessing pronounced convex and concave surface features. Suchgeometrical “roughness”may facilitate irreversible sliding, when grainsexperience enough compression to rearrange into a pattern that pre-vents particle-concavity disengagement.

6.2.4. AORIn Fig. 22, the results from the AOR measurements across different

coefficient of sliding friction are shown. Importantly, the AOR measure-ment involves a large degree of uncertainty (with 95% confidence inter-val of ten degrees), caused by nonlinear slope features. We attempt tomitigate some of the difficulty in computing the slope by using a robustlinear least-squares regression with bisquare weighting on surface gran-ules. This minimises the effect of outliers, while providing a good fit theentire range of surface granules. As evidenced in Fig. 22, the resultingAORs appear to follow a power law, which is in agreement with findingsof Zhou et al. [56].

6.2.5. Granular flowIn Fig. 23, the number of granules remaining in the cone (upper bulb)

is given as a function of time, for different values of the particle-particlefriction coefficient. We observe a constantmean flow rate in the outflowfrom the inverted funnel, visualised as a strong linear relationship ingrain quantity and simulation time. The latterfinding and the observationthat increased interparticle friction leads to longer drainage time, are inaccordance with experimental results [60].

In Fig. 24, we provide simulation results for mass flow rate in rela-tion to aperture diameter. We see good agreement with Beverloo etal. [61] and previous numerical simulation [60]. Importantly, this testwas performed with 64 K particles, which conforms to the height re-quirement for material above the orifice, as discussed in Section 6.1.5.

7. Discussion

The main aim of testing was to corroborate valid granular behav-iour in the simulation. In line with this goal, we quantified key phys-ical properties of the simulated sand and compared our results toprevious work.

Testing began with measurement of force distributions in thesandpile. Comparison of the parameter values of the Gaussian and ex-ponential fits, as well as our subjective interpretation of the distribu-tion shapes, indicated a close match to previous work [49]. Moreover,our results corroborated recent granular science research that advo-cates the superiority of the Gaussian fit, relative to the exponentialfit, for a sandpile under standard conditions [62,63,4,64,65]: We sawthe Gaussian matching closely to and interpolating the empirical dis-tributions over most of their force ranges, where the exponentialfits failed beyond forces twice to three times the mean. For a sandpileunder uniaxial compression, we found the Gamma fit interpolatedmost of the data, supporting its use in modelling compressed granularheaps, as suggested by previous work [10].

Digging deeper into sandpile morphology, we tested force–forcespatial correlations. The results revealed localised correlation, whichwas not unexpected [49]. The unexpected, however, did occur in thedistribution of intergranule contact angles. This useful indirect measureof contact geometry showed that our granules appeared to distributeforces horizontally, to the sides of the container, rather than vertical-ly, as with spherical granules [49]. However, both localisation andsupporting sidewalls are supported by the idea of localised stresstracts in sandpiles, which are implicated in the arching behaviourof heaped granules [66].

Frictional behaviour among granule contacts also influences the angleof repose (AOR) of a sandpile, as our testing has shown. This relationship

Page 16: Towards realistic and interactive sand simulation: A GPU-based framework

Fig. 26. Sand flowing out of an inverted cone. Sand is initially dropped into the top of the inverted funnel, which mimics an hourglass in supporting constant mean flow rate throughthe bottom opening. In addition, pile formation is demonstrated underneath the opening.

998 J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

is expected, since sliding friction among particles and between particlesand containerwalls plays a key role in controlling the linear and rotation-al motions of heaped particles [56]. In the case of multiparticle granules,our results indicate a possible power law between sliding friction andAOR, agreeing with uniparticle granule simulation of Zhou et al. [56].However, we stress that while these conditions are necessary, they arenot sufficient to recognise a power law relationship. Full corroborationwould require extensive testing, using varied grain sizes and other or-thogonal features. Nevertheless, without further testing, the simulationevidences compelling AOR-dependent behaviour, such as dune forma-tion, as seen in Fig. 25. Here, sand is dropped in two columns into a con-tainer, resulting in a typical dune-like surface.

In another test, pushing down on the sandpile and then removingthis compression, produced an initial elastic hysteresis, followed bytwo-thirds recovery of the strain by the end of unloading. This simu-lation result reproduces the physical experiment of Coetzee et al. [52],where their numerical simulation did not. The advantage of our sandsimulation framework is likely due to the irregular grain geometry,which may model interlocking among granules, and thus better in-hibit full recovery of the initial strain.

Finally, we examined another well-known feature of sand and sim-ilar granular materials: its constant mean flow rate through an outlet,whichmakes simple hourglasses possible. Our experiments reproducedconstant mean flow rate, across different sliding friction coefficients, aswell as in accordance with Beverloo's law, across different outlet diam-eters. An example of the hourglass simulation is given in Fig. 26, wherethe sand in the funnel evidences ameniscus (central depression) due tofaster outflow at the centre.

Thus, for modelling the physically valid behaviour of real sand,lying in a container or interacting with its environment, includingbeing compressed from above or flowing out from below, the simulat-ed sand has demonstrated its applicability in all cases. Moreover, ourfindings show that a multiparticle model of irregular grain geometryis able to mimic many interesting and characteristic behaviours ofsand. The only question left to answer is whether the computationalcost involved in producing physically compelling behaviour admitsreal-time performance.

In these examples, we show rigid bodies interacting with sand. InFig. 27, an avalanche causes a small structure to topple. The two-blockstructure experiences a force of its bottom, causing it to fall slightly to-wards the oncoming deluge. When the sand settles, the blocks swaybackwards, with momentum causing the structure to topple. In Fig. 14,we give an example of rigid body interaction without sand.

8. Conclusions

Amajor theoretical contribution of this paper is the developmentof a GPU-based framework which allows for real-time granular

material simulation. Simulations done with this framework repro-duce many important physical properties of real sand: dynamic be-haviours, including both elastic hysteresis and strain-loss in theunloading phase of compression; constant mean flow rate of grainsin an hourglass; and pile formation with an angle of repose predict-ably dependent on intergranule friction. In particular, the simulationframework has been shown to agree with previous work on forcedistributions in sand, spatial force–force correlations, and internalsandpile geometry.

Furthermore, our simulation supports interactionwith a general en-vironment. We have described a novel approach for representing largesimple surfaces implicitly, as particles. The objects are point-sampledand then treated as large “granules”. In thisway, our simulation natural-ly handles arbitrary rigid body interaction, thus making it applicable tobroader real-time simulation applications. An additional advantage ofthis approach is decreased texture memory consumption, resulting inincreased memory space for storing explicitly represented bodies andgranules.

A further practical contribution of this work is an illuminationmodel that mitigates the computationally expensive lighting requiredwith large grain numbers. We accelerate granule rendition byexploiting the underlying particle representation. Specifically, wehave used an implicit sphere representation to produce real-timesurface reflection and shadowing, where previously lighting of com-plex explicit geometry was needed. Our simulation thus admitsreal-time lighting, which includes granular self-shadowing andshadowing among granules and the environment. This is importantfor reproducing the subtle and distinctive visual appearance of gran-ular material.

Our efficient GPU implementation of a this DEM particle model ac-celerates the expensive particle physics calculations entirely on theGPU to allow for realistic real-time simulation of granular material. Col-lision detection and grain property updates are performed entirely inthe GPU rendering pipeline. In addition, the implementation also sup-ports real-time visualisation of the simulation results, which followsfrom having simulation data immediately available within the graphicspipeline.

For real-time visual effects involving granular materials, theframework's appeal lies in the linear scaling between grain quantityand frame time. In addition, the ability to handle large sand volumes(up to one million granules) facilitates larger offline applications, suchas visual effects for film. Provided GPU hardware development con-tinues to addmore processing (shader) cores and larger supporting tex-ture memory interfaces, we expect the framework will handle andevolve much larger sand volumes at a similar rate.

Our results support the use of the sand simulation frameworkas a means to produce physically valid granular behaviour, whichconceivably admits real-world applications. These include diverse

Page 17: Towards realistic and interactive sand simulation: A GPU-based framework

Fig. 27. An avalanche toppling a structure.

999J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

modelling applications, from settling of corn grains during trans-port and storage to the toning process in electro-photographiccopiers.

References

[1] H.M. Jaeger, S.R. Nagel, Physics of the granular state, Science 255 (5051) (1992)1523–1531.

[2] S. Luding, From microscopic simulations to macroscopic material behavior, ComputerPhysics Communications 147 (1) (2002) 134–140.

[3] S. Luding, M. Lätzel, W. Volk, S. Diebels, H. Herrmann, From discrete element simula-tions to a continuummodel, ComputerMethods inAppliedMechanics andEngineering191 (1–2) (2001) 21–28.

[4] T. Majmudar, R. Behringer, Contact force measurements and stress-induced anisot-ropy in granular materials, Nature 435 (7045) (2005) 1079–1082.

[5] J. Williams, G. Hocking, G. Mustoe, The theoretical basis of the discrete elementmethod, in: NUMETA-85, 1985.

[6] J.T. Jenkins, S.B. Savage, A theory for the rapid flow of identical, smooth, nearlyelastic, spherical particles, Journal of Fluid Mechanics 130 (1983) 187–202.

[7] P.A. Cundall, O.D.L. Strack, A discrete numerical model for granular assemblies,Geotechnique 29 (1979) 47–65.

[8] D. Hirshfeld, D. Rapaport, Granular flow from a silo: discrete-particle simulationsin three dimensions, The European Physical Journal E: Soft Matter and BiologicalPhysics 4 (2) (2001) 193–199.

[9] Y. Zhou, B. Xu, A. Yu, P. Zulli, Numerical investigation of the angle of repose ofmonosized spheres, Physical Review E: Statistical, Nonlinear, and Soft MatterPhysics 64 (2) (2001) 021301.

[10] C. Radeke, K. Bagi, B. Palancz, D. Stoyan, On probability distributions of contact forcemagnitudes in loaded dense granular media, Granular Matter 6 (1) (2004) 17–26.

[11] R. Balevičius, R. Kačianauskas, Z. Mroz, I. Sielamowicz, Microscopic and macro-scopic analysis of granular material behaviour in 3D flat-bottomed hopper bythe discrete element method, Archives of Mechanics 59 (2007) 231–257.

[12] R. Kačianauskas, A. Maknickas, A. Kačeniauskas, D. Markauskas, R. Balevičius, Paralleldiscrete element simulation of poly-dispersed granular material, Advances in Engi-neering Software 41 (1) (2010) 52–63.

[13] J. Landry, G. Grest, L. Silbert, S. Plimpton, Confined granular packings: structure,stress, and forces, Physical Review E 67 (4) (2003) 41303.

[14] J. Landry, G. Grest, S. Plimpton, Discrete element simulations of stress distributions insilos: crossover from two to three dimensions, Powder Technology 139 (3) (2004)233–239.

[15] J. Shen, M. Lipasti, Modern Processor Design: Fundamentals of Superscalar Processors,McGraw-Hill, 2005.

[16] I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, P. Hanrahan, Brookfor GPUs: Stream Computing on Graphics Hardware, in: ACM SIGGRAPH 2004Papers, ACM, 2004, pp. 777–786.

[17] GeForce 8800 GTX specifications, http://www.nvidia.com/object/product_geforce_gtx_8800.html, 2008.

[18] GeForce 480 GTX specifications, http://www.nvidia.com/object/product_geforce_gtx_480_us.html, 2010.

[19] J. Owens, GPU Architecture Overview, in: International Conference on ComputerGraphics and Interactive Techniques, ACM Press, New York, NY, USA, 2007.

[20] Z. Fan, F. Qiu, A. Kaufman, S. Yoakum-Stover, GPU cluster for high performance com-puting, in: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, IEEEComputer Society, 2004, p. 47.

[21] A. Kolb, L. Latta, C. Rezk-Salama, Hardware-based simulation and collision detec-tion for large particle systems, Graphics Hardware (2004) 123–132.

[22] T. Harada, S. Koshizuka, Y. Kawaguchi, Sliced Data Structure for Particle-Based Sim-ulations on GPUs, in: Proceedings of the 5th International Conference on ComputerGraphics and Interactive Techniques in Australia and Southeast Asia, ACM, NewYork, NY, USA, 2007, pp. 55–62.

[23] J. Venetillo, W. Celes, GPU-based particle simulationwith inter-collisions, The VisualComputer 23 (9) (2007) 851–860.

[24] R. Yasuda, T. Harada, Y. Kawaguchi, Real-time simulation of granular materialsusing graphics hardware, international conference on computer graphics, Imag-ing and Visualization (2008) 28–31.

[25] J. Schäfer, S. Dippel, D. Wolf, Force schemes in simulations of granular materials,Journal de Physique I (France) 6 (5) (1996) 5–20.

[26] L. Brendel, S. Dippel, Lasting contacts in molecular dynamics simulations, Physicsof Dry Granular Media (1998) 313–318.

[27] N. Bell, Y. Yu, P.J. Mucha, Particle-based simulation of granular materials, in: Pro-ceedings of the 2005 ACM SIGGRAPH/Eurographics Symposium on Computer An-imation, 2005, pp. 77–86.

[28] F. Alonso-Marroquin, Spheropolygons: a new method to simulate conservative anddissipative interactions between 2D complex-shaped rigid bodies, Europhysics Let-ters 83 (2008) 14001.

[29] F. Alonso-Marroquín, Y.Wang, An efficient algorithm for granular dynamics simula-tions with complex-shaped objects, Granular Matter 11 (5) (2009) 317–329.

[30] P. Cleary, Ball motion, axial segregation and power consumption in a full scaletwo chamber cement mill, Minerals Engineering 22 (2009) 809–820.

[31] M. Woo, J. Neider, T. Davis, OpenGL Programming Guide: The Official Guide toLearning OpenGL Version 1.1, Addison-Wesley, 1997.

[32] E. Angel, Interactive Computer Graphics: A Top-Down Approach Using OpenGL,4th edn Addison-Wesley, 2006.

[33] M. Segal, A. Akeley, The OpenGL Graphics System: A Specification (Version 3.0),http://www.opengl.org/registry/doc/glspec30.20080811.pdf, 2008.

[34] R. Rost, The OpenGL Shading Language, 2nd edition Addison Wesley Professional,January 25, 2006.

[35] J. Kessenich, D. Baldwin, R. Rost, The OpenGL Shading Language, vol. 1, 2008.(http://www.opengl.org/registry/doc/GLSLangSpec.Full.1.30.08.pdf).

[36] Compute Unified Device Architecture, http://developer.nvidia.com/cuda, 2008.[37] J. Sanders, E. Kandrot, CUDA by Example: An Introduction to General-Purpose

GPU Programming, Addison-Wesley Professional, 2010.[38] D.B. Kirk, W.W. Hwu, Programming Massively Parallel Processors: A Hands-on

Approach, Morgan Kaufmann, 2010.

Page 18: Towards realistic and interactive sand simulation: A GPU-based framework

1000 J.-P. Longmore et al. / Powder Technology 235 (2013) 983–1000

[39] A. Munshi, The OpenCL Specification, http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf, 2010.

[40] M. Joshi, Graphical Asian options, Wilmott Journal 2 (2) (2010) 97–107.[41] NVIDIA GPU Programming Guide: GeForce 8 and 9 Series, Revision 1.0, http://

developer.download.nvidia.com/GPU_Programming_Guide/GPU_Programming_Guide_G80.pdf, 2008.

[42] J. Ferrez, Dynamic Triangulations for Efficient 3D Simulation of Granular Materials,Ph.D. thesis, École Polytechnique Fédérale de Lausanne, 2001.

[43] T. Harada, Real-time rigid body simulation on GPUs, GPU Gems 3 (2007) 611–632.[44] D. Wolf, F. Radjai, S. Dippel, Dissipation in granular materials, Philosophical Magazine

Part B 77 (5) (1998) 1413–1425.[45] W. Press, S. Teukolsky, W. Vetterling, B. Flannery, Numerical Recipes in C: The Art

of Scientific Computing, 2nd ed. Cambridge Univ. Press, 1992.[46] K. Shoemake, Animating rotation with quaternion curves, ACM SIGGRAPH Com-

puter Graphics 19 (3) (1985) 245–254.[47] D. Baraff, An Introduction to Physically Based Modeling: Rigid Body Simulation II:

Nonpenetration Constraints, SIGGRAPH Course Notes, 1997.[48] R. Hammerstone, M. Craighead, K. Akeley, Vertex buffer object extension specifica-

tion, http://www.opengl.org/registry/specs/ARB/vertex_buffer_object.txt, .[49] L.E. Silbert, G.S. Grest, J.W. Landry, Statistics of the contact network in frictional

and frictionless granular packings, Physical Review E 66 (6) (2002) 061303,http://dx.doi.org/10.1103/PhysRevE.66.061303.

[50] E. Kaplan, P. Meier, Nonparametric estimation from incomplete observations,Journal of the American Statistical Association (1958) 457–481.

[51] D. Freedman, P. Diaconis, On the histogram as a density estimator: L2 theory,Probability Theory and Related Fields 57 (4) (1981) 453–476.

[52] C. Coetzee, D. Els, Calibration of granular material parameters for DEMmodel-ling and numerical verification by blade–granular material interaction, Journal ofTerramechanics 46 (1) (2009) 15–26.

[53] M.A. Carrigy, Experiments on the angles of repose of granularmaterials, Sedimentology14 (3–4) (1970) 147–158 (ISSN 00370746).

[54] Y. Grasselli, H.J. Herrmann, On the angles of dry granular heaps, Physica A: Statisticaland Theoretical Physics 246 (3–4) (1997) 301–312.

[55] N. Pohlman, B. Severson, J. Ottino, R. Lueptow, Surface roughness effects in granularmatter: influence on angle of repose and the absence of segregation, Physical ReviewE 73 (2006) 031304.

[56] Y.C. Zhou, B.H. Xu, A.B. Yu, P. Zulli, An experimental and numerical study of theangle of repose of coarse spheres, Powder Technology 125 (1) (2002) 45–54(ISSN 0032–5910).

[57] R. Nedderman, U. Tüzün, S. Savage, G. Houlsby, The flow of granular materials I: dis-charge rates from hoppers, Chemical Engineering Science 37 (11) (1982) 1597–1609.

[58] R. Courant, K. Friedrichs, H. Lewy, On the partial difference equations of mathe-matical physics, IBM Journal of Research and Development 11 (2) (1967) 215.

[59] D. Mueth, H. Jaeger, S. Nagel, Force distribution in a granular medium, PhysicalReview E 57 (3) (1998) 3164–3169.

[60] C. Mankoc, A. Janda, R. Arévalo, J. Pastor, I. Zuriguel, A. Garcimartín, D. Maza, Theflow rate of granular materials through an orifice, Granular Matter 9 (6) (2007)407–414.

[61] W. Beverloo, H. Leniger, J. van de Velde, The flow of granular material through or-ifices, Chemical Engineering Science 15 (1961) 260–269.

[62] C.S. O'Hern, S.A. Langer, A.J. Liu, S.R. Nagel, Random packings of frictionless parti-cles, Physical Review Letters 88 (7) (2002) 075507, http://dx.doi.org/10.1103/PhysRevLett.88.075507.

[63] J. Brujic, S.F. Edwards, I. Hopkinson, H.A. Makse, Measuring the distribution ofinterdroplet forces in a compressed emulsion system, Physica A: Statistical Me-chanics and its Applications 327 (3–4) (2003) 201–212.

[64] J. Zhou, S. Long, Q. Wang, A.D. Dinsmore, Measurement of sources inside a three-dimensional pile of frictionless droplets, Science 312 (5780) (2006) 1631–1633.

[65] A.R.T. van Eerd, W.G. Ellenbroek, M. van Hecke, J.H. Snoeijer, T.J.H. Vlugt, Tail ofthe contact force distribution in static granular materials, Physical Review E 75(060302) (2007) 060302(R).

[66] H.M. Jaeger, S.R. Nagel, R.P. Behringer, Granular solids, liquids, and gases, Re-view of Modern Physics 68 (4) (1996) 1259–1273, http://dx.doi.org/10.1103/RevModPhys.68.1259.