53
Real Time Depth Sorting of Transparent Fragments AXEL LEWENHAUPT Master’s Thesis at NADA Supervisor (KTH): Christopher Peters Supervisor (Fatshark): Axel Kinner Examiner: Johan Håstad

Real Time Depth Sorting of Transparent Fragments

Embed Size (px)

Citation preview

Page 1: Real Time Depth Sorting of Transparent Fragments

Real Time Depth Sorting of Transparent

Fragments

AXEL LEWENHAUPT

Master’s Thesis at NADASupervisor (KTH): Christopher Peters

Supervisor (Fatshark): Axel KinnerExaminer: Johan Håstad

Page 2: Real Time Depth Sorting of Transparent Fragments
Page 3: Real Time Depth Sorting of Transparent Fragments

Abstract

Using transparency is an important technique for modelling partial cov-erage in computer graphics and is used for a variety of effects such ashair, fur, glass, smoke and particle effects. Replacing the per objectsorting as the dominant algorithm used in real time 3D engines with anew algorithm with less rendering artifacts could let artists use moretransparency and effects in their scenes.

This thesis evaluates two algorithms as a possible replacement to theper object sorting algorithm, the per pixel linked list and the weightedblended order-independent transparency (WBOIT) in a visual compar-ison and a performance evaluation.

The results shows that neither of the algorithms are suitable as re-placement as a unified and general transparency algorithm but there arecases where the algorithms can be used as a replacement. The per pixellinked list has good visual quality but lacks in performance and can beused in limited scope such as hair renderings. The WBOIT algorithmhas good performance but the approximation used can give noticeableerrors in color when there is a large difference in depth between trans-parent objects.

Page 4: Real Time Depth Sorting of Transparent Fragments

Referat

Sortering av transparenta fragment i realtid

I datorgrafik är transparens en viktig teknik för att efterlikna partiellblockering och används i ett flertal områden, så som rendering av hår,päls, glas, rök och partikeleffekter. Om det går att ersätta per objekt-sortering som den mest använda algoritmen i realtidsmotorer för 3D meden ny algoritm med färre renderingsfel blir det möjligt för 3D-designersatt använda mer genomskinlighet och effekter i sina scener.

I den här rapporten utvärderas och jämförs två algoritmer, per pix-el linked lists och weighted blended order-independent transparency motper objektsortering i en visuell- och prestandautvärdering. Implemente-ringen är gjort i spelmotorn Stingray.

Resultaten visar att ingen av algoritmerna är en värdig kandidat föratt användas som en universell och generell transparensalgoritm mendet finns tillfällen där deras styrkor kan användas. Per pixel linked listshåller en hög visuell kvalité men har inte tillräckligt bra prestanda ochkan därför endast användas för mindre områden, så som hårrendering.Weighted blended order-independent transparency har istället bra pre-standa men kan skapa påtagliga visuella fel när det är stora skillnaderi djup mellan objekt.

Page 5: Real Time Depth Sorting of Transparent Fragments

Contents

1 Introduction 1

1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Scientific question . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Scope and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Layout of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Background 7

2.1 Physical foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Physical notation . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Hardware and software overview . . . . . . . . . . . . . . . . . . . . 11

2.2.1 Shader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.2 Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.3 Atomic counter . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.4 Rendering pipeline . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.5 Rendering techniques . . . . . . . . . . . . . . . . . . . . . . 13

2.2.6 Blending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.1 Per object sorting . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.2 Depth peeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.3 Stochastic transparency . . . . . . . . . . . . . . . . . . . . . 17

2.3.4 Weighted blended order-independent transparency . . . . . . 18

2.3.5 Per pixel linked list . . . . . . . . . . . . . . . . . . . . . . . . 19

2.3.6 Hybrid transparency . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 Summary of the background . . . . . . . . . . . . . . . . . . . . . . . 19

3 Implementation 21

3.1 Stingray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2 Choice of algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3 Weighted blended order-independent transparency . . . . . . . . . . 23

3.4 Per pixel linked list . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.5 Visualization and evaluation . . . . . . . . . . . . . . . . . . . . . . . 28

3.5.1 Fragment counting . . . . . . . . . . . . . . . . . . . . . . . . 28

Page 6: Real Time Depth Sorting of Transparent Fragments

3.5.2 Fragment layers . . . . . . . . . . . . . . . . . . . . . . . . . . 283.5.3 The overall work . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 Evaluation 314.1 Evaluation set up and details . . . . . . . . . . . . . . . . . . . . . . 31

4.1.1 Renderings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.1.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.1.3 Counting transparent fragments . . . . . . . . . . . . . . . . 33

4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2.1 Visual comparisons . . . . . . . . . . . . . . . . . . . . . . . . 344.2.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.3.1 Visual comparison . . . . . . . . . . . . . . . . . . . . . . . . 394.3.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.3.3 Transparent fragments . . . . . . . . . . . . . . . . . . . . . . 404.3.4 General discussion . . . . . . . . . . . . . . . . . . . . . . . . 41

5 Conclusions 435.1 Reflections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Bibliography 45

Appendices 46

A Figures 47

Page 7: Real Time Depth Sorting of Transparent Fragments

Chapter 1

Chapter 1 gives a short introduction to the master thesis by describing the problem,

scope and limitations of thesis. The chapter also describes the methodology of the

evaluation and the layout of the report.

Introduction

Many modern computer games and 3D visualization tools make use of transparencyto improve the visible quality of effects such as fog, particles, hair and non-refractiveglass. It has become possible to increase the usage of these effects with the increasedcomputing power of today’s graphic cards but it also introduces a rendering problem– the render order of transparent fragments. When rendering transparent materialsor effects, the fragments need to be rendered in back to front order to give therendering a correct result, unlike rendering opaque objects where only the frontmost fragment needs to be rendered [8].

Transparency is used for surfaces which partly cover the light from sources be-hind the surface. This can be used to mimic light extinction in the participatingmedium, such as in fog where the amount of light which is able to pass throughdepends on the distance the light has to travel in the medium [7]. Transparencyis also an effective way of approximating thin or small meshes which only cover asmall part of a pixel, such as hair or fur. When rendering hair, a single strain ofhair usually covers only a part of the pixel, so instead of rendering it opaque or notat all, it is better to render it with transparency by the amount it is visible [23].Another kind of usage is when blending sprites, usually from particle effects whereit is possible to create smooth transitions between particles [3].

If the rendering order of transparent fragments can be solved or at least bereduced it would increase the artists’ possibility to add more transparent objects andeffects to a scene [17]. In the current state with per object sorting, transparency isused as little as possible since any overlapping transparency creates visible artifacts[5].

The master thesis is carried out at the game company Fatshark AB1. The com-

1http://www.fatsharkgames.com/

1

Page 8: Real Time Depth Sorting of Transparent Fragments

CHAPTER 1. INTRODUCTION

Figure 1.1: Different types of effects which all need transparency for better visualquality. From left to right: A fire particle effect, some different objects made of glass,smoke and fog and in the last image hair. Image 1 and 3 are taken in Stingray, image2 is created by Gilles Tran and image 4 is taken in AMD TressFx Hair.

pany focuses at multiplayer 3D games developed for the 3D engine Stingray2, whichwas a product of Fatshark until 2014 when it was acquired by Autodesk3 4.

1.1 Problem statement

The nature of transparency where multiple layers are needed, creates many problemswhen using rasterized 3D rendering. The wrong depth order of objects is maybethe most noticeable problem, see Figure 1.2, where objects closer to the camerais wrongly rendered behind objects further away from the camera. The defaultsolution to the problem is to render the objects back to front by first sorting them,but sorting in 3D is not a trivial problem since objects can intersect or a smallerobject ca be inside a larger, like a glass ball inside glass vase. In the last caserendering the vase before the ball makes it look like ball is behind the vase ratherthan inside it and rendering the ball first makes it look like it is in front of the vase[5].

At the same time transparency imposes a performance problem, even withouttaking into account for the sorting, transparency needs multiple layer and requiresmultiple shader invocations per pixel. These extra shader calls can require signif-icant extra central processing unit (CPU) computational time and leave even lessroom for a better sorting of the transparent fragments. In some cases it is pos-sible to use a lower resolution target for the transparency where all the layers oftransparency are aggregated into a single layer. The layer is then applied at fullresolution to the output texture. This is done to reduce the number of calls perpixel and let the rendering time for particles with many layers be within reasonable

2http://www.autodesk.com/products/stingray3http://www.autodesk.com/4http://news.autodesk.com/press-release/media-and-entertainment/autodesk-

launches-stingray-game-engine-gdc-europe-2015

2

Page 9: Real Time Depth Sorting of Transparent Fragments

1.2. SCIENTIFIC QUESTION

levels [6].

a: Correct sorting b: Bad sorting

Figure 1.2: The figures shows a rendering of a trash can and a fire hydrant wherethe fragments sorting is correct in the left image (a) and wrong in the right (b). Theerror makes it look like the trash can is in midair in front of the fire hydrant, whilein fact it is standing on the ground behind the fire hydrant as seen in image a. Whilebad sorting of opaque geometry is rare, this error is common for transparency. Theopaque objects were chosen instead of transparent, since it makes the error easierto spot and especially in still images.

1.2 Scientific question

Per object sorting has been used in many 3D applications [10] and the thesis shouldinvestigate the possibility to replace the per object sorting with a new robust algo-rithm for a real time 3D engine.

- Is there a better algorithm than per object sorting for ordering of multiplelayers of transparency in a real time 3D engine, in respect to visual quality andperformance?

The two main parts of the question is the visual quality and the performance. Thevisual part can be split into two questions which needs to be answered to giveinformation about the error within a single frame and about any problems whicharise when the rendering changes a lot between two frames:

- How much does the rendering differentiates from a ground truth rendering ona per pixel level?

- Is the sorting stable in time and does not give large visual changes betweentwo adjacent frames?

The performance part of the question can be specified to give a better measurementfor performance of transparency:

- How much GPU time is needed to sort the transparent fragments relative tothe number transparent fragments in the frame?

3

Page 10: Real Time Depth Sorting of Transparent Fragments

CHAPTER 1. INTRODUCTION

1.3 Scope and limitations

There are many other problems with using transparency in 3D rendering than thedepth sorting. The problems arise from the need of multilayered information com-pared to the single for opaque fragments. Post processing uses the aggregatedinformation from previously geometry renderings to add effects such as depth offield, motion blur or volumetric renderings but these would require information foreach transparency layer, while it is only feasible to store one layer of informationwithout using too much memory. Also shadow maps require multiple layers tocorrectly handle transparency, since the strength of the shadow depends on the ma-terials the light need to pass through. This thesis will not focus on these problemsof transparency, each of these cases can often be handled by algorithms separatefrom the transparency sorting algorithm.

When this thesis started in September 2015, DirectX 12 was a new graphicsAPI which could make new, better algorithms possible for handling sorting butdue to the limited support for the API in 3D engines, only algorithms available forDirectX 11 has been considered.

Due to time constraints, only two additional algorithms to the per object sort willbe implemented and tested, even when there are many different kinds of algorithmswhich all tries to solve the sorting problem.

1.4 Methodology

To find candidate algorithms to be evaluated, two algorithms are selected from stateof the art algorithms to fit the scope and limitations of this master thesis, moreabout the state of the art algorithms can be found in the related work (section 2.3).One of the two algorithms should be able to give a correct result to be used as aground truth for the visual evaluation and the choice fell upon the per pixel linkedlist algorithm which fulfils this requirement. For the second algorithm there are lesslimitations and the weighted blended order-independent transparency were picked.More about the choice of the algorithms can be found in section 3.2.

In visual evaluation where the result sometimes can subjective makes it impor-tant to try to have the comparisons as objective as possible and help the reader tounderstand what the correct result is. Knowing what is correct is rarely obvious,especially when representing changes over time with still photos. To overcome theseproblems in the visual comparisons a ground truth image is always supplied, whichis a correct rendering, and enlarged sections of interest to make it easier to spot themore fine grained errors. For the cases with lots of thing happening on the screen,different images between the ground truth and the evaluated algorithm’s result aresupplied. The results can be found in subsection 4.2.1.

The amount of transparency can vary a lot between different frames and setups,therefor a way to measure the performance under these settings and create a goodway to compare the different algorithms is needed. To this end the performance

4

Page 11: Real Time Depth Sorting of Transparent Fragments

1.5. LAYOUT OF THESIS

evaluation measures the time spent in a frame for the different algorithms withrespect to the number of transparent fragments in the frame. Each algorithm runs,one at the time, with the same camera movement in a deterministic scene with afixed time step which makes it possible to create graphs for each algorithm whichcan be easily compared. The performance graphs can be found in the results insubsection 4.2.2.

1.5 Layout of thesis

The thesis is divided into five chapters where this chapter have described the prob-lems of transparency and benefits of a finding a replacement for per object sorting.In the next chapter is needed background described, with an overview of the physicsof light, rendering, transparency, an overview of the software and hardware used andending with related work of interest. The most interesting algorithms to evaluate ofthe related work are then described in Chapter 3 where the implemented algorithmsand implementation needed for the evaluation are described. In Chapter 4 comesthe evaluation of these algorithms, describing the different setups and the resultsfrom them and in the Chapter 5, the last chapter, is a discussion about the results,reflection about this work for this thesis and future work in the area of transparencysorting.

5

Page 12: Real Time Depth Sorting of Transparent Fragments
Page 13: Real Time Depth Sorting of Transparent Fragments

Chapter 2

Chapter 2 explains the needed physics foundation, the hardware used for accelerating

3D rendering, an overview of the rendering pipeline and some of the software used

for rendering. After the background comes a short description of related work with

different transparency algorithms and with a summary of how the algorithms work.

Background

In real time 3D rendering both knowledge about the physics of light, to be able toreplicate how light works in the real world, and knowledge about hardware, to beable to perform the renderings in real time, is needed. In the following sections ashort introduction is given to the physics of lighting, the different simplified modelsused and some examples of how different materials can be rendered. Followed byan overview of specialized hardware and software used for fast 3D rendering wherethe hardware used is called graphics cards and the software used is called graphicsdrivers.

The last part of this chapter is describing related work, with six sorting algo-rithms for transparent fragments are explained. Both algorithms commonly usedin real time 3D engines today and algorithm which may be used in coming enginesare explained and where three of the algorithms are evaluated in the thesis.

2.1 Physical foundation

In many 3D rendering applications the main objective is to replicate how lightbehaves in the real world, or at least make the rendering a believable substitute forthe reality. To achieve this, the models used are based on the physics of lighting,where the transportation of lighting, including reflection, transmission, absorptionand refraction are some of the main parts of the model. How these components ofthe lighting model works can be seen in Figure 2.2.

In real time 3D rendering, where rendering time is a limiting factor, a simplifiedmodel of light is used called ray optics, which is enough to give a good representationof how visible light work in the real world, but where all lighting phenomena cannotbe replicated. A better model could include wave optics, electromagnetic optics or

7

Page 14: Real Time Depth Sorting of Transparent Fragments

CHAPTER 2. BACKGROUND

even quantum optics but would require additional computing time while only havea minor impact on the final result [7].

2.1.1 Physical notation

Radiant flux is the amount of radiant power measured in watt (W) and denotedΦ.

Irradiance is the incoming amount of radiant power over a surface and is measuredin watt per square meter (Wm−2).

E(x) =dΦ(x)

dA(x)

Radiance is the amount light that is received from a solid angle (d~Φ⊥). This canbe seen as the amount of light that is reflected, emitted, transmitted from a pointon a surface and hits an observer, e.g. the eye. It is measured in watt per solidangle per square meter (Wsr−1m−2).

L(x, ~ω) =d2Φ(x, ~ω)

d~Φ dA⊥(x)

Bidirectional reflectance distribution function (BRDF)

A material’s BRDF describes how the incoming light from a light source is reflectedoff the surface in a direction. The BRDF gives a material properties like color,glossy, mirror like, diffuse and more. To be able to represent every kind of material,many types of BRDF are used and examples of how materials with different BRDFcan look like can be seen in Figure 2.1. In the following paragraph comes themathematical definition of the BRDF.

The bidirectional reflectance distribution function (BRDF) (fr(x, ~ωi → ~ωo))takes a point x at a surface, the incoming direction vector ~ωi of light, the outgoingdirection vector ~ωo of light and returns the amount of light reflected in the outgoingdirection. The incoming and outgoing vector must be part of the same normaloriented hemisphere (Ω) [7].

Bidirectional transmittance distribution function (BTDF)

The bidirectional transmitted distribution function (BTDF) is closely related toBRDF and function and takes the same arguments (ft(x, ~ωt → ~ωo)) but describesthe light passing through the material, like light transmitted though glass. Thisadds one requirement on one of the arguments, the outgoing direction vector ~ωo

must be on the opposite normal oriented hemisphere of the incoming vector ~ωt [7].

The rendering equation

When rendering a pixel, or rather a point x on a surface, the light which radiatesfrom that point in the direction of the eye or a camera is used as the color of the

8

Page 15: Real Time Depth Sorting of Transparent Fragments

2.1. PHYSICAL FOUNDATION

10.80.4 0.60.20.0

1.0

0.5

0.0

Metallic

Roughness

Figure 2.1: Three important parameters of physical based rendering are albedocolor, metallic and roughness but many more are sometimes used to be able torepresent even more materials [2, 9]. The figure shows a grey ball with varyingmetallic and roughness values. The metallic value increases how well the materialreflects light while the roughness blurs the reflected light.

pixel. To calculate the amount of light radiating from x an equation describing thedifferent ways light can be emitted from the point x is needed, and it is called therendering equation. The rendering equation gives the total light going from a pointof a surface into an observer L(x→ ~ω): Outgoing radiance from the point x in thedirection ~ω. L(x← ~ω): Incoming radiance to the point x from the direction ~ω.

L(x→ ~ωo)︸ ︷︷ ︸

outgoing

= Le(x→ ~ωo)︸ ︷︷ ︸

emitted

+ Lr(x→ ~ωo)︸ ︷︷ ︸

reflected

+ Lt(x→ ~ωo)︸ ︷︷ ︸

transmitted

Lr(x→ ~ωo) =

Ωfr(x, ~ωi → ~ωo) L(x← ~ωi) (~n • ~ωi) d~ωi

Lt(x→ ~ωo) =

−Ωft(x, ~ωt → ~ωo) L(x← ~ωt) (~n • ~ωt) d~ωt

The different components can be seen in Figure 2.2 where the reflected and transmit-ted components are given by the amount of incoming light from respective hemi-sphere, the BRDF, the BTDF and the angle from the normal, where the BRDFdescribes the reflective properties and the BTDF the transmissive properties of thematerial [7].

9

Page 16: Real Time Depth Sorting of Transparent Fragments

CHAPTER 2. BACKGROUND

a: Emission b: Reflection c: Transmission

Figure 2.2: The lighting model for transparent objects. Emission is when the ma-terial creates light, like a flame. Reflection is light coming from other light sources,and reflects in the direction of an observer. Transmission is the light passing througha material, like glass, and is transmitted in the direction of an observer. The re-flection is often split into two separate parts, the diffuse lighting and the specularlighting.

Transparency model

In real time rendering a simplified model is usually used where the BTDF functionis set to only transmit light in the same direction as the incoming light. In realitythe light also scatter in the transport medium but this effect is mostly ignored unlessit has a large impact on the result, like in fog like effects.

Opacity is used as a way to describe the amount of light which is transmittedthrough a material or that the material partial covers the light. Opacity 1 meansthat the material is fully opaque while 0 is fully transparent and opacity is oftendenoted alpha (α). To calculate the resulting color the over operator is used, Cf isthe resulting color, C0 is the background and C1 is the foreground [14]:

Cf = C1 + (1− α1)C0 (2.1)

The over operator can be applied iteratively by using Cf as the new backgroundwhen there are many layers in Equation 2.1 to create the Equation 2.2:

Cf =

[

Cn + (1− αn) · · ·[

C2 + (1− α2)[C1 + (1− α1)C0

]]

· · ·

]

(2.2)

It is worth noticing that using a opacity value as description of the occlusion is asimplification of reality where multiple layers of cover can be aligned or not. E.g. ifyou have some sticks and want to cover the light from a lamp, either you can coveras much as possible by putting the sticks side by side or you can put them in frontof each other to cover as little as possible of the light. The same reasoning can beapplied in the real world but at smaller level, with particles or even atoms insteadof sticks. This means that opacity value assumes that “partially-covered locationsare distributed in a statistically independent manner between surfaces” [18].

10

Page 17: Real Time Depth Sorting of Transparent Fragments

2.2. HARDWARE AND SOFTWARE OVERVIEW

Shader input

Shader output

Memory Resources

Constant buffers

Textures

Buffers

Shader Sampler

Figure 2.3: The memory access overview for a shader. A shader can access constantbuffers, textures and buffers.

2.2 Hardware and software overview

For a computer program to be able to output large amount of pixels many times persecond, required by 3D rendering, to a screen, specialized hardware is almost alwaysused. The hardware can either be a discrete or an integrated graphics processingunit (GPU). This specialized hardware is able to handle large amounts of floatingpoint operations in parallel and can in many cases handle a thousand times morecalculations compared to a CPU when the programs can be optimized for the GPU.

When using a GPU there are two types of ways to interact, either using a com-pute shader or the rendering pipeline. The compute shader is a program that takesdata in a binary buffer, does some calculations and then returns the result in a newbuffer. The program is written in high level shading language (HLSL) or OpenGLShading Language (GLSL), two C like languages specialized for graphics program-ing and then compiled and uploaded to the hardware. When the rendering pipelineis used, a number of these programs are chained together, taking care of the differ-ent steps of the rendering, starting with the geometry and ending with outputtingthe color of the pixels to a texture, and also using some fixed function steps whichtransforms the data between the shader steps, as can be seen in Figure 2.4.

2.2.1 Shader

To extract the maximum available performance from the graphics hardware whenexecuting a shader, some knowledge of hardware is needed. AMD and Nvidia useswavefronts and warps respectively, both are different names for the same mechanic.These are a bundle of execution threads where a number of threads (depending onthe hardware and are 32 or more) are executed at the same time and all executingthe same code. If any of the threads uses a different code path, due to branching,the other threads need to wait while this code is executed [19].

The performance of the code is dependent on utilizing the hardware in a goodway and to improve it further there are three different metrics that decide bottleneckof the performance: arithmetic logic unit (ALU) utilization, texture fetch latency

11

Page 18: Real Time Depth Sorting of Transparent Fragments

CHAPTER 2. BACKGROUND

Input-Assembler

Vertex Shader

Hull Shader

Tessellator

Domain Shader

Geometry Shader

Stream Output

Pixel Shader

Output-Merger

Fixed steps

Rasterizer

Programmable steps

Figure 2.4: The Direct3D 11 graphics pipeline. The squared boxes are fixed pipelinesteps which are controlled by supplying hints to the driver while the boxes withrounded corners are fully programmable shaders.

and memory access latency. If the code is bound by any of these, trying to improvethe other will not improve the overall performance of the code [24].

2.2.2 Buffers

For a shader to retrieve data, different kinds of binary buffers are used to transferthe data. For graphics programming it often make sense to see them as vector orscalar arrays, where each vector has 1 to 4 components of floats. The buffers canalso be arrays of any primitive data type or struct of primitive data where the arrayscan be of 1 to 3 dimensions.

To improve the performance of the buffers it is possible to bind them to aprogram in different ways such as read only mode, random access mode and it isalso possible to tell the driver if the buffer will be accessed once or many timesbefore it is updated on the CPU side.

2.2.3 Atomic counter

To allow for dynamically sized data structures it is possible to attach a atomic1

counter to a buffer. This makes it possible to get a unique index in the buffer whichis not used by any other thread.

2.2.4 Rendering pipeline

An overview of the rendering pipeline can be seen in Figure 2.4. The pipeline hastree main objectives, which are to manipulate the geometry and its vertices, then

1Atomic means that methods affecting this counter either happen or not at all, even if thefunction does multiple operations. This also means the method is thread safe.

12

Page 19: Real Time Depth Sorting of Transparent Fragments

2.2. HARDWARE AND SOFTWARE OVERVIEW

to rasterize the triangles into pixels and then perform lighting calculations for thosepixels and draw the result to a texture.

Input-Assembler Supplies triangles, lines and points to the Vertex Shader.

Vertex Shader A shader run for each vertex of the geometry. In this shader the3D to 2D projection and transform are applied to the vertices.

Hull Shader Creates control points for the tessellation stage.

Tessellator Tessellation subdivides triangles, lines or points to give a smothersurface or new geometry.

Domain Shader Calculates the resulting positions of the vertices created fromthe Tessellator.

Geometry Shader The geometry shader works on whole triangles, lines or pointsand can also use information of neighboring geometry. It is able to discard or outputnew geometry from this data.

Rasterizer Takes the geometry and creates a raster of pixels.

Pixel Shader A shader used run for each pixel of the geometry, deciding the re-sulting color and/or other properties of the pixel.

Output-Merger Merges the result of the pixel shader with other information suchas depth, stencil and blending.

2.2.5 Rendering techniques

Forward rendering

In forward rendering, see Figure 2.5a, the geometry and its materials are rendereddirectly to the output texture with full calculated lighting. In the case of overlappinggeometry a depth buffer is used, keeping track of the closest rendered fragment foreach pixel and fragments which would be behind the closest rendered fragment fora pixel are discarded, making sure only the closest geometry is rendered to screen.

The benefits of this technique are that a different shaders can be used for eachobject to calculate the lighting and it can handle multiple layers if the geometrycan be drawn in the correct order. Problems with this technique are that it is slowto calculate the lighting multiple times per pixel and that the lighting calculationcannot use information of nearby pixels.

Deferred rendering

To reduce the lighting cost, compared to the forward rendering, a new render tech-nique was invented called deferred rendering, see Figure 2.5b. It stores the materialproperties of each pixel in a buffer and then calculate the lighting when the closestgeometry is known. This mainly gives a advantage of speed but introduces a costof memory usage to store all the material properties in a buffer [20].

13

Page 20: Real Time Depth Sorting of Transparent Fragments

CHAPTER 2. BACKGROUND

Geometry LightingGBuffer

a: Deferred Rendering

Geometry Lighting

b: Forward Rendering

Figure 2.5: Deferred and forward are two types of rendering used in rasterizedrendering where the difference is found in the extra gbuffer step done in deferredrendering. The gbuffer step is added to accumulate the properties which are neededto calculate the lighting, such as color, depth and normal for each pixel. Insteadof calculating the lighting once per fragment it is done once per pixel which canincrease the performance in frames with overlapping geometry, compared to forwardrendering.

2.2.6 Blending

In the last step of the rendering pipeline, the Output-Merger in Figure 2.4, theresult of the pixel shader is drawn to the ouput 2D buffer and it is controlled bya blending operation. The blending is done using the red, green, blue and alpha(rgba) of the texture and the rgba output from the pixel shader, combining theminto a new color and used as the resulting color of the pixel. The alpha channelis used to control the transparency of the pixel [21]. The alpha value tells howtransparent the color is, where 0 is fully transparent and 1 is fully opaque. Moreabout transparency can be found in Figure 2.6. It can be done in many differentways to suit the needs of blending, see Figure 2.7.

Normal Transparency With normal transparency the alpha channel decides theamount of color taken from the source color and of the destination color, a value of0.6 means 60% of the source color, and 40% of the destination color as can be seenin Equation 2.3 which is derived from Equation 2.1.

Additive Blending Same as the normal transparency but the source alpha does

14

Page 21: Real Time Depth Sorting of Transparent Fragments

2.2. HARDWARE AND SOFTWARE OVERVIEW

α = 0.00 α = 0.33 α = 0.66 α = 1.00

Figure 2.6: The same dragon rendered with four different opacity (α) values, whereopacity is the amount of the material’s color which is drawn and 1−opacity is theamount of color shining through, e.g. α = 0 is completely transparent and α = 1 isopaque.

R1rgb = (Sr, Sg, Sb) · Sα + (Dr, Dg, Db) · (1− Sα) R1

α = Sα + Dα · (1− Sα) (2.3)

R2rgb = (Sr, Sg, Sb) · Sα + (Dr, Dg, Db) R2

α = Sα + Dα · (1− Sα) (2.4)

R3rgb = (Sr, Sg, Sb) + (Dr, Dg, Db) · (1− Sα) R3

α = Sα + Dα · (1− Sα) (2.5)

D1=(0.6,0,0,0.6)

S1=(0,1,0,0.8)

R1 R3R2

S2=(0.8,0,0,0.8)

D2=(0,0.6,0,0.6)

Figure 2.7: Three different blendings: Normal (R1), Additive(R2), Premultiplied(R3). Blending a source color (S) with a destination (D) color and creates a resultingcolor (R). The result of R1 is the same as R3 in this example, the difference is thatthe input in R3 is premultiplied which is not the case in R1.

not affect the amount of color taken from the destination, see Equation 2.4. Thecolor or of the source is simply added to the destination, this could result in colorvalues greater than 1 and therefor the result does not follow the transparency modelcompletely. In many cases this approach is still useful since the draw order doesnot affect the final result and it is used as a cheaper way of drawing particles.

Premultiplied Alpha Works in the same way as normal transparency but withone exception. The source alpha is applied beforehand and stored in the texturedata which means this operation is left out in the blending stage, see Equation 2.5.A texture with premultiplied alpha gives better color average when a texture isdown scaled since the opacity is included in the interpolation.

15

Page 22: Real Time Depth Sorting of Transparent Fragments

CHAPTER 2. BACKGROUND

2.3 Related work

In this section there is a short overview of the candidate algorithms and the dif-ferent types of algorithms used for sorting of transparency are listed and then inthe following sections there are a more in depth description with more technicalinformation of the algorithms.

The most naive algorithm is the per object sorting where the objects to renderedare simply sorted by the distance to the camera. This algorithm gives many sortingartifacts when the objects cannot be sorted correctly [5] but is still in use by manyengines due to its speed and simplicity.

To improve over the per objects sorting new types of algorithms have been sug-gested which are able to do a correct sorting of the transparency, where depth peeling[5] was the first. Depth peeling does one rendering for each layer of transparencyin which the previous and next layers of transparency are culled, leaving only onelayer at the time to be rendered. Depth peeling is a very slow algorithm [26] butstill used for its correctness and many improved versions have been suggested: dualdepth peeling [1], bucket sort depth peeling [13], multi-layer depth peeling via frag-ment sort [12] and more. Later came a different algorithm which could render thefragments in correct order while retaining good performance by only requiring asingle draw of the geometry, called per pixel linked list.

For real time 3D rendering, performance gives a hard limit and therefor manytypes of approximation heuristics have been tested. One type of approximationis the blended heuristics where the heuristics try to make the blending operationcommutative, which means the order the fragments are rendered in does not af-fect the result [18]. A pair of different algorithms try this approach, among themare: Memory-hazard-aware K-buffer algorithm for order-independent transparencyrendering [26] and weighted blended order-independent-transparency [18]. A secondtype of approximation heuristics are using random samples per pixel trying to esti-mate the color contribution from transparency to a pixel, one of these algorithms iscalled stochastic transparency [3]. A common attribute for the approximation algo-rithms is that their time complexity is linear to the number transparent fragments.

The last type of heuristics is the hybrids, combining multiple algorithms togive a correct sorting for the closest n fragments in each pixel and then using anapproximation algorithm for the fragments behind those n closest fragments. Thiscan give both good performance and good visual quality but require newer graphicsAPIs. Two algorithms using this approach are hybrid transparency [16] and adaptivetransparency [22].

As noticed in the previous paragraphs there is a multitude of different algorithmsand in the following sections comes a description of some of the algorithms wherethe most interesting or best descriptive of the concepts of the type are chosen.

16

Page 23: Real Time Depth Sorting of Transparent Fragments

2.3. RELATED WORK

2.3.1 Per object sorting

Per object sorting is the coarsest type of transparency sorting and it is done as thename suggests, per object. It has been used in real time engines extensively sinceit is fast, uses little memory and fits the normal rendering path where one objectat the time is rendered. The algorithm suffers from rendering artifacts where partsor whole objects can be drawn in the wrong order [15].

1. Sort the objects by distance from the camera to the object’s center.

2. Drawn the sorted objects one at the time, back to front.

2.3.2 Depth peeling

There are multiple suggested algorithms for correct rendering order of transparentobjects but these usually greatly increase the cost for each additional layer, such asthe depth peeling [5, 11] where one layer is rendered at the time. In depth peelingtwo depth buffers are used, one to discard fragments already drawn in a previouslayer and one to determine the closest next layer. The steps of the algorithm are:

1. Render the first layer of transparency by only using the fragments with thelowest depth (closest to the camera).

2. Same as in step 1 but using the depth buffer of step 1 to discard the frag-ments drawn in the previous iterations by discarding fragments with a depthlower or equal to the depth in the last depth buffer.

3. Repeat step 2 until all layers of transparency has been rendered.

2.3.3 Stochastic transparency

With an algorithm called stochastic transparency [3], it is suggested to use a MonteCarlo algorithm to improve rendering speed at the cost of correctness. It works bypicking multiple random samples for each pixel to sample the transparency layers.Finally, the average color of all the samples is used as the color of the pixel. Thistechnique requires lots of samples to give a good estimate of the final color and therandomness introduces noise (which is possible to reduce to some degree). Stochastictransparency requires lots of resources initially but has a linear cost in the numberof transparent pixels and a fixed memory usage. The algorithm uses four steps:

1. Use a high resolution texture and depth buffer with n times the pixels. Inthe high resolution texture each n pixels represents the samples for a single pixelin the normal texture.

2. Render the geometry and for each fragment drawn, draw a random patternto the subpixels of that pixel, covering the same amount as the opacity, seeFigure 2.8. A fragment with 20% opacity will draw to 20% of the subpixels of

17

Page 24: Real Time Depth Sorting of Transparent Fragments

CHAPTER 2. BACKGROUND

a: Red (1, 0, 0, 616) b: Green (1, 0, 0, 9

16) c: Result

Figure 2.8: An example of the drawing phase for stochastic transparency where twotriangles drawn, one at the time, on a white background with 16 samples per pixel.The first image of a and b is the geometry, the second image of a and b is thesamples and c the averaged final result. The red has an opacity of 6

16 while thegreen has opacity 9

16 and they are drawn to 6 and 9 samples per pixel respectively.Only fragments closer to the camera is drawn in each step, so the green samplescovers some of the red samples in b. Since the depth is used to cull the fragments,the order in which a and b are drawn does not affect the result.

the pixel and also draw to the high resolution depth buffer for those subpixels.For the following layers, use the depth buffer for depth test. In a depth testare fragments further away from the camera than the previous closest drawnfragment for that pixel discarded.

3. In a fullscreen pass: use the average of the color of the subpixels as the finalcolor of the pixel.

4. In a fullscreen pass: apply a noise reduction algorithm to remove artifactscreated by the randomness.

2.3.4 Weighted blended order-independent transparency

The weighted blended order-independent transparency approximates the trans-parency by using a function for the transparency that commutes, i.e. the orderin which the transparent fragments are drawn does not matter. Instead a weight-ing function is used for each fragment to estimate its contribution to the totaltransparency of the pixel [18]. The algorithm works as follow:

1. Prepass to calculate the depth of the transparent objects. (Optional)

2. Draw all the transparent objects to two fullscreen buffers, the first usingadditive blending to store the color and the second buffer with the total occlusionof the transparency as a float [0− 1].

3. In a fullscreen pass: apply the transparency color from the first buffer andusing the occlusion from the second buffer to normalize the color.

18

Page 25: Real Time Depth Sorting of Transparent Fragments

2.4. SUMMARY OF THE BACKGROUND

The algorithm requires a small amount of additional code to implement but canrequire a lot of tweaking to make it look good. The algorithm is chosen as one ofthe algorithms to be evaluated and more implementation details can be found insection 3.3.

2.3.5 Per pixel linked list

With new hardware and driver support a new algorithm was possible, the per pixellinked list (PPLL). Requiring only one draw pass of geometry for correct trans-parency. Only using one draw pass can greatly increase the speed since the vertexand geometry shaders only have to be run once. To be able to do it in one passwithout using a fixed amount of memory per pixel, fast thread synchronization isused in the form of an atomic buffer [25]. The per pixel linked list is the secondalgorithm to be evaluated and a more detailed description of the algorithm can befound in the implementation chapter in section 3.4.

2.3.6 Hybrid transparency

As a further improvement of the per pixel linked list a hybrid algorithm betweenper pixel linked list and weighted order-independent transparency was suggested byMaule et al. [16]. In DirectX 12 a new feature were introduced, adding synchroniza-tion per pixel, which makes it possible to access the previous elements in the pixelslist in a synchronized manner. This is used by the hybrid transparency to keep asorted list of a fixed length with the closest n fragments of transparency and thenany fragment further away is merged in a special bucket using a similar techniqueas the weighted blended order-independent transparency. This results in n layersof correct transparency and an estimate of the layers behind.

2.4 Summary of the background

To be able to understand the reasoning and the choices made in this thesis, it isimportant to have some knowledge of the area of real time 3D rendering, even whenthe problem to be solved of sorting transparent fragments may at a first glance seemto only require algorithmic knowledge. In reality is visual quality also required fora good sorting algorithm in 3D rendering. To give this understanding this chapterhave given an introduction into the 3D rendering with the physics of light and howthe light behaves when interacting with surfaces and materials’ BRDF/BTDF andhow the final lighting of a pixel is calculated using the rendering equation. Sincethe area of interest is transparency, additional focus is dedicated to this area bydescribed how the blending of colors works and the transparency models used in 3Drendering.

Another important part is the speed and execution of the algorithms. Theplatform is unlike a normal program run on a PC by using specialized hardware inthe form of a graphics card and heavily relying on parallelism to be able to output

19

Page 26: Real Time Depth Sorting of Transparent Fragments

CHAPTER 2. BACKGROUND

the large amount of pixels needed. To understand how improvements can be done tothese algorithms, knowledge about the hardware and software is needed. To this endan overview of hardware and the rendering pipeline used by the graphics driver isgiven and also how a program is run on the graphics card as a shader. More specificdetails are described of the features and limitations, such as the buffers, counters,the graphics driver and API which are used in the algorithms to be evaluated.

There are many different types of sorting algorithms for transparency and inthe last section of the background the different algorithms are explained. Thealgorithms use sorting per object, sorting per pixel, approximations for sorting, nosorting at all or hybrids, and all with the goal of good rendering quality and/orspeed. In the next chapter the implementation of two of these algorithms aredescribed and how they are integrated into the game engine Stingray.

20

Page 27: Real Time Depth Sorting of Transparent Fragments

Chapter 3

Chapter 3 gives an overview of Stingray and how the two algorithms, weighted

blended order-independent transparency and per pixel linked list are implemented

in Stingray and why these algorithms were chosen for evaluation. In addition to

the algorithms are implementations details of the visualization tools used in the

evaluation.

Implementation

When implementing graphics shaders it is important to have a good framework andplatform to evaluate the different algorithms. There is no default choice but formore advanced setups a 3D engine which can handle the different resources used inmodern 3D scenes, such as models, lighting, materials, cameras and more is needed.For this thesis, Stingray was chosen to be fulfil these requirements and make itpossible to focus on the algorithms and evaluation rather than creating tools whichalready exists.

The chosen algorithms, the weighted order-independent transparency and theper pixel linked list, test two different types of transparency algorithms. The perpixel linked list gives a correct rendering which makes it easier to do a visual compar-ison and the weighted blended order-independent transparency with its simplicityare good candidates to test.

In addition to the algorithms some extra resources, used in the evaluation, aredescribed. These help with visualizing the transparency layers and count the trans-parent fragments in a frame.

A more thoroughly explanation of the implementation of the algorithms, whythey were chosen and the tools used comes in the following sections.

3.1 Stingray

To reduce the amount of code to implement, a 3D engine was used, called Stingray.The use of an engine gives many benefits and reduces the amount of implementationneeded to be able to test new algorithms. Some of the areas which Stingray can be auseful tool for are: setting up scenes, importing models, settings materials, interfaces

21

Page 28: Real Time Depth Sorting of Transparent Fragments

CHAPTER 3. IMPLEMENTATION

Figure 3.1: The Stingray’s editor with one of the test scenes and four debug views(the bottom of the main view), with the three algorithms and the layers. In theimage the following elements can be seen, left to right: the level view port, the flowscripting editor, entity and property editor.

and API for scripting, setting up shader resources such as buffers, render targetsand textures, setting up cameras and more. There are many different engines allcapable of doing these things but Stingray was chosen due to being used at Fatsharkwhich made it possible to use their knowledge of the engine.

Stingray contains many good tools, where some are extra useful when evaluatingnew algorithms. A profiler which is able to measure the performance of selectedshaders. Also a system to switch between different shader setups. These switchesare important to make sure to completely turn on or off an algorithm, so all theresources used by it or the compiled code is optimized for the current algorithm totest.

3.2 Choice of algorithms

In the evaluation, three algorithms were used and the per object sorted had a natu-ral place among these. It both sets the lower bar to be improved from and is alreadyavailable in Stingray. The other two algorithms to be chosen needed more consider-ation. With the limitation of using DirectX 11 is the most modern algorithms whichare using DirectX 12 out of reach. Algorithms such as adaptive transparency andhybrid transparency which make use of the per pixel synchronization introduced inDirectX 12. These algorithm are in theory possible to implement in DirectX 11 butwith a major performance penalty.

22

Page 29: Real Time Depth Sorting of Transparent Fragments

3.3. WEIGHTED BLENDED ORDER-INDEPENDENT TRANSPARENCY

Shadow maps

Motion vectors

Reflection probes

Fullscreen pass

Distortion

Emissive

Lighting

Skydome

Transparency

Decals

G-buffers

Post processing

- Bloom

- Depth of field

- Motion blur

- Temporal anti aliasing

- Anti aliasing

- Color grading

- Apply transparency

Geometry pass

Figure 3.2: The rendering pipeline in Stingray. This figure shows the order of theexecution of the shaders and resource generation. The order of these passes alsogives a hint of the dependencies between them. The transparency pass is usuallydone after all other geometry passes to make use of the z-buffer from the opaquegeometry for depth culling and removing the need for a alpha channel on the rendertarget. As can be seen, the post processing effects are still dependent on the on thetransparency block.

Also one important aspect to consider is to have at least one algorithm to useas the ground truth for the visual evaluation. For the ground truth the per pixellinked list was chosen due to it is a lot faster than the depth peeling algorithm andcould be fast enough to use in a real time rendering.

The weighted blended order-independent transparency was chosen for its sim-plicity. It did not require additional changes to the engine to work and could beused to learn and test the engine. From the simplicity comes its speed which makesit worthy candidate and speed is really one of the most important aspects whenpicking an algorithm for transparency sorting. Transparency can already be anexpensive effect when there are many layers in a rendering and using an algorithmwhich can compare with the per object sorting in performance could be a good bet,if it is possible to get good visual quality.

The two algorithms chosen are also a good start for future work with DirectX 12algorithms, where a combination of the per pixel linked list and weighted blendedorder-independent transparency could be combined into the hybrid transparency.

3.3 Weighted blended order-independent transparency

Weighted blended order-independent transparency (WBOIT) was implemented us-ing a reference solution by McGuire and Bavoil and an overview of the algorithmcan be found in subsection 2.3.4. The main idea of the algorithm is to create an

23

Page 30: Real Time Depth Sorting of Transparent Fragments

CHAPTER 3. IMPLEMENTATION

0

500

1000

1500

2000

2500

3000

3500

0 10 20 30 40 50

Wei

ght

Z-depth (m)

a

b

c

0.992 0.993 0.994 0.995 0.996 0.997 0.998 0.999 1

View depth [0-1]

d

a = wz(z) = 100/((z/5)2 + (z/200)6) b = wz(z) = 100/((z/10)3 + (z/200)6)

c = wz(z) = 100/((z/200)4) d = wz(zv) = 0.3/(1010(1− zv)3)

Figure 3.3: The four different weight functions clamped between 0.01 and 3000 used.In a, b and c the depth is linear and uses the camera’s z-depth (wz(z)) while thed function uses the normalized view depth (zv, [0-1]) function (wz(zv))). The totalweight is w(z, α) = min(0.01, max(3000, wz(z)))+wα(α) where wα(α) = (α+0.01)4

approximation for Equation 2.2 into a function which is commutative, which meansthe order of the operands can be changed without changing the result of calculation.There is already one part of the calculation which is commutative, the alpha partwhere the following equation is applied recursively [18].

αf = α1 + (1− α1) ∗ α0 (3.1)

To make it easier to see that the function is commutative we use the inverse opacityα′ = 1− α.

α′

f = α′

n ∗ α′

n−1 ∗ . . . ∗ α′

1 ∗ α′

0 (3.2)

As can be seen in Equation 3.2 all the operators are commutative and this valueis the correct occlusion. The same cannot be done with the rgb part, instead aweighting function (Figure 3.3) is used to estimate the color contribution for eachfragment. It uses information about the distance from the camera to the fragmentto give fragments closer to the camera a higher contribution than fragments furtherfrom the camera. The weight function is applied to all four components. The alphacomponent is used to gather how much total weight which is applied, which makesit possible to normalize the values after all the fragments are rendered. This is doneby storing the correct αf using Equation 3.2 and the weighted alpha Cα and then

24

Page 31: Real Time Depth Sorting of Transparent Fragments

3.3. WEIGHTED BLENDED ORDER-INDEPENDENT TRANSPARENCY

Depth prepass

Opaque depth Transparency

depth

Render transparency

Opacity Transparency

color

Apply transparency

Opaque color Final result

Figure 3.4: The three passes of the weighted blended order-independent trans-parency algorithm where the first pass is optional and if used it stores the depthof the closest layer of transparency which then is used in the next pass. In therender transparency the transparent geometry is rendered to two buffers, opacityand transparency color and these are then applied to opaque color in the last pass:apply transparency.

normalizing the weighted color Cw:

Cf = Cw ∗Cα

αf

(3.3)

There is a variations of the algorithm using an extra prepass for finding the distanceto the closest transparent fragment for each pixel and giving these additional weightin the weight function. To find the minimum distance the same shaders as for anormal transparency pass is used, but with an empty pixel shader and the depthis stored in a z-buffer, where only values lower than the previous value are stored.To reduce the writes even further the initial values for the z-buffer are copied fromthe z-buffer for the opaque objects, since these will always be behind or occlude thetransparent fragments.

The prepass (if used) is then followed by a normal transparency rendering, withsome changes to the output in the fragment shader. The weight function, seeFigure 3.3, is applied to the rgba values before drawing them with additive blendingand an extra opacity buffer is used as a render target for the unweighted alpha (αf )is drawn.

In a final fullscreen pass the normalized color (Cf ) is drawn on top of the opaquelayer, using information from the transparency color buffer and the opacity buffer,example buffers can be seen in Figure 3.4.

The extra memory needed is 4 byte for the depth buffer, 4 byte for an opacitybuffer and 8 bytes for an extra rgba color buffer, in total 12 bytes/pixel. E.g. adisplay with a resolution of 1920x1080 uses an extra 24.88 MB of memory comparedto the per object sorting algorithm.

25

Page 32: Real Time Depth Sorting of Transparent Fragments

CHAPTER 3. IMPLEMENTATION

Listing 3.1: Per Pixel Link List Struct

s t r u c t PPLL_STRUCT f l o a t 4 color ;f l o a t depth ;u int uNext ;

;

-1 -1 -1 -1

-1

-1

-1

-1

-1

Head buffer (2D)

-1 -1 -1 -1

-1

-1

-1

Head buffer (2D)

# 0 1 2 3 4 5 6 7 ...

Depth 0.32 0.32 0.32 0.32 0.32 ...

Color ...

Next -1 -1 -1 -1 -1 -1 -1 -1 ...

Data buffer (1D) Counter: 5

# 0 1 2 3 4 5 6 7 ...

Depth 0.32 0.32 0.32 0.32 0.32 0.12 0.12 0.12 ...

Color ...

Next -1 -1 -1 -1 -1 1 -1 2 ...

Data buffer (1D) Counter: 8

Figure 3.5: The construction of a per pixel link list. The head buffer has a pointerto the head of the linked list in the data buffer for each pixel on the screen. Thedata buffer has an array of elements where each element has a pointer to the nextelement in the linked list or to a null pointer (-1 in the figure) if it is the last element.The image shows the state of the buffers before and after a green triangle is drawnand the arrows shows the pointers of the linked list of the last drawn pixel (fromthe head buffer (3, 3) to element 7 in the data buffer to element 2).

3.4 Per pixel linked list

To implement the per pixel linked list an article by Yang et al. [25] was used asreference and then adapted to be suitable for Stingray. The algorithm does a cor-rect sorting of all the transparent fragments by first rendering all the transparentgeometry and storing the transparent fragments in a buffer. In a seconds step thealgorithm sorts the fragments per pixel, calculates the final color and draws it ontop of the opaque layer. To be able to perform this in real time and utilize the par-allelism of graphic cards a single synchronization point is used – an atomic counter.The counter gives an unique location in the buffer where the fragment can be storedand it is stored in the form of a linked list as can be seen in Figure 3.5.

The algorithm uses read-write (RW) buffers, one 2D buffer for the head pointers

26

Page 33: Real Time Depth Sorting of Transparent Fragments

3.4. PER PIXEL LINKED LIST

separate_blend = " true "blend_op = " blend_op_add "src_blend = " blend_src_alpha "dst_blend = " blend_inv_src_alpha "

alpha_blend_op = " blend_op_add "alpha_src_blend = " blend_one "alpha_dst_blend = " blend_inv_src_alpha "

a: Normal blending

separate_blend = " true "blend_op = " blend_op_add "src_blend = " blend_src_alpha "dst_blend = " blend_one "

alpha_blend_op = " blend_op_add "alpha_src_blend = " blend_zero "alpha_dst_blend = " blend_one "

b: Additive blending

Figure 3.6: The blending settings for the normal blending and additive blending isconverted to premultiplied blending.

and one 1D RW buffer for the data. The data buffer uses an additional feature,an atomic counter. The counter is set to 0 at the start of each frame and is usedgive each fragment a unique location in the data buffer. To get the new index theatomic function IncrementCounter()1 is used, which increment the counter andreturns the new value of the counter. To store the index in the head pointer bufferthe atomic function InterlockedExchange()2 is used which sets the value andreturns the old, so it can be used in the linked list, see Figure 3.5 and Listing 3.1for the data layout.

The atomic operations and RW buffers required by the algorithm were intro-duced in HLSL 5.0 which there were no built in support for in the engine. To solvethis, the shader compiler was forced to always compile for HLSL 5.0 and the RWbuffers were added to the resource manager in Stingray. Also a clear flag was addedto be able to clear (set it to 0) the counter at the start of each frame.

To be able to use old scenes without changing resources, support for normalblending, additive blending and premultiplied blending were added by changing theshaders to always output premultiplied (Figure 3.6) rgba into the per pixel linkedlist (PPLL).

In the final stage of the algorithm, the data in the PPLL is retrieved by followingthe pointers and then stored in a thread local array for faster access. The data isthen sorted using insertion sort on the depth and finally blended and outputted tothe image.

The memory used for the algorithm is a 4 byte head link buffer and a 24n wheren is the maximum average layers of transparency per pixel to store making up fora total of 4 + 24n bytes/pixel. E.g. with a display with the resolution 1920x1080and 4 layers of transparency would use an extra 207.36 MB of graphics memorycompared to the per object sorted.

1https://msdn.microsoft.com/en-us/library/windows/desktop/ff471497.aspx (2016-02-17)

2https://msdn.microsoft.com/en-us/library/windows/desktop/ff471411.aspx (2016-02-17)

27

Page 34: Real Time Depth Sorting of Transparent Fragments

CHAPTER 3. IMPLEMENTATION

3.5 Visualization and evaluation

When implementing graphics algorithm an important step to debug and give in-formation is to create good visualizations for the results and partly results of thealgorithm. It ca be cumbersome to debug faulty implementations since the shearamount of calculations done in parallel and it is only possible to extract binary dataor textures from the GPU. To make it easy to spot errors in the fragment shadersa common strategy is to draw distinct colors to the output buffer when somethinggoes wrong in a pixel shader. E.g. this can results in images where some pixels arered, and these red pixels denotes that there is an error in the shader for that pixel.

To help visualize and count the transparent fragments two shaders where im-plemented which are described in the following subsections.

3.5.1 Fragment counting

Since the number of transparent fragments in a frame impacts the rendering speed,a good method for capturing this number is needed. A simple solution to this isto use the index counter in the per pixel linked list algorithm. The index counterhas the same value as the number of drawn pixels in the frame when the framecompleted.

It seems like a trivial problem to write the fragment count to a file, but extractingany kind of data from the GPU requires lots of extra work. In this case it wouldrequire these extra steps: an extra buffer which the index counter could be storedin, since the counter only exists as an extra value in a buffer which cannot be reador written to directly from the CPU. The new buffer must be setup to be of CPUreadable buffer. Then the buffer must be mapped to a part of the main memory soit can be read. This mechanism must be added into stingray so there is a way totrigger it at the correct time.

The described solution requires too many implementation parts in the enginepart of Stingray. Instead a solution using only shaders is developed, where a largerbuffer, with space to store a uint for each frame in the recording, is created. Laterwhen recording, each frame’s transparency count is stored in the buffer. I.e. the firstframe’s transparent count is stored at index 1 and the second frame’s transparentcount at index 2 and so on. The values stored in the buffer can then be read whenthe full recording is done, by using a debug program like RenderDoc3.

3.5.2 Fragment layers

To better visualize the number of transparent fragments for each pixel a heat mapis used. To get the number of fragments per pixel, the per per pixel linked list’s

3RenderDoc is a graphics debug program which can record all the graphic commands andbuffers sent to the graphics card and then at a later time replay the same steps. The replay can bepaused at any time and RenderDoc extracts the rendering results and values used for each step.RenderDoc can be found at: https://renderdoc.org/builds

28

Page 35: Real Time Depth Sorting of Transparent Fragments

3.5. VISUALIZATION AND EVALUATION

linked lists can be used. For each pixel the length of the linked list is the number oftransparent fragments for this pixel. The range is then mapped to a color and drawnto the screen, an example of this can be seen in the bottom row of Figure 4.1. Forthe heat maps used in the results, the value instead was converted into the range0–1 and drawn in grey scale to be able to store it as a texture for extraction. Thevalue is later mapped to a color using a gradient map, an example of this can beseen in Figure 4.2d.

3.5.3 The overall work

Overall for the theses, two new algorithms were implemented: the weighted blendedorder independent transparency and the per pixel linked list which both are inter-esting candidates as replacement for per object soring and also the second algorithmof the two gives a correct rendering which can be used for comparison. In addi-tion to the algorithms two new tools were developed which enables evaluation andvisualization of the algorithms in the form of fragment counting and a layer heatmap.

These implementations makes it possible to compare the new algorithms to theper object sorted and more about the evaluation, together with the set up of testingscenes, can be found in the next chapter.

29

Page 36: Real Time Depth Sorting of Transparent Fragments
Page 37: Real Time Depth Sorting of Transparent Fragments

Chapter 4

Chapter 4 shows the results of the thesis in the form of performance graphs and visual

comparisons between the per pixel linked list, weighted blended order-independent

transparency and per object sorting.

Evaluation

When evaluating a sorting algorithm of transparent fragments for real time applica-tions, the run time performance and the visual quality are the two most importantaspects to measure of the algorithm. This chapter focuses on these two evaluationswhere the visual comparisons are done by using ground truth images rendered bythe per pixel linked list as described in the previous chapter in section 3.4.

The ground truth does not always render a correct image in the sense of fol-lowing the physical model but it gives a good estimate and it catches the mostimportant aspect for the comparison; it always gives a correct sorting of the trans-parent fragments in a pixel. To get the best looking render, rather than a groundtruth for sorting evaluations, using a pixel as the smallest unit is not a high enoughresolution to capture all the details and when rendering transparency a simplifiedmodel is used where it is assumed that the particles covering the light are uniformlydistributed within the pixel, which in reality rarely is the case and which can leadto alignment errors [18].

For the speed test the comparison is done testing the performance to the numberof transparent fragments to better see how well the algorithms handles varyingtransparency usage in a scene during a camera movement.

4.1 Evaluation set up and details

The scene, called Sponza1, selected for the benchmark and is chosen to be usedas a background environment for the transparent models and effects. The sceneis commonly used in computer graphics research due to it has enough complexity,

1The Sponza scene and Stanford dragon can be found at: http://graphics.cs.williams.

edu/data/meshes.xml

31

Page 38: Real Time Depth Sorting of Transparent Fragments

CHAPTER 4. EVALUATION

the lighting is good for evaluating indirect lighting algorithms and available undera free to use license.

To get a variation of test models both primitive geometry is used, such as spheresand cubes, one more advanced model of the Stanford dragon which contains a lotof detailed geometry and a smoke particle effect from one of Stingray’s test scenes.The models are chosen to be commonly available and used in computer graphicsresearch. The same cannot be said about the particle effect, where most gameengines use their own particle system engine, one available for Stingray and whichhas a lot of transparency layers is used.

All the performance evaluations are performed on a computer with an i7-4790Kwith 16 GB ram, an Nvidia 970 GTX and in 1920x1080 resolution. The profiling isdone using Stingray’s profiler and the frame rate locked at 30 fps.

In the visual comparison the same computer is used and resolution and withthe following post effects enabled: bloom, screen space reflection and color grading.To enhance the visual comparison the brightness levels of the images will be editedafterwards in a image editor, by increasing the levels in all the images in the samecomparison by the same amount.

The evaluations are done using different scene setups to measure the differentcases: the performance relative to number of transparent fragments, the visualquality compared to the ground truth and the performance of per pixel linked listwhen changing the number of transparent fragments per pixel.

4.1.1 Renderings

Four cases are used to capture situations where the algorithms show different resultsand one case to show a more general case.

Of the first four specialized cases, the first is used to show the sorting error of perobject sorted and how well the other two algorithms can handle the situation. Toshow this error three Stanford dragons are placed next to each other and the cameradirected in a precise angle, where the error is most noticeable. In complex modelssuch as the Stanford dragon, the sorting is wrong per triangle inside the model itselfbut this kind of error is harder see. The results can be seen in Figure 4.2 and 4.1.

The same set up, with the three dragons, are also used to compare differentweighting methods of the weighted blended order-independent transparency butviewed from a different angle and it can be seen in Figure 4.5. To show the errorwhen transparent objects are far away and are rendered with weighted blendedorder-independent transparency, a setup with a particle system and a sphere at twodistances is used, it can be seen in Figure 4.3.

For the general case a scene is set up with lots of different transparency, bothwith geometry and a particle effect to show how well the different algorithms canhandle lots of transparency. The results can be seen in Figure 4.4.

32

Page 39: Real Time Depth Sorting of Transparent Fragments

4.1. EVALUATION SET UP AND DETAILS

4.1.2 Performance

The main objective of the performance set up is to measure the performance relativeto the number of transparent fragments in the scene. To get a good overview ofdifferent cases, a longer camera sequence is used where different amount of trans-parency is visible in a single frame. During this sequence both simple geometry andmore heavy effects which make use of transparency are visible. Each algorithm isrun one at the time and the results is stored to file.

To make sure that every recording takes exactly the same path the correspondingframe for each of the algorithms to match, multiple actions are taken. The particleeffects are deterministic to guarantee the same amount of particles in each run andthe recording is triggered automatically make sure the same frames are captured.The time used is set to move forward with a fixed amount for each frame, so nomatter the rendering time of a frame, the scene time always moves by 33 ms (1/30second) per frame.

To capture the amount of transparent fragments of the frames the techniquedescribed in subsection 3.5.1 is used in a separate run to not affect the performance.

The performance results can be seen in Figure 4.6.

4.1.3 Counting transparent fragments

To test the performance of the per pixel linked list when changing the number oftransparent fragments per pixel a new set up is used. In this set up a single quadis added each frame at a random depth within a small range to make sure everyquad still covers the same number of pixels of the screen. The rendering time ofeach frame is recorded and stored to file after the test. The result is displayed inFigure 4.7.

33

Page 40: Real Time Depth Sorting of Transparent Fragments

CHAPTER 4. EVALUATION

4.2 Results

In the following subsections are the results from the different setups, which are de-scribed in the previous section, starting with a visual comparison and then followedby performance and the performance of a varying number of transparent fragmentsper pixel for the per pixel linked list.

4.2.1 Visual comparisons

Per

obje

ctP

er p

ixel

WB

OIT

5 6 7 8

Layer

s

Figure 4.1: Four frames (frame 5 to 8) of a camera panning by 5° per frame. Noticethe sorting of the blue dragon changes from in front to behind between frame 6 and7 in per object sorting (top row, middle two columns) while the sorting is stable inthe other two algorithms. The brightness of the images are increase afterwards ina image editing program to better visualize the error. In the layers (bottom) row,black means no transparency and the brighter the colors are the more transparencylayers.

34

Page 41: Real Time Depth Sorting of Transparent Fragments

4.2. RESULTS

a: Per pixel sorted b: Per object sorted

c: Difference between a and b

0

1

2

3

4

5

6

7

8

d: Layers of transparency

Figure 4.2: Three Stanford dragons (0.78 alpha) rendered in the Sponza scene.The per pixel sorted rendering (a) is used as a ground truth and as can be seen inthe difference image (c), both a large potion where the green dragon is renderederroneously in front of the red dragon and many small parts where the triangleswithin a single object (dragon) is rendered in the wrong order.

Figure 4.3: The same smoke occluding a transparent ball at two different distancesand using the WBOIT algorithm. The result looks good in the left image wherethe smoke and the ball are near each other in depth while in the right image theweighting function gives the smoke a too large color contribution making the pixelsit cover grey.

35

Page 42: Real Time Depth Sorting of Transparent Fragments

CHAPTER 4. EVALUATION

a: Per pixel sorted

0

5

10

15

20

25

30

b: Layers of transparency

c: Object sorted d: Object sorted difference

e: Weighted blended f: WBOIT difference

Figure 4.4: The Sponza scene with added spheres and particle effects. In the dif-ference images the per pixel sorted (a). As seen in object sorted difference (d) itgives more concentrated errors compared to WBOIT difference (f) which gives smallerrors everywhere.

36

Page 43: Real Time Depth Sorting of Transparent Fragments

4.2. RESULTSD

epth

wei

ght

N

h w

eight

PP

LL

a b c d

0

2

4

6

8

10

Layer

s

Figure 4.5: Three dragons rendered with WBOIT with four different weight func-tions, and with or without depth weighting. Also a rendering with PPLL as aground truth for comparison. The depth weight adds a large weight to the frontmost layer and as can be seen in the top row of the figure, where the front mostblue dragon is much more dominant than in the bottom row. Depending on theweighting function or if the depth weight is used the results differentiate a lot andin this particular case the depth weighting gives better result than not using it.

37

Page 44: Real Time Depth Sorting of Transparent Fragments

CHAPTER 4. EVALUATION

4.2.2 Performance

0

2

4

6

8

10

12

14

16

18

0 50 100 150 200 250 300 350

0.0

0.4

0.8

1.2

1.7

2.1

2.5

Tim

e (

ms)

Tra

nspare

nt

fragm

ent

per

pix

el

Frame #

Transparency count

Per object sorted

0

5

10

15

20

25

30

Figure 4.6: A tour of the Sponza scene with added transparent spheres and someparticle effects. The scene has about 1.54 million triangles. The graph representsthe GPU time spent per frame and the average number of transparent fragments perpixel per frame. The images at the top shows three of the frames of the recording andwith a heat map of the number of transparent layers in the frame. As can be seenin the graphs, the weighted blended order-independent transparency gives a smalloverhead compared to the per object sorted while the per pixel sorted transparencyuses a lot more GPU time.

38

Page 45: Real Time Depth Sorting of Transparent Fragments

4.3. DISCUSSION

0

2

4

6

8

10

12

0 10 20 30 40 50 60

Tim

e (m

s)

Layers #

Sorting

Forward

Total

Figure 4.7: The graph shows the performance of two stages of the per pixel linkedlist algorithm, the forward pass where the geometry is rendered and the per linkedlists construction together with the sorting pass where the linked lists are sortedand rendered to the screen.

4.3 Discussion

The results contains lots of images with visual comparisons and graphs of perfor-mance measurements which can make it hard to see the full picture. In this sectionis a discussion of the results to help the reader grasp the details.

4.3.1 Visual comparison

In the visual comparison it can be seen that the per object sorting has discontinuitiesin sorting between frames where large parts of the screen can change between twoadjacent frames, creating very noticeable rendering artifacts as seen in Figure 4.1.The other two algorithms does not suffer from the same type of problems andremains stable between frames.

One thing to take into consideration is that visual quality is in many cases asubjective comparison and looking at Figure 4.4 gives an idea of this. In this ren-dering with lots of transparency on the screen it is hard to see the errors even whenthere are some parts which are rendered in the wrong order as seen in the differenceimages Figure 4.4f and 4.4d. But all scenes do not have this much transparencyhappening and in the cases where there are other objects which can be used as areference point the errors are more easy to spot. The same goes for the “poppingeffect” when the sorting changes between two frames which makes it easy to spotthe error. If the same error is displayed in two still images next to each other theerror can be much harder to see.

The weighted blended order-independent transparency exploits that it in manycases is hard to know what a correct rendering looks like and will always keep astable sorting making it less likely that the illusion of a correct rendering is broken.But the illusion comes at a price, the colors are not always correct which can make

39

Page 46: Real Time Depth Sorting of Transparent Fragments

CHAPTER 4. EVALUATION

the geometry and particle effect have a look which was not indented by the designer.As can be seen in Figure 4.5 it is possible to choose a weighting function which givesa close approximation to the real rendering but it is not always the case as can beseen in Figure 4.3 the errors can be quite distinct when objects are moved out ofthe weighting functions effective distance.

Instead of relying to hiding errors, the per pixel linked list always gives a correctsorting which is both stable and true to the original design made by an model orlevel designer. The result by the per pixel linked list would optimal from a visualquality perspective.

4.3.2 Performance

In Figure 4.6 shows the performance of the different algorithms relative to averagenumber of transparent fragments per pixel. The per object sorting is used as a lowerbar, since it uses no extra resources in the form of buffers or posteffect to apply thetransparency. The performance follows the average number of transparency pixelsbut there are some inconsistencies which can be due to the amount of particlesand geometry in each frame, where the particles has a low amount of triangles peramount of transparent fragments generated, making them cheaper.

As can be seen in the figure, the weighted blended order-independent trans-parency is almost as fast as the per object sorted in many cases and its performancefollows the per object sorted, having exactly the same strengths and weaknesses.If the depth prepass is skipped it should have exactly the same graph, but with asmall offset for the extra cost of rendering to a separate buffer.

The per pixel linked list follows the same pattern as the other two algorithmsbut is a lot more expensive and the cases of lots of transparency in a frame it usesa lot of GPU processing time. To make a rendering real time it must use less than33 ms per frame and spending up to 16 ms per frame for the transparency alone isa heavy price to pay. The algorithm can be optimized further to increase the speedbut with the speed increase needed it looks like long way before it can be used as ageneral transparency algorithm in real time applications.

Overall, the per object sorted and the weighted blended order-independent trans-parency have good enough performance to be used and the per pixel linked list mightbe usable for high end systems performance wise or to test other algorithms againstfor a visual comparison in real time.

4.3.3 Transparent fragments

Even though the per pixel linked lists’ time complexity is quadratic, the value inmany real world cases does not have a large impact on the performance comparedto a linear algorithm, especially with a limit on the average number of transparentpixels enforced by the fixed linked list buffer size used.

40

Page 47: Real Time Depth Sorting of Transparent Fragments

4.3. DISCUSSION

4.3.4 General discussion

With three different algorithms tested and all with their respective strengths andweaknesses it is important to make a full comparison to find out which algorithmis the most suitable to use as a general transparency algorithm in a real time appli-cation.

The per object sorting has many good strengths which has made it the mostused algorithm. Some of these strengths are the simplicity, almost no extra memoryusage and its performance. On the other hand it gives rendering artifacts whentransparent fragments are sorted and drawn in the wrong order.

The weighted order-independent transparency removes the problem of bad sort-ing but introduces a new type of errors where the color is not correct. The colors areusually carefully selected by an artist to create feeling and look and an algorithmwhich does not respect these selected values could have a bad impact. By lettingthe artist select the weighting function it could also be possible to work around itand incorporate the effect into the design process. The speed of the algorithm is fastenough to be used and imposes no problem except when there is lots of transparencyin a frame but all transparency algorithms suffers from this problem.

In the last algorithm, the per pixel linked list, the visual quality of the sortingis perfect and the algorithm is only limited by its speed. The speed in a real timeapplication is very important and if the rendering takes a long time and reducesthe frame rate too much it can drastically impact the experience of the game orapplication. Another problem can be the memory usage. The algorithm requireslots of memory to store all the fragments before sorting. The memory usage can bereduced to some extent using smart ways of storing the data [17].

Of these three algorithms the choice would still be to use the per object sortedalgorithm. The speed problem of the per pixel linked list makes unsuitable touse and the weighted blended order-independent transparency changes one visualproblem into another but at the same time adding more work to make it look good.Still, there are special cases where the different algorithms can be used, where itis possible to use their strengths without suffering from the weaknesses. Some ofthese cases be using the per pixel linked list for hair renderings or using the weightedorder-independent transparency in spaces where the maximum distance is known.

41

Page 48: Real Time Depth Sorting of Transparent Fragments
Page 49: Real Time Depth Sorting of Transparent Fragments

Chapter 5

Chapter 5 sums up the thesis by describing which conclusions can be drawn from

the results, what can be improved upon or other algorithms of interest for further

research. The chapter also contains a reflection of the work done in the thesis with

thoughts of what could have been done differently.

Conclusions

Even with per object being the most used sorting technique for transparent frag-ments in real time 3D rendering for a long time, a new fast general algorithm whichhas better visual quality than per object sorting could change the usage of trans-parency in games of visualization applications and without the need to use differ-ent algorithms for different situations. The weighted blended ordered-independenttransparency’s quality depends a lot on where it is used as can be seen in Figure 4.5and would require an artist to decide when it should be used or not, to limit theamount of visual artifacts. And as suggested by McGuire and Bavoil if there aresituations where the distance is known the weighting function could give a good es-timate for sorting of transparency and it does not use too much of the performancebudget to be of any problem [18].

When using the other algorithm, the per pixel linked list, other types of problemsarise. The main concern is the performance when there is a lot of transparency onthe screen like when particle systems are close to the camera. Also since the perpixel linked list’ buffers have fixed size there is a risk for overflow which can createlots of visual artifacts and inconsistencies between frames. This makes it moresuitable for situations where there are more control over the amount of transparencyvisible. Examples of such situations can be hair or fur renderings [4] where an order-independent transparency algorithm is needed. Other usages could be for objectsfurther away, since these objects covers less of the screen the performance is lessof a problem, but it is hard to motivate usages like this when objects near thecamera have the largest impact on the final rendering and trying to limit the use toa small part of the distance dimension creates problem when an object moves overthe threshold for what is an acceptable distance to use the different algorithms.

Neither of the two additional algorithms have the required qualities of a gen-

43

Page 50: Real Time Depth Sorting of Transparent Fragments

CHAPTER 5. CONCLUSIONS

eral transparency sorting algorithm and can only be used situational or for limitedpurposes. Instead the hope lies in the coming algorithms, such as hybrid trans-parency or adaptive transparency. But even with algorithms performing as well asthe per object sorting and without the same distinct visual artifacts other kinds ofalgorithms will be used. Even if the algorithms only increase the speed in a specialcases where the visual quality is not as important, since in real time 3D renderingspeed is everything.

5.1 Reflections

The complexity of a full-fledged game engine without having any documentation andnot working with the authors of the code became a problem when implementing theadditions needed in the form of read write buffers. Also the maturity of the engine,where it has a lot of changes since the acquisition from Fatshark, made it hard tofind documentation for uncommon features. That said, the documentation for themore common things and API were excellent and having access to all the tools forsetting up scenes, switching between different algorithms and profiling was reallyhelpful.

5.2 Future work

There are still many other algorithms which were not tested here and there aresome improvements which could be done to the evaluated algorithms. The mostinteresting are the new algorithms available for DirectX 12, hybrid transparency[16] and adaptive transparency [22], which on paper could both be fast and give agood approximation for the transparency. Both of them make use of the new perpixel synchronization to keep a fixed number of samples per pixel but aggregatingthe least important samples during the rendering. The stochastic transparency alsocould be a good choice, but would require better hardware to reduce the impact ofthe high performance base cost of the algorithm.

The performance of the per pixel linked list could be improved by using a maskfor the sorting, using multiple sorting algorithms depending on how many fragmentsthere are per pixel. So in multiple passes, the first pass would handle all the pixelswith a single fragment, the next pass with two, and three and four and so. It couldalso be possible to render parts, such as a particle effects to a buffer using no sortingand then render the buffer to the per pixel linked list, to remove some of the layercomplexity of particle effects. Since it often particle effect which has many layersof transparency.

For the weighted blended order-independent transparency would need some wayto better identify situations where it can be used efficiently.

44

Page 51: Real Time Depth Sorting of Transparent Fragments

Bibliography

[1] Louis Bavoil and Kevin Myers. “Order Independent Transparency with DualDepth Peeling”. In: 2008.

[2] Brent Burley. “Physically-Based Shading at Disney”. In: 2012.

[3] E. Enderton et al. “Stochastic Transparency”. In: Visualization and ComputerGraphics, IEEE Transactions on 17.8 (Aug. 2011), pp. 1036–1047. issn: 1077-2626.

[4] Wolfgang Engel. GPU Pro 5: Advanced Rendering Techniques. 2014. isbn:9781482208634.

[5] Cass Everitt. Interactive Order-Independent Transparency. 2001.

[6] Jon Jansen and Louis Bavoil. “Fast rendering of opacity-mapped particlesusing DirectX 11 tessellation and mixed resolutions”. In: (2011). issn: 0097-8930.

[7] Wojciech Jarosz. “Efficient Monte Carlo Methods for Light Transport in Scat-tering Media”. PhD thesis. UC San Diego, Sept. 2008.

[8] Norman P. Jouppi and Chun-Fa Chang. “Z3: An Economical Hardware Tech-nique for High-Quality Antialiasing and Transparency”. In: SIGGRAPH/Eu-rographics Workshop on Graphics Hardware. Ed. by A. Kaufmann et al. TheEurographics Association, 1999. isbn: 1-58113-170-4.

[9] Brian Karis. “Real Shading in Unreal Engine 4”. In: 2013.

[10] Pyarelal Knowles, Geoff Leach, and Fabio Zambetta. “Fast sorting for exactOIT of complex scenes”. English. In: The Visual Computer 30.6-8 (2014),pp. 603–613. issn: 0178-2789.

[11] Baoquan Liu et al. “Multi-layer depth peeling via fragment sort”. In:Computer-Aided Design and Computer Graphics, 2009. CAD/Graphics ’09.11th IEEE International Conference on. Aug. 2009, pp. 452–456.

[12] B. Liu et al. “Multi-layer depth peeling via fragment sort”. In: Computer-Aided Design and Computer Graphics, 2009. CAD/Graphics ’09. 11th IEEEInternational Conference on. Aug. 2009, pp. 452–456.

[13] Fang Liu et al. “Efficient Depth Peeling via Bucket Sort”. In: Proceedings ofthe Conference on High Performance Graphics 2009. HPG ’09. New Orleans,Louisiana: ACM, 2009, pp. 51–57. isbn: 978-1-60558-603-8.

45

Page 52: Real Time Depth Sorting of Transparent Fragments

BIBLIOGRAPHY

[14] A. Mammen. “Transparency and antialiasing algorithms implemented withthe virtual pixel maps technique”. In: IEEE Computer Graphics and Applica-tions 9.4 (July 1989), pp. 43–55. issn: 0272-1716.

[15] Marilena Maule et al. “A survey of raster-based transparency techniques”. In:Computers & Graphics 35.6 (2011), pp. 1023–1034. issn: 0097-8493.

[16] Marilena Maule et al. “Hybrid Transparency”. In: Proceedings of the ACMSIGGRAPH Symposium on Interactive 3D Graphics and Games. I3D ’13.Orlando, Florida: ACM, 2013, pp. 103–118. isbn: 978-1-4503-1956-0.

[17] M. Maule et al. “Memory-Efficient Order-Independent Transparency with Dy-namic Fragment Buffer”. In: Graphics, Patterns and Images (SIBGRAPI),2012 25th SIBGRAPI Conference on. Aug. 2012, pp. 134–141.

[18] Morgan McGuire and Louis Bavoil. “Weighted Blended Order-IndependentTransparency”. In: Journal of Computer Graphics Techniques (JCGT) 2.2(Dec. 18, 2013), pp. 122–141. issn: 2331-7418.

[19] D. Moth. warp or wavefront of GPU threads. https://blogs.msdn.microsoft.

com/nativeürency/2012/03/26/warp-or-wavefront-of-gpu-threads/.Accessed: 2016-03-20. 2012.

[20] A. L. Petrescu et al. “Virtual Deferred Rendering”. In: 2015 20th InternationalConference on Control Systems and Computer Science. May 2015, pp. 373–378. doi: 10.1109/CSCS.2015.49.

[21] Thomas Porter and Tom Duff. “Compositing Digital Images”. In: SIGGRAPHComput. Graph. 18.3 (Jan. 1984), pp. 253–259. issn: 0097-8930.

[22] Marco Salvi, Jefferson Montgomery, and Aaron Lefohn. “Adaptive Trans-parency”. In: Proceedings of the ACM SIGGRAPH Symposium on High Per-formance Graphics. HPG ’11. Vancouver, British Columbia, Canada: ACM,2011, pp. 119–126. isbn: 978-1-4503-0896-0.

[23] Erik Sintorn and Ulf Assarsson. “Real-time Approximate Sorting for SelfShadowing and Transparency in Hair Rendering”. In: Proceedings of the 2008Symposium on Interactive 3D Graphics and Games. I3D ’08. Redwood City,California: ACM, 2008, pp. 157–162. isbn: 978-1-59593-983-8.

[24] R. Taylor and X. Li. “A Micro-benchmark Suite for AMD GPUs”. In: Paral-lel Processing Workshops (ICPPW), 2010 39th International Conference on.Sept. 2010, pp. 387–396.

[25] Jason C. Yang et al. “Real-time Concurrent Linked List Construction on theGPU”. In: Proceedings of the 21st Eurographics Conference on Rendering.EGSR’10. Saarbrücken, Germany: Eurographics Association, 2010, pp. 1297–1304.

[26] Nan Zhang. “Memory-Hazard-Aware K-Buffer Algorithm for Order-IndependentTransparency Rendering”. In: Visualization and Computer Graphics, IEEETransactions on 20.2 (Feb. 2014), pp. 238–248. issn: 1077-2626.

46

Page 53: Real Time Depth Sorting of Transparent Fragments

Appendix A

Figures

Figure A.1: Ten frames recorded with three different algorithms. In frame 6 thereis a sorting error in the per object sorted algorithm, where the blue dragon iserroneously rendered in front (compare with the per pixel). Four of the frames areused in Figure 4.1.

47