(Some) Algorithm - Unite Seouluniteseoul.com/2019/PDF/D1T2S4.pdf · 2020. 2. 6. · Field Engineer, Unity Technologies. Agenda 3. Agenda 4 — Look at how HDRP renders progressively

(Some) Algorithm & Data-structure in HDRP

How (some configuration of) HDRP renders a frame

Victor S.C. LuiField Engineer, Unity Technologies

Agenda

3

Agenda

4

— Look at how HDRP renders progressively more complex scenes

— Start with HDRP in one of its simplest configurations

— Add Alpha-tested material, GI, SSAO, SSR

— Add Forward Opaque / Transparent material, and Postprocessing Effects such as Bloom

— Add decal, light list building, feature categorization

Agenda

5

— We’ll analyze using Nsight Graphics from Nvidia - nice graphic debugging/profiling tool - chose it because computer has Nvidia GPU

— GPU frame capture will become more and more complex, like this

— We’ll try to understand it!

Agenda

6

— We’ll also look at (basic) Subsurface Scattering!(…if we have enough time)

— And area light— And volumetric lighting— And refraction— And hair/eye/fabric/coating…— And RTX

— We are only covering a small part of HDRP,but this is the foundation upon which more advanced topics are built

Specs

7

— Project version: Unity 2019.1

— HDRP version: 5.7.2

— Linear color space

— Windows Standalone Player, DirectX-11

Frame One

8

Frame One

9

— Almost everything is turned off in our HDRenderPipelineAsset

— Lit Shader Mode: Deferred Only

— Just renders Opaque Objects with Shadow Map under one directional light

— We also need Exposure Control, otherwise scene easily look too dark/bright

Frame One

10

— Ball on top has high metallic, low smoothness value

— Ball underneath has high smoothness, low metallic value

Frame One

11

— This will be a standard deferred rendering process:

G-Buffer -> Shadow map -> Deferred lighting

Lit shader BRDF

12

— For direct lighting on a material using Lit shader:

Lit shader BRDF

13

— HDRP Lit shader uses modified Burley’s diffuse:[BSDF.hlsl – DisneyDiffuseNoPI]

— Equation not important per se, what is important is we need these surface parameters for calculating direct diffuse lighting:

- Diffuse color- Roughness- Normal

Lit shader BRDF

14

— For specular, HDRP Lit shader uses SmithJointGGX.[Lit.hlsl – BSDF]

…Let’s skip the equation! What is important is that this requires we have surface parameter:

- F0

Lit shader BRDF

15

Albedo

Metallic

Smoothness

User specifies in Material:

Normal

Lit shader BRDF

16

Diffuse Color

F0

Roughness

But BRDF needs:

Normal

Lit shader BRDF

17

Albedo

Metallic

Smoothness

Diffuse Color

F0

Roughness

User specifies in Material: BRDF needs:

HDRP transforms for us

Normal Normal

Lit shader BRDF

18

Albedo

Metal

Smoothness

Diffuse Color

F0

Roughness

User specifies in Material: BRDF needs:

Same valueNormal Normal

“Some inverse relation”

???

Lit shader BRDF

19

FromAlbedo and metal

ToDiffuse color and F0

Lit shader BRDF

20

— For highly metallic surface, F0 is close to your albedo map. Your albedo map mainly impact specular lighting.

— For not very metallic, Diffuse color is close to your albedo map.Your albedo map mainly impact diffuse lighting.

FromAlbedo and metal

ToDiffuse color and F0

G-Buffer (for standard Lit shader)

21

We save in G-Buffer the BRDF inputs - Avoid doing conversion in lighting pass to reduce register pressure - G-Buffer pass is simpler so can afford the extra register pressure

* “SpecularOcclusion” “coatMask” – not used, not in scope of this talk* “featureID” “bakedDiffuseLighting” – will look at them later

G-Buffer (for standard Lit shader)

22

Normal Encoding:

Map normal vector in unit sphereto unit square like this ->

Store the (x , y) coordinatesas two 12bit float in G Buffer

Gets higher precision with same memory footprint

Stencil usage: Lighting Mask

23

Stencil buffer pixels will be marked with “should receive lighting” flag as we do draw calls in G-Buffer pass.

During the deferred lighting pass, we only do lighting calculation on those pixels.

This excludes doing deferred lighting on sky pixels.

Deferred Lighting

24

— Lights are stored in HLSL StructuredBuffer [ShaderVariablesLightLoop.hlsl]

— They are first sorted by type (spotlight, point light…) then sent to GPU[LightLoop.cs]

Makes GPU threads more likely execute the same branch *Two GPU threads in same wave-front going on different branch is bad for performance

Deferred Lighting

25

— During deferred Lighting, we do a whole screen draw call, and each pixel goes through the LightLoop.[LightLoop.hlsl]

— In a loop, look up all the light uploaded and receive their illumination, using BRDF defined by shader.

Frame Two

26

Frame Two

27

— Enabled Screen Space Ambient Occlusion (SSAO)

— Enabled Screen Space Reflection (SSR)

— Enabled Shadow-mask

Frame Two

28

— Turn on Baked Global Illumination (GI)with Shadow-mask

— One directional mixed light

Frame Two

29

— Added grass (Alpha-tested material)

— Added very smooth mirror-like material- This is the only material in scene with “Receive SSR” option turned on

— Added reflection probes

— Added light probes

Alpha-Tested Materials

30

— Alpha test during G-Buffer pass is expensive- Cannot do early-Z -> texture fetch without draw

— Solution: Instead we Alpha test in Depth Prepass- Depth Prepass has close to zero pixel shader cost so it is okay if we cannot early-Z

— Then we draw in G-Buffer pass with ZTest Equal, without enabling alpha test.[BaseUnlitUI.cs]

Alpha-Tested Materials

31

— Depth prepass with alpha-tested material

— G-Buffer pass with normal opaque material (writes to depth buffer as well of course)

— Alpha tested material render with ZTest Equal and no alpha testing

Shadowmask

32

— Shadowmask map

— Records direct shadowing of mixed light from static meshes to static meshes

— Stored in extra G-Buffer, let’s call it G-Buffer 4.

Global illuminations – Indirect Diffuse

33

— Baked directional light map + Light probes

— At G-Buffer pass, using these we calculate indirect diffuse lighting from the pixel to camera

— Stored in G-Buffer 3

Global illuminations – Indirect Specular aka reflection

34

— Environment map: captures light coming from each direction of sky

Basically is a cube map sampled by deferred lighting pass

— Convolution, projection: out of scope of this talk


35

— Baked reflection probes: records sharp lighting received by a point in scene from each direction

Basically is a cube map sampled by deferred lighting pass

— Blending, convolution, projection: out of scope of this talk

— There is also planar reflection probe:out of scope of this talk


36

— We also create a screen space reflection texture to be sampled by deferred lighting pass

— Shoot ray from camera to each pixel in depth buffer

— Reflect and march the ray to see if it hits some other pixel in depth buffer

— If so, record the hit pixel’s color


37

Can’t find Can’t find

— In HDRP, we have a reflection hierarchy like this:

Global illuminations – Ambient Occlusion

38

— Aim: find how much each pixel is surrounded by solid instead of air

— The more it is surrounded by solid, the less indirect light rays it is able to get

— This affects indirect diffuse and indirect specular by darkening their illumination


39

— In our talk, occlusion only comes from SSAO pass, a real-time screen-space algorithm.

— Artist can actually specify detailed occlusion data using Mask map G channel and Bent Normal map too.

— We basically “use the darker of the two sources”


40

— In theory, direct lighting should not care about ambient occlusion. Direct lighting should use shadow mapping instead.

— In practice, artists can ask direct diffuse lighting also be affected by ambient occlusion by setting the parameter Direct Lighting Strength to a non-zero value in a Volume component [MaterialEvaluation.hlsl]


41

— Also, we use micro-shadowing. [The Technical Art of Uncharted 4 - Brinck and Maximov 2016]

— More influence from AO to direct lighting!

— Another “hack that looks good”[SurfaceShading.hlsl]

Global illuminations

42

— All these extra information now used in our deferred lighting pass

Global illuminations

43

No wonder frame two looks so much better!

Screen Space Reflection

44

— Ray-marching on depth buffer is expensive!

— SSR cost grows linearly with number of pixel using SSR

— User can specify whether to “Receive SSR” for each material

— If “Receive SSR” is not on, we mark on stencil buffer that SSR should not be performed.


45

12 3

1. In previous frame, take final color rendered, and down-sample it using Gaussian Blur:


46

12 3

2. For each pixel not marked with Don’t Receive SSR flag in stencil buffer, shoot ray from camera and march reflected ray in depth buffer, when we hit something, record its screen space position (x , y):


47

12 3

3. Depending on the smoothness of the surface, choose mipmap level (ex: rough surface use blurred image)

Sample our previous frame Color Gaussian MIP chain at appropriate position and mipmap level.

Screen Space Reflection - Optimizations

48

Hi-Z ray-marching

We generate a MIP chain of depth buffer where low resolution image is closer to camera at every pixel

Then we perform this Hi-Z ray-marching algorithm.


49

Hi-Z ray-marching algorithm [Frostbite 2015 SIGGRAPH]

mip = 0; while (level > -1) step through current cell; if (above Z plane) ++level; if (below Z plane) --level;

[Thanks, Frostbite 2015 SIGGRAPH]


50





51





52





53





54




Screen Space Ambient Occlusion

55

Down-sampling:Most simple method is fine, no need to blur


56

Calculate AO

Calculate AO

Calculate AO

Increasing range of occlude detection, decreasing accuracy


57

Calculate AO:How?

“Like this”[Volumetric Obscurance, Bradford 2010]


58

Combine these maps using bilinear upscaling


59

Combine these maps using bilinear upscaling[AmbientOcclusionUpsample.compute]

“When upscaling, receive less influence from lower resolution AO map if difference in depth value is big”

Frame Three

60

Frame Three

61

— Added support for rendering transparent material

— Added Bloom and Vignette post-processing effect

— Camera has FXAA turned on

Frame Three

62

— Transparent material ball

— Fabric shader-graph material ball - Material that cannot be rendered on deferred path

Deferred/Forward Hybrid Architecture

— Transparent material is rendered using forward rendering path – blend on top of opaque images

— Complex opaque material (such as Fabric, Skin…) has a lot of surface properties.If we encode all in G Buffer, G Buffer will be too large!Solution: Use forward rendering path


Forward opaque material (Fabric)

Forward transparent material


— Before Deferred Lighting, we may need output from screen-space pre-lighting algorithms such as SSAO, SSR…

They may need depth and normal buffer of the frame!

— So we need to guarantee this:

After G Buffer pass, we have all normal/depth information about opaque object in the scene.

— Solution:1. Forward opaque material need to do a depth prepass too. 2. During the depth prepass, forward opaque material also output normal to G Buffer.


Output to depth AND normal bufferForward Opaque material Depth prepassAFTERAlpha Tested Depth material prepass

Because Forward Opaque may early-Z

Post processing: FXAA

“Original resolution color buffer blur, but keep edges sharp”

Happens at Final Pass:[FinalPass.shader]

Post processing: Vignettes

“Makes edge of screen darker – help player focus on center of screen”

Happens at Post-processing Uber shader:

Post processing: Bloom

1. Down-sample and blur 2. Up-sample and Merge

Post processing: Bloom

Q: Why not use Color pyramid generated for SSR?

A: The color pyramid generated for SSR and Bloom is down-sampled differently.Currently our post-processing team has no plan to unify the two.

Time to talk about LDS

Generating Gaussian Color Pyramid, SSAO, Bloom all have one thing in common:

REPEATED NEIGHBOURING PIXEL TEXTURE FETCH

All of these use compute shader and cache neighbor pixels in LDS (local data storage)

Significant performance boost : )

Frame Four

72

Frame Four

73

— Enabled decal, will use in most simple way: modify color/normal/smoothness

— Turn “Metal and Ambient Occlusion Properties” off

Frame Four

74

— 3 sets of decal projector on opaque meshes: - Only Base Map (affects albedo) - Plus Normal Map (+ affects normal) - Plus Mask Map (+ affects smoothness)

Frame Four

75

— Decal Projector with “Affects Transparent Material” turned on

D Buffer

76

— Do a full depth prepass for all opaque material

— Issue (instanced) draw call for each decal projector rendering to a separate D Buffer.

Each fragment generated by decal draw call can find the screen space position of opaque pixel it needs to modify by sampling the depth buffer.

Thus, each fragment decide appropriate pending decal modification and output result to D Buffer.

— During G Buffer pass and forward opaque rendering draws later, we can then easily use D Buffer modifier as input in fragment shader.

D Buffer

77

— In our current configuration, D Buffer has 3 render targets as components:

— Color modification

— Normal modification

— Smoothness modification

D Buffer

78

— Problem: What if one decal modification is on top of another?

D Buffer

79

— Requirement: Need to keep this rule:

Blending each decal modifications in D Buffer, and then blending this with G Buffer

should give same result as

Blending each decal modification with G Buffer directly, one after another

— Solution: To achieve this, we use alpha compositing.[GPU Gems 3 Chapter 23]

— Problem: What if one decal modification is on top of another?

D Buffer

80

— Alpha Compositing[GPU Gems 3 Chapter 23]

Apply (x1,a1) (x2,a2) (x3,a3) on x using Blend SrcAlpha OneMinusSrcAlpha :

x x(1-a1) + x1a1x(1-a1)(1-a2) + x1a1(1-a2) + x2a2x(1-a1)(1-a2)(1-a3) + x1a1(1-a2)(1-a3) + x2a2(1-a3) + x3a3

Apply (x1,a1) (x2,a2) (x3,a3) on (0,1) using:

Blend SrcAlpha OneMinusSrcAlpha for RGB channel: x1a1x1a1(1-a2) + x2a2x1a1(1-a2)(1-a3) + x2a2(1-a3) + x3a3 := X

Blend Zero OneMinusSrcAlpha for alpha channel:

(1-a1) (1-a1)(1-a2) (1-a1)(1-a2)(1-a3) := A

Apply (X,A) on x using Blend SrcAlpha One: x(1-a1)(1-a2)(1-a3) + x1a1(1-a2)(1-a3) + x2a2(1-a3) + x3a3

D Buffer

81

— Problem: 1. Forward material outputs normal in depth prepass2. D Buffer is built after depth prepass3. Just after G Buffer pass, need to guarantee normal/depth buffer complete

… Wouldn’t the normal buffer just after G Buffer pass not reflect decal modification?

— Solution: We use Decal Normal Patch Extra pass between D Buffer pass and G Buffer pass to apply Decal modification to normal buffer

D Buffer

82

— Decal Normal Patch

Wouldn’t this be expensive?We minimize the cost best we can.

- Decal writes “Here is decal!” on stencil buffer, - Forward material writes “Here is normal from forward material!” on stencil buffer- Only perform Decal Normal Patch on pixels where both flags are set

*This kind of complexity comes from our insistence forward/deferred path has feature parity

D Buffer : Optimizations

83

— H-Tile: A low resolution render target. One pixel represents a 8x8 tile on D Buffer

— Record flags such as: “This tile has color modification”“This tile has normal modification”““This tile has smoothness modification”

— When G Buffer shader / Forward material shader look up D Buffer, instead of always doing 3 texture fetch, first do a H Tile query to find out what textures really need be fetched.

D Buffer: Why?

84

— Most well-known way of implementing Decal (if you have a G Buffer pass) is Screen-space Deferred Decal: [SIGGRAPH 2012 - Screen Space Decals in Warhammer 40,000]

1. Do the GBuffer pass and copy the depth buffer to a texture.

2. Each decal issue a draw call of its Oriented Bounding Box to G Buffer. 3. Each resulting fragment can appropriately decide what modification to G Buffer it should apply by sampling the depth buffer.

4. Apply the modification if needed to G Buffer by blending.

85

— Most well-known way of implementing Decal (if you have a G Buffer pass) is Screen-space Deferred Decal [SIGGRAPH 2012 - Screen Space Decals in Warhammer 40,000]

BUT:

1. Does not support applying on forward opaque material

2. Constraints G Buffer layout: - Must be blend-able (no complex encoding) - Constraints storing data in G Buffer alpha channel

D Buffer: Why?

86

— HDRP is all about visual quality

— We choose D Buffer, the higher quality (but more memory hungry) approach.

D Buffer: Why?

87

What about Decal for Transparent material?

— For transparent material, we use a separate system.

— “Treat each decal as a light”

1. Upload to GPU: - color/normal/smoothness map to a Decal Atlas - decal projector’s transform data and Decal Atlas key to a Decal Data Array

2. When rendering transparent material, in its fragment shader, loop through the Decal Data Array and for each entry modify the fragment appropriately.

— Transparent decals as expensive as a light (or more because of texture fetch)!

Decal Atlas for our frame

Frame Five

88

Frame Five

89

— Enabled Async-Compute (not supported in DX11)

— Enabled Big Tile Prepass + Deferred Tile

90

Problem: Looping through all the lights?

e

e

e

91

Tiled Lighting

— A technique to help opaque fragments being shaded avoid doing lighting calculation for light that doesn’t really affects them.

— How?

92

Tiled Lighting— Make sure before lighting you have depth buffer all filled up.

— Upload your lights in the scene to GPU as a list. For each light, define a bounding volume and also upload those.

— Divide the camera frustum into many small tiles. For each tile, find the max and min depth value. This gives us a bounding volume for opaque fragments being shaded the tile.

— For each tile, we build a tile light index list, the list of indices of lights potentially affecting shaded fragments within the tile by performing intersection test between the bounding volume for the tile and the light.

— When calculating lighting for a fragment, instead of going through the whole light list, go through the tile light index list of the tile that contains the fragment instead.

93

Fine Pruned Tiled Lighting

— A version of Tiled Lighting where we aggressively remove false positives. Really minimize size of tile light index list.

— Most important difference is that we do an extra pass to prune the original tile light index list:

For each pixel (x , y) in the tile, find depth value, z, of pixel. Find list of light whose bounding volume includes (x , y , z). Merge these lists coming from each pixel in the tile. The result is a pruned tile light index list.

— In reality, we do the same thing but with more efficient algorithm using compute shader.All 9 lights pass normal Tiled Lighting test,

Only 1 passes FPTL

94

Clustered Lighting

— Tiled lighting only helps opaque fragment ignore irrelevant lights.

— This is because these algorithms prune unrelated lights based on the assumption that we only render fragments at (x,y) with depth value of sample_depth_buffer(x,y).

— What about fragments coming from transparent draws?We use Clustered Lighting.

cluster tile

95

Clustered Lighting

— For each light, define a bounding volume.

— Divide camera frustum along all 3 dimensions into a lot of clusters.

— For each cluster, perform intersection test between the light bounding volume and the cluster to build a cluster light index list.

cluster tile

96

Light List Building Algorithm in Actionaka “We prune very hard”

How exactly does HDRP build the FPTL/Clustered-Lighting light lists?

97

Light List Building Algorithm in Action

— How exactly does HDRP build the FPTL/Clustered-Lighting light lists?

1. Preparation

As we upload the list of light data to GPU, also upload a Bounding Sphere and a Bounding Trapezoid. Same format for all light type. [LightLoop.cs]

b

h cd

a

98



1. Preparation

As we upload the list of light data to GPU, also upload a Tight Bounding Volume. Different format for different light types. [LightLoop.cs]

99



2. AABB Generation Pass

For each light, clip each face of its Bounding Trapezoid. (Clipping is complex, but necessary)

After clipping, let the (still convex) volume have vertices v1,…,vn in screen space. We use these vertices v1,…,vn to build a screen space AABB. [scrbound.compute]

100



2. AABB Generation Pass

For each light, clip each face of its Bounding Trapezoid. (Clipping is complex, but necessary)

After clipping, let the (still convex) volume have vertices v1,…,vn in screen space. We use these vertices v1,…,vn to build a screen space AABB. [scrbound.compute]

h cd

h

dc

101


— How exactly does HDRP build the FPTL/Clustered-Lighting light lists?3. Big Tile List Generation Pass

Divide the screen into many 64x64 “Big Tile”. For each “Big Tile”:- Find all lights with 2D AABB (generated last pass) overlapping the big tile (2D)- Prune away all lights with Bounding Sphere not overlapping the big tile (2D)- We get the coarse list to be used by later passes[lightlistbuild-bigtile.compute]

102


— How exactly does HDRP build the FPTL/Clustered-Lighting light lists?4. FPTL Pass

For each 16x16 tile:- Sample depth buffer to get a 3D bound of our tile- Take coarse list of the 64x64 Big Tile containing our 16x16 tile- Prune the coarse list using 3D AABB (light) – AABB (tile) intersection test- Further prune using 2D Bounding Sphere (light) - AABB (tile) intersection test- Perform intersection test between 3D screen space position of every pixel in tile and the Tight Bounding Volume of still-unpruned light. Remove light not intersecting any pixel in tile.

[lightlistbuild.compute]

103



Expensive

Last step is expensive, involves a lot of texture fetch, so we prune a lot with cheap methods beforehand (4 rounds) to minimize expensive work.

4. FPTL Pass

For each 16x16 tile:- Sample depth buffer to get a 3D bound of our tile- Take coarse list of the 64x64 Big Tile containing our 16x16 tile- Prune the coarse list using 3D AABB (light) – AABB (tile) intersection test- Further prune using 2D Bounding Sphere (light) - AABB (tile) intersection test- Perform intersection test between 3D screen space position of every pixel in tile and the Tight Bounding Volume of still-unpruned light. Remove light not intersecting any pixel in tile.

[lightlistbuild.compute]

104


— How exactly does HDRP build the FPTL/Clustered-Lighting light lists?5. Clustered Lighting Pass

For each 32x32 tile:- Take coarse list of the 64x64 Big Tile containing our 32x32 tile- Prune the coarse list using 2D AABB (light) – AABB (tile) intersection test- Further prune using 2D Bounding Sphere (light) - AABB (tile) intersection test- Let's call the result our clustered-tile light index list.

For each cluster in the 32x32 tile:- Take the clustered-tile light index list. Prune each light in the list by intersection test between the light’s bounding trapezoid and the cluster boundary. - This gives us the clustered list.

[lightlistbuild-clustered.compute]

Very similar to FPTL pass

105


— We prune a lot. However if we are on DX12, then it is okay! DX12 supports Async Compute. Compute work and draw call work can happen side-by-side.

106


— Light pruning compute shader is calculation heavy. Shadow mapping (basically a lot of depth-only draws) is rasterization heavy. Having them run in parallel is a great fit.

The light list generation is almost free in this case. [GPU Gem 7 Mikkelsen]

107


— However, on DX11, without Async Compute, this aggressive pruning may not be as necessary if you do not have a lot of lights.

We are aware of this and we are working on dealing with this kind of thing.

In the meantime… you can customize HDRP yourself if it’s really necessary.

108


This really makes project much more scalable at handling a lot of lights, especially spotlights.

Remember transparent decal is like a light. Cluster lighting works for transparent decal projectors as well!

Result = Less unnecessary light lookup (especially transparent decal)Faster deferred lighting pass!

Frame Six

109

Frame Six

110

— Enable “Compute Light Evaluation”

— Consider both light and material variants

111

Deferred Shader Bloating Problem

— Lit shader supports a lot of lighting models:- Standard Opaque- Standard Transparent- Anisotropy- Subsurface Scattering- Iridescence…

— LightLoop.hlsl deals with different kind of lights:- Directional- Spotlight- Point light- Area light- Environment light…

— This makes the shader code become “bloated” and include quite a lot of conditional logic

112

Deferred Shader Bloating Problem

— Shader code include quite a lot of conditional logic.

— With approach of Frame Six, we are doing a whole screen fragment shader pass.

— Although branching is not always super expensive, if two threads working on the same compute unit has divergent conditional branch, shader performance suffers! (Also unnecessary VGPR pressure is added)

— Whole screen shader pass has no guarantee similar branching behavior is grouped in same compute unit in GPU.

113

Solution: Lit Feature Categorization

— We’re doing FPTL anyways. On top of this system, we can categorize how each 16x16 tile is like.

- What light is influencing the tile?

- What type of material is in the tile?

— 16x16 is not too big. Reasonably likely only one type of material/light covering whole tile.

[Thanks Garawany 2016]

114


— Make a number of deferred compute shader variants each supporting only a subset of features. [Lit.hlsl]

— “Simplest shader possible to draw the 16x16 tile”[Garawany 2016]

115


— Make a table that maps what tile each compute shader variant can cover. [builddispatchindirect.compute]

116


— Need a worst cast variant that has all features, to deal with tiles that cannot be categorized this way. [Lit.hlsl]

— Result: One pixel shader -> Many compute shaders. Reduced unnecessary conditional logics. Faster!

117

Lit Feature Categorization

— Idea: Consider modifying the table kFeatureVariantFlags in Lit.hlsl, so it categorize features better for your project

Frame Seven

118

Frame Seven

119

— Enable “Subsurface Scattering”

— But we do not enable “Transmission” to simplify things

Frame Seven

120

— Ball on top has SSS material with Subsurface Mask value 0 (weak scattering)

— Ball underneath has SSS material with Subsurface Mask value 1(strong scattering)

Subsurface Scattering?

121

One pixel

Direct Diffuse


122

What about this?


123

“Subsurface Scattering”


124

What about this?


125

“Transmission”

HDRP supports, but not in scope of this talk

Subsurface Scattering (SSS)

126

Our diffusion behavior is encoded in a Diffusion Profile. It is just a scriptable object used for storing a SSS configuration.

For SSS, this asset specify how much light (of each color) from one place is transferred to another place during the SSS pass.

This is visualized by “Diffusion Profile Preview” ------>

For simplicity, let’s use Pre-and-Post-Scatter Texturing mode.


127

The “Diffusion Profile Preview” is radially symmetrical, and its value as a function of radius is governed by

Burley’s Normalized Diffusion Model


128

In G Buffer pass, if Lit shader is in Subsurface scattering mode, it tags stencil with the Split-Lighting flag, aka, “I’m not only lighted, but wants SSS!”.


129

During Deferred lighting, for pixels tagged with stencil flag Split-Lighting:

- output its specular lighting to color buffer, as usual (Right)- output its diffuse lighting to a separate SSS Diffuse Lighting Buffer (Left)


130

We not only output the diffuse lighting to a separate buffer, we also square-root its value (Looks more white, less colorful).

[SubsurfaceScattering.hlsl]


131

We “transfer” lighting around the surface in screen space according to the diffusion profile.

Then we multiply transferred color by square-root of diffuse lighting at exit pixel.

“Entry lighting: “absorbed only half of color””

“Transferred + Exit lighting: absorbed half + half = full color”

Transfer radius big

Transfer radius tiny


132

combine pass


The most important part of SSS is the light transfer stage:


Here is code that for each pixel samples neighbor diffuse lighting to transfer from”[SubsurfaceScattering.hlsl]

Code is complex with a lot of tricks such as clever LDS caching and importance sampling. Not in scope of this talk.

Closing Remarks

135

— This is only a small part of HDRP, we didn’t cover:

- Motion vectors / TAA- Mesh decal- Area lighting- Volumetric lighting- Thick/Thin Transmission for SSS shader- Other advanced lighting models (translucent, hair, fabric…)- RTX

Closing Remarks

136

— For more detail about HDRP, check out Sebastien Lagarde’s talk in SIGGRAPH 2018:

The Road Toward Unified Rendering with Unity’s High Definition Render Pipeline

— For more detail about SSS, check out Evgenii Golubev’s talk in SIGGRAPH 2018:

Efficient Screen-Space Subsurface Scattering Using Burley’s Normalized Diffusion in Real Time

Closing Remarks

137

— There is some small customization of HDRP for this presentation to simplify things. So you may not 100% reproduce what is shown here

— If there is any inaccuracies in the presentation, which is more than likely, please send suggestion for correction to us, for instance one contact point is [email protected]

Closing Remarks

138

mailto:[email protected]

Q&A

139

Documents

(Some) Algorithm - Unite Seouluniteseoul.com/2019/PDF/D1T2S4.pdf · 2020. 2. 6. · Field Engineer, Unity Technologies. Agenda 3. Agenda 4 — Look at how HDRP renders progressively