140

(Some) Algorithm - Unite Seouluniteseoul.com/2019/PDF/D1T2S4.pdf · 2020. 2. 6. · Field Engineer, Unity Technologies. Agenda 3. Agenda 4 — Look at how HDRP renders progressively

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • (Some) Algorithm & Data-structure in HDRP

    How (some configuration of) HDRP renders a frame

    Victor S.C. LuiField Engineer, Unity Technologies

  • Agenda

    3

  • Agenda

    4

    — Look at how HDRP renders progressively more complex scenes

    — Start with HDRP in one of its simplest configurations

    — Add Alpha-tested material, GI, SSAO, SSR

    — Add Forward Opaque / Transparent material, and Postprocessing Effects such as Bloom

    — Add decal, light list building, feature categorization

  • Agenda

    5

    — We’ll analyze using Nsight Graphics from Nvidia - nice graphic debugging/profiling tool - chose it because computer has Nvidia GPU

    — GPU frame capture will become more and more complex, like this

    — We’ll try to understand it!

  • Agenda

    6

    — We’ll also look at (basic) Subsurface Scattering!(…if we have enough time)

    — And area light— And volumetric lighting— And refraction— And hair/eye/fabric/coating…— And RTX

    — We are only covering a small part of HDRP,but this is the foundation upon which more advanced topics are built

  • Specs

    7

    — Project version: Unity 2019.1

    — HDRP version: 5.7.2

    — Linear color space

    — Windows Standalone Player, DirectX-11

  • Frame One

    8

  • Frame One

    9

    — Almost everything is turned off in our HDRenderPipelineAsset

    — Lit Shader Mode: Deferred Only

    — Just renders Opaque Objects with Shadow Map under one directional light

    — We also need Exposure Control, otherwise scene easily look too dark/bright

  • Frame One

    10

    — Ball on top has high metallic, low smoothness value

    — Ball underneath has high smoothness, low metallic value

  • Frame One

    11

    — This will be a standard deferred rendering process:

    G-Buffer -> Shadow map -> Deferred lighting

  • Lit shader BRDF

    12

    — For direct lighting on a material using Lit shader:

  • Lit shader BRDF

    13

    — HDRP Lit shader uses modified Burley’s diffuse:[BSDF.hlsl – DisneyDiffuseNoPI]

    — Equation not important per se, what is important is we need these surface parameters for calculating direct diffuse lighting:

    - Diffuse color- Roughness- Normal

  • Lit shader BRDF

    14

    — For specular, HDRP Lit shader uses SmithJointGGX.[Lit.hlsl – BSDF]

    …Let’s skip the equation! What is important is that this requires we have surface parameter:

    - F0

  • Lit shader BRDF

    15

    Albedo

    Metallic

    Smoothness

    User specifies in Material:

    Normal

  • Lit shader BRDF

    16

    Diffuse Color

    F0

    Roughness

    But BRDF needs:

    Normal

  • Lit shader BRDF

    17

    Albedo

    Metallic

    Smoothness

    Diffuse Color

    F0

    Roughness

    User specifies in Material: BRDF needs:

    HDRP transforms for us

    Normal Normal

  • Lit shader BRDF

    18

    Albedo

    Metal

    Smoothness

    Diffuse Color

    F0

    Roughness

    User specifies in Material: BRDF needs:

    Same valueNormal Normal

    “Some inverse relation”

    ???

  • Lit shader BRDF

    19

    FromAlbedo and metal

    ToDiffuse color and F0

  • Lit shader BRDF

    20

    — For highly metallic surface, F0 is close to your albedo map. Your albedo map mainly impact specular lighting.

    — For not very metallic, Diffuse color is close to your albedo map.Your albedo map mainly impact diffuse lighting.

    FromAlbedo and metal

    ToDiffuse color and F0

  • G-Buffer (for standard Lit shader)

    21

    We save in G-Buffer the BRDF inputs - Avoid doing conversion in lighting pass to reduce register pressure - G-Buffer pass is simpler so can afford the extra register pressure

    * “SpecularOcclusion” “coatMask” – not used, not in scope of this talk* “featureID” “bakedDiffuseLighting” – will look at them later

  • G-Buffer (for standard Lit shader)

    22

    Normal Encoding:

    Map normal vector in unit sphereto unit square like this ->

    Store the (x , y) coordinatesas two 12bit float in G Buffer

    Gets higher precision with same memory footprint

  • Stencil usage: Lighting Mask

    23

    Stencil buffer pixels will be marked with “should receive lighting” flag as we do draw calls in G-Buffer pass.

    During the deferred lighting pass, we only do lighting calculation on those pixels.

    This excludes doing deferred lighting on sky pixels.

  • Deferred Lighting

    24

    — Lights are stored in HLSL StructuredBuffer [ShaderVariablesLightLoop.hlsl]

    — They are first sorted by type (spotlight, point light…) then sent to GPU[LightLoop.cs]

    Makes GPU threads more likely execute the same branch *Two GPU threads in same wave-front going on different branch is bad for performance

  • Deferred Lighting

    25

    — During deferred Lighting, we do a whole screen draw call, and each pixel goes through the LightLoop.[LightLoop.hlsl]

    — In a loop, look up all the light uploaded and receive their illumination, using BRDF defined by shader.

  • Frame Two

    26

  • Frame Two

    27

    — Enabled Screen Space Ambient Occlusion (SSAO)

    — Enabled Screen Space Reflection (SSR)

    — Enabled Shadow-mask

  • Frame Two

    28

    — Turn on Baked Global Illumination (GI)with Shadow-mask

    — One directional mixed light

  • Frame Two

    29

    — Added grass (Alpha-tested material)

    — Added very smooth mirror-like material- This is the only material in scene with “Receive SSR” option turned on

    — Added reflection probes

    — Added light probes

  • Alpha-Tested Materials

    30

    — Alpha test during G-Buffer pass is expensive- Cannot do early-Z -> texture fetch without draw

    — Solution: Instead we Alpha test in Depth Prepass- Depth Prepass has close to zero pixel shader cost so it is okay if we cannot early-Z

    — Then we draw in G-Buffer pass with ZTest Equal, without enabling alpha test.[BaseUnlitUI.cs]

  • Alpha-Tested Materials

    31

    — Depth prepass with alpha-tested material

    — G-Buffer pass with normal opaque material (writes to depth buffer as well of course)

    — Alpha tested material render with ZTest Equal and no alpha testing

  • Shadowmask

    32

    — Shadowmask map

    — Records direct shadowing of mixed light from static meshes to static meshes

    — Stored in extra G-Buffer, let’s call it G-Buffer 4.

  • Global illuminations – Indirect Diffuse

    33

    — Baked directional light map + Light probes

    — At G-Buffer pass, using these we calculate indirect diffuse lighting from the pixel to camera

    — Stored in G-Buffer 3

  • Global illuminations – Indirect Specular aka reflection

    34

    — Environment map: captures light coming from each direction of sky

    Basically is a cube map sampled by deferred lighting pass

    — Convolution, projection: out of scope of this talk

  • Global illuminations – Indirect Specular aka reflection

    35

    — Baked reflection probes: records sharp lighting received by a point in scene from each direction

    Basically is a cube map sampled by deferred lighting pass

    — Blending, convolution, projection: out of scope of this talk

    — There is also planar reflection probe:out of scope of this talk

  • Global illuminations – Indirect Specular aka reflection

    36

    — We also create a screen space reflection texture to be sampled by deferred lighting pass

    — Shoot ray from camera to each pixel in depth buffer

    — Reflect and march the ray to see if it hits some other pixel in depth buffer

    — If so, record the hit pixel’s color

  • Global illuminations – Indirect Specular aka reflection

    37

    Can’t find Can’t find

    — In HDRP, we have a reflection hierarchy like this:

  • Global illuminations – Ambient Occlusion

    38

    — Aim: find how much each pixel is surrounded by solid instead of air

    — The more it is surrounded by solid, the less indirect light rays it is able to get

    — This affects indirect diffuse and indirect specular by darkening their illumination

  • Global illuminations – Ambient Occlusion

    39

    — In our talk, occlusion only comes from SSAO pass, a real-time screen-space algorithm.

    — Artist can actually specify detailed occlusion data using Mask map G channel and Bent Normal map too.

    — We basically “use the darker of the two sources”

  • Global illuminations – Ambient Occlusion

    40

    — In theory, direct lighting should not care about ambient occlusion. Direct lighting should use shadow mapping instead.

    — In practice, artists can ask direct diffuse lighting also be affected by ambient occlusion by setting the parameter Direct Lighting Strength to a non-zero value in a Volume component [MaterialEvaluation.hlsl]

  • Global illuminations – Ambient Occlusion

    41

    — Also, we use micro-shadowing. [The Technical Art of Uncharted 4 - Brinck and Maximov 2016]

    — More influence from AO to direct lighting!

    — Another “hack that looks good”[SurfaceShading.hlsl]

  • Global illuminations

    42

    — All these extra information now used in our deferred lighting pass

  • Global illuminations

    43

    No wonder frame two looks so much better!

  • Screen Space Reflection

    44

    — Ray-marching on depth buffer is expensive!

    — SSR cost grows linearly with number of pixel using SSR

    — User can specify whether to “Receive SSR” for each material

    — If “Receive SSR” is not on, we mark on stencil buffer that SSR should not be performed.

  • Screen Space Reflection

    45

    12 3

    1. In previous frame, take final color rendered, and down-sample it using Gaussian Blur:

  • Screen Space Reflection

    46

    12 3

    2. For each pixel not marked with Don’t Receive SSR flag in stencil buffer, shoot ray from camera and march reflected ray in depth buffer, when we hit something, record its screen space position (x , y):

  • Screen Space Reflection

    47

    12 3

    3. Depending on the smoothness of the surface, choose mipmap level (ex: rough surface use blurred image)

    Sample our previous frame Color Gaussian MIP chain at appropriate position and mipmap level.

  • Screen Space Reflection - Optimizations

    48

    Hi-Z ray-marching

    We generate a MIP chain of depth buffer where low resolution image is closer to camera at every pixel

    Then we perform this Hi-Z ray-marching algorithm.

  • Screen Space Reflection - Optimizations

    49

    Hi-Z ray-marching algorithm [Frostbite 2015 SIGGRAPH]

    mip = 0; while (level > -1) step through current cell; if (above Z plane) ++level; if (below Z plane) --level;

    [Thanks, Frostbite 2015 SIGGRAPH]

  • Screen Space Reflection - Optimizations

    50

    Hi-Z ray-marching algorithm [Frostbite 2015 SIGGRAPH]

    mip = 0; while (level > -1) step through current cell; if (above Z plane) ++level; if (below Z plane) --level;

    [Thanks, Frostbite 2015 SIGGRAPH]

  • Screen Space Reflection - Optimizations

    51

    Hi-Z ray-marching algorithm [Frostbite 2015 SIGGRAPH]

    mip = 0; while (level > -1) step through current cell; if (above Z plane) ++level; if (below Z plane) --level;

    [Thanks, Frostbite 2015 SIGGRAPH]

  • Screen Space Reflection - Optimizations

    52

    Hi-Z ray-marching algorithm [Frostbite 2015 SIGGRAPH]

    mip = 0; while (level > -1) step through current cell; if (above Z plane) ++level; if (below Z plane) --level;

    [Thanks, Frostbite 2015 SIGGRAPH]

  • Screen Space Reflection - Optimizations

    53

    Hi-Z ray-marching algorithm [Frostbite 2015 SIGGRAPH]

    mip = 0; while (level > -1) step through current cell; if (above Z plane) ++level; if (below Z plane) --level;

    [Thanks, Frostbite 2015 SIGGRAPH]

  • Screen Space Reflection - Optimizations

    54

    Hi-Z ray-marching algorithm [Frostbite 2015 SIGGRAPH]

    mip = 0; while (level > -1) step through current cell; if (above Z plane) ++level; if (below Z plane) --level;

    [Thanks, Frostbite 2015 SIGGRAPH]

  • Screen Space Ambient Occlusion

    55

    Down-sampling:Most simple method is fine, no need to blur

  • Screen Space Ambient Occlusion

    56

    Calculate AO

    Calculate AO

    Calculate AO

    Increasing range of occlude detection, decreasing accuracy

  • Screen Space Ambient Occlusion

    57

    Calculate AO:How?

    “Like this”[Volumetric Obscurance, Bradford 2010]

  • Screen Space Ambient Occlusion

    58

    Combine these maps using bilinear upscaling

  • Screen Space Ambient Occlusion

    59

    Combine these maps using bilinear upscaling[AmbientOcclusionUpsample.compute]

    “When upscaling, receive less influence from lower resolution AO map if difference in depth value is big”

  • Frame Three

    60

  • Frame Three

    61

    — Added support for rendering transparent material

    — Added Bloom and Vignette post-processing effect

    — Camera has FXAA turned on

  • Frame Three

    62

    — Transparent material ball

    — Fabric shader-graph material ball - Material that cannot be rendered on deferred path

  • Deferred/Forward Hybrid Architecture

    — Transparent material is rendered using forward rendering path – blend on top of opaque images

    — Complex opaque material (such as Fabric, Skin…) has a lot of surface properties.If we encode all in G Buffer, G Buffer will be too large!Solution: Use forward rendering path

  • Deferred/Forward Hybrid Architecture

    Forward opaque material (Fabric)

    Forward transparent material

  • Deferred/Forward Hybrid Architecture

    — Before Deferred Lighting, we may need output from screen-space pre-lighting algorithms such as SSAO, SSR…

    They may need depth and normal buffer of the frame!

    — So we need to guarantee this:

    After G Buffer pass, we have all normal/depth information about opaque object in the scene.

    — Solution:1. Forward opaque material need to do a depth prepass too. 2. During the depth prepass, forward opaque material also output normal to G Buffer.

  • Deferred/Forward Hybrid Architecture

    Output to depth AND normal bufferForward Opaque material Depth prepassAFTERAlpha Tested Depth material prepass

    Because Forward Opaque may early-Z

  • Post processing: FXAA

    “Original resolution color buffer blur, but keep edges sharp”

    Happens at Final Pass:[FinalPass.shader]

  • Post processing: Vignettes

    “Makes edge of screen darker – help player focus on center of screen”

    Happens at Post-processing Uber shader:

  • Post processing: Bloom

    1. Down-sample and blur 2. Up-sample and Merge

  • Post processing: Bloom

    Q: Why not use Color pyramid generated for SSR?

    A: The color pyramid generated for SSR and Bloom is down-sampled differently.Currently our post-processing team has no plan to unify the two.

  • Time to talk about LDS

    Generating Gaussian Color Pyramid, SSAO, Bloom all have one thing in common:

    REPEATED NEIGHBOURING PIXEL TEXTURE FETCH

    All of these use compute shader and cache neighbor pixels in LDS (local data storage)

    Significant performance boost : )

  • Frame Four

    72

  • Frame Four

    73

    — Enabled decal, will use in most simple way: modify color/normal/smoothness

    — Turn “Metal and Ambient Occlusion Properties” off

  • Frame Four

    74

    — 3 sets of decal projector on opaque meshes: - Only Base Map (affects albedo) - Plus Normal Map (+ affects normal) - Plus Mask Map (+ affects smoothness)

  • Frame Four

    75

    — Decal Projector with “Affects Transparent Material” turned on

  • D Buffer

    76

    — Do a full depth prepass for all opaque material

    — Issue (instanced) draw call for each decal projector rendering to a separate D Buffer.

    Each fragment generated by decal draw call can find the screen space position of opaque pixel it needs to modify by sampling the depth buffer.

    Thus, each fragment decide appropriate pending decal modification and output result to D Buffer.

    — During G Buffer pass and forward opaque rendering draws later, we can then easily use D Buffer modifier as input in fragment shader.

  • D Buffer

    77

    — In our current configuration, D Buffer has 3 render targets as components:

    — Color modification

    — Normal modification

    — Smoothness modification

  • D Buffer

    78

    — Problem: What if one decal modification is on top of another?

  • D Buffer

    79

    — Requirement: Need to keep this rule:

    Blending each decal modifications in D Buffer, and then blending this with G Buffer

    should give same result as

    Blending each decal modification with G Buffer directly, one after another

    — Solution: To achieve this, we use alpha compositing.[GPU Gems 3 Chapter 23]

    — Problem: What if one decal modification is on top of another?

  • D Buffer

    80

    — Alpha Compositing[GPU Gems 3 Chapter 23]

    Apply (x1,a1) (x2,a2) (x3,a3) on x using Blend SrcAlpha OneMinusSrcAlpha :

    x x(1-a1) + x1a1x(1-a1)(1-a2) + x1a1(1-a2) + x2a2x(1-a1)(1-a2)(1-a3) + x1a1(1-a2)(1-a3) + x2a2(1-a3) + x3a3

    Apply (x1,a1) (x2,a2) (x3,a3) on (0,1) using:

    Blend SrcAlpha OneMinusSrcAlpha for RGB channel: x1a1x1a1(1-a2) + x2a2x1a1(1-a2)(1-a3) + x2a2(1-a3) + x3a3 := X

    Blend Zero OneMinusSrcAlpha for alpha channel:

    (1-a1) (1-a1)(1-a2) (1-a1)(1-a2)(1-a3) := A

    Apply (X,A) on x using Blend SrcAlpha One: x(1-a1)(1-a2)(1-a3) + x1a1(1-a2)(1-a3) + x2a2(1-a3) + x3a3

  • D Buffer

    81

    — Problem: 1. Forward material outputs normal in depth prepass2. D Buffer is built after depth prepass3. Just after G Buffer pass, need to guarantee normal/depth buffer complete

    … Wouldn’t the normal buffer just after G Buffer pass not reflect decal modification?

    — Solution: We use Decal Normal Patch Extra pass between D Buffer pass and G Buffer pass to apply Decal modification to normal buffer

  • D Buffer

    82

    — Decal Normal Patch

    Wouldn’t this be expensive?We minimize the cost best we can.

    - Decal writes “Here is decal!” on stencil buffer, - Forward material writes “Here is normal from forward material!” on stencil buffer- Only perform Decal Normal Patch on pixels where both flags are set

    *This kind of complexity comes from our insistence forward/deferred path has feature parity

  • D Buffer : Optimizations

    83

    — H-Tile: A low resolution render target. One pixel represents a 8x8 tile on D Buffer

    — Record flags such as: “This tile has color modification”“This tile has normal modification”““This tile has smoothness modification”

    — When G Buffer shader / Forward material shader look up D Buffer, instead of always doing 3 texture fetch, first do a H Tile query to find out what textures really need be fetched.

  • D Buffer: Why?

    84

    — Most well-known way of implementing Decal (if you have a G Buffer pass) is Screen-space Deferred Decal: [SIGGRAPH 2012 - Screen Space Decals in Warhammer 40,000]

    1. Do the GBuffer pass and copy the depth buffer to a texture.

    2. Each decal issue a draw call of its Oriented Bounding Box to G Buffer. 3. Each resulting fragment can appropriately decide what modification to G Buffer it should apply by sampling the depth buffer.

    4. Apply the modification if needed to G Buffer by blending.

  • 85

    — Most well-known way of implementing Decal (if you have a G Buffer pass) is Screen-space Deferred Decal [SIGGRAPH 2012 - Screen Space Decals in Warhammer 40,000]

    BUT:

    1. Does not support applying on forward opaque material

    2. Constraints G Buffer layout: - Must be blend-able (no complex encoding) - Constraints storing data in G Buffer alpha channel

    D Buffer: Why?

  • 86

    — HDRP is all about visual quality

    — We choose D Buffer, the higher quality (but more memory hungry) approach.

    D Buffer: Why?

  • 87

    What about Decal for Transparent material?

    — For transparent material, we use a separate system.

    — “Treat each decal as a light”

    1. Upload to GPU: - color/normal/smoothness map to a Decal Atlas - decal projector’s transform data and Decal Atlas key to a Decal Data Array

    2. When rendering transparent material, in its fragment shader, loop through the Decal Data Array and for each entry modify the fragment appropriately.

    — Transparent decals as expensive as a light (or more because of texture fetch)!

    Decal Atlas for our frame

  • Frame Five

    88

  • Frame Five

    89

    — Enabled Async-Compute (not supported in DX11)

    — Enabled Big Tile Prepass + Deferred Tile

  • 90

    Problem: Looping through all the lights?

    e

    e

    e

  • 91

    Tiled Lighting

    — A technique to help opaque fragments being shaded avoid doing lighting calculation for light that doesn’t really affects them.

    — How?

  • 92

    Tiled Lighting— Make sure before lighting you have depth buffer all filled up.

    — Upload your lights in the scene to GPU as a list. For each light, define a bounding volume and also upload those.

    — Divide the camera frustum into many small tiles. For each tile, find the max and min depth value. This gives us a bounding volume for opaque fragments being shaded the tile.

    — For each tile, we build a tile light index list, the list of indices of lights potentially affecting shaded fragments within the tile by performing intersection test between the bounding volume for the tile and the light.

    — When calculating lighting for a fragment, instead of going through the whole light list, go through the tile light index list of the tile that contains the fragment instead.

  • 93

    Fine Pruned Tiled Lighting

    — A version of Tiled Lighting where we aggressively remove false positives. Really minimize size of tile light index list.

    — Most important difference is that we do an extra pass to prune the original tile light index list:

    For each pixel (x , y) in the tile, find depth value, z, of pixel. Find list of light whose bounding volume includes (x , y , z). Merge these lists coming from each pixel in the tile. The result is a pruned tile light index list.

    — In reality, we do the same thing but with more efficient algorithm using compute shader.All 9 lights pass normal Tiled Lighting test,

    Only 1 passes FPTL

  • 94

    Clustered Lighting

    — Tiled lighting only helps opaque fragment ignore irrelevant lights.

    — This is because these algorithms prune unrelated lights based on the assumption that we only render fragments at (x,y) with depth value of sample_depth_buffer(x,y).

    — What about fragments coming from transparent draws?We use Clustered Lighting.

    cluster tile

  • 95

    Clustered Lighting

    — For each light, define a bounding volume.

    — Divide camera frustum along all 3 dimensions into a lot of clusters.

    — For each cluster, perform intersection test between the light bounding volume and the cluster to build a cluster light index list.

    cluster tile

  • 96

    Light List Building Algorithm in Actionaka “We prune very hard”

    How exactly does HDRP build the FPTL/Clustered-Lighting light lists?

  • 97

    Light List Building Algorithm in Action

    — How exactly does HDRP build the FPTL/Clustered-Lighting light lists?

    1. Preparation

    As we upload the list of light data to GPU, also upload a Bounding Sphere and a Bounding Trapezoid. Same format for all light type. [LightLoop.cs]

    b

    h cd

    a

  • 98

    Light List Building Algorithm in Action

    — How exactly does HDRP build the FPTL/Clustered-Lighting light lists?

    1. Preparation

    As we upload the list of light data to GPU, also upload a Tight Bounding Volume. Different format for different light types. [LightLoop.cs]

  • 99

    Light List Building Algorithm in Action

    — How exactly does HDRP build the FPTL/Clustered-Lighting light lists?

    2. AABB Generation Pass

    For each light, clip each face of its Bounding Trapezoid. (Clipping is complex, but necessary)

    After clipping, let the (still convex) volume have vertices v1,…,vn in screen space. We use these vertices v1,…,vn to build a screen space AABB. [scrbound.compute]

  • 100

    Light List Building Algorithm in Action

    — How exactly does HDRP build the FPTL/Clustered-Lighting light lists?

    2. AABB Generation Pass

    For each light, clip each face of its Bounding Trapezoid. (Clipping is complex, but necessary)

    After clipping, let the (still convex) volume have vertices v1,…,vn in screen space. We use these vertices v1,…,vn to build a screen space AABB. [scrbound.compute]

    h cd

    h

    dc

  • 101

    Light List Building Algorithm in Action

    — How exactly does HDRP build the FPTL/Clustered-Lighting light lists?3. Big Tile List Generation Pass

    Divide the screen into many 64x64 “Big Tile”. For each “Big Tile”:- Find all lights with 2D AABB (generated last pass) overlapping the big tile (2D)- Prune away all lights with Bounding Sphere not overlapping the big tile (2D)- We get the coarse list to be used by later passes[lightlistbuild-bigtile.compute]

  • 102

    Light List Building Algorithm in Action

    — How exactly does HDRP build the FPTL/Clustered-Lighting light lists?4. FPTL Pass

    For each 16x16 tile:- Sample depth buffer to get a 3D bound of our tile- Take coarse list of the 64x64 Big Tile containing our 16x16 tile- Prune the coarse list using 3D AABB (light) – AABB (tile) intersection test- Further prune using 2D Bounding Sphere (light) - AABB (tile) intersection test- Perform intersection test between 3D screen space position of every pixel in tile and the Tight Bounding Volume of still-unpruned light. Remove light not intersecting any pixel in tile.

    [lightlistbuild.compute]

  • 103

    Light List Building Algorithm in Action

    — How exactly does HDRP build the FPTL/Clustered-Lighting light lists?

    Expensive

    Last step is expensive, involves a lot of texture fetch, so we prune a lot with cheap methods beforehand (4 rounds) to minimize expensive work.

    4. FPTL Pass

    For each 16x16 tile:- Sample depth buffer to get a 3D bound of our tile- Take coarse list of the 64x64 Big Tile containing our 16x16 tile- Prune the coarse list using 3D AABB (light) – AABB (tile) intersection test- Further prune using 2D Bounding Sphere (light) - AABB (tile) intersection test- Perform intersection test between 3D screen space position of every pixel in tile and the Tight Bounding Volume of still-unpruned light. Remove light not intersecting any pixel in tile.

    [lightlistbuild.compute]

  • 104

    Light List Building Algorithm in Action

    — How exactly does HDRP build the FPTL/Clustered-Lighting light lists?5. Clustered Lighting Pass

    For each 32x32 tile:- Take coarse list of the 64x64 Big Tile containing our 32x32 tile- Prune the coarse list using 2D AABB (light) – AABB (tile) intersection test- Further prune using 2D Bounding Sphere (light) - AABB (tile) intersection test- Let's call the result our clustered-tile light index list.

    For each cluster in the 32x32 tile:- Take the clustered-tile light index list. Prune each light in the list by intersection test between the light’s bounding trapezoid and the cluster boundary. - This gives us the clustered list.

    [lightlistbuild-clustered.compute]

    Very similar to FPTL pass

  • 105

    Light List Building Algorithm in Action

    — We prune a lot. However if we are on DX12, then it is okay! DX12 supports Async Compute. Compute work and draw call work can happen side-by-side.

  • 106

    Light List Building Algorithm in Action

    — Light pruning compute shader is calculation heavy. Shadow mapping (basically a lot of depth-only draws) is rasterization heavy. Having them run in parallel is a great fit.

    The light list generation is almost free in this case. [GPU Gem 7 Mikkelsen]

  • 107

    Light List Building Algorithm in Action

    — However, on DX11, without Async Compute, this aggressive pruning may not be as necessary if you do not have a lot of lights.

    We are aware of this and we are working on dealing with this kind of thing.

    In the meantime… you can customize HDRP yourself if it’s really necessary.

  • 108

    Light List Building Algorithm in Action

    This really makes project much more scalable at handling a lot of lights, especially spotlights.

    Remember transparent decal is like a light. Cluster lighting works for transparent decal projectors as well!

    Result = Less unnecessary light lookup (especially transparent decal)Faster deferred lighting pass!

  • Frame Six

    109

  • Frame Six

    110

    — Enable “Compute Light Evaluation”

    — Consider both light and material variants

  • 111

    Deferred Shader Bloating Problem

    — Lit shader supports a lot of lighting models:- Standard Opaque- Standard Transparent- Anisotropy- Subsurface Scattering- Iridescence…

    — LightLoop.hlsl deals with different kind of lights:- Directional- Spotlight- Point light- Area light- Environment light…

    — This makes the shader code become “bloated” and include quite a lot of conditional logic

  • 112

    Deferred Shader Bloating Problem

    — Shader code include quite a lot of conditional logic.

    — With approach of Frame Six, we are doing a whole screen fragment shader pass.

    — Although branching is not always super expensive, if two threads working on the same compute unit has divergent conditional branch, shader performance suffers! (Also unnecessary VGPR pressure is added)

    — Whole screen shader pass has no guarantee similar branching behavior is grouped in same compute unit in GPU.

  • 113

    Solution: Lit Feature Categorization

    — We’re doing FPTL anyways. On top of this system, we can categorize how each 16x16 tile is like.

    - What light is influencing the tile?

    - What type of material is in the tile?

    — 16x16 is not too big. Reasonably likely only one type of material/light covering whole tile.

    [Thanks Garawany 2016]

  • 114

    Solution: Lit Feature Categorization

    — Make a number of deferred compute shader variants each supporting only a subset of features. [Lit.hlsl]

    — “Simplest shader possible to draw the 16x16 tile”[Garawany 2016]

  • 115

    Solution: Lit Feature Categorization

    — Make a table that maps what tile each compute shader variant can cover. [builddispatchindirect.compute]

  • 116

    Solution: Lit Feature Categorization

    — Need a worst cast variant that has all features, to deal with tiles that cannot be categorized this way. [Lit.hlsl]

    — Result: One pixel shader -> Many compute shaders. Reduced unnecessary conditional logics. Faster!

  • 117

    Lit Feature Categorization

    — Idea: Consider modifying the table kFeatureVariantFlags in Lit.hlsl, so it categorize features better for your project

  • Frame Seven

    118

  • Frame Seven

    119

    — Enable “Subsurface Scattering”

    — But we do not enable “Transmission” to simplify things

  • Frame Seven

    120

    — Ball on top has SSS material with Subsurface Mask value 0 (weak scattering)

    — Ball underneath has SSS material with Subsurface Mask value 1(strong scattering)

  • Subsurface Scattering?

    121

    One pixel

    Direct Diffuse

  • Subsurface Scattering?

    122

    What about this?

  • Subsurface Scattering?

    123

    “Subsurface Scattering”

  • Subsurface Scattering?

    124

    What about this?

  • Subsurface Scattering?

    125

    “Transmission”

    HDRP supports, but not in scope of this talk

  • Subsurface Scattering (SSS)

    126

    Our diffusion behavior is encoded in a Diffusion Profile. It is just a scriptable object used for storing a SSS configuration.

    For SSS, this asset specify how much light (of each color) from one place is transferred to another place during the SSS pass.

    This is visualized by “Diffusion Profile Preview” ------>

    For simplicity, let’s use Pre-and-Post-Scatter Texturing mode.

  • Subsurface Scattering (SSS)

    127

    The “Diffusion Profile Preview” is radially symmetrical, and its value as a function of radius is governed by

    Burley’s Normalized Diffusion Model

  • Subsurface Scattering (SSS)

    128

    In G Buffer pass, if Lit shader is in Subsurface scattering mode, it tags stencil with the Split-Lighting flag, aka, “I’m not only lighted, but wants SSS!”.

  • Subsurface Scattering (SSS)

    129

    During Deferred lighting, for pixels tagged with stencil flag Split-Lighting:

    - output its specular lighting to color buffer, as usual (Right)- output its diffuse lighting to a separate SSS Diffuse Lighting Buffer (Left)

  • Subsurface Scattering (SSS)

    130

    We not only output the diffuse lighting to a separate buffer, we also square-root its value (Looks more white, less colorful).

    [SubsurfaceScattering.hlsl]

  • Subsurface Scattering (SSS)

    131

    We “transfer” lighting around the surface in screen space according to the diffusion profile.

    Then we multiply transferred color by square-root of diffuse lighting at exit pixel.

    “Entry lighting: “absorbed only half of color””

    “Transferred + Exit lighting: absorbed half + half = full color”

    Transfer radius big

    Transfer radius tiny

  • Subsurface Scattering (SSS)

    132

    combine pass

  • Subsurface Scattering (SSS)

    The most important part of SSS is the light transfer stage:

  • Subsurface Scattering (SSS)

    Here is code that for each pixel samples neighbor diffuse lighting to transfer from”[SubsurfaceScattering.hlsl]

    Code is complex with a lot of tricks such as clever LDS caching and importance sampling. Not in scope of this talk.

  • Closing Remarks

    135

  • — This is only a small part of HDRP, we didn’t cover:

    - Motion vectors / TAA- Mesh decal- Area lighting- Volumetric lighting- Thick/Thin Transmission for SSS shader- Other advanced lighting models (translucent, hair, fabric…)- RTX

    Closing Remarks

    136

  • — For more detail about HDRP, check out Sebastien Lagarde’s talk in SIGGRAPH 2018:

    The Road Toward Unified Rendering with Unity’s High Definition Render Pipeline

    — For more detail about SSS, check out Evgenii Golubev’s talk in SIGGRAPH 2018:

    Efficient Screen-Space Subsurface Scattering Using Burley’s Normalized Diffusion in Real Time

    Closing Remarks

    137

  • — There is some small customization of HDRP for this presentation to simplify things. So you may not 100% reproduce what is shown here

    — If there is any inaccuracies in the presentation, which is more than likely, please send suggestion for correction to us, for instance one contact point is [email protected]

    Closing Remarks

    138

    mailto:[email protected]

  • Q&A

    139