Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
STEPHAN HODES
DEVELOPER TECHNOLOGY ENGINEER, AMD
INTRODUCING FIDELITYFX VARIABLE SHADING
AMD Public | Introducing FidelityFX Variable Shading | November 20202
INTRODUCTION TO VARIABLE RATE SHADING
• Variable Rate Shading (VRS) is a feature of DirectX®12 Ultimate
• Goal of VRS is to save GPU work (where it does not significantly contribute to the final frame)
• Games today are usually being played at very high resolution• Pixels are very small on screen
• Adjacent pixels often have similar color(if they belong to the same primitive)
• Postprocessing effects like Antialiasing, Depth of Field, or MotionBlurfurther reduce the difference between adjacent pixels
AMD Public | Introducing FidelityFX Variable Shading | November 2020 33
Pixel
Quads
THE CONCEPT OF VRS
AMD Public | Introducing FidelityFX Variable Shading | November 2020 44
• Without VRS, 4 pixel shader (PS) threads are getting generated for every quad of which at least one pixel is covered by a primitive
THE CONCEPT OF VRS
Pixel Quads
Sample position
AMD Public | Introducing FidelityFX Variable Shading | November 2020 55
• Without VRS, 4 pixel shader (PS) threads are getting generated for every quad of which at least one pixel is covered by a primitive
• For every pixel where the sample-position is covered by the primitive, the result of the PS gets written to the render target
• Example:10 Quads/40 PS threads (27 active)
THE CONCEPT OF VRS
Pixel Quads
Sample position
AMD Public | Introducing FidelityFX Variable Shading | November 2020 66
• Without VRS, 4 pixel shader (PS) threads are getting generated for every quad of which at least one pixel is covered by a primitive
• For every pixel where the sample-position is covered by the primitive, the result of the PS gets written to the render target
• Example:10 Quads/40 PS threads (27 active)
• With VRS one or multiple pixels form a coarse pixel (2x2 in this example)
THE CONCEPT OF VRS
Pixel
2x2 Coarse Pixel
Quad
Pixel center Sample pos.
AMD Public | Introducing FidelityFX Variable Shading | November 2020 77
• Without VRS, 4 pixel shader (PS) threads are getting generated for every quad of which at least one pixel is covered by a primitive
• For every pixel where the sample-position is covered by the primitive, the result of the PS gets written to the render target
• Example:10 Quads/40 PS threads (27 active)
• With VRS one or multiple pixels form a coarse pixel (2x2 in this example)
• Example:4 Quads/16 PS threads (10 active)
• VRS only reduces shading quality within a triangle, the geometry edges are preserved
THE CONCEPT OF VRS
Pixel
2x2 Coarse Pixel
Quad
Pixel center Sample pos.
AMD Public | Introducing FidelityFX Variable Shading | November 2020 88
• Without VRS, 4 pixel shader (PS) threads are getting generated for every quad of which at least one pixel is covered by a primitive
• For every pixel where the sample-position is covered by the primitive, the result of the PS gets written to the render target
• Example:10 Quads/40 PS threads (27 active)
• With VRS one or multiple pixels form a coarse pixel (2x2 in this example)
• Example:4 Quads/16 PS threads (10 active)
• VRS only reduces shading quality within a triangle, the geometry edges are preserved
Make sure to use centroid interpolation!
THE CONCEPT OF VRS
Pixel
2x2 Coarse Pixel
Quad
Pixel center Sample pos.
AMD Public | Introducing FidelityFX Variable Shading | November 2020 99
VRS ON RDNA2
• VRS has multiple ways to control shading rate• Per drawcall (VRS Tier1)
• Per primitive (VRS Tier2, VS/GS output)
• Per screen tile (VRS Tier2, Image Based)• 8x8 pixels tile size
• Small tile size provides fine grained control
• Additional shading rates not supported• At standard resolutions 4x can hardly be used
without generating visual artifacts
• Additional shading rates make image generation more complex
• VRS Image gets copied into H-tile on bind• Small (but not neglectable) overhead
when binding the VRS image
• No overhead during rendering!
9
Pixel
Coarse Pixel
VRS Tile
2x2 shading 2x1 shading
1x2 shading 1x1 shading
AMD Public | Introducing FidelityFX Variable Shading | November 2020 1010
10
DEMO
AMD Public | Introducing FidelityFX Variable Shading | November 2020 1111
o Query Hardware details:
• Is VRS supported / which shading rates?
• Supporting 4x4 shading rate makes the image generation shader more complex
• 4x4 is likely to cause visible quality degradation at common resolutions
• Is Tier2 (Shader or Image Based VRS) supported?
• For image based: What is the tile size?
o Create VRS Image & generation shader
IMPLEMENTING VRS (INITIALIZATION)
11
void OnCreate(Device *pDevice, ResourceViewHeaps *pResourceViewHeaps, DynamicBufferRing *pConstantBufferRing, StaticBufferPool *pStaticBufferPool, DXGI_FORMAT overlayOutputFormat);
void OnDestroy();void OnCreateWindowSizeDependentResources(uint32_t w, uint32_t h);void OnDestroyWindowSizeDependentResources();
AMD Public | Introducing FidelityFX Variable Shading | November 2020 1212
IMPLEMENTING VRS(RENDERING)• Compute VRS Image
• Bind VRS Image
• Set base shading rate and combiners
• Unbind VRS Image when done
• [Render VRS Image as overlay for debugging]
12
Combiner
Combiner
Final Rate
VRS Image
VS/GS Output
BaseShadingRate
void ComputeVrsMap(ID3D12GraphicsCommandList*, CBV_SRV_UAV*);void SetShadingRate(D3D12_SHADING_RATE,
const D3D12_SHADING_RATE_COMBINER*, ID3D12GraphicsCommandList*);
void StartVrsRendering(ID3D12GraphicsCommandList*);void EndVrsRendering(ID3D12GraphicsCommandList*);void DrawOverlay(ID3D12GraphicsCommandList*);
AMD Public | Introducing FidelityFX Variable Shading | November 2020 1313
FIDELITYFX VARIABLE SHADING (CPP)
13
struct FFX_Variable_Shading_CB{
uint32_t width, height;uint32_t tileSize;float varianceCutoff;float motionFactor;
};
static void FFX_VariableShading_GetVrsImageResourceDesc(const uint32_t rtWidth, const uint32_t rtHeight, const uint32_t tileSize,CD3DX12_RESOURCE_DESC& VRSImageDesc);
static void FFX_VariableShading_GetDispatchInfo(const FFX_Variable_Shading_CB* cb,const bool useAditionalShadingRates,uint32_t& numThreadGroupsX, uint32_t& numThreadGroupsY)
AMD Public | Introducing FidelityFX Variable Shading | November 2020 1414
FIDELITYFX VARIABLE SHADING (HLSL)
14
// Define: FFX_VARIABLE_SHADING_TILESIZE// Optional: FFX_VARIABLE_SHADING_ADDITIONALSHADINGRATES
// Constant Buffer cbuffer FFX_Variable_Shading_CB0 {
int2 g_Resolution;uint g_TileSize;float g_VarianceCutoff;float g_MotionFactor;
}
// Forward declaration of functions that need to be implemented// by shader code using this techniquefloat FFX_VariableShading_ReadLuminance(int2 pos);float2 FFX_VariableShading_ReadMotionVec2D(int2 pos);void FFX_VariableShading_WriteVrsImage (int2 pos, uint value);
AMD Public | Introducing FidelityFX Variable Shading | November 2020 1515
FIDELITYFX VARIABLE SHADING (HLSL USAGE)
15
// define FFX_VARIABLE_SHADING_TILESIZE on compile!// may define FFX_VARIABLE_SHADING_ADDITIONALSHADINGRATESRWTexture2D<uint> imgDestination: register(u0);Texture2D texColor : register(t0);Texture2D texVelocity : register(t1);
#define FFX_HLSL 1#include "ffx_Variable_Shading.h"
float FFX_VariableShading_ReadLuminance(int2 pos) {float3 color = texColor[pos].xyz;return dot(color, float3(0.30, 0.59, 0.11));
}
float2 FFX_VariableShading_ReadMotionVec2D(int2 pos) {return texVelocity[pos].xy * float2(0.5f, -0.5f) * g_Resolution;
}
AMD Public | Introducing FidelityFX Variable Shading | November 2020 1616
HOW THE SHADER WORKS
1. One threadgroup computes between 1 and 4 tiles• Without additional shading rates, and 8x8 tile size,
each group of 8x8 threads computes the shading rate for 4 tiles.
2. Analyze pairs of pixels within 2x2 region• Using LDS
• Each thread also takes pixels outside the 2x2 box into account. This avoids burn-in (i.e. Low VRS rate because it was low in last frame)
• Reduce luminance delta by motion influence
3. Compute largest luminance delta within tile1. Using wave intrinsics
4. Compare to threshold
5. Write out VRSImage
16
Coarse Pixel (1 thread)
VRS Tile
Area analyzed
by 1 thread
AMD Public | Introducing FidelityFX Variable Shading | November 2020 1717
17
VRS OVERLAY
• Display VRSImage as overlay for Debugging tweaking
• Ready to use code in VrsOverlay.hlsl
• Easy to integrate:
1. Build VS/PS Pipeline and bind it
2. Provide constantbuffer containing resolution and tile size
(same as for VRS image generation)
3. Draw a single triangle
(No vertex or index buffers required)
AMD Public | Introducing FidelityFX Variable Shading | November 2020
18
AMD Public | Introducing FidelityFX Variable Shading | November 2020 1919
CAVEATSSince VRS works by reducing number of PS executions
• No benefit in depth/stencil only passes
• No benefit in fill rate bound scenarios
• No benefit in compute passes
• Very little benefit if average triangle size is very small (Think of quad utilization)
19
10 active PS
to shade 28 Pixels
Pixel
Shading region
Example: 2x2 shading rate
AMD Public | Introducing FidelityFX Variable Shading | November 2020 2020
CAVEATSFeatures that cause shading rate to drop to 1x1
• Depth export
• Post-depth coverage
• Raster Ordered Access Views
• 16xAA
Minimize the number of times per frame the VRS Image gets bound or unbound!
o If VRS needs to get disabled for a few draw calls while the same depth buffer is being used, (e.g. to render alpha-tested geometry) the best practice is to leave the VRS image bound and disable VRS by modifying the combiners.
20
Pixel
Shading region
Example: 2x2 shading rate
10 active PS
to shade 28 Pixels
AMD Public | Introducing FidelityFX Variable Shading | November 2020 2121
TAKEAWAY• Easy to integrate
• Free performance
• VRS preserves triangle edges
• Also depth/stencil information
• Experiments showed: 2x2 shading rate is ideal for commonly used resolutions
21
Pixel
Shading region
Example: 2x2 shading rate
10 active PS
to shade 28 Pixels
AMD Public | Introducing FidelityFX Variable Shading | November 2020
DISCLAIMER
22
Gameplay footage captured from DIRT® 5 footage courtesy of Codemasters®. System configuration used: AMD Ryzen 9
3900X Twelve-Core Processor, Corsair Hydro Series, H100x, 240mm Radiator, Dual 120mm PWM Fans, Liquid CPU Cooler,
Gigabyte X570 Gaming X AMD AM4 X570 Chipset ATX Motherboard, Corsair 64GB Vengeance LPX DDR4 3200 MHz
RAM/Memory Kit 4x16GB, Corsair Force Series Gen.4 PCIe MP600 2TB NVME M.2 SSD w/ Heatsink, Samsung 860 Evo
1TB Solid State Drive/SSD, WD Blue 3TB 3.5" Desktop Hard Drive (HDD), EVGA SuperNOVA 850 G3 Power Supply,
Windows 10 OEM.
The information contained herein is for informational purposes only, and is subject to change without notice. While every
precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and
typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro
Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this
document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness
for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein.
No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and
limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or
in AMD's Standard Terms and Conditions of Sale.
© 2020 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, Epyc, Radeon, RDNA, Ryzen, and
combinations thereof are trademarks of Advanced Micro Devices, Inc. DirectX and Xbox are registered trademarks of
Microsoft Corporation in the US and other jurisdictions. Other product names used in this publication are for identification
purposes only and may be trademarks of their respective companies.