35
© ARM 2017 Vulkan Multipass mobile deferred done right Hans-Kristian Arntzen GDC 2017 Senior Engineer 02 / 28 / 2017

Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

Title 44pt sentence case

Affiliations 24pt sentence case

20pt sentence case

© ARM 2017

Vulkan Multipass mobile deferred done right

Hans-Kristian Arntzen

GDC 2017

Senior Engineer

02 / 28 / 2017

Page 2: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 2

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Content

What is multipass?

What multipass allows ...

A driver to do versus MRT

Developers to do

Transient images and lazy memory

Case studies

Baseline app

Sponza

«Lofoten» demo

Page 3: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 3

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

What is Multipass?

Renderpasses can have multiple subpasses

Subpasses can have dependencies between each other

Render pass graphs

Subpasses refer to subset of attachments

Page 4: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 4

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Improved MRT deferred shading in Vulkan Classic deferred has two render passes

G-Buffer pass, render to ~4 textures

Lighting pass, read from G-Buffer, accumulate light

Lighting pass only reads G-Buffer at gl_FragCoord

Rethinking this in terms of Vulkan multipass

Two subpasses

Dependencies

COLOR | DEPTH -> INPUT_ATTACHMENT | COLOR | DEPTH_READ

VK_DEPENDENCY_BY_REGION_BIT

Page 5: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 5

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Tile-based GPUs 101

Tile-based GPUs batch up and bin all primitives in a render pass to tiles

In fragment processing later, render one tile at a time

Hardware knows all primitives which cover a tile

Advantage, framebuffer is now in fast and small SRAM!

Having framebuffer in on-chip SRAM has practical benefits

Read/write to it is cheap, no external bandwidth cost

Main memory is written to when tile is complete

DDR

GPU Vertex

Shader

Vertex

Attributes

Tiler

Geometry Working Set

(Tile List and Varyings) Framebuffer Textures

Local

Tile Memory

Fragment

Shader

Page 6: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 6

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Tile-based GPU subpass fusing Subpass information is known ahead of time

VkRenderPass

Driver can find two or more sub-passes which have …

BY_REGION dependencies

no external side effects which might prevent fusing

Fuse G-Buffer and Lighting passes

Combine draw calls from G-Buffer and Lighting into one “render pass”

G-Buffer content can remain in on-chip SRAM

Reading G-Buffer data in lighting pass just needs to read tile buffer

vkCmdNextSubpass essentially becomes a noop

Page 7: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 7

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Vulkan GLSL subpassLoad()

Reading from input attachments in Vulkan is special

Special image type in SPIR-V

On vkCreateGraphicsPipelines we know

renderPass

subpassIndex

subpassLoad() either becomes

texelFetch()-like if subpasses were not fused

This is why we need VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT

magicReadFromTilebuffer() if subpasses were fused

Compiler knows ahead of time

No last-minute shader patching required

Page 8: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 8

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Transient attachments After the lighting pass, G-Buffer data is not needed anymore

G-Buffer data only needs to live on the on-chip SRAM

Clear on render pass begin, no need to read from main memory

storeOp is DONT_CARE, so never actually written out to main memory

Vulkan exposes lazily allocated memory

imageUsage = VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT

memoryProperty = VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT

On tilers, no need to back these images with physical memory

Page 9: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 9

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Multipass benefits everyone Deferred paths essentially same for mobile and desktop

Same Vulkan code (*)

Same shader code (*)

VkRenderPass contains all information it needs

Desktop can enjoy more informed scheduling decisions

Latest desktop GPU iterations seem to be moving towards tile-based

At worst, it’s just classic MRT

(*) Minor tweaking to G-Buffer layout may apply

Page 10: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines
Page 11: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 11

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Baseline test

Basic multipass sample

One renderpass

Light on geometry

~8 large lights

Simple shading

Benchmark

Multipass (subpass fusing)

MRT

Overall performance

Bandwidth

Page 12: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines
Page 13: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 13

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Baseline test data

Measured on Galaxy S7 (Exynos)

4096x2048 resolution

Hit V-Sync at native 1440p

~30% FPS improvement

~80% bandwidth reduction

Only using albedo and normals

Saving bandwidth is vital for mobile GPUs

Multipass

MRT

Page 14: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 14

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Sponza test

Mid-high complexity geometry for mobile

~200K tris drawn / frame

~610 spot lights

Full PBR

Shadowmaps on some of the lights

8 reflection probes

Directional light w/ shadows

Full HDR pipeline

Adaptive tonemapping

Bloom

Page 15: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines
Page 16: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines
Page 17: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines
Page 18: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines
Page 19: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 19

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Stencil culling

Update stencil state

Render light with front face culling

Update stencil state

Render light with back face culling

Page 20: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 20

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Clustered stencil culling

Classic method of per light stencil culling involves a lot of state toggling

Can be expensive even on Vulkan

Usually performs poorly on GPU

Instead, cluster local lights along camera Z axis

Each light is assigned to a cluster using conservative depth

Stencil is written for all local lights in a single pass

Peel depth for an extra 2-bit “depth buffer” in stencil

Lights are finally rendered using stencil with double sided depth test

Page 21: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines
Page 22: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 22

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Sponza test data

Far, far heavier than sensible mobile content

1440p (native)

Overkill

~50-60% bandwidth reduction

~18% FPS increase

Multipass

MRT

Page 23: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines
Page 24: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 24

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

«Lofoten»

City scene with ~2.5 million primitives

Relies heavily on CPU based culling techniques to reduce geometry load

100 spot lights and point lights w/shadow maps

Reflection probes

Sun light with cascaded shadow maps

Atmospheric scattering

Ocean simulation

FFT running GPU compute

Separate refraction pass

Screen space reflections

Bloom post-processing w/adaptive luminance tone mapping

Page 25: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 25

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Not your typical mobile graphics pipeline

FFT update Render

shadow depth G-buffer pass Lighting pass

Bloom threshold

Compute average

luminance Multi-rate blur

Tonemap/ composite

Page 26: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 26

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Implementation details

G-buffer pass

On tile-based GPUs, fill rate != bandwidth

Emissive/forward materials written directly to light buffer

Stencil set to mark reflection influence

Depth peeling for clustered stencil culling

Lighting pass

Lighting accumulated using additive blending to lighting attachment

Clustered stencil culling used for local lights

Transparent objects

Fog applied after shading is complete

Only commits light buffer to memory

Page 27: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 27

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Render passes

Early decision to also make the high level interface explicit in terms of defining

render passes

Possible to back-port to OpenGL etc

BeginRenderPass()

NextSubPass()

EndRenderPass()

Page 28: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 28

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Integration for transient and lazy images Add support for “virtual” attachments

Keep a pool of images allocated with TRANSIENT_ATTACHMENT_BIT

Actual image handles not visible to the API user

multipass.virtualAttachments = { RGBA8, RGBA8, RGB10A2 }; mutipass.numSubpasses = 2; multipass.subpass[0].colorTargets = { RT0, VIRTUAL0, VIRTUAL1, VIRTUAL2 }; multipass.subpass[1].colorTargets = { RT0 }; multipass.subpass[1].inputs = { VIRTUAL0, VIRTUAL1, VIRTUAL2, DEPTH }; renderpass.multipass = &multipass; BeginRenderPass(&renderpass);

Page 29: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 29

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Multipass – virtual attachments

// Render G-buffer

pCB->BeginRenderPass(pCommandBuffer,

KINSO_RENDERPASS_LOAD_CLEAR_ALL | KINSO_RENDERPASS_STORE_COLOR,

Vec4(0.0f), 1.0f, 0, &m_Multipass);

DrawGeometry();

// Increment to the next subpass.

pCB->NextSubpass();

// Render lights additively.

DrawLightGeometry();

// Finally apply environment effects

ApplyEnv();

pCB->EndRenderPass();

Page 30: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 30

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

«Lofoten» test data

Even heavier than Sponza

1440p (native)

Overkill

~50% bandwidth save

~12% FPS increase

~25% GPU energy save!

Multipass

MRT

Page 31: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines
Page 32: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 33

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Performance considerations for tilers With on-chip SRAM, per-pixel buffer size is limited

G-Buffer size is therefore limited

On current Mali hardware, 128 bits color targets per pixel

May vary between GPUs and vendors

Smaller tiles may allow for larger G-Buffer

At a quite large performance penalty

Fewer threads active, worse occupancy on shader core

Need to scan through tile list more

Page 33: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 34

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Engine integration for multiple APIs Render pass concept in engine is a must

Cannot express multiple subpasses otherwise

subpassLoad() is unique to Vulkan GLSL

Solution #1, Vulkan GLSL is main shading language

SPIRV-Cross can remap subpassLoad() to MRT-style texelFetch

Solution #2, HLSL or similar

Make your own “intrinsic” which emits subpassLoad in Vulkan

Unroll multipass to multiple passes in other APIs

Change render targets on NextSubpass()

Bind input attachments for subpass to texture units

Statically remap input_attachment_index -> texture unit in shader

Page 34: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

© ARM 2017 35

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Handling image layouts G-Buffer images are by design only used temporarily

No need to track layouts

Application should not have direct access to these images!

Can use external subpass dependencies for transition

VkSubpassDependency dep = {}; dep.srcSubpass = VK_SUBPASS_EXTERNAL; dep.dstSubpass = 0; attachment.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED; reference.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; dep.srcAccessMask = COLOR_ATTACHMENT_WRITE; dep.dstAccessMask = COLOR_ATTACHMENT_READ | WRITE; dep.srcStageMask = COLOR_ATTACHMENT_OUTPUT; dep.dstStageMask = COLOR_ATTACHMENT_OUTPUT;

Page 35: Vulkan Multipass Mobile Deferred Done Right · 2017. 4. 27. · Vulkan GLSL subpassLoad() Reading from input attachments in Vulkan is special Special image type in SPIR-V On vkCreateGraphicsPipelines

The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited

(or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured may be

trademarks of their respective owners.

Copyright © 2017 ARM Limited

© ARM 2017

Questions?