Upload
misael-perkins
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
gameworks.nvidia.com | GDC 2015
Nathan Reed — Developer Technology Engineer, NVIDIADean Beeler — Software Engineer, Oculus
VR Direct: How NVIDIA Technology Is Improving the VR Experience
VR Direct: How NVIDIA Technology Is Improving the VR Experience
gameworks.nvidia.com | GDC 2015
Nathan ReedNVIDIA DevTech — 2 yrsPreviously: game graphics programmer at Sucker Punch
Dean BeelerOculus — 2 yrsPreviously: emulation, drivers, mobile dev, kernel
Who We Are
gameworks.nvidia.com | GDC 2015
Headset design
Input
Rendering performance
Experience design
Hard Problems of VR
gameworks.nvidia.com | GDC 2015
Latency
Scott W. Vincent
Motion to photons in ≤ 20 ms
Franklin Heijnen
gameworks.nvidia.com | GDC 2015
Stereo Rendering
Two eyes, same scene
gameworks.nvidia.com | GDC 2015
Various NV hardware & software technologies
Targeted at VR rendering performanceReduce latencyAccelerate stereo rendering
What Is VR Direct?
gameworks.nvidia.com | GDC 2015
VR Direct Components
In This TalkAsynchronous
Timewarp
VR SLI
gameworks.nvidia.com | GDC 2015
Latency
Frame Queuing
Timewarp
Late-Latching Constants
Asynchronous Timewarp
gameworks.nvidia.com | GDC 2015
Frame Queuing
CPUqueue
GPU
Scanout
Time
Frame NFrame N+1
…
Frame NFrame N+1
Frame N−1
…
Frame N…Frame N−1
Frame N+1
Frame N…Frame N−1
Frame N…Frame N−1
Frame N+1
…
gameworks.nvidia.com | GDC 2015
Frame Queuing
CPU
GPU
Scanout
Time
Frame NFrame N+1
…
Frame NFrame N−1
Frame N…Frame N−1
Frame N+1
gameworks.nvidia.com | GDC 2015
Timewarp
gameworks.nvidia.com | GDC 2015
Very effective at reducing latency...of rotation!Fortunately, that’s the most important
Doesn’t help translation!
Doesn’t help other input latency
Doesn’t help if vsync is missed
Timewarp Pros & Cons
gameworks.nvidia.com | GDC 2015
Timewarp Pipeline Bubbles
CPU
GPU
Scanout
Time
Frame N
Frame N
Frame NFrame N−1
Timewarp
Wait
IdleIdle
Frame N+1
Vsync
gameworks.nvidia.com | GDC 2015
Late-Latching Constants
CPU
GPU
Scanout
Time
Frame N
Frame N…
Frame NFrame N−1
Timewarp
Frame N+1Wait
Frame N+1
Vsync…
gameworks.nvidia.com | GDC 2015
Update constants after render commands queued
NO_OVERWRITE / persistently-mapped buffer
GPU sees latest data when it renders
Still doesn’t help with missed vsync
Late-Latching Constants
gameworks.nvidia.com | GDC 2015
Asynchronous Timewarp
CPU
GPU
Scanout
Timewarp
Time
Frame N
Frame N…
Frame NFrame N−1
Frame N+1
Frame N+1 …
Vsync
…
Vsync
gameworks.nvidia.com | GDC 2015
GPU
Space Vs Time
Time
GPU Resourc
es (Space)
gameworks.nvidia.com | GDC 2015
Main Rendering
Space-MultiplexingTimewarp
Time
GPU Resourc
es (Space)
Vsync Vsync
gameworks.nvidia.com | GDC 2015
Time-Multiplexing
Main Rendering
Time
GPU Resourc
es (Space)
Vsync
Tim
ew
arp
Tim
ew
arp
Vsync
gameworks.nvidia.com | GDC 2015
Prevents worst case: stuck image on headset
Patches up occasional stutters
Doesn’t help translation
Doesn’t help other input latency
Doesn’t help animation stuttering due to low FPS
Async Timewarp Pros & Cons
gameworks.nvidia.com | GDC 2015
NV driver supports high-priority graphics contextTime-multiplexed — takes over entire GPU
Main rendering → normal context
Timewarp rendering → high-pri context
High-Priority Context
gameworks.nvidia.com | GDC 2015
Async Timewarp With High-Pri Context
Render thread
GPU
Warp thread
Time
Frame N
Frame N… Frame N+1
Frame N+1 …
Vsync
Preempt
Vsync
Preempt
gameworks.nvidia.com | GDC 2015
Fermi, Kepler, Maxwell: draw-level preemption
Can only switch at draw call boundaries!Long draw will delay context switch
Future GPU: finer-grained preemption
Preemption
gameworks.nvidia.com | GDC 2015
NvAPI_D3D1x_HintCreateLowLatencyDevice()
Applies to next D3D device created
Fermi, Kepler, Maxwell / Windows Vista+
NDA developer driver available now
Direct3D High-Priority Context
gameworks.nvidia.com | GDC 2015
EGL_IMG_context_priority
Adds priority attribute to eglCreateContext
Available on Tegra K1, X1Including SHIELD console
Only for EGL (Android) at presentWGL (Windows), GLX (Linux) to come
OpenGL High-Priority Context
gameworks.nvidia.com | GDC 2015
Still try to render at headset native framerate!
Async timewarp is a safety netHide occasional hitches / perf dropsNot for upsampling framerate
Developer Guidance
gameworks.nvidia.com | GDC 2015
Avoid long draw callsCurrent GPUs only preempt at draw call boundariesAsync timewarp can get stuck behind long draws
Split up draws that take >1 ms or soE.g. heavy postprocessingSplit into screen-space tiles
Developer Guidance
gameworks.nvidia.com | GDC 2015
Translation warpingUsing depth buffer, layered images, etc.
Motion extrapolationUsing velocity buffer
GSYNCTricky with low-persistence display
Future Work
gameworks.nvidia.com | GDC 2015
Reduce queued frames to 1
Timewarp: adjusts rendered image for late head rotation
Async timewarp: safety net for missed vsync
NVIDIA enables async timewarp via high-pri context
Latency TL;DR
gameworks.nvidia.com | GDC 2015
Stereo RenderingMultiview Rendering
VR SLI
gameworks.nvidia.com | GDC 2015
Frame PipelineWhich stages must be done twice for stereo?
Find visible objects
Submit render commands
Driver internal work
CPUTransform geometry
Rasterization
Shading
GPU
gameworks.nvidia.com | GDC 2015
Flexibility vs OptimizabilityMore flexible — all stages separate
Left
Right
gameworks.nvidia.com | GDC 2015
Flexibility vs OptimizabilityMore optimizable — some stages shared
Left
Right
Shared
gameworks.nvidia.com | GDC 2015
Almost the same visible objects
Almost the same render commands
Almost the same driver internal work
Almost the same geometry rendered
Stereo Views
gameworks.nvidia.com | GDC 2015
Cubemaps: 6 faces
Shadow mapsSeveral lights in one sceneSlices of a cascaded shadow map
Light probes for GIMany probe positions in one scene
Other Multi-View Scenarios
gameworks.nvidia.com | GDC 2015
Submit scene render commands once
All draws, states, etc. broadcast to all views
API support for limited per-view state
Saves CPU rendering cost
Maybe GPU too — depending on impl!
Multiview Rendering
gameworks.nvidia.com | GDC 2015
Shader Multiview
API
VSTess & GS
VSTess & GS
Rast PS
Rast PS
ViewID = 0
ViewID = 1
gameworks.nvidia.com | GDC 2015
Hardware Multiview
API VSTess & GS
Rast PS
Rast PS
ViewMatrix[0]
ViewMatrix[1]
gameworks.nvidia.com | GDC 2015
Shading Reuse
API VSTess & GS
Rast PS
Rast PS
Share work
gameworks.nvidia.com | GDC 2015
VR SLI
API
Left
Right
Shared command
stream
gameworks.nvidia.com | GDC 2015
Interlude: AFR SLI
CPU
GPU0
GPU1
Scanout
Time
…
NN−2
N+1N−1
N+2
N+3
…
N N+1 N+2…
…
N N+1 N+2N−1…
gameworks.nvidia.com | GDC 2015
VR SLI
CPU
GPU0
GPU1
Scanout
…
N leftN−2 L
N N+1 N+2…
N N+1 N+2N−1…
N+1 L …
N rightN−2 R N+1 R …
Time
gameworks.nvidia.com | GDC 2015
GPU 1 MemoryGPU 0 Memory
VR SLI
Same resources & commands
gameworks.nvidia.com | GDC 2015
VR SLI
APIEngine RL
Per-GPU state:Constant buffersViewports L
R
gameworks.nvidia.com | GDC 2015
VR SLI
L
RRL
Blit GPU1→GPU0 over PCIe bus
gameworks.nvidia.com | GDC 2015
View-independent work (e.g. shadow maps) is duplicated
Scaling depends on proportion of view-dependent work
VR SLI Scaling
gameworks.nvidia.com | GDC 2015
Blitting between GPUs uses PCIe bus
PCIe 2.0 x16: ~8 GB/sec = ~1 ms / eye view
PCIe 3.0 x16: ~16 GB/sec = ~0.5 ms / eye view
Dedicated copy engineNon-dependent rendering can continue during blit
Cross-GPU Blit
gameworks.nvidia.com | GDC 2015
Distortion vs SLIDistortion before or after cross-GPU blit?
AfterLower latency
Future-compatible with Oculus SDK updates
BeforeDistortion uses both
GPUs
40% less data to transfer
gameworks.nvidia.com | GDC 2015
Currently D3D11 only
Fermi, Kepler, Maxwell / Windows 7+
Developer driver available now
OpenGL and other APIs: to come
API Availability
gameworks.nvidia.com | GDC 2015
Teach your engine the concept of a “multiview set”
Related views that will be rendered together
Currently: for (each view) find_objects(); for (each object) update_constants(); render();
Developer Guidance
gameworks.nvidia.com | GDC 2015
Multiview: find_objects(); for (each object) for (each view) update_constants(); render();
Developer Guidance
gameworks.nvidia.com | GDC 2015
Keep track of which render targets store stereo data
May need to be marked or set up speciallyOr allocated as a texture array, etc.
Keep track of sync pointsWhere you need all views finished before continuingMay need to blit between GPUs
Developer Guidance
gameworks.nvidia.com | GDC 2015
Multiview: submit scene once, save CPU overhead
Requires some engine integration
Range of possible implementationsTrade off flexibility vs optimizability
VR SLI: a GPU per eye
Stereo Rendering TL;DR
gameworks.nvidia.com | GDC 2015
Variety of VR-related APIs coming in near future
Reduce latencyReduced frame queuingEnable async timewarp & other improvements
Accelerate stereo renderingMultiview APIsVR SLI
VR Direct Recap
gameworks.nvidia.com | GDC 2015
Fermi, Kepler, Maxwell
D3D11: context priorities and VR SLINDA developer driver available now
Android: EGL_IMG_context_priority
Other APIs/platforms: to come
VR Direct API Availability
gameworks.nvidia.com | GDC 2015
All this stuff is hot out of the oven!
Will need more iterations before it settlesSee what works, revise APIs as neededConsolidate & standardize across industry
What Next?
gameworks.nvidia.com | GDC 2015
Email us: [email protected] [email protected]
Slides will be posted: https://developer.nvidia.com/gdc-2015
Questions & Comments?