49
Real-Time Realistic Rendering Michael Doggett Docent Department of Computer Science Lund university 30-5-2011

Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

  • Upload
    others

  • View
    12

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Real-Time Realistic Rendering

Michael DoggettDocent

Department of Computer ScienceLund university

30-5-2011

Page 2: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Visually realistic goal “…force[d] us to completely rethink the entire rendering process.”Cook et al., 1987The Reyes Image Rendering Architecture

Slide courtesy Jonathan Ragan-KelleyImage courtesy and copyright Pixar

Page 3: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Outline

• GPU architecture

• 2001-2009 ATI/AMD - Boston, U.S.A.

• XBOX360, Radeon 2xxx-6xxx

• Decoupled Sampling

• Analytical Motion Blur

Page 4: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Graphics Group

• Tomas Akenine-Möller• Michael Doggett• Lennart Ohlsson• Magnus Andersson• Rasmus Barringer

• Per Ganestam• Carl Johan Gribel• Björn Johnsson• Jim Rasmusson• Philip Buchanan

Page 5: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Shared L3 cache

Core

cachecache cache cache

cachecache cache cache

Core Core Core

L2 L2L2L2CPU

GPU

Page 6: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

128 fragments in parallel

6

16 cores = 128 ALUs , 16 simultaneous instruction streams Slide courtesy Kayvon Fatahalian

Page 7: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

128 [ ] in parallel

7

vertices/fragmentsprimitives

OpenCL work itemsCUDA threads

fragments

vertices

primitives

Slide courtesy Kayvon Fatahalian

Page 8: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

GPU design parameters

• Competition

• Currently 2 strong competitors

• AMD (ATI) and nVidia

• Performance/Dollar

• Moore’s law

• Number of transistors on a chip doubles every two years

Page 9: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

GPU design parameters

• RTL design

• nVidia ALUs from DX10 on are full custom

• Architecture changes hidden by API

• Fixed function uses programmable hardware

• Many custom units

• Backwards compatible

Page 10: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Before programmable GPUs

• ATI

• Founded 1985 started

• nVidia

• Founded 1993

• DirectX 6

Texturing

Rasterization

FrameBuffer

Z & Alpha

Textures

Image

Page 11: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Hardware Transform, Clipping and Lighting

• Transformation requires 32bit float 4x4 matrix multiplication

• Texturing for 4 component (RGBA) 8bit pixels

• Low precision math

• DirectX 7

• NV10 ’99 (Nvidia GeForce 256)

• R100 ’00 (ATI Radeon 7500)

Texturing

Transform

Rasterization

FrameBuffer

Z & Alpha

Vertices

Textures

Image

Page 12: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

TexturingPixel shader

TransformVertex shader

Programmable GPUs1st Generation

Rasterization

FrameBuffer

Z & Alpha

Vertices

Shaders

Textures

Shaders

Image

•Shaders run programs that do what fixed function hardware did

•Makes GPUs simpler than CPUs

Page 13: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

TexturingPixel shader

TransformVertex shader

Programmable GPUs1st Generation

Rasterization

FrameBuffer

Z & Alpha

Vertices

Shaders

Textures

Shaders

Image

• DirectX8

• Multiple versions of Pixel shaders, 1.1, 1.3, 1.4

• 13-22 instructions

• assembler language

Page 14: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

TexturingPixel shader

TransformVertex shader

Programmable GPUs1st Generation

Rasterization

FrameBuffer

Z & Alpha

Vertices

Shaders

Textures

Shaders

Image

• NV20 ’01

• Nvidia GeForce 3

• [Lindholm01]

• R200 ’01

• ATI Radeon 8500

• 2-wide Vertex shader

• 4-wide SIMD Pixel shader

• Fixed point ~16bits

Page 15: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

TexturingPixel shader

TransformVertex shader

Programmable GPUs2nd Generation

Rasterization

FrameBuffer

Z & Alpha

Vertices

Shaders

Textures

Shaders

Image

• R300 ’02

• ATI Radeon 9700

• 4-wide Vertex shader

• 8-wide SIMD Pixel shader

• 24bit floating point math

Page 16: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

TexturingPixel shader

TransformVertex shader

Programmable GPUs2nd Generation

Rasterization

FrameBuffer

Z & Alpha

Vertices

Shaders

Textures

Shaders

Image

• NV30 ’03

• Nvidia GeForce FX 5800

• 16 and 32 bit float

• Cg (C for graphics)

• NV40 ’04 GeForce 6800 [Montrym05]

• DirectX 9

• High Level Shading Language (HLSL)

Page 17: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

17

Unified Shader

XBox360 GPU architecture - ’05

Vertex Pipeline

Pixel Pipeline

Display PixelsRasterizer

Index Stream Generator

Tessellator

OutputBuffer

GPU

DAUGHTERDIE

Memory Export

UNIFIEDMEMORY

Texture/Vertex Fetch

Transform

Texturing

VertexShaders

PixelShaders

Page 18: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

ATI Radeon 4870(R770) Die

• 2008

• 260mm2

• 956 MTransistors

Page 19: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

ATI Radeon 4870(R770) Die

• 260mm2

• 956 MTransistors

• Red

• 10 SIMDs

• Orange

• 64 z/stencil

• 40 texture

Page 20: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Multi-Graphics coreNVIDIA Fermi

• GeForce GTX 480 (GF100)

• Released March, 2010

• Shader Load/Store L1/L2 cache

• 4x Triangle rate

• 4 rasterization engines

• 529mm2, core i7 263mm2

Page 21: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

NVIDIA GeForce GTX 480 “SM”

21

“Shared” memory(16+48 KB)

Execution contexts(128 KB)

Fetch/Decode

Fetch/Decode

15 Streaming Multiprocessors

“Shared” memory(32 KB)

Execution contexts(256 KB)

Fetch/Decode

ATI Radeon HD 5870 “SIMD-engine”

20 SIMD-engines

Page 22: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Intel’s Knights Corner/Ferry (Larrabee)

• 32 Pentium Cores

• 16 wide SIMD with 32 bit floating point MAD

• Programmable and debuggable

Images courtesy [Seiler08]

Page 23: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Analytical Motion Blur Rasterization with Compression

Carl Johan Gribel, Michael Doggett, Tomas Akenine-Möller

High-Performance Graphics 2010

Page 24: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

• Human visual system designed to detect motion

• This fails when presented too few images, or too much motion per image- motion gets jumpy

• Motion blur aids the motion detection of the visual system

Motion Blur Motivation

Page 25: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

• Compute Edge Equations and exact exposure intervals- analytic inside-test- visibility management

• Compute the time integral

• Store and compress the intervals

Analytical Motion Blur Rasterization

e1(t) = (p2(t)× p0(t)) · (x0, y0, 1)

= (((1− t)q2 + tr2)× ((1− t)q0 + tr0)) · (x0, y0, 1)

= (f t2 + gt+ h) · (x0, y0, 1)

Page 26: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

t=0.0

t=1.0

1.0t

0.33

depth function

t=0.0

t=1.0

Moving triangle edge functions

Page 27: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Ground Truth

Stochastic 12 spp

Analytical, compression 8

Results

Page 28: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel
Page 29: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Decoupled Sampling for Real-Time Graphics PipelinesJonathan Ragan-Kelley, Jiawen Chen, Jaakko Lehtinen, Michael Doggett, Fredo Durand (collab. with MIT)

ACM TOG 2011, to be presented at SIGGRAPH 2011http://bit.ly/DecoupledSampling

Page 30: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Decoupled Shading

Rendering : - Visibility - compute what is visible - Shading - compute color for each pixelComplex visibility - many stochastic point samples in 5D (space, time, lens aperture)Complex shading - expensive evaluation can be pre!lteredModern GPUs use multisample AA for decoupling

Page 31: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Our technique:Post-visibility Decoupled Sampling

1. Separate visibility from shading samples.

2. Define an explicit mapping from visibility to shading space.

3. Use a cache to manage irregular shading-visibility relationships, without precomputation.

Slide courtesy Jonathan Ragan-Kelley

Page 32: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

shading grid

t = 0.0t = 0.25t = 0.5t = 0.75 foreach primitive:foreach vis sample:

skip if not visiblemap to shading sampleif not in cache:

shade and cache

else:use cached value

visibility samples(screen space)

Decoupled Samplingwith motion blur

Slide courtesy Jonathan Ragan-Kelley

Page 33: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

shading grid

t = 0.0t = 0.25t = 0.5t = 0.75

visibility samples(screen space)

foreach primitive:foreach vis sample:

skip if not visiblemap to shading sampleif not in cache:

shade and cache

else:use cached value

Decoupled Samplingwith motion blur

Slide courtesy Jonathan Ragan-Kelley

Page 34: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

shading grid

t = 0.0t = 0.25t = 0.5t = 0.75

visibility samples(screen space)

foreach primitive:foreach vis sample:

skip if not visiblemap to shading sampleif not in cache:

shade and cache

else:use cached value

Decoupled Samplingwith motion blur

Slide courtesy Jonathan Ragan-Kelley

Page 35: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

shading grid

t = …

visibility samples(screen space)

foreach primitive:foreach vis sample:

skip if not visiblemap to shading sampleif not in cache:

shade and cache

else:use cached value

Decoupled Samplingwith motion blur

Slide courtesy Jonathan Ragan-Kelley

Page 36: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

shading grid

t = 0.0t = 0.25t = 0.5t = 0.75

visibility samples(screen space)

foreach primitive:foreach vis sample:

skip if not visiblemap to shading sampleif not in cache:

shade and cache

else:use cached value

Decoupled Samplingwith motion blur

Slide courtesy Jonathan Ragan-Kelley

Page 37: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

shading grid

visibility samples(screen space)

foreach primitive:foreach vis sample:

skip if not visiblemap to shading sampleif not in cache:

shade and cache

else:use cached value

Decoupled Samplingwith motion blur

Slide courtesy Jonathan Ragan-Kelley

Page 38: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Decoupled Sampling

Xform Rast Depth-Stencil Map FB

transformedprimitives

coveredsubpixels

visiblesubpixels

coloredsubpixels

shadingrequests

Cache

Shade shadedresults

Slide courtesy Jonathan Ragan-Kelley

Page 39: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Decoupled Sampling

Xform Rast Depth-Stencil Map FB

transformedprimitives

coveredsubpixels

visiblesubpixels

coloredsubpixels

Cache

Shadeshadingrequests

shadedresults

Slide courtesy Jonathan Ragan-Kelley

Page 40: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Results

Slide courtesy Jonathan Ragan-Kelley

Page 41: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

0

1

2

3

4

5

Sort Last

●●●● ● ● ● ● ● ● ● ● ●

●●● ● ● ● ● ● ● ● ● ● ●

●●● ● ● ● ● ● ● ● ● ● ●

0 50 100 150 200 250

Tiled

●●● ●

● ● ● ● ● ● ● ●●

●●●

●●

●●

●●

●●●

●●

0 50 100 150 200 250

shad

ing

rate 64  samples/pixel

27  samples/pixel8  samples/pixel

blur area (pixels)

Blur vs. shading rate: defocusmoderateblur

no blur heavyblur

Half-Life 2, Episode 21280x720, 27 samples/pixel

4.5-43x less shading than ideal supersampling

Slide courtesy Jonathan Ragan-Kelley

Page 42: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Blur vs. shading rate: motion

shad

ing

rate

blur area (pixels)

0

5

10

15

Sort Last

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ●●

●●

● ● ● ●●

●●

●●

0 20 40 60 80 100 120

Tiled

●●

●●

●●

●●

●●

0 20 40 60 80 100 120

64  samples/pixel27  samples/pixel8  samples/pixel

moderateblur

noblur

heavyblur

Team Fortress 21280x720, 27 samples/pixel

3-40x less shading than ideal supersampling

Slide courtesy Jonathan Ragan-Kelley

Page 43: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Blur vs. Shading Rate

blu

r ar

ea (p

ixel

s)sh

adin

g ra

te

0

1

2

3

4Sort Last

0 10 20 30 40 500

10

20

30

40

50

Tiled

0 10 20 30 40 50

frame

1000

2000

3000

4000

5000

6000

7000

0 10 20 30 40 50

64  samples/pixel27  samples/pixel8  samples/pixel

Slide courtesy Jonathan Ragan-Kelley

Page 44: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Depth of field results

Valve’s Half-Life 2 : Episode 2

Page 45: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Graphics Hardware• Hardware on which 3D graphics is created in real-time

• Traditionally custom hardware

• Increasing programmability

• More complex rendering algorithms

• Programmable hardware available on a wide range

• Mobile - PowerVR, Desktop - AMD Radeon/Nvidia GeForce

• Heterogeneous processors - CPU with integrated graphics

• Sandy Bridge

• Create new algorithms that make use of architectural features

Page 46: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Device properties• Number and type of cores

• Task parallel

• OOO/In-order

• Data parallel

• SIMD/SIMT, width, VLIW/scalar

• Caches

• Size, bandwidths, hierarchy, coherency

• Adaptable to new varieties of architectures

Page 47: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Challenges

• Wide range of changing hardware

• Efficient programming and utilization

• CUDA, OpenCL, DirectCompute

• Fixed custom scheduling (currently)

• Quest for ever increasing realism

• Analytical visibility

• Decoupled sampling

Page 48: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Summary

• GPUs have evolved into massively parallel processors

• Analytical Motion Blur improves quality leading to more realistic images

• Decoupled shading enables real-time realistic rendering

• Need new approaches to adapt to wide variation and complex scheduling

Page 49: Real-Time Realistic Renderingfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/... · The Reyes Image Rendering Architecture ... • 4-wide SIMD Pixel shader ... Vertex Pipeline Pixel

Thanks for listeningand

Questions