Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Fast Tessellated
Rendering on Fermi
GF100Hot3D @ HPG 2010
Tim Purcell
Outline
DX10 state of the art
DX11 tessellation
Fermi GF100 architecture
Results
Demo
2
State of the Art: DX10 pipeline
3
Image from Far Cry® 2,
courtesy of Ubisoft
Pixels are meticulously shaded
Input
Assembler
Vertex
Shader
Rasterizer/
Interpolator
Pixel
Shader
Geometry
Shader
Output
Merger
Image from Far Cry® 2,
courtesy of Ubisoft
State of the Art: DX10 pipeline
4
Pixels are meticulously shaded,
but geometric detail is modest
Tessellation in DirectX 11
Hull shader
Computes the
tessellation factor
Runs pre-expansion
Explicitly parallel across
control points
Vertex
Patch
Assembly
Hull
Tessellator
Domain
Primitive
Assembly
Geometry
From input assembly
Control
points
Tessellation in DirectX 11
Fixed function
tessellation stage
Configured by LOD
output from Hull Shader
Produces triangles and
lines
Vertex
Patch
Assembly
Hull
Tessellator
Domain
Primitive
Assembly
Geometry
From input assembly
Control
points
Tessellation in DirectX 11
Domain shader
Runs post-expansion
Maps (u,v) to (x,y,z,w)
Implicitly parallel
Vertex
Patch
Assembly
Hull
Tessellator
Domain
Primitive
Assembly
Geometry
From input assembly
Control
points
GS
VS
GS
VS
GS
VS
Crossbar
ROP
FB
ROP
FB
ROP
FB
ROP
FB
ROP
FB
ROP
FB
Setup
Viewport
Transform
PS
Raster
PS
Raster
PS
Raster
PS
Raster
PS
Raster
PS
Raster
PS
Raster
PS
Raster
PS
Raster
PS
Raster
PS
Raster
PS
Raster
GS
VS
GS
VS
GS
VS
GS
VS
GS
VS
GS
VS
GS
VS
GS
VS
GS
VS
Primitive Distributor
NVIDIA DX10 Logical Pipeline
8
Data
flow
Vertex Shader
Geometry Shader
Pixel Shader
Primitive Distributor
Crossbar
ROP
FB
ROP
FB
ROP
FB
ROP
FB
ROP
FB
ROP
FB
Setup
Viewport
Transform
TS
VS
GS
TS
VS
GS
TS
VS
GS
TS
VS
GS
TS
VS
GS
TS
VS
GS
TS
VS
GS
TS
VS
GS
TS
VS
GS
TS
VS
GS
TS
VS
GS
TS
VS
GS
PS
Raster
PS
Raster
PS
Raster
PS
Raster
PS
Raster
PS
Raster
PS
Raster
PS
Raster
PS
Raster
PS
Raster
PS
Raster
PS
Raster
DX10 Logical Pipeline + Tessellation
9
Data
flow
Tessellation Shader
(includes Hull Shader,
Tessellator, and Domain
Shader)
Problems With DX10 + Tess. (1)
Data expansion in TS
can lead to excessive
buffering requirements
after GS
Have to drain GS0 before
GS1 to maintain API
ordering
10
TS0
GS0
TSn
GSn
…
…
Problems With DX10 + Tess. (2)
Setup limited prim rate
Already a problem for
CPU-generated
geometry
Tessellation generates
even more primitives
11
TS1 TSn
GS1 GSn
Viewport /
Setup
…
…
Raster0 Rasterm…
TS0
GS0
…
Crossbar
Primitive Distributor
Work Distribution Crossbar
Crossbar
ROP
FB
ROP
FB
ROP
FB
ROP
FB
ROP
FB
ROP
FB
Task Distributor
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
Work Distribution Crossbar
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Fermi GF100 Logical Pipeline
12
Hull Shader
Data
flow
Domain Shader
Raster Engine
(includes Edge Setup,
Rasterizer, Zcull )
Crossbar
Primitive Distributor
Work Distribution Crossbar
Crossbar
ROP
FB
ROP
FB
ROP
FB
ROP
FB
ROP
FB
ROP
FB
Task Distributor
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
Work Distribution Crossbar
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Fermi GF100 Logical Pipeline
13
Data
flowPolymorph Engine
(includes Tessellator,
Viewport Transform,
and Attribute Setup)
Crossbar
Primitive Distributor
Work Distribution Crossbar
Crossbar
ROP
FB
ROP
FB
ROP
FB
ROP
FB
ROP
FB
ROP
FB
Task Distributor
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
Work Distribution Crossbar
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Fermi GF100 Tessellation Features 1
Task Distributor
Task ~= Hull Shader
output
Control points plus
tessellation factor
Pre-expansion
Distributes tasks for
subsequent geometry
processing
Expand patch into
primitives
Optional GS
Reduces buffering
requirements
14
Fermi GF100 Tessellation Features 2
Parallel Viewport
Transform
Includes clipping and
culling
Eliminate serialization
point
15
Crossbar
Primitive Distributor
Work Distribution Crossbar
Crossbar
ROP
FB
ROP
FB
ROP
FB
ROP
FB
ROP
FB
ROP
FB
Task Distributor
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
Work Distribution Crossbar
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Fermi GF100 Tessellation Features 3
Parallel Rasterization
Including Edge Setup,
Rasterization, and
Zcull
Improved primitive
throughput
Multiple primitives
per clock
Screen mapped for
load balancing
16
Crossbar
Primitive Distributor
Work Distribution Crossbar
Crossbar
ROP
FB
ROP
FB
ROP
FB
ROP
FB
ROP
FB
ROP
FB
Task Distributor
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
Work Distribution Crossbar
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Screen Mapped Rasterization
Each block is a tile of pixels
Rasterizers uniquely own pixel tiles
Note: small primitives can overlap multiple tiles
17
0 1 32 0 1 32 0 1 2
3 0 1 32 0 1 32 0 1
1 32 0 1 32 0 1 32
32 0 1 32 0 1 32 0
0 1 32 0 1 32 0 1 2
3 0 1 32 0 1 32 0 1
1 32 0 1 32 0 1 32
AB
0
1
3
2
Rasterizer 0
Rasterizer 3
Rasterizer 1
Rasterizer 2
Challenge: Maintaining API Order
Parallelization is easy
Maintaining API
ordering is a
challenge
Work Distribution
Crossbar (WDX)
Similar to
Pomegranate
[Eldridge et al. 2000]
18
Crossbar
Primitive Distributor
Work Distribution Crossbar
Crossbar
ROP
FB
ROP
FB
ROP
FB
ROP
FB
ROP
FB
ROP
FB
Task Distributor
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
VS
HS
Work Distribution Crossbar
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Tessellator
Viewport
Transform
DS
GS
Work Distribution Crossbar
OWDX
Uses primitive
bounding box to
determine which
rasterizers get which
primitives
SWDX
reconstructs API order
Rasterizers uniquely
own pixels
No subsequent sorting
for ordering required
19
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
SWDX
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
SWDX
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
SWDX
Raster
Engine
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
Attribute
Setup
PS
SWDX
Work Distribution Crossbar
Tessellator
Viewport
Transform
DS
GS
OWDX
Tessellator
Viewport
Transform
DS
GS
OWDX
Tessellator
Viewport
Transform
DS
GS
OWDX
Tessellator
Viewport
Transform
DS
GS
OWDX
Tessellator
Viewport
Transform
DS
GS
OWDX
Tessellator
Viewport
Transform
DS
GS
OWDX
Tessellator
Viewport
Transform
DS
GS
OWDX
Tessellator
Viewport
Transform
DS
GS
OWDX
Tessellator
Viewport
Transform
DS
GS
OWDX
Tessellator
Viewport
Transform
DS
GS
OWDX
Tessellator
Viewport
Transform
DS
GS
OWDX
Tessellator
Viewport
Transform
DS
GS
OWDX
Tessellator
Viewport
Transform
DS
GS
OWDX
Tessellator
Viewport
Transform
DS
GS
OWDX
Tessellator
Viewport
Transform
DS
GS
OWDX
Tessellator
Viewport
Transform
DS
GS
OWDX
Handling Load Imbalance
Load imbalance is a concern for all region-based
architectures because they are non adaptive
Static imbalance
Different screen regions get differing amounts of work
Mitigated by relatively small pixel tiles
Dynamic imbalance
Lots of tiny primitives in same screen region at the same time
Mitigated by buffering
– We can only buffer so much…
20
Results
21
Historical Triangle Rate
22
GF100
0
250
500
750
1000
1250
1500
1750
2000
Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10
M t
ris/s
ec
Chip Release
Results – Synthetic Performance Test
23
0
0.5
1
1.5
2
2.5
3
0 2 4 6 8 10 12 14 16
Triangle
s p
er
clo
ck
Triangle Size (pixels)
Z + per-vertex color
Z-only
Results – Island Demo
24Frame average of 1.94 triangles per clock
Results – Unigine Heaven
252.34 triangles per clock in highly tessellated areas
Demo Time
26
Summary
Tessellation
Increases geometric realism without burdening the CPU
Fermi GF100 is designed for tessellation
Eliminates serialization points
Dramatically improved primitive throughput
Next-generation visual fidelity
27
Acknowledgements
Steve Molnar
Ziyad Hakura
Andreas Dietrich (demo help)
Entire Fermi GF100 team
28
Questions?
29