Upload
others
View
19
Download
0
Embed Size (px)
Citation preview
Fermi GF100 Graphics Processing Unit (GPU)
Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu
Acknowledgements
We would like to thank Jonah Alben, John Robinson, Mark Daly, Henry Moreton, and the entire Fermi GF100 Development team for success of this project
Outline
GF100 SpecificationsGF100 ArchitectureTessellationPhysicsComputational GraphicsDemos
GF100 Specifications
3 Billion Transistors in 40 nm process (TSMC)Up to 512 CUDA / unified shader cores384-bit GDDR5 memory interface, 6GB capacityGeForce GTX480: Graphics Enthusiast
Full Microsoft DirectX 11 Shader Model 5.032x coverage sample antialiasing (AA)128-bit floating point high dynamic -range lighting with AA
Tesla C2070: Compute HPCCUDA programming: C, C++, OpenCL, DirectCompute, or Fortran515 GigaFLOPS double -precision peak floating pointECC Register Files, L1/L2 caches, shared memory and DRAM
GF100 Architecture
512 CUDA cores16 PolyMorph Engines4 raster units64 texture units48 ROP units384-bit GDDR5
L2 Cache
Me
mory C
ontroller
GPC
SM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
GPC
SM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SMM
em
ory Controlle
rM
em
ory Controlle
r
Me
mor
y C
ontr
olle
rM
em
ory
Con
trol
ler
Me
mor
y C
ontr
olle
r
GPC
SM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
GPC
SM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
Polymorph Engine
Host Interface
GigaThread Engine
GF100 SM Architecture
SM – Streaming Multiprocessor32 CUDA cores: 4x GT200
GT200 – previous high -end GPU by NVIDIA
48 or 16KB of shared memory: 3x GT20016 or 48KB of L1 cache: No L1 on GT200ISA improvements
32-bit integer operationsFMA (fused multiply add) IEEE -754 2008
4 Texture units1 PolyMorph Engine
Cache Architecture for Graphics
Data stays on die
L1 cacheRegister spillingStack opsGlobal LD/ST
L2 CacheVertex, SM, Texture and ROP Data
DRAM
Vertex Fetch Vertex ShaderHull, Domain &
Geomtry ShadersRasterizer Pixel Shader ROP
L1 & L2 Caches
Chip
Cache Architecture Comparison
GT200 GF100 Benefit
L1 Texture Cache(per SM)
12 KB 12 KB Fast t ext ure f i l t ering
Dedicated L1 LD/ ST Cache
✘ 16 or 48 KB Ef f icient physics and
ray t racing
Total Shared Memory
16KB 16 or 48 KB More dat a reuse
among t hreads
L2 Cache 256KB
(TEX read only)
768 KB
(al l cl ient s
read/ writ e)
Great er t ext ure
coverage, robust
comput e performance
DX10 Game Problems, too little detail
Geometry throughput is challenge
Flat head
Silhouette results from coarse triangulationThere are other artifacts
Image from Far Cry® 2, courtesy of Ubisoft
Offline Film Rendering usesTessellation
Tessellation + displacement mappingmodelling standard
Rich geometric detailMore Geometry, Takes More time
© Disney Enterprises, Inc. and Jerry Bruckheimer, I nc.All rights reserved. Image courtesy Industrial Ligh t & Magic.
Tessellation & Displacement MapsAdd Geometric Detail
http:/ / en.wikipedia.org/ wiki/ File:Displacement. jpg
OriginalGeometry
DisplacementMap
TessellatedGeometry
Final Geometry
“v”“u”
“x”“y”
“z”Displaced vertices
Tessellation in DirectX 11
Hull shaderinput new patch primitiveOutputs tessFactors, modesRuns pre -expansion (1 arrow)
Fixed function tessellation stageinput TessFactors and modesOutputs triangles and linesOn chip data expansion (3 arrows)
Vertex
PatchAssembly
Hull
Tessellator
Domain
Primit ive Assembly
Geometry
From input assembly
Cont rolpoint s
New pipeline stages
Tessellation in DirectX 11, cont
Domain shaderInput TessFactors, (u,v) domain pointsOutputs vertices in (x,y,z,w)
Vertex
PatchAssembly
Hull
Tessellator
Domain
Primit ive Assembly
Geometry
From input assembly
Cont rolpoint s
“x”
“y”
“z”
“v”
“u”
“z”
L2 Cache
Me
mory C
ontroller
GPC
SM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
GPC
SM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Me
mory C
ontroller
Me
mory C
ontroller
Me
mor
y C
ontr
olle
rM
em
ory
Con
trol
ler
Me
mor
y C
ontr
olle
r
GPC
SM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
GPC
SM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
Polymorph Engine
Host Interface
GigaThread Engine
GF100 Scalable Parallel Implementation
L2 Cache
Me
mory C
ontroller
GPCSM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
GPCSM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Me
mory C
ontroller
Me
mory C
ontroller
Me
mor
y C
ontr
olle
rM
em
ory
Con
trol
ler
Me
mor
y C
ontr
olle
rGPC
SM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
GPC
SM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
Polymorph Engine
Host Interface
GigaThread Engine
PolyMorph Engine
Vertex Fetch Tessellator Viewport Transform
Attribute Setup Stream Output
Raster Engine
Edge Setup Rasterizer Z-Cull
8x geometry performance vs. GT200
Enables > 2.5 triangles/ clock
Distributed, parallel geometry4 Raster Engines
16 PolyMorph Engines
Tessellation provides interactive smooth silhouetteFilm like capabilities in games
More geometry detail —3D Stereo viewing benefitsDynamic shadows3D Vision™
Memory footprint & BW savingsStore coarse geometry, expand on -demandEnables more complex animations
ScalabilityDynamic LOD allows for performance/quality tradeoffsScale into the future – resolution, compute power
GF100 Enables Geometric Realism
© Kenneth Scott , id Software 2008
Dramatic Increase in Geometry Capability
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Ti4600 6800 Ultra 8800GTX GTX285 GTX480
Shader TeraFLOP/s triangle/s
2002 20062004 2008 2010
GeForce GTX 480
Geometry
Shader
Advance in Geometric Complexity
2003 2010
-
2
4
6
8
10
12
14
16
18
20 M
illio
ns o
f pol
ygon
s pe
r fr
ame
Physics: PhysX ® Example: Fluid Simulation
PhysX ® is NVIDIA physics engine Uses CUDA SW stackUsed in over 150 computer games
Particle -based fluid simulation (SPH)128,000 particlesIncludes surface tension
3x speed up going from GT200 to GF10067 to 200 fps
Games fluid for water splashes, mud, blood…
Computational Graphics: Image Processing – Motion Blur
Color Buffer Previous
Model Position
Vertex Shader
Multiple render target (MRT)
Back Buffer
Motion Blur Shader
Current Camera Position
Current Model
Position
Previous Camera Position
Pixel Shader
Object Shading
Screen Space velocity of every pixel
Velocity Buffer
Unblurred Image
Final Blurred Image
Detected Zones
Computational Graphics: Ray Tracing
Combine Rasterization with RaytracingRasterize primary raysRaytrace shadows and reflections
4x faster than GT200Efficient use of cache architecture
GF100 GPU Compute Performance
© Capcom, Inc.0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
PhysX Fluid Dark Void Ray Tracing AI
GF100GTX285GT200
Demo: SuperSonic Sled --tessellation, physics, and computational graphics
SuperSonic Sled: Tessellation - Terrain
Terrain Displacement Map
SuperSonic Sled: Physics – Smoke, Bridge, Fireballs
GPU Dust Particles
Large Pieces
CPU Objects
(1500 Objects)
Small Chunks
GPU Debris
f
(p,v,m)
128
128
SuperSonic Sled: Computational Graphics -Motion Blur
Demo
Conclusions
GF100 Improves Geometry processing up to 8X over GT200New Architecture and New family of chipsGF100 Architecture breaks 1 triangle per clock barrier
More than 2.5 Triangles/clockHigh throughput tessellation has benefits for interactive gaming
Demonstration of TessellationPhysicsComputational Graphics