27
Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu

Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

  • Upload
    others

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Fermi GF100 Graphics Processing Unit (GPU)

Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu

Page 2: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Acknowledgements

We would like to thank Jonah Alben, John Robinson, Mark Daly, Henry Moreton, and the entire Fermi GF100 Development team for success of this project

Page 3: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Outline

GF100 SpecificationsGF100 ArchitectureTessellationPhysicsComputational GraphicsDemos

Page 4: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

GF100 Specifications

3 Billion Transistors in 40 nm process (TSMC)Up to 512 CUDA / unified shader cores384-bit GDDR5 memory interface, 6GB capacityGeForce GTX480: Graphics Enthusiast

Full Microsoft DirectX 11 Shader Model 5.032x coverage sample antialiasing (AA)128-bit floating point high dynamic -range lighting with AA

Tesla C2070: Compute HPCCUDA programming: C, C++, OpenCL, DirectCompute, or Fortran515 GigaFLOPS double -precision peak floating pointECC Register Files, L1/L2 caches, shared memory and DRAM

Page 5: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

GF100 Architecture

512 CUDA cores16 PolyMorph Engines4 raster units64 texture units48 ROP units384-bit GDDR5

L2 Cache

Me

mory C

ontroller

GPC

SM

Raster Engine

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

GPC

SM

Raster Engine

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

SMM

em

ory Controlle

rM

em

ory Controlle

r

Me

mor

y C

ontr

olle

rM

em

ory

Con

trol

ler

Me

mor

y C

ontr

olle

r

GPC

SM

Raster Engine

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

GPC

SM

Raster Engine

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

Polymorph Engine

Host Interface

GigaThread Engine

Page 6: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

GF100 SM Architecture

SM – Streaming Multiprocessor32 CUDA cores: 4x GT200

GT200 – previous high -end GPU by NVIDIA

48 or 16KB of shared memory: 3x GT20016 or 48KB of L1 cache: No L1 on GT200ISA improvements

32-bit integer operationsFMA (fused multiply add) IEEE -754 2008

4 Texture units1 PolyMorph Engine

Page 7: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Cache Architecture for Graphics

Data stays on die

L1 cacheRegister spillingStack opsGlobal LD/ST

L2 CacheVertex, SM, Texture and ROP Data

DRAM

Vertex Fetch Vertex ShaderHull, Domain &

Geomtry ShadersRasterizer Pixel Shader ROP

L1 & L2 Caches

Chip

Page 8: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Cache Architecture Comparison

GT200 GF100 Benefit

L1 Texture Cache(per SM)

12 KB 12 KB Fast t ext ure f i l t ering

Dedicated L1 LD/ ST Cache

✘ 16 or 48 KB Ef f icient physics and

ray t racing

Total Shared Memory

16KB 16 or 48 KB More dat a reuse

among t hreads

L2 Cache 256KB

(TEX read only)

768 KB

(al l cl ient s

read/ writ e)

Great er t ext ure

coverage, robust

comput e performance

Page 9: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

DX10 Game Problems, too little detail

Geometry throughput is challenge

Flat head

Silhouette results from coarse triangulationThere are other artifacts

Image from Far Cry® 2, courtesy of Ubisoft

Page 10: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Offline Film Rendering usesTessellation

Tessellation + displacement mappingmodelling standard

Rich geometric detailMore Geometry, Takes More time

© Disney Enterprises, Inc. and Jerry Bruckheimer, I nc.All rights reserved. Image courtesy Industrial Ligh t & Magic.

Page 11: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Tessellation & Displacement MapsAdd Geometric Detail

http:/ / en.wikipedia.org/ wiki/ File:Displacement. jpg

OriginalGeometry

DisplacementMap

TessellatedGeometry

Final Geometry

“v”“u”

“x”“y”

“z”Displaced vertices

Page 12: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Tessellation in DirectX 11

Hull shaderinput new patch primitiveOutputs tessFactors, modesRuns pre -expansion (1 arrow)

Fixed function tessellation stageinput TessFactors and modesOutputs triangles and linesOn chip data expansion (3 arrows)

Vertex

PatchAssembly

Hull

Tessellator

Domain

Primit ive Assembly

Geometry

From input assembly

Cont rolpoint s

New pipeline stages

Page 13: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Tessellation in DirectX 11, cont

Domain shaderInput TessFactors, (u,v) domain pointsOutputs vertices in (x,y,z,w)

Vertex

PatchAssembly

Hull

Tessellator

Domain

Primit ive Assembly

Geometry

From input assembly

Cont rolpoint s

“x”

“y”

“z”

“v”

“u”

“z”

Page 14: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

L2 Cache

Me

mory C

ontroller

GPC

SM

Raster Engine

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

GPC

SM

Raster Engine

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

SM

Me

mory C

ontroller

Me

mory C

ontroller

Me

mor

y C

ontr

olle

rM

em

ory

Con

trol

ler

Me

mor

y C

ontr

olle

r

GPC

SM

Raster Engine

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

GPC

SM

Raster Engine

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

Polymorph Engine

Host Interface

GigaThread Engine

GF100 Scalable Parallel Implementation

L2 Cache

Me

mory C

ontroller

GPCSM

Raster Engine

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

GPCSM

Raster Engine

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

SM

Me

mory C

ontroller

Me

mory C

ontroller

Me

mor

y C

ontr

olle

rM

em

ory

Con

trol

ler

Me

mor

y C

ontr

olle

rGPC

SM

Raster Engine

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

GPC

SM

Raster Engine

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

SM

Polymorph Engine

Polymorph Engine

Host Interface

GigaThread Engine

PolyMorph Engine

Vertex Fetch Tessellator Viewport Transform

Attribute Setup Stream Output

Raster Engine

Edge Setup Rasterizer Z-Cull

8x geometry performance vs. GT200

Enables > 2.5 triangles/ clock

Distributed, parallel geometry4 Raster Engines

16 PolyMorph Engines

Page 15: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Tessellation provides interactive smooth silhouetteFilm like capabilities in games

More geometry detail —3D Stereo viewing benefitsDynamic shadows3D Vision™

Memory footprint & BW savingsStore coarse geometry, expand on -demandEnables more complex animations

ScalabilityDynamic LOD allows for performance/quality tradeoffsScale into the future – resolution, compute power

GF100 Enables Geometric Realism

© Kenneth Scott , id Software 2008

Page 16: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Dramatic Increase in Geometry Capability

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Ti4600 6800 Ultra 8800GTX GTX285 GTX480

Shader TeraFLOP/s triangle/s

2002 20062004 2008 2010

GeForce GTX 480

Geometry

Shader

Page 17: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Advance in Geometric Complexity

2003 2010

-

2

4

6

8

10

12

14

16

18

20 M

illio

ns o

f pol

ygon

s pe

r fr

ame

Page 18: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Physics: PhysX ® Example: Fluid Simulation

PhysX ® is NVIDIA physics engine Uses CUDA SW stackUsed in over 150 computer games

Particle -based fluid simulation (SPH)128,000 particlesIncludes surface tension

3x speed up going from GT200 to GF10067 to 200 fps

Games fluid for water splashes, mud, blood…

Page 19: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Computational Graphics: Image Processing – Motion Blur

Color Buffer Previous

Model Position

Vertex Shader

Multiple render target (MRT)

Back Buffer

Motion Blur Shader

Current Camera Position

Current Model

Position

Previous Camera Position

Pixel Shader

Object Shading

Screen Space velocity of every pixel

Velocity Buffer

Unblurred Image

Final Blurred Image

Detected Zones

Page 20: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Computational Graphics: Ray Tracing

Combine Rasterization with RaytracingRasterize primary raysRaytrace shadows and reflections

4x faster than GT200Efficient use of cache architecture

Page 21: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

GF100 GPU Compute Performance

© Capcom, Inc.0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

PhysX Fluid Dark Void Ray Tracing AI

GF100GTX285GT200

Page 22: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Demo: SuperSonic Sled --tessellation, physics, and computational graphics

Page 23: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

SuperSonic Sled: Tessellation - Terrain

Terrain Displacement Map

Page 24: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

SuperSonic Sled: Physics – Smoke, Bridge, Fireballs

GPU Dust Particles

Large Pieces

CPU Objects

(1500 Objects)

Small Chunks

GPU Debris

f

(p,v,m)

128

128

Page 25: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

SuperSonic Sled: Computational Graphics -Motion Blur

Page 26: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Demo

Page 27: Fermi GF100 Graphics Processing Unit (GPU)...Fermi GF100 Graphics Processing Unit (GPU) Craig M. Wittenbrink, Emmett Kilgariff, Arjun Prabhu Acknowledgements We would like to thank

Conclusions

GF100 Improves Geometry processing up to 8X over GT200New Architecture and New family of chipsGF100 Architecture breaks 1 triangle per clock barrier

More than 2.5 Triangles/clockHigh throughput tessellation has benefits for interactive gaming

Demonstration of TessellationPhysicsComputational Graphics