42
A Dynamic Adaptive Multi- A Dynamic Adaptive Multi- resolution resolution GPU Data Structure GPU Data Structure Adaptive Shadow Maps, Octree 3D Paint, Adaptive PDE Adaptive Shadow Maps, Octree 3D Paint, Adaptive PDE Solver Solver Aaron Lefohn Aaron Lefohn University of California, Davis University of California, Davis

A Dynamic Adaptive Multi-resolution GPU Data Structure Adaptive Shadow Maps, Octree 3D Paint, Adaptive PDE Solver Aaron Lefohn University of California,

Embed Size (px)

Citation preview

A Dynamic Adaptive Multi-A Dynamic Adaptive Multi-resolution resolution

GPU Data StructureGPU Data Structure

Adaptive Shadow Maps, Octree 3D Paint, Adaptive PDE Adaptive Shadow Maps, Octree 3D Paint, Adaptive PDE SolverSolver

Aaron LefohnAaron Lefohn

University of California, DavisUniversity of California, Davis

2Aaron Lefohn University of California, Davis

Problem StatementProblem Statement• GoalGoal

• Dynamic, adaptive, multi-resolution GPU data structureDynamic, adaptive, multi-resolution GPU data structure

• Efficient read, write, structure changeEfficient read, write, structure change

• Adaptive shadow maps, octree 3D paint, adaptive PDE solverAdaptive shadow maps, octree 3D paint, adaptive PDE solver

• ChallengesChallenges• All operations must be data-parallelAll operations must be data-parallel

• Trees difficult to update and cause incoherent accessesTrees difficult to update and cause incoherent accesses

• SolutionSolution• Leverage virtual memory research from architectureLeverage virtual memory research from architecture

• Page-table based structurePage-table based structure

• Decouple levels of indirection from resolution levelsDecouple levels of indirection from resolution levels

• Easy implementation with the Glift template libraryEasy implementation with the Glift template library

3Aaron Lefohn University of California, Davis

CollaboratorsCollaborators• Joe KnissJoe Kniss

University of UtahUniversity of Utah

• Robert StrzodkaRobert StrzodkaCAESAR Research InstituteCAESAR Research Institute

• Shubhabrata SenguptaShubhabrata SenguptaUniversity of California, University of California, DavisDavis

• John OwensJohn OwensUniversity of California, University of California, DavisDavis

4Aaron Lefohn University of California, Davis

AssumptionsAssumptions

• This talk heavily relies on the This talk heavily relies on the contents of the “Glift” generic data contents of the “Glift” generic data structure talkstructure talk

5Aaron Lefohn University of California, Davis

Is This GPGPU Programming?Is This GPGPU Programming?

• YesYes• Inseparable mix of GPGPU stream Inseparable mix of GPGPU stream

programming and traditional graphicsprogramming and traditional graphics

• High-quality interactive renderingHigh-quality interactive rendering

• Updating complex GPU data structuresUpdating complex GPU data structures

6Aaron Lefohn University of California, Davis

Previous WorkPrevious Work

• Binotto et al.Binotto et al.

• Carr et al.Carr et al.

• Coombe et al.Coombe et al.

• Ertl et al.Ertl et al.

• Lefebvre et al.Lefebvre et al.

• Purcell et al.Purcell et al.

7Aaron Lefohn University of California, Davis

Why A New Structure?Why A New Structure?

• What’s Missing?What’s Missing?• Fully GPU-based adaptive multi-resolution Fully GPU-based adaptive multi-resolution

structurestructure

• GPU based address translatorGPU based address translator

• GPU based updates of address translatorGPU based updates of address translator

• Trilinear/Quadlinear mipmap filtering supportTrilinear/Quadlinear mipmap filtering support

• Uniform, coherent memory accessesUniform, coherent memory accesses

8Aaron Lefohn University of California, Davis

ApplicationsApplications

• Adaptive shadow mapsAdaptive shadow maps

• OctreeOctree• 3D paint3D paint

• Adaptive partial differential equation solverAdaptive partial differential equation solver

• ......

9Aaron Lefohn University of California, Davis

Adaptive Shadow MapsAdaptive Shadow Maps

• Fernando et al., ACM SIGGRAPH 2001Fernando et al., ACM SIGGRAPH 2001• Elegant solution to shadow map aliasingElegant solution to shadow map aliasing

• Quadtree of small shadow mapsQuadtree of small shadow maps

• Many recent (2004) shadow papers cite ASMs as Many recent (2004) shadow papers cite ASMs as high quality solution but not possible on graphics high quality solution but not possible on graphics hardwarehardware

Application

10Aaron Lefohn University of California, Davis

ASM Data Structure ASM Data Structure RequirementsRequirements• AdaptiveAdaptive

• MultiresolutionMultiresolution

• Fast, parallel random-access readFast, parallel random-access read• 2x2 native Percentage Closer Filtering (PCF)2x2 native Percentage Closer Filtering (PCF)

• Trilinear interpolated mipmapped PCFTrilinear interpolated mipmapped PCF

• Fast, parallel writeFast, parallel write

• Fast, parallel insert and eraseFast, parallel insert and erase

Application

11Aaron Lefohn University of California, Davis

Octree 3D PaintOctree 3D Paint

• ProblemProblem• Apply paint to non-parameterized surfaceApply paint to non-parameterized surface

•Complex topologyComplex topology

• Implicit surfaceImplicit surface

• SolutionSolution

•Octree textures, brick maps, etc.Octree textures, brick maps, etc.

•Benson & Davis and DeBry et al., SIGGRAPH Benson & Davis and DeBry et al., SIGGRAPH 20022002

12Aaron Lefohn University of California, Davis

Octree 3D Paint RequirementsOctree 3D Paint Requirements

• AdaptiveAdaptive

• MultiresolutionMultiresolution

• Fast, parallel random-access readFast, parallel random-access read• 3x3 native trilinear filtering3x3 native trilinear filtering

• Quadlinear interpolated mipmappingQuadlinear interpolated mipmapping

• Fast, parallel writeFast, parallel write

• Fast, parallel insert and eraseFast, parallel insert and erase

Application

13Aaron Lefohn University of California, Davis

Adaptive PDE SolverAdaptive PDE Solver

• WARNING : Work in progress…WARNING : Work in progress…

• ProblemProblem• Large 3D partial differential equation solvers are slowLarge 3D partial differential equation solvers are slow

• SolutionSolution• Adaptive solver that focuses computation on regions of Adaptive solver that focuses computation on regions of

interestinterest

• Octree simulation domainOctree simulation domain

• Losasso et al., SIGGRAPH 2004Losasso et al., SIGGRAPH 2004

14Aaron Lefohn University of California, Davis

Adaptive PDE Solver Adaptive PDE Solver RequirementsRequirements• AdaptiveAdaptive

• Multiresolution?Multiresolution?

• Fast, parallel neighborhood readFast, parallel neighborhood read

• Fast, parallel writeFast, parallel write• Efficient stream processing of octree nodesEfficient stream processing of octree nodes

• Fast, parallel insert and eraseFast, parallel insert and erase

Application

15Aaron Lefohn University of California, Davis

GPU Dynamic, Adaptive Data GPU Dynamic, Adaptive Data StructureStructure• Three applications have nearly Three applications have nearly

identical requirementsidentical requirements• Describe structure in 2D for ASMDescribe structure in 2D for ASM

16Aaron Lefohn University of California, Davis

ASM Virtual DomainASM Virtual Domain

• Shadow map coordinatesShadow map coordinates

(0,0) (1,0)

(1,1)(0,1)

17Aaron Lefohn University of California, Davis

ASM Physical DomainASM Physical Domain

• Paged 2D texture memoryPaged 2D texture memory• All physical pages identical size (very All physical pages identical size (very

important!)important!)

Physical DomainVirtual Domain

?

18Aaron Lefohn University of California, Davis

ASM Address TranslatorASM Address Translator

• Mipmapped page tableMipmapped page table

Physical DomainVirtual Domain

19Aaron Lefohn University of California, Davis

ASM Address TranslatorASM Address Translator

• Start with page tableStart with page table• Coarse, uniform discretization of virtual domainCoarse, uniform discretization of virtual domain

• Very common in GPU structuresVery common in GPU structures

• LOTS of architecture literatureLOTS of architecture literature

• O(N) memory, O(1) insert, O(1) computation, O(1) eraseO(N) memory, O(1) insert, O(1) computation, O(1) eraseuniform consistency, partial mapping (sparse)uniform consistency, partial mapping (sparse)

Application

20Aaron Lefohn University of California, Davis

ASM Address TranslatorASM Address Translator

• Page table examplePage table example

Application

Physical MemoryPage TableVirtual Domain

vpn = va / pageSizeppa = pageTable(vpn)

off = va % pageSizepa = ppa + off

21Aaron Lefohn University of California, Davis

ASM Data Structure ASM Data Structure RequirementsRequirements• AdaptiveAdaptive

• MultiresolutionMultiresolution

• Fast, parallel random-access readFast, parallel random-access read• 2x2 native Percentage Closer Filtering (PCF)2x2 native Percentage Closer Filtering (PCF)

• Trilinear interpolated mipmapped PCFTrilinear interpolated mipmapped PCF

• Fast, parallel writeFast, parallel write

• Fast, parallel insert and eraseFast, parallel insert and erase

Application

22Aaron Lefohn University of California, Davis

ASM Address TranslatorASM Address Translator

• Adaptive Page TableAdaptive Page Table• Map multiple virtual pages to single physical Map multiple virtual pages to single physical

pagepage

Application

Physical MemoryVirtual Domain

ppa = pageTable(vpn).ppa()

vpn = va / pageSizes = pageTable(vpn).s()off = (va * s) % pageSizepa = ppa + off

Page Table

23Aaron Lefohn University of California, Davis

ASM Data Structure ASM Data Structure RequirementsRequirements• AdaptiveAdaptive

• MultiresolutionMultiresolution

• Fast, parallel random-access readFast, parallel random-access read• 2x2 native Percentage Closer Filtering (PCF)2x2 native Percentage Closer Filtering (PCF)

• Trilinear interpolated mipmapped PCFTrilinear interpolated mipmapped PCF

• Fast, parallel writeFast, parallel write

• Fast, parallel insert and eraseFast, parallel insert and erase

Application

24Aaron Lefohn University of California, Davis

ASM Address TranslatorASM Address Translator

• Multiresolution Page TableMultiresolution Page Table

Application

Physical MemoryVirtual DomainMipmap

Page Table

25Aaron Lefohn University of California, Davis

ASM Data Structure ASM Data Structure RequirementsRequirements• AdaptiveAdaptive

• MultiresolutionMultiresolution

• Fast, parallel random-access readFast, parallel random-access read• 2x2 native Percentage Closer Filtering (PCF)2x2 native Percentage Closer Filtering (PCF)

• Trilinear interpolated mipmapped PCFTrilinear interpolated mipmapped PCF

• Fast, parallel writeFast, parallel write

• Fast, parallel insert and eraseFast, parallel insert and erase

Application

26Aaron Lefohn University of California, Davis

ASM Data Structure ASM Data Structure RequirementsRequirements• How support bilinear filtering?How support bilinear filtering?

• Duplicate 1 column and 1 row of texels in each Duplicate 1 column and 1 row of texels in each pagepage

• Mipmapped trilinear?Mipmapped trilinear?• ““By-hand” interpolation between mipmap levelsBy-hand” interpolation between mipmap levels

Application

27Aaron Lefohn University of California, Davis

ASM Data Structure ASM Data Structure RequirementsRequirements• AdaptiveAdaptive

• MultiresolutionMultiresolution

• Fast, parallel random-access readFast, parallel random-access read• 2x2 native Percentage Closer Filtering (PCF)2x2 native Percentage Closer Filtering (PCF)

• Trilinear interpolated mipmapped PCFTrilinear interpolated mipmapped PCF

• Fast, parallel writeFast, parallel write

• Fast, parallel insert and eraseFast, parallel insert and erase

Application

28Aaron Lefohn University of California, Davis

How Define ASM Structure in How Define ASM Structure in Glift?Glift?• Start with generic page table Start with generic page table

AddrTransAddrTrans• Use mipmapped PhysMem for page tableUse mipmapped PhysMem for page table

• Change template parameter to add adaptivityChange template parameter to add adaptivity

• Write page allocatorWrite page allocator• alloc_pages, free_pagesalloc_pages, free_pages

• Finally…Finally…typedef PageTableAddrTrans<…>typedef PageTableAddrTrans<…> PageTable;PageTable;typedef PhysMemGPU<vec2f, vec1s>typedef PhysMemGPU<vec2f, vec1s> PMem2D;PMem2D;typedef VirtMemGPU<PageTable, PMem2D> typedef VirtMemGPU<PageTable, PMem2D> VPageTable;VPageTable;typedef AdaptiveMem<VPageTable, PageAllocator> ASM;typedef AdaptiveMem<VPageTable, PageAllocator> ASM;

Application

29Aaron Lefohn University of California, Davis

ASM Data Structure UsageASM Data Structure Usagefloat4float4 main( main( uniformuniform VMem2D asm,VMem2D asm,

float3float3 shadowCoord, shadowCoord,

float4float4 litColor litColor ) : ) : COLORCOLOR

{{

floatfloat isInLight = asm.vTex2Ds( shadowCoord ); isInLight = asm.vTex2Ds( shadowCoord );

return lerp( black, litColor, isInLight );return lerp( black, litColor, isInLight );

}}

asm.bind_for_read( … );asm.bind_for_read( … );

asm.bind_for_write( … );asm.bind_for_write( … );

asm.alloc_pages( … );asm.alloc_pages( … );

asm.free_page( … );asm.free_page( … );

……

Application

30Aaron Lefohn University of California, Davis

Adaptive Shadow Map Adaptive Shadow Map AlgorithmAlgorithm• Faithful to Fernando et al. 2001Faithful to Fernando et al. 2001• Refinement algorithmRefinement algorithm• Identify shadow pixels w/ resolution mismatch Identify shadow pixels w/ resolution mismatch

(GPU)(GPU)

• Compact pixels into small stream Compact pixels into small stream (GPU)(GPU)

• CPU reads back compacted stream CPU reads back compacted stream (GPU(GPUCPU)CPU)

• Allocate pagesAllocate pages

• Draw new PTEs into mipmap page tables Draw new PTEs into mipmap page tables (CPU(CPUGPU)GPU)

• Draw depth into ASM for each new page (GPU)Draw depth into ASM for each new page (GPU)

Application

31Aaron Lefohn University of California, Davis

Stream CompactionStream Compaction

• Daniel Horn, GPU Gems II, ch. 36Daniel Horn, GPU Gems II, ch. 36

32Aaron Lefohn University of California, Davis

[Thanks to Yong Kil for the tree model]

ASM: Effective resolution 131,0722 (37

MB); SM: 20482

33Aaron Lefohn University of California, Davis

““Octree” 3D PaintOctree” 3D Paint

• 3D version of ASM data structure3D version of ASM data structure

• Differs from previous work:Differs from previous work:• Quadrilinear filteringQuadrilinear filtering

• O(1), uniform accessO(1), uniform access

• Interactive withInteractive witheffectiveeffectiveresolutionsresolutionsbetweenbetween646433 and 2048 and 204833

Application

34Aaron Lefohn University of California, Davis

Adaptive PDE SolverAdaptive PDE Solver

• Work in progress…Work in progress…• Key feature is defining GPU iteratorsKey feature is defining GPU iterators

• IteratorIterator

•Vertex buffer object of quads (one per page)Vertex buffer object of quads (one per page)

•Create iterators with RTVACreate iterators with RTVA

35Aaron Lefohn University of California, Davis

DemoDemo

36Aaron Lefohn University of California, Davis

ASM Performance ResultsASM Performance Results• Fernando ResultsFernando Results

• 5 fps (asynchronous, incremental refinement)5 fps (asynchronous, incremental refinement)

• Fixed lightFixed light

• 31K polys, 51231K polys, 5122 2 image, 65Kimage, 65K22 - 524K - 524K22 ASMs ASMs

• Our results Our results • 15-20 fps while moving camera including refinement 15-20 fps while moving camera including refinement

• 7-12 fps while moving light7-12 fps while moving light

• 45k polys, 51245k polys, 51222 image, 131K image, 131K22 ASM ASM

• Lookup time compared to 2048Lookup time compared to 204822 shadow map: shadow map:• Bilinear filtered: Bilinear filtered: 90% 90%• Trilinear filtered mipmapped:Trilinear filtered mipmapped: 73% 73%

37Aaron Lefohn University of California, Davis

Page Table Memory CoherencyPage Table Memory Coherency

• 1- and 2-level page tables bandwidth bound below 8 x 8 page1- and 2-level page tables bandwidth bound below 8 x 8 pageRGBA8 textures, NVIDIA GeForce 6800 GT, NVIDIA driver 75.22, Cg 1.4aRGBA8 textures, NVIDIA GeForce 6800 GT, NVIDIA driver 75.22, Cg 1.4a

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 10 100 1000

Page Size

Ban

dwid

th (M

B/s

)

0 Indirections (SEQ)

1 Indirection

2 Indirections

4 Indirections

1-Level Page Table

2-Level Page Table

38Aaron Lefohn University of California, Davis

Data Structure LimitationsData Structure Limitations

• Assume page-level coherencyAssume page-level coherency

• Page table memory consumptionPage table memory consumption• Trade more levels of indirection for memoryTrade more levels of indirection for memory

• Depth-limited treeDepth-limited tree

39Aaron Lefohn University of California, Davis

ConclusionsConclusions

• Dynamic adaptive multires data Dynamic adaptive multires data structurestructure• Coherent accesses if pages are larger than 8 x Coherent accesses if pages are larger than 8 x

88

• Decouple levels of indirection from levels of Decouple levels of indirection from levels of resolutionresolution• Page table literaturePage table literature• Continuum all the way from 1-level to full treeContinuum all the way from 1-level to full tree

• Based on assumption that accesses are Based on assumption that accesses are coherent within pagecoherent within page

40Aaron Lefohn University of California, Davis

ConclusionsConclusions

• Adaptive Shadow MapsAdaptive Shadow Maps• Interactive adaptive refinementInteractive adaptive refinement• Effective shadow map resolution up to Effective shadow map resolution up to

131,072131,07222

• Octree 3D paintOctree 3D paint• Interactive GPU-based octree 3D paintingInteractive GPU-based octree 3D painting• Effective paint resolution up to 2048Effective paint resolution up to 204833

• Adaptive PDE solverAdaptive PDE solver• Work in progress…Work in progress…

41Aaron Lefohn University of California, Davis

AcknowledgementsAcknowledgements• Craig Kolb, Nick Triantos Craig Kolb, Nick Triantos NVIDIA NVIDIA

• Fabio PellaciniFabio Pellacini Cornell/Pixar Cornell/Pixar

• Adam Moerschell, Yong Kil Adam Moerschell, Yong Kil UCDavisUCDavis

Serban Porumbescu, Chris Co, ….Serban Porumbescu, Chris Co, ….

• National Science Foundation Graduate FellowshipNational Science Foundation Graduate Fellowship

• Department of EnergyDepartment of Energy

• Pixar Animation StudiosPixar Animation Studios

42Aaron Lefohn University of California, Davis

More InformationMore Information

• ACM SIGGRAPH Sketches 2005ACM SIGGRAPH Sketches 2005• ““Dynamic Adaptive Shadow Maps”Dynamic Adaptive Shadow Maps”

• ““Octree Textures on Graphics Hardware”Octree Textures on Graphics Hardware”

• ““GPU Programming,” Thursday, 1:45pmGPU Programming,” Thursday, 1:45pm

• Upcoming ACM Transactions on Graphics Upcoming ACM Transactions on Graphics paperpaper• ““Glift : An Abstraction for Generic, Efficient GPU Glift : An Abstraction for Generic, Efficient GPU

Data Structures”Data Structures”

• Google “Lefohn GPU”Google “Lefohn GPU”