Upload
irene-brooks
View
240
Download
4
Tags:
Embed Size (px)
Citation preview
A Dynamic Adaptive Multi-A Dynamic Adaptive Multi-resolution resolution
GPU Data StructureGPU Data Structure
Adaptive Shadow Maps, Octree 3D Paint, Adaptive PDE Adaptive Shadow Maps, Octree 3D Paint, Adaptive PDE SolverSolver
Aaron LefohnAaron Lefohn
University of California, DavisUniversity of California, Davis
2Aaron Lefohn University of California, Davis
Problem StatementProblem Statement• GoalGoal
• Dynamic, adaptive, multi-resolution GPU data structureDynamic, adaptive, multi-resolution GPU data structure
• Efficient read, write, structure changeEfficient read, write, structure change
• Adaptive shadow maps, octree 3D paint, adaptive PDE solverAdaptive shadow maps, octree 3D paint, adaptive PDE solver
• ChallengesChallenges• All operations must be data-parallelAll operations must be data-parallel
• Trees difficult to update and cause incoherent accessesTrees difficult to update and cause incoherent accesses
• SolutionSolution• Leverage virtual memory research from architectureLeverage virtual memory research from architecture
• Page-table based structurePage-table based structure
• Decouple levels of indirection from resolution levelsDecouple levels of indirection from resolution levels
• Easy implementation with the Glift template libraryEasy implementation with the Glift template library
3Aaron Lefohn University of California, Davis
CollaboratorsCollaborators• Joe KnissJoe Kniss
University of UtahUniversity of Utah
• Robert StrzodkaRobert StrzodkaCAESAR Research InstituteCAESAR Research Institute
• Shubhabrata SenguptaShubhabrata SenguptaUniversity of California, University of California, DavisDavis
• John OwensJohn OwensUniversity of California, University of California, DavisDavis
4Aaron Lefohn University of California, Davis
AssumptionsAssumptions
• This talk heavily relies on the This talk heavily relies on the contents of the “Glift” generic data contents of the “Glift” generic data structure talkstructure talk
5Aaron Lefohn University of California, Davis
Is This GPGPU Programming?Is This GPGPU Programming?
• YesYes• Inseparable mix of GPGPU stream Inseparable mix of GPGPU stream
programming and traditional graphicsprogramming and traditional graphics
• High-quality interactive renderingHigh-quality interactive rendering
• Updating complex GPU data structuresUpdating complex GPU data structures
6Aaron Lefohn University of California, Davis
Previous WorkPrevious Work
• Binotto et al.Binotto et al.
• Carr et al.Carr et al.
• Coombe et al.Coombe et al.
• Ertl et al.Ertl et al.
• Lefebvre et al.Lefebvre et al.
• Purcell et al.Purcell et al.
7Aaron Lefohn University of California, Davis
Why A New Structure?Why A New Structure?
• What’s Missing?What’s Missing?• Fully GPU-based adaptive multi-resolution Fully GPU-based adaptive multi-resolution
structurestructure
• GPU based address translatorGPU based address translator
• GPU based updates of address translatorGPU based updates of address translator
• Trilinear/Quadlinear mipmap filtering supportTrilinear/Quadlinear mipmap filtering support
• Uniform, coherent memory accessesUniform, coherent memory accesses
8Aaron Lefohn University of California, Davis
ApplicationsApplications
• Adaptive shadow mapsAdaptive shadow maps
• OctreeOctree• 3D paint3D paint
• Adaptive partial differential equation solverAdaptive partial differential equation solver
• ......
9Aaron Lefohn University of California, Davis
Adaptive Shadow MapsAdaptive Shadow Maps
• Fernando et al., ACM SIGGRAPH 2001Fernando et al., ACM SIGGRAPH 2001• Elegant solution to shadow map aliasingElegant solution to shadow map aliasing
• Quadtree of small shadow mapsQuadtree of small shadow maps
• Many recent (2004) shadow papers cite ASMs as Many recent (2004) shadow papers cite ASMs as high quality solution but not possible on graphics high quality solution but not possible on graphics hardwarehardware
Application
10Aaron Lefohn University of California, Davis
ASM Data Structure ASM Data Structure RequirementsRequirements• AdaptiveAdaptive
• MultiresolutionMultiresolution
• Fast, parallel random-access readFast, parallel random-access read• 2x2 native Percentage Closer Filtering (PCF)2x2 native Percentage Closer Filtering (PCF)
• Trilinear interpolated mipmapped PCFTrilinear interpolated mipmapped PCF
• Fast, parallel writeFast, parallel write
• Fast, parallel insert and eraseFast, parallel insert and erase
Application
11Aaron Lefohn University of California, Davis
Octree 3D PaintOctree 3D Paint
• ProblemProblem• Apply paint to non-parameterized surfaceApply paint to non-parameterized surface
•Complex topologyComplex topology
• Implicit surfaceImplicit surface
• SolutionSolution
•Octree textures, brick maps, etc.Octree textures, brick maps, etc.
•Benson & Davis and DeBry et al., SIGGRAPH Benson & Davis and DeBry et al., SIGGRAPH 20022002
12Aaron Lefohn University of California, Davis
Octree 3D Paint RequirementsOctree 3D Paint Requirements
• AdaptiveAdaptive
• MultiresolutionMultiresolution
• Fast, parallel random-access readFast, parallel random-access read• 3x3 native trilinear filtering3x3 native trilinear filtering
• Quadlinear interpolated mipmappingQuadlinear interpolated mipmapping
• Fast, parallel writeFast, parallel write
• Fast, parallel insert and eraseFast, parallel insert and erase
Application
13Aaron Lefohn University of California, Davis
Adaptive PDE SolverAdaptive PDE Solver
• WARNING : Work in progress…WARNING : Work in progress…
• ProblemProblem• Large 3D partial differential equation solvers are slowLarge 3D partial differential equation solvers are slow
• SolutionSolution• Adaptive solver that focuses computation on regions of Adaptive solver that focuses computation on regions of
interestinterest
• Octree simulation domainOctree simulation domain
• Losasso et al., SIGGRAPH 2004Losasso et al., SIGGRAPH 2004
14Aaron Lefohn University of California, Davis
Adaptive PDE Solver Adaptive PDE Solver RequirementsRequirements• AdaptiveAdaptive
• Multiresolution?Multiresolution?
• Fast, parallel neighborhood readFast, parallel neighborhood read
• Fast, parallel writeFast, parallel write• Efficient stream processing of octree nodesEfficient stream processing of octree nodes
• Fast, parallel insert and eraseFast, parallel insert and erase
Application
15Aaron Lefohn University of California, Davis
GPU Dynamic, Adaptive Data GPU Dynamic, Adaptive Data StructureStructure• Three applications have nearly Three applications have nearly
identical requirementsidentical requirements• Describe structure in 2D for ASMDescribe structure in 2D for ASM
16Aaron Lefohn University of California, Davis
ASM Virtual DomainASM Virtual Domain
• Shadow map coordinatesShadow map coordinates
(0,0) (1,0)
(1,1)(0,1)
17Aaron Lefohn University of California, Davis
ASM Physical DomainASM Physical Domain
• Paged 2D texture memoryPaged 2D texture memory• All physical pages identical size (very All physical pages identical size (very
important!)important!)
Physical DomainVirtual Domain
?
18Aaron Lefohn University of California, Davis
ASM Address TranslatorASM Address Translator
• Mipmapped page tableMipmapped page table
Physical DomainVirtual Domain
19Aaron Lefohn University of California, Davis
ASM Address TranslatorASM Address Translator
• Start with page tableStart with page table• Coarse, uniform discretization of virtual domainCoarse, uniform discretization of virtual domain
• Very common in GPU structuresVery common in GPU structures
• LOTS of architecture literatureLOTS of architecture literature
• O(N) memory, O(1) insert, O(1) computation, O(1) eraseO(N) memory, O(1) insert, O(1) computation, O(1) eraseuniform consistency, partial mapping (sparse)uniform consistency, partial mapping (sparse)
Application
20Aaron Lefohn University of California, Davis
ASM Address TranslatorASM Address Translator
• Page table examplePage table example
Application
Physical MemoryPage TableVirtual Domain
vpn = va / pageSizeppa = pageTable(vpn)
off = va % pageSizepa = ppa + off
21Aaron Lefohn University of California, Davis
ASM Data Structure ASM Data Structure RequirementsRequirements• AdaptiveAdaptive
• MultiresolutionMultiresolution
• Fast, parallel random-access readFast, parallel random-access read• 2x2 native Percentage Closer Filtering (PCF)2x2 native Percentage Closer Filtering (PCF)
• Trilinear interpolated mipmapped PCFTrilinear interpolated mipmapped PCF
• Fast, parallel writeFast, parallel write
• Fast, parallel insert and eraseFast, parallel insert and erase
Application
22Aaron Lefohn University of California, Davis
ASM Address TranslatorASM Address Translator
• Adaptive Page TableAdaptive Page Table• Map multiple virtual pages to single physical Map multiple virtual pages to single physical
pagepage
Application
Physical MemoryVirtual Domain
ppa = pageTable(vpn).ppa()
vpn = va / pageSizes = pageTable(vpn).s()off = (va * s) % pageSizepa = ppa + off
Page Table
23Aaron Lefohn University of California, Davis
ASM Data Structure ASM Data Structure RequirementsRequirements• AdaptiveAdaptive
• MultiresolutionMultiresolution
• Fast, parallel random-access readFast, parallel random-access read• 2x2 native Percentage Closer Filtering (PCF)2x2 native Percentage Closer Filtering (PCF)
• Trilinear interpolated mipmapped PCFTrilinear interpolated mipmapped PCF
• Fast, parallel writeFast, parallel write
• Fast, parallel insert and eraseFast, parallel insert and erase
Application
24Aaron Lefohn University of California, Davis
ASM Address TranslatorASM Address Translator
• Multiresolution Page TableMultiresolution Page Table
Application
Physical MemoryVirtual DomainMipmap
Page Table
25Aaron Lefohn University of California, Davis
ASM Data Structure ASM Data Structure RequirementsRequirements• AdaptiveAdaptive
• MultiresolutionMultiresolution
• Fast, parallel random-access readFast, parallel random-access read• 2x2 native Percentage Closer Filtering (PCF)2x2 native Percentage Closer Filtering (PCF)
• Trilinear interpolated mipmapped PCFTrilinear interpolated mipmapped PCF
• Fast, parallel writeFast, parallel write
• Fast, parallel insert and eraseFast, parallel insert and erase
Application
26Aaron Lefohn University of California, Davis
ASM Data Structure ASM Data Structure RequirementsRequirements• How support bilinear filtering?How support bilinear filtering?
• Duplicate 1 column and 1 row of texels in each Duplicate 1 column and 1 row of texels in each pagepage
• Mipmapped trilinear?Mipmapped trilinear?• ““By-hand” interpolation between mipmap levelsBy-hand” interpolation between mipmap levels
Application
27Aaron Lefohn University of California, Davis
ASM Data Structure ASM Data Structure RequirementsRequirements• AdaptiveAdaptive
• MultiresolutionMultiresolution
• Fast, parallel random-access readFast, parallel random-access read• 2x2 native Percentage Closer Filtering (PCF)2x2 native Percentage Closer Filtering (PCF)
• Trilinear interpolated mipmapped PCFTrilinear interpolated mipmapped PCF
• Fast, parallel writeFast, parallel write
• Fast, parallel insert and eraseFast, parallel insert and erase
Application
28Aaron Lefohn University of California, Davis
How Define ASM Structure in How Define ASM Structure in Glift?Glift?• Start with generic page table Start with generic page table
AddrTransAddrTrans• Use mipmapped PhysMem for page tableUse mipmapped PhysMem for page table
• Change template parameter to add adaptivityChange template parameter to add adaptivity
• Write page allocatorWrite page allocator• alloc_pages, free_pagesalloc_pages, free_pages
• Finally…Finally…typedef PageTableAddrTrans<…>typedef PageTableAddrTrans<…> PageTable;PageTable;typedef PhysMemGPU<vec2f, vec1s>typedef PhysMemGPU<vec2f, vec1s> PMem2D;PMem2D;typedef VirtMemGPU<PageTable, PMem2D> typedef VirtMemGPU<PageTable, PMem2D> VPageTable;VPageTable;typedef AdaptiveMem<VPageTable, PageAllocator> ASM;typedef AdaptiveMem<VPageTable, PageAllocator> ASM;
Application
29Aaron Lefohn University of California, Davis
ASM Data Structure UsageASM Data Structure Usagefloat4float4 main( main( uniformuniform VMem2D asm,VMem2D asm,
float3float3 shadowCoord, shadowCoord,
float4float4 litColor litColor ) : ) : COLORCOLOR
{{
floatfloat isInLight = asm.vTex2Ds( shadowCoord ); isInLight = asm.vTex2Ds( shadowCoord );
return lerp( black, litColor, isInLight );return lerp( black, litColor, isInLight );
}}
asm.bind_for_read( … );asm.bind_for_read( … );
asm.bind_for_write( … );asm.bind_for_write( … );
asm.alloc_pages( … );asm.alloc_pages( … );
asm.free_page( … );asm.free_page( … );
……
Application
30Aaron Lefohn University of California, Davis
Adaptive Shadow Map Adaptive Shadow Map AlgorithmAlgorithm• Faithful to Fernando et al. 2001Faithful to Fernando et al. 2001• Refinement algorithmRefinement algorithm• Identify shadow pixels w/ resolution mismatch Identify shadow pixels w/ resolution mismatch
(GPU)(GPU)
• Compact pixels into small stream Compact pixels into small stream (GPU)(GPU)
• CPU reads back compacted stream CPU reads back compacted stream (GPU(GPUCPU)CPU)
• Allocate pagesAllocate pages
• Draw new PTEs into mipmap page tables Draw new PTEs into mipmap page tables (CPU(CPUGPU)GPU)
• Draw depth into ASM for each new page (GPU)Draw depth into ASM for each new page (GPU)
Application
31Aaron Lefohn University of California, Davis
Stream CompactionStream Compaction
• Daniel Horn, GPU Gems II, ch. 36Daniel Horn, GPU Gems II, ch. 36
32Aaron Lefohn University of California, Davis
[Thanks to Yong Kil for the tree model]
ASM: Effective resolution 131,0722 (37
MB); SM: 20482
33Aaron Lefohn University of California, Davis
““Octree” 3D PaintOctree” 3D Paint
• 3D version of ASM data structure3D version of ASM data structure
• Differs from previous work:Differs from previous work:• Quadrilinear filteringQuadrilinear filtering
• O(1), uniform accessO(1), uniform access
• Interactive withInteractive witheffectiveeffectiveresolutionsresolutionsbetweenbetween646433 and 2048 and 204833
Application
34Aaron Lefohn University of California, Davis
Adaptive PDE SolverAdaptive PDE Solver
• Work in progress…Work in progress…• Key feature is defining GPU iteratorsKey feature is defining GPU iterators
• IteratorIterator
•Vertex buffer object of quads (one per page)Vertex buffer object of quads (one per page)
•Create iterators with RTVACreate iterators with RTVA
36Aaron Lefohn University of California, Davis
ASM Performance ResultsASM Performance Results• Fernando ResultsFernando Results
• 5 fps (asynchronous, incremental refinement)5 fps (asynchronous, incremental refinement)
• Fixed lightFixed light
• 31K polys, 51231K polys, 5122 2 image, 65Kimage, 65K22 - 524K - 524K22 ASMs ASMs
• Our results Our results • 15-20 fps while moving camera including refinement 15-20 fps while moving camera including refinement
• 7-12 fps while moving light7-12 fps while moving light
• 45k polys, 51245k polys, 51222 image, 131K image, 131K22 ASM ASM
• Lookup time compared to 2048Lookup time compared to 204822 shadow map: shadow map:• Bilinear filtered: Bilinear filtered: 90% 90%• Trilinear filtered mipmapped:Trilinear filtered mipmapped: 73% 73%
37Aaron Lefohn University of California, Davis
Page Table Memory CoherencyPage Table Memory Coherency
• 1- and 2-level page tables bandwidth bound below 8 x 8 page1- and 2-level page tables bandwidth bound below 8 x 8 pageRGBA8 textures, NVIDIA GeForce 6800 GT, NVIDIA driver 75.22, Cg 1.4aRGBA8 textures, NVIDIA GeForce 6800 GT, NVIDIA driver 75.22, Cg 1.4a
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1 10 100 1000
Page Size
Ban
dwid
th (M
B/s
)
0 Indirections (SEQ)
1 Indirection
2 Indirections
4 Indirections
1-Level Page Table
2-Level Page Table
38Aaron Lefohn University of California, Davis
Data Structure LimitationsData Structure Limitations
• Assume page-level coherencyAssume page-level coherency
• Page table memory consumptionPage table memory consumption• Trade more levels of indirection for memoryTrade more levels of indirection for memory
• Depth-limited treeDepth-limited tree
39Aaron Lefohn University of California, Davis
ConclusionsConclusions
• Dynamic adaptive multires data Dynamic adaptive multires data structurestructure• Coherent accesses if pages are larger than 8 x Coherent accesses if pages are larger than 8 x
88
• Decouple levels of indirection from levels of Decouple levels of indirection from levels of resolutionresolution• Page table literaturePage table literature• Continuum all the way from 1-level to full treeContinuum all the way from 1-level to full tree
• Based on assumption that accesses are Based on assumption that accesses are coherent within pagecoherent within page
40Aaron Lefohn University of California, Davis
ConclusionsConclusions
• Adaptive Shadow MapsAdaptive Shadow Maps• Interactive adaptive refinementInteractive adaptive refinement• Effective shadow map resolution up to Effective shadow map resolution up to
131,072131,07222
• Octree 3D paintOctree 3D paint• Interactive GPU-based octree 3D paintingInteractive GPU-based octree 3D painting• Effective paint resolution up to 2048Effective paint resolution up to 204833
• Adaptive PDE solverAdaptive PDE solver• Work in progress…Work in progress…
41Aaron Lefohn University of California, Davis
AcknowledgementsAcknowledgements• Craig Kolb, Nick Triantos Craig Kolb, Nick Triantos NVIDIA NVIDIA
• Fabio PellaciniFabio Pellacini Cornell/Pixar Cornell/Pixar
• Adam Moerschell, Yong Kil Adam Moerschell, Yong Kil UCDavisUCDavis
Serban Porumbescu, Chris Co, ….Serban Porumbescu, Chris Co, ….
• National Science Foundation Graduate FellowshipNational Science Foundation Graduate Fellowship
• Department of EnergyDepartment of Energy
• Pixar Animation StudiosPixar Animation Studios
42Aaron Lefohn University of California, Davis
More InformationMore Information
• ACM SIGGRAPH Sketches 2005ACM SIGGRAPH Sketches 2005• ““Dynamic Adaptive Shadow Maps”Dynamic Adaptive Shadow Maps”
• ““Octree Textures on Graphics Hardware”Octree Textures on Graphics Hardware”
• ““GPU Programming,” Thursday, 1:45pmGPU Programming,” Thursday, 1:45pm
• Upcoming ACM Transactions on Graphics Upcoming ACM Transactions on Graphics paperpaper• ““Glift : An Abstraction for Generic, Efficient GPU Glift : An Abstraction for Generic, Efficient GPU
Data Structures”Data Structures”
• Google “Lefohn GPU”Google “Lefohn GPU”