56
SPU Shaders SPU Shaders Mike Acton Mike Acton Engine Director Engine Director Insomniac Games Insomniac Games

SPU Shaders · Now have async update on PPU with proto- ... – Lessons from Physics shaders ... Use Cases Middleware as SPU Shaders

  • Upload
    buidung

  • View
    240

  • Download
    0

Embed Size (px)

Citation preview

SPU ShadersSPU Shaders

● Mike ActonMike ActonEngine DirectorEngine DirectorInsomniac GamesInsomniac Games

State of AffairsState of Affairs● Engine Systems on SPUsEngine Systems on SPUs● SPU Optimization UnderstoodSPU Optimization Understood● Remaining Systems Planned OutRemaining Systems Planned Out● Still Have SPU Time to SpareStill Have SPU Time to Spare● PPU Still Driving The GamePPU Still Driving The Game● Arguing for More ParallelismArguing for More Parallelism● Want More Customization, IterationWant More Customization, Iteration

● Insomniac Games, 2007Insomniac Games, 2007

Needed StrategyNeeded Strategy● Bridging the “Cell Gap”Bridging the “Cell Gap”

– Reducing synchronization pointsReducing synchronization points– Better data flowBetter data flow– Encouraging SPU use by more systems and Encouraging SPU use by more systems and

programmers.programmers.– Keeping SPU code straightforward, fast and Keeping SPU code straightforward, fast and

optimizable.optimizable.– Get as much on the SPUs as possible.Get as much on the SPUs as possible.

● More customization, iteration.More customization, iteration.● Ultimately: Kill the main update loop.Ultimately: Kill the main update loop.

What are SPU Shaders?What are SPU Shaders?

● SPU Shaders are:SPU Shaders are:– Fragments of code used in a larger systemFragments of code used in a larger system– Code is injected at location pre-determined by Code is injected at location pre-determined by

system.system.– Custom for any particular system.Custom for any particular system.– Custom modifications of system data.Custom modifications of system data.– Feedback to other systems outside the scope of Feedback to other systems outside the scope of

the current system.the current system.●

What are SPU Shaders?What are SPU Shaders?

● Like scripts...Like scripts...● Like callbacks...Like callbacks...● Like messages...Like messages...● Like overlays...Like overlays...● Like iterators...Like iterators...● BUT! On the SPU, and very simple.BUT! On the SPU, and very simple.

What are SPU Shaders?What are SPU Shaders?

● SPU Shaders are NOT:SPU Shaders are NOT:– Generic, general purpose system.Generic, general purpose system.– A system of any kind, actually.A system of any kind, actually.–

What are SPU Shaders?What are SPU Shaders?

● Why is it called a “shader”?Why is it called a “shader”?– Shares important similarities to GPU shaders.Shares important similarities to GPU shaders.

● Native code fragmentsNative code fragments● Part of a larger systemPart of a larger system● In-context executionIn-context execution● Independently optimizableIndependently optimizable

– Most important: Concept is approachable.Most important: Concept is approachable.–

What are SPU Shaders?What are SPU Shaders?

● SPU Shaders as policy:SPU Shaders as policy:– Simplicity by force.Simplicity by force.–

● ““Don't try to solve everyone's problems”Don't try to solve everyone's problems”– Solutions that try to solve all problems tend to Solutions that try to solve all problems tend to

cause more problems than they solve.cause more problems than they solve.●

AdvantagesAdvantages

● Easy to implementEasy to implement● Put the programmer in the right place at the Put the programmer in the right place at the

right time.right time.● Programmer writes to SPU, not to software Programmer writes to SPU, not to software

layer.layer.● Core performance issues still managed by Core performance issues still managed by

systems programmers.systems programmers.● Fragments are optimizable.Fragments are optimizable.

● Insomniac Games, 2007Insomniac Games, 2007

AdvantagesAdvantages

● Systems don't need to provide functionality for Systems don't need to provide functionality for every possible caseevery possible case– More power in the hands of the gameplay More power in the hands of the gameplay

programmers.programmers.– More communication between gameplay and More communication between gameplay and

engine.engine.– Less work-around and more work-within.Less work-around and more work-within.

● Insomniac Games, 2007Insomniac Games, 2007

AdvantagesAdvantages● Add functionality without modifying the Add functionality without modifying the

system.system.– Less risk to core.Less risk to core.– Less risk to other systems.Less risk to other systems.– Less risk to other shaders.Less risk to other shaders.

● Insomniac Games, 2007Insomniac Games, 2007

AdvantagesAdvantages● And the obvious...And the obvious...

– It's not on the PPU.It's not on the PPU.– It's faster.It's faster.

● Insomniac Games, 2007Insomniac Games, 2007

CostsCosts● Forced to think about data layout and Forced to think about data layout and

synchronization.synchronization.– (But that's a good thing.)(But that's a good thing.)– Haphazard access isn't going to work.Haphazard access isn't going to work.

● Debugging can be trickier.Debugging can be trickier.● Need manage shadersNeed manage shaders

– As code in project, linked-in?As code in project, linked-in?– As data stored with instance data?As data stored with instance data?– How to reference shaders?How to reference shaders?

● Insomniac Games, 2007Insomniac Games, 2007

Easy To ImplementEasy To Implement

● Pick stage(s) in system kernel to inject shaders.Pick stage(s) in system kernel to inject shaders.● Define available inputs and outputs.Define available inputs and outputs.● Collect common functions.Collect common functions.● Compile shaders as data.Compile shaders as data.● Sort instance data based on shader type(s)Sort instance data based on shader type(s)● Load shader on-demand based on data select.Load shader on-demand based on data select.● Call shaders.Call shaders.

Easy To ImplementEasy To Implement

● What data is being transformed?What data is being transformed?– What are the inputs?What are the inputs?– What are the outputs?What are the outputs?– What can be modified?What can be modified?

Easy To ImplementEasy To Implement

● Collect the common functions...Collect the common functions...– Always loaded by the systemAlways loaded by the system– e.g. e.g.

● Dma wrapper functionsDma wrapper functions● Debugging functionsDebugging functions● Common transformation functionsCommon transformation functions

struct CommonFunctions{ PrintfProc* m_print_proc; PrintVectorProc* m_print_vector; PrintIntegerProc* m_print_integer; PrintFloatProc* m_print_float;

PrintMatrix4Proc* m_print_mtx4; PrintMatrix3Proc* m_print_mtx3; DmaGetProc* m_dma_get; DmaPutProc* m_dma_put; }

struct common_t{ void (*print_str)(const char *str); void (*dma_wait)(uint32_t tag); void (*dma_send)(void *ls, uint32_t ea, uint32_t size, uint32_t tag); void (*dma_recv)(void *ls, uint32_t ea, uint32_t size, uint32_t tag);

char* ls; uint32_t ls_size; uint32_t data_ea; uint32_t data_size;};

Easy To ImplementEasy To Implement

● System Shader Configuration...System Shader Configuration...– System knows where the fragments are.System knows where the fragments are.– System knows when to call the fragments.System knows when to call the fragments.– System doesn't know what the fragments do.System doesn't know what the fragments do.– Fragments are in main RAM.Fragments are in main RAM.– Fragments don't need to be fixed.Fragments don't need to be fixed.–

struct config_t{ uint32_t max_frag_size; // so we can do double buffering uint32_t frag_count; // number of fragments in the list uint32_t frags_ea; // EA of list of fragments uint32_t pad_0;};

Easy To ImplementEasy To Implement

● System Shader Configuration.System Shader Configuration.● Manage fragment memory:Manage fragment memory:

– Simplest method: Simplest method: ● Double buffer, Double buffer, ● On-demand,On-demand,● Fixed maximum size,Fixed maximum size,● By-index from array,...By-index from array,...

struct fragment_t{ uint32_t code_ea; // fragment's code EA uint32_t code_size; uint32_t entry_point; // in bytes, relative to code_ea uint32_t data_ea; // data in main RAM the fragment wants to process uint32_t data_size; uint32_t pad[3];};

Easy To ImplementEasy To Implement

● System Shader Configuration.System Shader Configuration.● Manage fragment memory:Manage fragment memory:

– Alternate methods: Alternate methods: ● Allocated, cached fragments (Lots of small shaders)Allocated, cached fragments (Lots of small shaders)● Fixed locations (Offline analysis)Fixed locations (Offline analysis)

– ... The best solution is specific to the system.... The best solution is specific to the system.–

Easy To ImplementEasy To Implement

● Create the shader code...Create the shader code...● ““Code is just data”Code is just data”

– No special distinquishing feature on the SPUsNo special distinquishing feature on the SPUs● Overlays or additional jobs are too complex Overlays or additional jobs are too complex

and heavyweight.and heavyweight.– Just want load and execute.Just want load and execute.– No special system needed. No special system needed.

Easy To ImplementEasy To Implement

● Create the shader code..Create the shader code..– Method 1: Shader as PPU header*Method 1: Shader as PPU header*

● Compile shader as normal, to obj file.Compile shader as normal, to obj file.● Dump obj file using spu-objdumpDump obj file using spu-objdump● Convert dump to header using script.Convert dump to header using script.

* This is what we're using now.* This is what we're using now.

Easy To ImplementEasy To Implement

● Create the shader code..Create the shader code..– Method 2: Shader as dataMethod 2: Shader as data

● Strip code from obj file and store as data.Strip code from obj file and store as data.● Put into data pipeline not code pipeline.Put into data pipeline not code pipeline.

Easy To ImplementEasy To Implement

● Create the shader code..Create the shader code..– Method 3: Use obj fileMethod 3: Use obj file

● Just pull code from obj file at runtimeJust pull code from obj file at runtime– Method 4: Use elf fileMethod 4: Use elf file

● Requires extra compile step, but probably more Requires extra compile step, but probably more debugger friendly.debugger friendly.

– Other methods too, use whatever works for you.Other methods too, use whatever works for you.

Easy To ImplementEasy To Implement

● Calling the shader...Calling the shader...● Nothing could be easier.Nothing could be easier.

– ShaderEntry* shader = (addr of fragment);ShaderEntry* shader = (addr of fragment);– shader( data, common );shader( data, common );

Debugging ShadersDebugging Shaders

● Fragments are smallFragments are small● Fragments have well defined inputs and Fragments have well defined inputs and

outputs.outputs.● Ideal for unit tests in separate framework.Ideal for unit tests in separate framework.● Test on PS3/Linux box.Test on PS3/Linux box.●

Debugging ShadersDebugging Shaders

● Runtime debugging:Runtime debugging:● Currently shaders have no debug info.Currently shaders have no debug info.● Step through assembly. Often OK.Step through assembly. Often OK.

– Shaders are simple anyway,Shaders are simple anyway,– If written with intrinsics, not much difference.If written with intrinsics, not much difference.

● Alternatives:Alternatives:– Debug on PPU (intrinsics are portable)Debug on PPU (intrinsics are portable)– Temporarily link in shader.Temporarily link in shader.

SPU Shader RulesSPU Shader Rules

● Rule 1: Don't Manage Data for ShadersRule 1: Don't Manage Data for Shaders– Just give shaders a buffer and fixed size.Just give shaders a buffer and fixed size.– Shaders should depend on size, so leave room for Shaders should depend on size, so leave room for

system changes.system changes.– Best size depends on system.Best size depends on system.

● (Maybe 4K, maybe 32K)(Maybe 4K, maybe 32K)– Don't read or write from/to shader buffer.Don't read or write from/to shader buffer.

● Insomniac Games, 2007Insomniac Games, 2007

SPU Shader DataSPU Shader Data

● System-specificSystem-specific– Multiple list of instances to modify or transformMultiple list of instances to modify or transform– Context dataContext data

● Shader-internal (“local”)Shader-internal (“local”)– EA passed by systemEA passed by system– Fixed bufferFixed buffer

● Shader shared (“global”)Shader shared (“global”)– EA passed by systemEA passed by system– Zero'd on system initializationZero'd on system initialization

● Insomniac Games, 2007Insomniac Games, 2007

SPU Shader RulesSPU Shader Rules

● Rule 2: Don't Manage DMA for ShadersRule 2: Don't Manage DMA for Shaders– Give fixed number of DMA tags to shaderGive fixed number of DMA tags to shader

● Grab them in the entry function and pass down)Grab them in the entry function and pass down)● Avoid: GetDmaTagFromParentSystem()Avoid: GetDmaTagFromParentSystem()

– Give DMA functions to shadersGive DMA functions to shaders● To allow system to run with any job manager, or noneTo allow system to run with any job manager, or none

– Don't use shader tags for other purposesDon't use shader tags for other purposes

● Insomniac Games, 2007Insomniac Games, 2007

SPU Shader RulesSPU Shader Rules

● Rule 3: Enforce fixed maximum size for Rule 3: Enforce fixed maximum size for Shader code.Shader code.– System can be maintained.System can be maintained.–

● Rule 4: Shaders are always called in a clear, Rule 4: Shaders are always called in a clear, well defined context.well defined context.– i.e. Part of a larger system.i.e. Part of a larger system.–

● Insomniac Games, 2007Insomniac Games, 2007

SPU Shader RulesSPU Shader Rules

● Rule 5: Fixed parameter list for shaders, per-Rule 5: Fixed parameter list for shaders, per-system (or sub-system)system (or sub-system)– Don't want to re-compile all shaders.Don't want to re-compile all shaders.– Don't want to manage dynamic parameter lists.Don't want to manage dynamic parameter lists.–

● Rule 6: Shaders should be given as many Rule 6: Shaders should be given as many instances as possible.instances as possible.– More optimizable.More optimizable.–

● Insomniac Games, 2007Insomniac Games, 2007

SPU Shader RulesSPU Shader Rules

● Rule 7: Don't break the rules.Rule 7: Don't break the rules.– You'll end up with a new job manager.You'll end up with a new job manager.– You'll end up re-inventing SPURS.You'll end up re-inventing SPURS.– You'll end up with a big headache.You'll end up with a big headache.–

● Insomniac Games, 2007Insomniac Games, 2007

SPU Shader GuidelinesSPU Shader Guidelines

● Keep instance data contiguousKeep instance data contiguous– Can pack/compress data.Can pack/compress data.

● Keep instance data separate from PPU data.Keep instance data separate from PPU data.– Don't fight with synchronization issuesDon't fight with synchronization issues

● Keep shaders simple.Keep shaders simple.– Shaders should perform one simple tranformation Shaders should perform one simple tranformation

(preferably branch-free)(preferably branch-free)– Switch shaders for new function.Switch shaders for new function.– e.g. One shader per AI state.e.g. One shader per AI state.

● Insomniac Games, 2007Insomniac Games, 2007

SPU Shader GuidelinesSPU Shader Guidelines

● Don't use globals in shaders.Don't use globals in shaders.– Requires code fixupRequires code fixup– Makes size less predictableMakes size less predictable– Only use buffer provided by the system.Only use buffer provided by the system.

● Roll frequently used functions back into kernelRoll frequently used functions back into kernel– Give space back to shaders.Give space back to shaders.

● Insomniac Games, 2007Insomniac Games, 2007

Integration with schedulersIntegration with schedulers

● ““How are fragments scheduled?”How are fragments scheduled?”– The aren't. The parent system is.The aren't. The parent system is.– Fragments are loaded and used on-demand.Fragments are loaded and used on-demand.–

● ““So, how is the parent system scheduled”So, how is the parent system scheduled”– Enter SPURS rant..Enter SPURS rant..

Scheduling with SPURSScheduling with SPURS

● SPURS... SPURS... – Wants to be a general purpose scheduler.Wants to be a general purpose scheduler.– Wants to solve all the allocation issues.Wants to solve all the allocation issues.– Wants to be flexible for every possible case.Wants to be flexible for every possible case.– Is overcomplicated.Is overcomplicated.– Is unnecessary.Is unnecessary.–

– (With apologies to the SPURS team.)(With apologies to the SPURS team.)

Scheduling with SPURSScheduling with SPURS● How many parent systems are actually being How many parent systems are actually being

scheduled?scheduled?– Dozens at best. Not thousands.Dozens at best. Not thousands.

● There are always higher-level scheduling needs. There are always higher-level scheduling needs. ● Exactly like a general purpose memory allocator. Exactly like a general purpose memory allocator.

– Any scheduling needs, just like any dynamic memory Any scheduling needs, just like any dynamic memory needs, are going to be very simple.needs, are going to be very simple.

● * Decide where in the frame the system will run, * Decide where in the frame the system will run, and for how long. Then stop.and for how long. Then stop.–

Transitioning to ShadersTransitioning to Shaders● Example from RCF: (FastPathFollower)Example from RCF: (FastPathFollower)●

● Started with typical CPP update pattern:Started with typical CPP update pattern:– On PPUOn PPU– FastPathFollower* was Update instanceFastPathFollower* was Update instance– All Updates called in a (sorted) list.All Updates called in a (sorted) list.

● (Sorted for icache hits)(Sorted for icache hits)●

– * Are classes derived from FastPathFollower* Are classes derived from FastPathFollower● Insomniac Games, 2007Insomniac Games, 2007

Transitioning to ShadersTransitioning to Shaders● ““Where there's one, there's more than one”Where there's one, there's more than one”● Grouped all FastPathFollowers together.Grouped all FastPathFollowers together.● Removed from Update list.Removed from Update list.● Made single function to update all together.Made single function to update all together.● Update function now ignored.Update function now ignored.●

● Insomniac Games, 2007Insomniac Games, 2007

Transitioning to ShadersTransitioning to Shaders● ““Know the inputs and outputs”Know the inputs and outputs”● Most inputs were read-onlyMost inputs were read-only● Path update was read-write, but only here.Path update was read-write, but only here.● Outputs: e.g.Outputs: e.g.

– State (Exploded, Crashed, etc.)State (Exploded, Crashed, etc.)– Animation (Change anim, frame, etc.)Animation (Change anim, frame, etc.)– Collision database (Move, Update)Collision database (Move, Update)– A few more...A few more...

● Insomniac Games, 2007Insomniac Games, 2007

Transitioning to ShadersTransitioning to Shaders● ““Minimize Synchronization Points”Minimize Synchronization Points”● Collected all outputs to command buffersCollected all outputs to command buffers

– (Arrays of minimum data to execute)(Arrays of minimum data to execute)– Advantage: Static cost limitAdvantage: Static cost limit

● e.g. Maximum number of “exploded” statese.g. Maximum number of “exploded” states– Disadvantage: Always used space for worst-Disadvantage: Always used space for worst-

case (But small, so OK)case (But small, so OK)● e.g. Non-optional state changes need max instance e.g. Non-optional state changes need max instance

count x command size. count x command size. ● Executed command buffers after update.Executed command buffers after update.

● Insomniac Games, 2007Insomniac Games, 2007

Transitioning to ShadersTransitioning to Shaders● ““Minimize Synchronization Points”Minimize Synchronization Points”● Put FPF Update in separate PPU thread.Put FPF Update in separate PPU thread.● Overlapped with standard Update hierarchyOverlapped with standard Update hierarchy

– (Yes, time was completely swallowed – but only (Yes, time was completely swallowed – but only because Update hierarchy has many dcache because Update hierarchy has many dcache misses) misses)

● Insomniac Games, 2007Insomniac Games, 2007

Transitioning to ShadersTransitioning to Shaders● ““Customization” i.e. Proto-ShaderCustomization” i.e. Proto-Shader● Added callback function for customizationsAdded callback function for customizations● Moved all derived update functionality into Moved all derived update functionality into

callbacks.callbacks.● During update, registered callbacks placed During update, registered callbacks placed

into new command buffer for second pass.into new command buffer for second pass.– (Lose some dcache coherency, but overall (Lose some dcache coherency, but overall

better due to...) better due to...)

● Insomniac Games, 2007Insomniac Games, 2007

Transitioning to ShadersTransitioning to Shaders● ““Limit Customization”Limit Customization”● Static limit on custom callback command Static limit on custom callback command

buffer size.buffer size.● So - Custom callbacks may not be called So - Custom callbacks may not be called

every frame.every frame.– i.e. Selectively derived classes based on i.e. Selectively derived classes based on

performance.performance.● (And can potentially select LOD callbacks.)(And can potentially select LOD callbacks.)●

● Insomniac Games, 2007Insomniac Games, 2007

Transitioning to ShadersTransitioning to Shaders● ““Proto-Shader to Real Shader”Proto-Shader to Real Shader”● Now have async update on PPU with proto-Now have async update on PPU with proto-

shader (w/ minimal code changes)shader (w/ minimal code changes)● Used as template for complete re-write:Used as template for complete re-write:● Today: AsyncMobyUpdate (Joe Valenzula)Today: AsyncMobyUpdate (Joe Valenzula)

– More generic (Not just one class)More generic (Not just one class)– Update loop on SPUUpdate loop on SPU– Lessons from Physics shadersLessons from Physics shaders– And Joe's own contributions to the conceptAnd Joe's own contributions to the concept

● Insomniac Games, 2007Insomniac Games, 2007

Use CasesUse Cases● Middleware as SPU ShadersMiddleware as SPU Shaders

– i.e. SPURS is not the only solution to i.e. SPURS is not the only solution to Middleware.Middleware.

● Middleware can provide:Middleware can provide:– PPU library (init, state changes, etc.)PPU library (init, state changes, etc.)– SPU shader(s) for updateSPU shader(s) for update

● User provides:User provides:– SPU entry (main)SPU entry (main)– SPU LS buffer, dma tags, dma functions, and SPU LS buffer, dma tags, dma functions, and

EA to main RAM data.EA to main RAM data.

Use CasesUse Cases● Middleware as SPU ShadersMiddleware as SPU Shaders

– Advantages:Advantages:● User can use any job manager (or none)User can use any job manager (or none)● Middleware is only does it's job.Middleware is only does it's job.● User in control of scheduling.User in control of scheduling.

– Disadvantages:Disadvantages:● User spends less time bitching about SPURS.User spends less time bitching about SPURS.● ...oh wait....oh wait.

Use CasesUse Cases● System Pipeline Stage ManagementSystem Pipeline Stage Management● e.g. IgPhysics (Eric Christensen)e.g. IgPhysics (Eric Christensen)

– Fixed pipeline, each stage as a shader.Fixed pipeline, each stage as a shader.– Shaders loaded in order by the kernel.Shaders loaded in order by the kernel.– Certain stages will themselves load shaders Certain stages will themselves load shaders

to manage specific data types.to manage specific data types.– Result – Completely deferred system, 2x Result – Completely deferred system, 2x

speedup overall. speedup overall. ● (Mostly due to reduction of sync points)(Mostly due to reduction of sync points)

Use CasesUse Cases● Others:Others:● Precondition data for GPU shadersPrecondition data for GPU shaders● Special EffectsSpecial Effects● Animation CustomizationsAnimation Customizations● Most PPU Code :)Most PPU Code :)

Contact InfoContact Info● Mike ActonMike Acton

[email protected]@insomniacgames.com

See also:See also:www.insomniacgames.com/techwww.insomniacgames.com/tech

CreditCredit● Thanks guys for putting the SPU Shader Thanks guys for putting the SPU Shader

ideas to the test and for your comments:ideas to the test and for your comments:● Eric ChristensenEric Christensen

– Principal Engine ProgrammerPrincipal Engine Programmer● Joe ValenzuelaJoe Valenzuela

– Engine ProgrammerEngine Programmer● André De LeiradellaAndré De Leiradella

– Consultant for Engine TeamConsultant for Engine Team

CreditCredit● Also thanks for the feedback from the guys on Also thanks for the feedback from the guys on

the Beyond3D CellPerformance forums:the Beyond3D CellPerformance forums:● ebola, minty, patsu, Shifty Geezer (if just to tell ebola, minty, patsu, Shifty Geezer (if just to tell

me he didn't like the name), and LordOfThePing.me he didn't like the name), and LordOfThePing.