Upload
artem-gerasimovich
View
761
Download
0
Embed Size (px)
DESCRIPTION
How take 60fps in games for iPhone
Citation preview
Advanced Mobile Optimizations
How to go to 60 fps after you have removed all Sleep calls ;-)
Disclaimer
• The views expressed here are my personal views and do not necessarily reflect the thoughts, opinions, intentions, plans or strategies of Unity
Optimization Mindset
• you can't just make your game faster– there is no magic bullet– very specific stuff
• not the same as scripting charachter
Optimization Mindset
• not in specific order• know• think• measure
Optimization Mindset
• You can't avoid any of that– no, really
Optimization Mindset
• know + think = shoot in the dark– you just write code hoping for the best
• know + measure = shoot in the dark– you are missing "understand" part
• think + measure = shoot in the dark– you solve abstract problem, not real
Optimization Mindset: know + think
• hardware is more complex then you think– highly parallel– deep pipelining– when you write asm - high-level already
Optimization Mindset: know + measure
• knowledge is static• knowledge comes from the past• knowledge is general
Optimization Mindset: know + measure
• qsort vs bubble sort– sure, qsort is faster
• but you are missing the point– maybe radix?– maybe no need to sort?– maybe insertion?– parallel sorting network?
Optimization Mindset: think + measure
• solving abstract problem– example: GPU
• optimizing for RIVA TNT and GTX is different
Optimization Mindset
• well, if you are missing two from the three– no comments
Know
• your hardware• your data
– knowing data is interleaved with think– we will talk more of it in "think"
Know your hardware
• GPU• CPU• whatever
– e.g. disk load speed
Know your hardware: GPU
• Pipeline– meaning - slow step = slow everything– you are as slow as your bottleneck
• Know your pipeline• Won't go into full pipeline spec
– Resources section• Just common/biggest problems
Know your hardware: GPU Geometry
• pre/post tnl cache– should use indexed geometry or not
• cache hit rate – strips vs tri list
• memory throughput– vertex size
• fetch cost (memory)– pack attributes or not
Know your hardware: GPU Textures
• Texture Cache– swizzle– compression– mip-maps
• Biggest memory hog
Know your hardware: GPU Shaders
• VertexProgram vs FragmentShader– balancing– attributes
• Unified Shaders– load balancing
• Precision– gles: highp/mediump/lowp– CG: float/half/fixed (iirc)
Know your hardware: GPU Rasterization
• Fillrate (memory speed)– alpha
• 2x2 samples (or more)– why GometryLOD matters
Know your hardware: CPU
• Mobile = in-order RISC– for stupid code far worse than CISC
• 2 main issues:– Memory speed– Computation speed
Know your hardware: CPU Memory
• This is single most important factor– memory access far slower then computation
• Latency vs Throughput• Caches
– fast memory– your best friend– L1/L2/whatever
• LHS
Know your hardware: CPU Computations
• SIMD– better memory usage– better arithmetic usage (4 vals instead of 1)
Know your target hardware
• There were general rules• But you are running on that particular
piece of sh... hardware
Know your target hardware: PowerVR
• TBDR– perfect hidden surface removal– Alpha-Test/discard
• shader precision • unified shaders• Tegra / ATI-AMD / Adreno more common
Know your target hardware: ARM
• VFP = FPU on steroids (not real SIMD)– scalar instructions at same speed as
vectorized• NEON = SIMD
– more registers– awesome load/store instructions– not as cool as Altivec but cool enough for
mobiles
Know your target hardware: ARM
• Conditional execution of most instructions• Fold shifts and rotates into the "data
processing" instructions– load structure from array by index
• Thumb + float = disaster– switch back and forth between Thumb mode
and regular 32-bit mode
Know your hardware: Resources
• RTR• lots of whitepapers:
– powerVR (imgtech) tegra (nvidia) adreno (qualcomm)
– AMD/ATI - basically the same as X360, but much smaller tiles
• ARM dev center
Think
• Think about your data• Think about your algorithms• Think about your constraints• Think about your hardware
Think Basics
• CPU vs GPU– e.g. draw calls
• pure CPU cost
• CPU:– memory vs arithmetic
• memory slower
• GPU:– vprog vs fshader– memory vs arithmetic
Think Memory
• fragmentation• data organization
– AOS vs SOA – hot/cold split
• data structures– linear vs random – array vs list – map vs hashtable – allocators
Think Constraints
• GPU: will you see the difference?– really?– on mobile screen?– on that one small thingy in the corner?
• CPU: will you need that?– e.g. physics in casual game?
• Memory: will you need that?– will you need more then XXX actors?
Measure
• you didn't optimize anything if you didn't measure difference
• you can't optimize if you don't know what needs to be optimized– if you can't measure what takes time
Measure Tools
• there are lots of tools – instruments (ios)– perfhud (tegra)– adreno profiler (qualcomm)– some more probably
• Poor-man profiler– timers
Unity use case:random bits
• Mobile shaders– specialized of usual built-ins
• Skinning– full NEON/VFP impl
• usually 10-15% of c-code time– and we are not done optimizing it ;-)
• Rej's baking material to texture and coming soon BRDF baking to texture
Unity use case:random bits
• Remote Profiler– run on target hw, data is transferred over wifi– collect in Editor and show pretty graphs ;-)
• Sort alpha-test *after* opaque• check *lots* of extensions• LODs - almost done• Vertex Cache optimization - after LODs ;-)
Closing Words
• Know hardware• Know data• Think data• Think constraints• Measure always
– You better know earlier• You should be always optimizing
Questions