21
3/23/2009 1 Game Developers Conference 2009 P i Ti F S l bl Programming Tips For Scalable Graphics Performance March 25, 2009 ROOM 2010 Luis Gimenez Graphics Architect Ganesh Kumar Application Engineer Katen Shah Graphics Architect Why Optimize for Scalable Graphics Intel ® GMA Series Architecture and Tools Agenda Intel ® GMA Series Architecture and Tools Balance Work Load Between CPU and GPU Minimize Runtime and Driver Overhead Optimize Shader Performance 2 Case Study Q&A

l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

  • Upload
    vannhu

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

1

Game Developers Conference 2009

P i Ti F S l bl Programming Tips For Scalable Graphics Performance

March 25, 2009ROOM 2010

Luis GimenezGraphics Architect

Ganesh KumarApplication Engineer

Katen ShahGraphics Architect

• Why Optimize for Scalable Graphics

• Intel® GMA Series Architecture and Tools

Agenda

• Intel® GMA Series Architecture and Tools

• Balance Work Load Between CPU and GPU

• Minimize Runtime and Driver Overhead

• Optimize Shader Performance

2

• Case Study

• Q&A

Page 2: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

2

250

300

Developing for Integrated Graphics Allows You to Sell Your Game to More Customers!

PC Graphics Market Segment

100

150

200

Milli

ons

Desktop IntegratedDesktop DiscreteMobile IntegratedMobile Discrete

3

0

50

2007 2008 2009 2010 2011 2012 2013

Source: Mercury Research (Q4’08)

Scale Your Game!

4

Page 3: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

3

Intel ® Integrated Graphics (IIG) Architecture

MemoryCommands

Internal buses

Memory

Array ofExecution Units

VF

VS

Clip

I$ Cache

TextureCache

Sampler

RenderCache

VideoProcessing

2D DisplayCmd

Streamer

Memory /Cache

GS

Row0EU0 EU1 EUn

RowNEU EU EU

SOThre

ad D

ispat

ch

5

SetupRast /Early-Z

PixelOps

RowNEU0 EU1 EUn

Intel® GMA 3 & GMA 4 Series support SM4

Intel’s New Graphics Performance AnalyzersToday 2:30 PM – 3:30 PM in Room 3004, West Hall

FRAME ANALYZER

SYSTEM ANALYZER

6

Page 4: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

4

Optimization Hints For Intel®

Integrated Graphics

How to avoid frequent pitfalls found in testing integrated graphics playability over numerous games every year

• Balance Workload Between CPU and GPU

• Minimize Runtime and Driver Overhead

7

Minimize Runtime and Driver Overhead

• Optimize Shader Performance

Balance The Workload between the CPU and the GPU

OCEAN FOG DEMO

• Massive Data Parallelism

• Per Pixel Lighting

• Shadows

• Post Processing

Bl di

• Complex Algorithms

• Physics/AI

• Simulation

• Animation

8

Pre-computing the Perlin textures in the CPU and using the GPU for Rendering nearly doubled the frame rate

http://software.intel.com/en-us/articles/ocean-fog-using-direct3d-10/

• Blending

• Animation• Pre-computing

Page 5: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

5

Maximize CPU and GPU Utilization:

Avoid Stalling the Pipeline!

To avoid stalling the CPU minimize …

• CPU data read-back• Serializing Event Queries2. CPU …Map() Resource

GPUSt i R

Copy output

9

CMD Buffer1.CopyResource

Render

Command

Command

3. CPU Stall Until Flush

Staging Resource

CPUF3

CPUF0

CPUF1

CPUF2

STUTTERINGCPUF4

CPUF5

Maximize CPU and GPU Utilization:

Avoid Stalling the Pipeline!

GPUF0

GPUF1

GPUF2

GPUF3

CPUF0

GPUF0

CPUF1

CPUF2

GPUF1

GPUF2

STALL STALL

GPUF4

To avoid stalling the CPU minimize …

• CPU data read-back• Serializing Event Queries

Put Space between locks…• Synchronize to N-1 to N-2 frames

10

CPUF0

GPUF0

CPUF1

CPUF2

GPUF1

GPUF2

CPUF3 N-2 SYNCH

Page 6: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

6

• The IIG driver optimizes the workload before sending it to the GPU

Maximize CPU and GPU Utilization:

Avoid Stalling the Pipeline!

To avoid stalling the CPU minimize …

• CPU data read-back• Serializing Event Queries

Put Space between locks…• Synchronize to N-1 to N-2 frames

Memory

Cmd Parser

Vertex Shader

Geometry

App

Direct3D

Intel Driver

Commands

Vertex Buffers

Index Buffer

Texture…

11

Reduce CPU work, optimize Driver performance by reducing…

• State Changes

• Creation and Destruction of Resources

Geometry Shader

Stream Out

Clipper

Setup/ Rasterization

Pixel

Shader

Output Merger

Texture…

Buffer

Texture…

Depth / Color

Display Buffer

Optimization Hints For Intel®

Integrated Graphics

• Balance load Between CPU and GPU

• Minimize Runtime and Driver Overhead

• Optimize Shader Performance

12

p

Page 7: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

7

• DirectX 10 manages resources based on USAGE and CPU_ACCESS_FLAG • The best memory location is decided by OS/driver/memory manager

Minimizing Runtime and Driver Overhead

Manage Your DirectX 10 Resources!

DX10 Usage /Update Freq

Access CPU Resource Update USE

IMMUTABLENever

GPU read Create…()Load @ create never updated

Static VBs/ IBs/Textures

DEFAULT<=1 per frame

GPU read-write

Copy…(), Update…() use only for CBs and small textures

VBs/IBs/CBs /Textures

DYNAMIC> 1 per Frame

CPU writeGPU readCopy()

Map() w. WRITE_NO_OVERWRITE partial update of VBs/IBs

WRITE_DISCARD for full update or CBs

Dynamic Update VBs/ IBsCBs

NO

NM

AP

PA

BLE

Copy…()CBs

STAGINGtransfer data

to the GPU

CPU read-write

GPUindirect

read/ write

Map() for write to mapped memory WRITE/DO_NOT_WAIT_FLAG to avoid stalls

Copy…() from staging resource to video Memory

Texture updates

transfer data to the GPU

CPURead-back from GPU

Copy() GPU output to staging resourceMap() for read w. DO_NOT_WAIT_FLAG to avoid stall

Surfaces for read-back /

MA

PP

AB

LE

Minimizing Runtime and Driver Overhead

Optimize Your Constants Access!• IIG Driver optimizes for DX9/10 the

most frequently used constants– Avoid global constants

– Limit Dynamicindexed Constants yC[a0] C[r]

• In DX10 when a constant changes the complete buffer gets updated – Group cbuffers by frequency of

updates

– Organize cbuffers based on feature scaling

Fog Demo

14

scaling

– Inside cbuffer put constants by access sequence

– Inside cbuffers pack data into float4 boundaries

http://software.intel.com/en-us/articles/directx-constants-optimizations-for-intel-integrated-graphics/

Page 8: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

8

Minimizing Runtime and Driver Overhead

Batch Your Primitives!

15

• Use large batches >200-1K primitives

• Minimize State Changes between batches

• Use Instancing for Small Batches http://software.intel.com/en-us/articles/rendering-grass-with-instancing-in-directx-10/

Optimization Hints For Intel®

Integrated Graphics

• Balance load Between CPU and GPU

• Minimize Runtime and Driver Overhead

• Optimize Shader Performance

16

p

Page 9: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

9

Optimizing Shader Performance

Skip Computes that do not Render!

• Test for visibility to reject objects that fall outside the view frustum that fall outside the view frustum

• Maximize Use of Early-Z (cost 4 pixels/clock hardware) • Avoid modified Z value (oDepth) in the

pixel shader

• Use Occlusion Query for complex

17

Use Occlusion Query for complex scenes

• Use LOD to reduce complexity for objects that are distant

Array ofExecution Units

VF

VS

I$ Cache

TextureCache

Sampler

Row0EU0 EU1 EUn

Dis

patc

h

CmdStreamer

Optimizing Shader Performance

Optimize the Use of the Intel Integrated Graphics HW!

• For best EUs Utilization minimize registry usage • Sample Textures to >4:1 ratio of #Instructions per Texture Sample• Large shader impacts performance due to limited number of registers• Smart Usage of Flow Control

Clip

SetupRast /

Early-Z

RenderCache

PixelOps

GS

RowNEU0 EU1 EUn

SO

Thre

ad D

18

• Smart Usage of Flow Control

• Mask alpha when not needed• Minimize use of transcendentals like LOG, POW, EXP etc. • Pre-load Shaders to avoid Mid-Scene Compiles• Avoid Mid-Scene textures changes

Page 10: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

10

• Keep your Textures under 256x256 and same format if possible

• Prefer Multi texture over Multi Pass

Optimizing Shader Performance

Scale Your Pixel Shader and Textures!

• Prefer Multi-texture over Multi-Pass

• Use Compressed Textures and mip-maps

• Use Texture arrays / Texture Atlas

• Minimize Lock/Blit of Z and/or Stencil Buffer

• Use Shadow Maps for IIG and Stencil

19

• Use Shadow Maps for IIG and Stencil Shadows as scalable feature

• Minimize Clear() surfaces

• Minimize post processing passes

Optimizing for IIG: Demigod

20

Page 11: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

11

Key Lessons Learned from Optimizing Demigod for IIG

21

Be Wary of ‘Clear’ Calls

Why:- Costlier than you might think- Affects every pixel on surfacey p

Recommendations:- Make sure unused surfaces don’t get cleared unnecessarily- Consider reducing surface resolution when in lower LOD- Clear Color, Stencil and Z-Buffer in the same API call

22

Page 12: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

12

Prune Costly Clear Calls

23

Reduce the Number of Texture Fetches

• Texture cache is limited on integrated graphics

• Reducing Texture sizes alone doesn’t help as much

• Optimize Shaders by reducing texture fetches in Low Fidelity modes

24

• Balance Texture load instructions with arithmetic instructions if possible

Page 13: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

13

Simplify Post Processing Effects

• Post Processing Effects that use multiple passes• Bloom• Motion Blur• Depth of Field• High Dynamic Range

• Balance visual quality with speed by reducing the number of passes

25

After

Demigod Bloom Effect

Before

26

Bloom turned OffBloom Onwith Fewer Passes

Page 14: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

14

Avoid Pixel Overdraw

• Render opaque objects from Front to Back- Render UI and other HUDs firstRender UI and other HUDs first- Render Sky and Terrain last

• Early-Z architecture eliminates occluded pixels early in the pipeline

27

Example of Back to Front Rendering

28

Page 15: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

15

Moving Terrain Rendering to the End

29

Lastly, Add Benchmark Mode to Your Game for Performance Profiling!

It helps to characterize the workload

Four Key requirements benchmark must provide

1. Accurately reflect real workload

2. Repeatability

3. Ability to run standalone without Internet

4 Abilit t A t t b ilt i d d li

30

4. Ability to Automate – built-in demo, command-line

execution and output to a log file

Page 16: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

16

Summary

Scale Your Game for Integrated! • Balance CPU and GPU Workload, Avoid Stalls

• Minimize Run Time and Driver Overhead

• Optimize your shader performance by scaling your game

• Analyze your game, find your most expensive call

31

• Balance your visual effects against performance penalties

• Add benchmark mode to your game

Additional ResourcesDevelopers Guide for Intel® Integrated Graphics

• http://software.intel.com/en-us/articles/intel-graphics-media-accelerator-developers-guide

Articles Mentioned in this Presentation• http://software intel com/en-us/articles/ocean-fog-using-direct3d-10http://software.intel.com/en us/articles/ocean fog using direct3d 10• http://software.intel.com/en-us/articles/directx-constants-optimizations-for-intel-

integrated-graphics/• http://software.intel.com/en-us/articles/rendering-grass-with-instancing-in-directx-10

Intel® Graphics Performance Analyzer• www.intel.com/software/gpa

Intel® Graphics Community• http://softwarecommunities.intel.com/communities/visualcomputing

323232

Integrated Graphics Software Development Forum• http://softwarecommunities.intel.com/isn/Community/en-

US/forums/2414/ShowForum.aspx

Intel® Laptop Gaming TDK• http://softwarecommunities.intel.com/articles/eng/1017.htm

32

Page 17: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

17

Training the Next Generation

Enhance Your Productsand Your Business

The gateway to Intel’s worldwide technology engineering and go-to-market

Get the“Story Behind the Story”

Investing in Talent and Technology See What’s New

Developers Connecting with Intel Engineers

technology, engineering and go to market support for Visual Computing developers

33

www.intel.com/software/visualadrenaline

For More Information

http://www.intel.com/software/gdc

Contact infoContact info

See Intel at GDC: - Intel Booth at Expo, North Hall- Intel Interactive Lounge – West Hall 3rd floor

Take a collateral DVD

34

Take a collateral DVD- Here in the room!- Intel Booth or Interactive Lounge

Page 18: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

18

Intel @ GDCWednesday, March 25

Programming Tips for Scalable Graphics10:30 AM – 11:30 AM in Room 2010, West Hall

Threaded AI For the Win!12:00 PM – 1:00 PM in Room 2011, West Hall

Intel’s New Graphics Performance Analyzers2:30 PM – 3:30 PM in Room 3004, West Hall

Kaboom: Real-Time Multi-Threaded Fluid Simulation for Games4:00 PM – 5:00 PM in Room 2011, West Hall

Thursday, March 26Who Moved the Goalposts? The Rapidly Changing World of CPU’s

and Optimization

35

p1:30 PM – 2:30 PM in Room 2011, West Hall

Taming Your Game Production Demons: the Offset approach3:00 PM – 4:00 PM in Room 2011, West Hall

Optimizing Game Architectures with Intel Threading Building Blocks4:30 PM – 5:30 PM in Room 2011, West Hall

Last of Intel @ GDC

Friday, March 27Procedural and Multi-Core Techniques to take Visuals to the Next Level• 9:00 AM – 10:00 AM in Room 2010, West Hall

Rasterization on Larrabee: A First Look at the LarrabeeNew Instructions (LRBni) in Action• 9:00 AM – 10:00 AM in Room 135, North Hall

SIMD Programming on Larrabee: A Second Look at the

36

Larrabee New Instructions (LRBni) in Action • 10:30 AM – 11:30 AM in Room 3002, West Hall

Page 19: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

19

Risk FactorsThis presentation contains forward-looking statements. All statements made that are not historical facts are subject to a number of risks and uncertainties, and actual results may differ materially. Please refer to our most recent Earnings Release and our most recent Form 10-Q or 10-K filing available on our

b i f i f i h i k f h ld website for more information on the risk factors that could cause actual results to differ.

37

Rev. 4/17/07

Backup Slides

39

Page 20: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

20

Both Intel GMA 3 and 4 support DirectX 10

Make your Scaling API Independent!

Game Scaling

DX8 DX9 DX10

High Detail

dat

ion

40

Standard Detail

Low Detail

Rec

om

men

d

Both Intel GMA 3 and 4 support all required D3D10 Features

• D3D10 Optional FeaturesMSAA: only single sample supported- MSAA: only single sample supported

- 32-bit FP Filtering: not supported- 16bit UNORM Blending: Supported in GMA X4XXX and beyond- RGB32 RT: Not supported- Use D3D10Device::CheckFormatSupport to check for supported formats

• Other D3D10 performance considerationsLimit Use of GS make it scale feature

41

–Limit Use of GS make it scale feature–Use different Stream Out buffers for different SO formats

Check for Optional Features before Use them

Page 21: l b l S F i T PiP rogramming Tips For Scalable Graphics ... · PDF file3/23/2009 1 Game Developers Conference 2009 l b l S F i T PiP rogramming Tips For Scalable Graphics Performance

3/23/2009

21