Programmability Features of Graphics Hardware

Preview:

Citation preview

Programmability Features of Graphics Hardware

Michael DoggettATI Research

MDoggett@ati.com

2GH Programmability

Michael Doggett

OutlineGraphics HardwareTransform Stage

Vertex EngineOpenGL ARB vertex program

Fragment StagePixel Pipeline

OpenGL ARB fragment programExamples

MandelbrotFFTDisplacement Mapping

Programming OptionsOpenGL Shading Language overview

Computation on GPUs

3GH Programmability

Michael Doggett

Graphics Hardware

Hardware pipeline

Rasterization

Surface

FrameBuffer

Fragment

Transform

4GH Programmability

Michael Doggett

Graphics Hardware

Hardware pipelineBased on RADEON 9800

380MHzSIMD stages

TransformFragment

Highly parallel4 component vector registers and operationsHigh computation/bandwidth ratio

Arithmetic intensity

Rasterization

Surface

Fragment

TransformT T T T

F F F FF F F F

FragmentAA A AA A A AMemory

5GH Programmability

Michael Doggett

Graphics Hardware

Input – 3D SceneTypically triangles made up of vertices 1D buffers of vertex data

Rasterization

Surface

FrameBuffer

Fragment

TransformVertices

Surface

6GH Programmability

Michael Doggett

Graphics Hardware

Input – 3D SceneTypically triangles made up of vertices 1D buffers of vertex dataVertex and Fragment programs Rasterization

Surface

FrameBuffer

Fragment

Transform

Programs

Programs

7GH Programmability

Michael Doggett

Graphics Hardware

Input – 3D SceneTypically triangles made up of vertices 1D buffers of vertex dataVertex and Fragment programsTextures

Random memory access

Rasterization

Surface

FrameBuffer

Fragment

Transform

Textures

8GH Programmability

Michael Doggett

Graphics Hardware

Input – 3D SceneTypically triangles made up of vertices 1D buffers of vertex dataVertex and Fragment programsTextures

Random memory access

Output – 2D ImageColor, depth and stencil

Rasterization

Surface

FrameBuffer

Fragment

Transform

Image

9GH Programmability

Michael Doggett

Graphics Hardware

Focus on programmable stages

TransformFragment

Rasterization

Surface

FrameBuffer

Fragment

Transform

Textures

Vertices

Surface

Programs

Programs

Image

10GH Programmability

Michael Doggett

Graphics Hardware Evolution

StateStateStateFrameBuffer

32 tex/ 64 alus16e7 Floating PointPS 2.0

12 tex/16 alus3.12 Fixed pointPS 1.4

StateFragmentStateStateStateRasterization

VS 2.0256 instrsControl Flow

VS 1.1128 instrs

StateTransformCPU/GPUCPU/GPUCPUSurface

Stages987DirectX

9700 (R300)8500 (R200)7500 (R100)ATI RADEON200220012000Year

11GH Programmability

Michael Doggett

Surface Generation Stage

Newest stageDX8 NPatches

State controlledGeometric calculations based on complex surfaces

R

S

FB

F

T

S

12GH Programmability

Michael Doggett

Transform Stage

Includes:ModelView TransformationVertex LightingPerspective TransformationTweening/Skinning

Per-vertex operationsOriginally microcoded on DSPs or CPU

Controlled by fixed function state

Programmable Vertex EngineLindholm et al., SIGGRAPH 2001

R

S

FB

F

T

S

13GH Programmability

Michael Doggett

Vertex Engine

4 parallel vertex enginesInput

Vertex stream Constants

OutputPosition, Color, Tex Coords

Program (Shader) has up to 256 instructions

VertexEngine

Vertex Data

Output

1616

14GH Programmability

Michael Doggett

Vertex EngineRegisters

Four component floating point vectors

12 Read-write temp registersOutput registers

position, color, fogcoord, pointsize, texcoord

Vertex shader outputs are pixel shader inputs

VertexEngine

Vertex Data

Address

Constant

Temp

Output

11

256256

1212

15GH Programmability

Michael Doggett

Vertex Program InstructionsBasic arithmetic operators

ADD, MAD, MUL, SUB

ComparisonMAX, MIN, SGE, SLT

Dot and cross ProductDP3, DP4, DPH, XPD

Exponential functionsEX2, EXP, LG2, LOG

Other ABS, ARL, DST, FLR, FRC, LIT, MOV, POW, RCP, RSQ, SWZ

16GH Programmability

Michael Doggett

Register Modifiers

Source swizzle selects which components to use

iPosition.[xyzw][xyzw][xyzw][xyzw]e.g. iPosition.yzxw

Destination mask to select individual componentoColor.{x}{y}{z}{w}

Source negation-iNormal

17GH Programmability

Michael Doggett

Simple Vertex Program!!ARBvp1.0ATTRIB iPos = vertex.position;PARAM mvp[4] = { state.matrix.mvp };PARAM ambientCol = state.lightprod[0].ambient;OUTPUT oPos = result.position;OUTPUT oColor = result.color;

# Transform the vertex to clip coordinates.DP4 oPos.x, mvp[0], iPos;DP4 oPos.y, mvp[1], iPos;DP4 oPos.z, mvp[2], iPos;DP4 oPos.w, mvp[3], iPos;

# Write out a color.MOV oColor, ambientCol;END

!!ARBvp1.0ATTRIB iPos = vertex.position;PARAM mvp[4] = { state.matrix.mvp };PARAM ambientCol = state.lightprod[0].ambient;OUTPUT oPos = result.position;OUTPUT oColor = result.color;

# Transform the vertex to clip coordinates.DP4 oPos.x, mvp[0], iPos;DP4 oPos.y, mvp[1], iPos;DP4 oPos.z, mvp[2], iPos;DP4 oPos.w, mvp[3], iPos;

# Write out a color.MOV oColor, ambientCol;END

18GH Programmability

Michael Doggett

RADEON 9800 Vertex Shader

DirectX 9.0 Vertex Shader 2.0Same arithmetic instructionsConstant based control flow capabilitiesLoops, branches, subroutines

CALL, LOOP, ENDLOOP, JUMP, JNZ, LABEL, REPEAT, ENDREPEAT, RETURN16 Integer constants16 Boolean constants Loop counter

19GH Programmability

Michael Doggett

Rasterization Stage

Includes:Triangle SetupViewport ClippingViewport TransformRasterization

Not programmable, some control through stateHigh precision vertex parameter interpolators

Position, normal, color, tex coords

R

S

FB

F

T

S

20GH Programmability

Michael Doggett

Fragment Stage

Includes:Texturing

arbitrary memory fetch (Gather)fixed point filtering

Point, Linear, Bi-linear, Tri-linear, Anisotropic Fragment (Per-Pixel) Lighting

RADEON 9700 introduced floating point32 Texture and 64 ALU instructionsCo-issue vector and scalar instruction

R

S

FB

F

T

S

21GH Programmability

Michael Doggett

PP

Pixel Pipeline

8 parallel floating point pixel pipelinesInput

Color, TexcoordsOutput

Color, DepthOutput surfaces

16, 32 bit float16 bit fixed

Color TextureCoords

88

Color

22GH Programmability

Michael Doggett

Pixel Pipeline

RegistersFour component 24bit floating point

performance and precision

12 read-write temp registers16 Texture imagesTexture instructions

KIL, TEX, TXB, TXP

PP

Color

ConstantRegisters

TextureCoords

3232

88

TextureSamplers

1616

Color

Temp

1212

23GH Programmability

Michael Doggett

Simple Fragment Program!!ARBfp1.0 TEMP temp; ATTRIB tex0 = fragment.texcoord[0]; # input registerATTRIB col0 = fragment.color; PARAM half = 0.5, 0.5, 0.5, 0.5; # constant

OUTPUT out = result.color;

#Fetch textureTEX temp, tex0, texture[0], 2D; #modulate and write out colorMUL out, col0, temp; END

!!ARBfp1.0 TEMP temp; ATTRIB tex0 = fragment.texcoord[0]; # input registerATTRIB col0 = fragment.color; PARAM half = 0.5, 0.5, 0.5, 0.5; # constant

OUTPUT out = result.color;

#Fetch textureTEX temp, tex0, texture[0], 2D; #modulate and write out colorMUL out, col0, temp; END

24GH Programmability

Michael Doggett

Computing the Mandelbrot set

Test each point on the complex number planeX dimension is the real componentY dimension is the imaginary componentZ’ = Z2 + CC = starting position on complex planeIterate until Z > 2

25GH Programmability

Michael Doggett

Mandelbrot main loopMUL pos.xy, curr, curr; # real componentADD pos.x, pos.x, -pos.y; # x2 – y2 + start.xADD pos.x, pos.x, start.x;MUL pos.y, curr.x, curr.y; # imaginary componentMAD pos.y, pos.y, two.x, start.y; DP3 magnitude, pos, pos; # calculate magnitudeSUB magnitude, magnitude, four; # compare magnitude to 4CMP escape.x, magnitude, 0, 1; ADD pos.z, pos.z, escape.x;MOV curr, pos; # ready next iteration

26GH Programmability

Michael Doggett

Mandelbrot main loop

MUL pos.xy, curr, curr; # real component

ADD pos.x, pos.x, -pos.y; # x2

– y2 + start.xADD pos.x, pos.x, start.x;MUL pos.y, curr.x, curr.y; # imaginary component

MAD pos.y, pos.y, two.x, start.y; DP3 magnitude, pos, pos; # calculate magnitude

#

27GH Programmability

Michael Doggett

Mandelbrot main loop

MUL pos.xy, curr, curr; # real component

ADD pos.x, pos.x, -pos.y; # x2

– y2 + start.xADD pos.x, pos.x, start.x;MUL pos.y, curr.x, curr.y; # imaginary component

MAD pos.y, pos.y, two.x, start.y; DP3 magnitude, pos, pos; # calculate magnitude

#

28GH Programmability

Michael Doggett

Mandelbrot main loop

MUL pos.xy, curr, curr; # real component

ADD pos.x, pos.x, -pos.y; # x2

– y2 + start.xADD pos.x, pos.x, start.x;MUL pos.y, curr.x, curr.y; # imaginary component

MAD pos.y, pos.y, two.x, start.y; DP3 magnitude, pos, pos; # calculate magnitude

#

29GH Programmability

Michael Doggett

Mandelbrot main loop

MUL pos.xy, curr, curr; # real component

ADD pos.x, pos.x, -pos.y; # x2

– y2 + start.xADD pos.x, pos.x, start.x;MUL pos.y, curr.x, curr.y; # imaginary component

MAD pos.y, pos.y, two.x, start.y; DP3 magnitude, pos, pos; # calculate magnitude

#

30GH Programmability

Michael Doggett

Mandelbrot multipass demo

Main loop is 11 instructionsNeed multiple iterations to get detailMultipass

Render to texturePass 1: Render output to framebufferSet framebuffer data as texturePass 2 to N: Read in previous result as texture Pass N+1: Draw texture on screen

31GH Programmability

Michael Doggett

Mandelbrot demo

32GH Programmability

Michael Doggett

Render to texture

Render data

Rasterization

Surface

FrameBuffer

Fragment

Transform

Output 1

33GH Programmability

Michael Doggett

Render to texture

Render dataSet data as texture input to Fragment stage Rasterization

Surface

FrameBuffer

Fragment

Transform

Output 1

34GH Programmability

Michael Doggett

Render to texture

Render dataSet data as texture input to Fragment stageRender to second output

Rasterization

Surface

FrameBuffer

Fragment

Transform

Output 1

Output 2

35GH Programmability

Michael Doggett

Render to texture

Render dataSet data as texture input to Fragment stageRender to second outputSwitch output and texture bufferPing-pong buffers

Rasterization

Surface

FrameBuffer

Fragment

Transform

Output 3

Output 2

36GH Programmability

Michael Doggett

Fast Fourier Transform

Transform image into frequency domainComplex weights for a summation of sine waves

Separable FilterCooley and Tukey 65, decimation in time

log2 (width) + log2 (height) + 2 rendering passesPing-pong between two floating point renderabletextures

37GH Programmability

Michael Doggett

Fast Fourier TransformHorizontal scramble

dependent texture read using precalculated position bit invertlog2(width) butterfly passes

texture read offset, betaVertical scramblelog2(height) butterfly passesMore details

ShaderX2 chapter, “Advanced Image Processing with DirectX 9 Pixel Shaders”, Mitchell, Ansari, Hart RNL seperable 50x50 gaussian blur filter

Jason Mitchell’s course notes from course 7. Real-Time Shading

38GH Programmability

Michael Doggett

FFT demo

39GH Programmability

Michael Doggett

Multiple Render Targets

Multiple render targets4 possible output buffersAll same X, Y and bit resolutionDifferent formats

ATI_draw_buffers extensionadds result.color[n] where n=0,1,2,3

OUTPUT oColor1 = result.color[1];

40GH Programmability

Michael Doggett

Render to texture

pbuffersOpenGL ARB SuperBuffers

Adds GLmem typecan be used as

framebuffertexturevertex array

41GH Programmability

Michael Doggett

Displacement mappingusing render to vertex array

2 Pass algorithm1st pass

Displacement fragment programOutput to SuperBuffer Vertex Array

using ATI_draw_buffersVertex, Normal, TexCoord

2nd pass Vertex program uses displaced Vertex BufferFragment program does bump mapped lighting

42GH Programmability

Michael Doggett

Displacement Mapping Demo

43GH Programmability

Michael Doggett

Programming OptionsLow or High level

Assembler or High Level Shading languagesAssembler

OpenGL Vertex Program and Fragment ProgramDirectX Vertex Shader and Pixel Shader

Shading Languagesthe OpenGL Shading LanguageDirectX9 HLSLNVIDIA’s Cg

ToolsATI’s RenderMonkey

44GH Programmability

Michael Doggett

the OpenGL Shading Language

C like language for programming GPUsBasically the same language for vertex and fragment shadersInputs and outputs similar to Vertex Engine and Pixel Pipeline

different syntax e.g. gl_TexCoord0, gl_FragColorType qualifiers

const // compile-time constantuniform // constant across primitive

Typesfloat, int, bool, vec4, ivec3

45GH Programmability

Michael Doggett

the OpenGL Shading Language

Structures and arraysOperators work with 1-4 component typesFlow controlUser defined functionsBuilt-in functions

sin, cos, normalize, noise,

46GH Programmability

Michael Doggett

Simple GLSL examplevertex shadervec4 LightPos = vec4 ( 0.4, 0.6, 0.6, 0.0 );vec4 DiffuseColor = vec4( 0.50, 0.15, 0.25, 1.0 ); vec4 LightColor = vec4( 0.8, 0.5, 0.8, 1.0 );

void main(void){

vec3 eyeNormal;

gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;eyeNormal = gl_NormalMatrix * gl_Normal;

gl_TexCoord0 = vec4( eyeNormal, 0.0 ); // eye space normalgl_TexCoord1 = LightPos;gl_TexCoord2 = DiffuseColor; gl_TexCoord3 = LightColor;

}

47GH Programmability

Michael Doggett

Simple GLSL examplefragment shaderconst float Shininess = 50.0;void main (void){

vec3 NNormal;vec4 MyColor;float Intensity;

NNormal = normalize( vec3 ( gl_TexCoord0 ) ); // tc0 = eyeNormalIntensity = dot ( vec3 ( gl_TexCoord1 ), NNormal ); // diffuse calculation

// tc1 = light positionIntensity = max ( Intensity, 0.0 );MyColor = vec4( Intensity ) * gl_TexCoord2; // tc2 = diffuse colorNNormal = abs ( NNormal );Intensity = dot ( NNormal, vec3( gl_TexCoord1 ) ); // light positionIntensity = max ( Intensity, 0.0 );Intensity = pow ( Intensity, Shininess );MyColor += vec4( Intensity ) * gl_TexCoord3; // specular light colorMyColor.a = gl_TexCoord2.a; // diffuse alphagl_FragColor = MyColor;

}

48GH Programmability

Michael Doggett

Simple GLSL examplefragment shaderconst float Shininess = 50.0;void main (void){

vec3 NNormal;vec4 MyColor;float Intensity;

NNormal = normalize( vec3 ( gl_TexCoord0 ) ); // tc0 = eyeNormalIntensity = dot ( vec3 ( gl_TexCoord1 ), NNormal ); // diffuse calculation

// tc1 = light positionIntensity = max ( Intensity, 0.0 );MyColor = vec4( Intensity ) * gl_TexCoord2; // tc2 = diffuse colorNNormal = abs ( NNormal );Intensity = dot ( NNormal, vec3( gl_TexCoord1 ) ); // light positionIntensity = max ( Intensity, 0.0 );Intensity = pow ( Intensity, Shininess );MyColor += vec4( Intensity ) * gl_TexCoord3; // specular light colorMyColor.a = gl_TexCoord2.a; // diffuse alphagl_FragColor = MyColor;

}

49GH Programmability

Michael Doggett

Simple GLSL examplefragment shaderconst float Shininess = 50.0;void main (void){

vec3 NNormal;vec4 MyColor;float Intensity;

NNormal = normalize( vec3 ( gl_TexCoord0 ) ); // tc0 = eyeNormalIntensity = dot ( vec3 ( gl_TexCoord1 ), NNormal ); // diffuse calculation

// tc1 = light positionIntensity = max ( Intensity, 0.0 );MyColor = vec4( Intensity ) * gl_TexCoord2; // tc2 = diffuse colorNNormal = abs ( NNormal );Intensity = dot ( NNormal, vec3( gl_TexCoord1 ) ); // light positionIntensity = max ( Intensity, 0.0 );Intensity = pow ( Intensity, Shininess );MyColor += vec4( Intensity ) * gl_TexCoord3; // specular light colorMyColor.a = gl_TexCoord2.a; // diffuse alphagl_FragColor = MyColor;

}

50GH Programmability

Michael Doggett

Simple GLSL examplefragment shaderconst float Shininess = 50.0;void main (void){

vec3 NNormal;vec4 MyColor;float Intensity;

NNormal = normalize( vec3 ( gl_TexCoord0 ) ); // tc0 = eyeNormalIntensity = dot ( vec3 ( gl_TexCoord1 ), NNormal ); // diffuse calculation

// tc1 = light positionIntensity = max ( Intensity, 0.0 );MyColor = vec4( Intensity ) * gl_TexCoord2; // tc2 = diffuse colorNNormal = abs ( NNormal );Intensity = dot ( NNormal, vec3( gl_TexCoord1 ) ); // light positionIntensity = max ( Intensity, 0.0 );Intensity = pow ( Intensity, Shininess );MyColor += vec4( Intensity ) * gl_TexCoord3; // specular light colorMyColor.a = gl_TexCoord2.a; // diffuse alphagl_FragColor = MyColor;

}

51GH Programmability

Michael Doggett

Computation on GPUsFast Computation of Generalized Voronoi Diagrams Using Graphics Hardware

Hoff et al. 99use polygon scan-conversion and depth comparison

Interactive Multi-Pass Programmable ShadingPeercy00Treat the OpenGL architecture as a general SIMD computer

Physically-Based Visual Simulation on Graphics Hardware

Mark Harris et. al., GH02Coupled map lattice stored in a texture

continuous values on a discrete latticesimulations of convection, reaction-diffusion, and boiling

52GH Programmability

Michael Doggett

Computation on GPUsSimulation and Computation, GH03 session

Simulation of Cloud Dynamics on Graphics HardwareHarris et al.

A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware

Goodnight et al. The FFT on a GPU

Moreland et al.

GPUs as Stream Processors, GH03 panelData Parallel Computing on Graphics Hardware, Ian Buck“Brook” stream API

objective is to abstract computational functionality

53GH Programmability

Michael Doggett

Computation on GPUs

SIGGRAPH2003 sessionThursday, 3.45-5.30Bolz et. al., Sparse matrix solversKrüger et. al., Krüger, Westermann

Linear Algebra Operators for GPU Implementation of Numerical AlgorithmsFramework for linear algebra operatorsNavier-Stokes Demo

54GH Programmability

Michael Doggett

Summary

GPU ishighly parallelvery programmable

low and high levels

increasing in performance and maintaining a low costdriven by a demanding consumer application

55GH Programmability

Michael Doggett

Questions ?

More informationwww.ati.com/developerdevrel@ati.comCourse 22. The OpenGL Shading Language

Monday, Half Day, 1:45 - 5:30 pm SuperBuffers and other GL extensions

Tuesday, exhibit hall C, 1 - 3pm.www.opengl.org

AcknowledgementsEvan Hart, Marwan Ansari, Bill Licea-Kane, Arcot Preetham, James Peercy

Recommended