Transform and Lighting

Transform and Lighting

NVIDIA Corporation Transform and Lighting | 1

Executive Summary

NVIDIA has created a discontinuity in the PC market with the introduction of the GeForce 256™ GPU(Graphics Processing Unit). The introduction of the GPU with integrated hardware transform andlighting engines is a 3D graphics breakthrough that enables a new class of applications.

The GPU delivers a level of realism that was impossible without yesterday’s multimillion -dollargraphics supercomputers. Animated characters running on a GPU become lifelike detailed characterswith complex facial expressions with smooth movement. The world these characters inhabit are nowable to be lush with organic life (trees, bushes, plants) and architectural and structural details that wetake for granted in the real world.

In the pre-GPU era, walls for buildings and rooms were placed with the fewest details possible, whichresulted in a 3D vacuum. This vacuum will now be filled by the GPU with the details of the real world:realistic furniture, appliances, lights, consumer items, clothing and vehicles. In short, the GPU world willapproach what we see in the real world

• CPU power, which was previously held hostage to calculate the sparse pre-GPU world, willnow be freed for other tasks, such as physics, inverse kinematics, sophisticated artificialintelligence etc. The combination of a GPU with a highly utilized CPU will enable a new classof user experience.

• This paper will cover the transform and lighting capabilities of the GPU and how this will beexperienced by users.

Transform and Lighting

Transform and lighting (T&L) are the first two of the four major steps a GPU computes in the 3Dgraphics pipeline. They are highly compute intensive, with a set of very specific mathematicalinstructions performed billions of times per second to render a scene.

To understand the role of (T&L), it is helpful to have a general understanding of the entire process ofcreating 3D graphics starting with the application itself. The major functional steps are shown in Figure1.

The Role of the Transform Engine

The process of describing and displaying 3D graphics objects and environments is complex. Toreduce this complexity, it is useful to describe the 3D data according to different frames of reference,or different coordinate systems, at different times. These different frames of reference are referred toas “spaces” such as world space, eye space and screen space. Each of these is spaces is convenientfor one or more operations that must be performed as a 3D image is created. World space is used forholding all of the 3D objects that are part of the 3D world. Eye space is used for lighting and cullingand screen space is used for storing the scene in the graphics frame buffer. However, because thesespaces use different coordinate systems, 3D data must be converted or “transformed” from one spaceto another as it moves through the 3D pipeline. The transform engine performs all of thesemathematical transformations. Figures 2 and 3 give visual examples of world space and screenspace.


Figure 1The 3D Graphics Pipeline

Application Tasks The application controls the movement of objects(including the camera) and their interaction in the 3D world.Problems such as realistic physics calculations (objectshave momentum) and collision detection (did my race carbump the wall or bump another car?) affect how an objectmoves in the scene.

Scene-level Tasks

(culling, LOD, display list)

Scene-level tasks include object level culling (car #3 iscompletely behind the camera, so it doesn’t need to besent to the next stage), selecting the appropriate detaillevel (car #1 is far away, so a low-detail model is best), andcreating the list of all objects in the current camera view.

Transform The transform engine converts 3D data from one frame ofreference to a new frame of reference. The system musttransform the data to the current view before performingthe following steps (lighting, triangle setup and rendering).Every object that is displayed and some objects that arenot displayed must be transformed every time the scene isredrawn.

Lighting Lighting is the next step in the 3D pipeline and provideshigh visual impact. Lighting effects are essential forenhancing the realism of a scene and bringing renderedimages one more step closer to our perception of the realworld.

Triangle Setup andClipping

The triangle setup engine is a floating-point math processorthat receives vertex data and calculates all of theparameters that the rendering engine will require. This unit‘sets up’ and ‘clips’ the triangle for the rendering engine.

Rendering Rendering is calculating the correct color for each pixel onthe screen, given all of the information delivered by thesetup engine. The rendering engine must consider thecolor of the object, the color and direction of light hitting theobject, whether the object is translucent, and what texturesapply to the object. If rendering is done in multiple stages,it is called multi-pass rendering.

3D G

raphics Pipeline3D

Application

and AP

I


Figure 2World Space

Image courtesy of Pixar.

Figure 3Screen Space

3/31/98 3D Tutorial - Copyright NVIDIA Corp.

Screen Space

World Coordinate Space

Screen Coordinate Space

Screen space or view spaceis the view presented to theuser. One of the firstprocesses in the graphicspipeline is to transform thescene (with light sources)into screen space.

3/31/98 3D Tutorial - Copyright NVIDIA Corp.

World Space

Y

X

Z

Light Sources

Screen

View Pointor Camera

World Coordinate Space

World space includes everything!Position and orientation for allitems is needed to accurately calculatetransformations into screen space.


The Role of the Lighting Engine

The lighting engine is similar to the transform engine because it has a set of mathematical functionsthat it must perform. The GeForce 256 GPU uses separate transform and lighting engines so that eachengine can run at maximum efficiency. Without discrete engines, the transform performance is limitedby sharing compute time with the lighting computation. This discrete engine calculates distancevectors from lights to objects in the 3D scene as well as distance vectors from objects to the viewer’seyes. A vector, by definition, contains information about direction and distance. The lighting enginemust also separate the length or distance information from the direction information because thatsimplifies future steps in the 3D pipeline. Lighting calculations are used for vertex lighting, but they arealso critical for other effects, such as advanced fog effects, that are based on the eye-to-objectdistance rather than just the Z-value of the object.

Why Users Want Integrated Transform Engines in a GPU

Separate transform and lighting engines integrated into one chip, such as in the GeForce 256 GPU,are absolutely necessary for graphics processors to continue to advance the user experience. 3Dgraphics performance has scaled rapidly over the past four years with the primary emphasis beingplaced on pixel fill rate and texture-mapping capabilities. This emphasis has reaped for users therewards of fast frame rates for fluid navigation in 3D environments, but has left the task of thegeometry transform and lighting calculations for the host CPU to do. The problem with this is the factthat CPUs only double in speed every 18 months while graphics processors advance eight times in thesame period of time. Geometry processing (T&L) performance has now become the most significantbarrier to more sophisticated 3D graphics on a typical PC. The graphics processor spends too muchtime waiting for the CPU! This bottleneck has forced software developers to limit the geometriccomplexity of their 3D characters and environments, sacrificing image quality and elaborate 3Denvironments in order to maintain expected performance. Integrated transform engines are the mostcost-effective way to alleviate the geometry performance bottleneck and open up new opportunities forsoftware developers to add greater geometric detail to their 3D worlds.

Transform performance dictates how finely software developers may tessellate the 3D objects theycreate, how many objects they can put in a scene and how sophisticated the 3D world itself can be.Tessellation is the process of converting curved lines into a series of line segments that approximatethe original curve. This idea extends to three-dimensional objects as curved surfaces are convertedinto polygons that approximate the surface. This creates a classic performance-versus-quality trade-off for the software developer because finer tessellation will result in more polygons and slowerperformance, but with the reward of higher quality. Figure 4 shows a simple example of a spheretessellated by different degrees.


Figure 4Tessellated Spheres

Each of the images in Figure 4 represents the same sphere, but the image on the far right is clearly themost realistic of the three. It also has almost five times as many polygons as the image on the far leftand three times as many polygons as the middle image. This means that the ‘sphere’ on the rightrequires five times as much transform performance as the leftmost sphere to maintain a given framerate for the user. Computer users prefer frame rates in the 30 fps to 60 fps range and will often adjustthe ‘quality’ settings of the application (rendering features, resolution, etc.) to make sure that the framerate is in this range.

This simple “performance-versus-quality” trade-off becomes more complex when one considers that acurrent 3D scene requires hundreds to thousands of objects that must share the transform and lightingability of the CPU. Software developers must budget how much detail to put in each model and howmany models to use in the scene so that the overall quality of the scene is maximized andperformance is maintained within an acceptable range. If a jungle scene is the goal, lots of trees andbushes are necessary, having a single tree and bush does not convey the idea of a jungle.

The trade-off between performance and higher geometric complexity (to create higher image quality)would not be severe if the transform performance of today’s PCs were dramatically higher. The GPU’sgoal is to deliver performance and quality that push the limits of human perception. The reality today isthat even the fastest CPU forces software developers to make severe compromises in bothperformance and quality to deliver 30fps to 60fps.

Why Users Want Integrated Lighting Engines in a GPU

Integrated lighting is a critical requirement in 3D graphics, because it has such a dramatic impact onimage quality. Lighting calculations are an effective way of adding subtle and not-so-subtle changes inbrightness to 3D objects in a manner that mimics our real-world experiences. That ability to renderrealistic scenes is crucial because makes it 3D graphics appeal to a broad audience. The morerealistic 3D graphics scenes become, the more impact and applicability they will have. 3D applicationsrun the gamut from entertainment to education to science, industry, medicine, data analysis and more.Lighting effects add impact to each of those applications because they create realistic scene detail.The human eye is actually more sensitive to changes in brightness than it is to changes in color.


Simply put, lighting effects take greater advantage of the natural abilities of the human eye. Thismeans that an image with lighting effects will communicate more information to the viewer in the sameamount of time. Figure 5 shows an architectural scene that demonstrates the potential of lightingeffects to add realism to a scene. Without integrated lighting, objects cannot react to complex changesin eyepoint relative to light sources accurately

Figure 5Lighting Example

Image courtesy of Digital Illusion.

Lighting in 3D graphics is divided into two major components to simplify the task of modeling thephysics of light. Those two components are diffuse lighting and specular lighting. Diffuse lightingassumes the light hitting an object scatters in all directions equally, so the brightness of the reflectedlight does not depend at all on the position of the viewer. Sunlight on a playground is an example inthe real world of diffuse lighting. The brightness of an object in the scene is primarily determined bythe diffuse lighting calculations. Specular lighting is different because it does depend on the positionof the viewer as well as the direction of light and orientation of the triangle being rendered. Shining aspotlight into a dark corner of a room onto a TV set and looking at the hot spots on the picture tube willshow you specular lighting. Specular lighting captures the mirror-like properties of an object so effectssuch as reflection and glare are achievable. Figure 6 shows two examples of a space station from the


3D WinBench® benchmark that demonstrate the differences between diffuse and specular lightingeffects.

Figure 6 and 7Specular Highlights

Specular highlights move on the object if the viewer or the object moves relative to the light source.For this reason they cannot be pre-computed or static. Specular lighting is particularly useful for twoeffects in a 3D scene: displacement mapping and creating the appearance of different materials forobjects.

Specular lighting is also important for representing different materials for objects in a 3D scene. A silkshirt looks different than a cotton shirt, even if they are the same color. A major difference is how thetwo materials reflect light, which is captured with specular lighting. Another example is polished stonesuch as marble compared to the same material before polishing. The polishing doesn’t change thecolor or pattern in the marble, but it does affect the way light is reflected. Specular lighting combinedwith texture mapping creates more realistic objects because they have the visual properties of realmaterials.

Specular lighting can be done without a dedicated hardware lighting engine, but only with a severe lossof performance. Texture mapping can be used for some specular lighting effects, but if the viewermoves around in the scene, then the environment maps must be re-calculated, which is a time-consuming process unless the graphics hardware supports cube environment mapping. Spheremapping is the common alternative but it suffers from performance and quality problems as the viewermoves around in the scene, making it unattractive for interactive 3D environments.

Diffuse Lighting Only Diffuse and SpecularLighting


Where Transform & Lighting (T&L) Work is Done:

The Migration from CPU to GPU

All of the work in the 3D graphics pipeline is divided between the CPU and the graphics processor.The line that divides the CPU tasks from those performed on the graphics processor moves as thecapabilities of the graphics processor continue to grow. 1999 is the year when graphics processors withintegrated T&L engines can be sold at mainstream PC price points and create a compelling valueproposition for any PC user. Figure 8 graphically shows the growing role in the last few years of thededicated graphics processors for mainstream PCs. The complete graphics pipeline is now computedby the graphics processing unit, hence the term GPU.

Figure 8The 3D Graphics Pipeline

The transform and lighting functions of the 3D pipeline traditionally have been performed by generalpurpose microprocessors such as those used as CPUs in PCs. This was driven by the traditionallyhigh cost of duplicating the floating-point functions required for T&L in the graphics processor coupledwith the fact that most CPUs already contained complete floating-point units (FPUs). The cost ofimplementation has been a traditional barrier to adding these features to graphics products inmainstream PCs.

The cost equation was different for exotic Unix workstations, where non-integrated transform andlighting processors were available to those with unlimited budgets. As costs declined in the past 2years, Windows NT® workstations were the next business opportunity but those solutions were

Application tasks (move objects according to application, move/aim camera)

Scene level calculations (object level culling, select detail level, create object mesh)

Transform

Lighting

Triangle Setup and Clipping

Rendering

CPU

CPU

CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU

CPU CPU CPU

1996 1997 1998 1999

GraphicsProcessor

GraphicsProcessor

GraphicsProcessor

GraphicsProcessor

GraphicsProcessor

GPU

GPU

GPU

GPU

3D Applicationand API

3D GraphicsPipeline


relegated to multi-board graphics add-in cards costing thousands of dollars. Prices continued todecline as semiconductor manufacturing allowed more transistors to be put on a single die. In 1999,the affordable GPU is possible with dedicated hardware T&L acceleration because it can now beintegrated onto the same die with the other components of the 3D graphics pipeline. Silicon processtechnology has shattered this cost barrier and enabled NVIDIA’s GPU architects to integrate allfunctions on a single die.

Now that the GPU is available to high-volume PC designs, system designers will choose this new pathto deliver a stunning visual experience to all users. The two key architectural benefits are:

• 1) Higher graphics performance – GPUs with integrated T&L engines will process thosefunctions at two to four times the speed of the leading CPUs.

• 2) CPU power can be better utilized for functions such as physics calculations, artificialintelligence, other application functions, and managing system resources

GPUs can easily outperform a CPU because a dedicated graphics engine focuses on the critical mathfunctions only. The floating-point unit in a CPU must provide many other functions because itsgeneral-purpose nature requires that versatility. GPU’s are not as versatile; however they areextremely efficient in executing a specific set of graphics functions.

The mathematics requirements for a GPU engine are specific. T&L calculations rely heavily on matrixmathematics, specifically 4x4 matrix multiplication, and a few other vector operations. The focusednumber of operations that a T&L engine must support makes it straightforward to optimize the designfor maximum silicon efficiency. That efficiency makes the throughput of a GPU highly predictable too.This predictability stems from the fact that buffers and pipelines can be optimized for these veryspecific requirements. When the CPU performs T&L calculations, the throughput is more variablebecause it varies with the rest of the workload imposed on the CPU. Most CPUs today offer some typeof matrix math extensions such as MMX, SSE or 3DNow!®, but these extensions of the instruction setare, to a degree, general purpose as well. Those extensions must be able to process a variety ofmultimedia functions including audio, video and communications algorithms.

A hit 3D game running on a GPU also requires more than pretty graphics, or “eye candy.” It requiressophisticated artificial intelligence (AI), realistic physics and more complex game elements. The needfor heavy artificial intelligence in strategy and sports games has limited pre-GPU titles to charactersthat are poorly defined. The developers have had to sacrifice realistic looking graphics for basic gameplay since the CPU is required to handle both AI and graphics tasks. The GPU allows realistic soccer,football and basketball teams to have players that look, run, jump and play like real players. Realisticphysics modeling enhances 3D applications by more accurately mimicking the real world. Objects inthe real world have momentum, so objects in a 3D application should too. These functions alone canoverwhelm a leading-edge CPU today and the CPU still has all of its traditional functions to do as well,such as managing bus traffic, servicing the operating systems requirements, and handling allapplication level requirements. These functions cannot be offloaded to the graphics processor, but theGPU operations will be offloaded to improve the overall graphics performance and overall applicationperformance too.


The PC User Is the Ultimate Winner

PC users are the ultimate winners as T&L operations move from the CPU to GPU because the newarchitecture will deliver dramatically better performance at the same price. The higher performancecomes from more efficient partitioning of tasks between the different processors in the PC system.The benefits are significant and include:

• Existing 3D applications can run faster.

• Software developers can use more advanced effects and graphics techniques, so the graphicsin their applications will be more informative and more captivating.

• New applications using 3D as a medium emerge in the same way that the two-dimensional bit-mapped Graphical User Interface (GUI) enabled WYSIWYG desktop publishing rather thanjust word processing with text-based interfaces.

These benefits increase with the overall graphics performance of the PC. GPUs like the GeForce 256will create a new level of price/performance for the typical PC user. This trend starts this year, in1999, and will continue growing as every graphics processor includes integrated geometryacceleration.

© 1999 NVIDIA Corporation

NVIDIA, the NVIDIA logo, RIVA TNT, RIVA TNT2 and NVIDIA Vanta are trademarks of NVIDIA Corporation. RIVA, RIVA 128 and RIVA 128ZX are

trademarks of NVIDIA Corporation and STMicroelectronics. Other company and product names may be trademarks or registered trademarks of

the respective companies with which they are associated.


Appendix A

How Transform and Lighting Works: The MathematicsBehind the Magic

Transform and lighting operations are similar because they both require a set of mathematicalfunctions that are repeated over and over again millions of times each second. The actual mathcapabilities required in a GPU are described here so the reader can appreciate why these functionslend themselves to implementation in highly optimized, dedicated processors rather than generalpurpose CPUs.

A transform operation is a 4x4 matrix-multiply operation. A vector representing 3D data, typically avertex or a normal vector, is multiplied by a 4x4 matrix called the transform matrix and the result is thetransformed vector. To do this, the transform engine uses standard linear algebra to multiply thematrices. Figure 9 shows a generic example of this operation.

Figure 9Matrix Multiplication

a b c d x ax+by+cz+dw x’

e f g h x y = ex+fy+gz+hw = y’

i j k l z ix+jy+kz+lw z’

m n o p w mx+ny+oz+pw w’

Transform Original Transformed Vector

Matrix Vector

Before a vector can be transformed, a transform matrix must be constructed. This matrix contains allof the information necessary to convert vector data to the new coordinate system. An interim matrixmust be created for each action (scaling, rotation and translation) that should be performed on thevector and those interim matrices are multiplied together to create a single matrix that represents thecombined effect of all of those actions. That single matrix is called the transform matrix and it can bere-used without being re-created. For example, once the transform matrix is constructed, it can beused to transform one vector or one million vectors. This ability to re-use the transform matrixeffectively amortizes the setup time required to create the matrix over all the times it is used.

Intuitively, one might think that a 3x3 matrix is sufficient because this is a 3D graphics calculation.The fourth component in the original vector is w. W is used as a scaling factor for perspective


correction and drives the transform matrix to be 4x4. A thorough discussion of w is beyond the scopeof this paper.

Note that the number of actual mathematics operations for a single transform is substantial: 16multiplication operations and 12 addition operations. The mathematics required for transforms arealso very simple, only multiplication and addition are used.

Part of the magic of using matrix multiplication for transform operations is that scaling, rotation andtranslation all take the same amount of time to perform. They also can be done simultaneously. Thismakes the performance of a dedicated transform engine predictable and consistent. Predictabilityallows software developers to know how their application will perform with a given amount ofgeometry, allowing them to make informed decisions regarding performance and quality.

Lighting calculations are computationally expensive. Typical lighting calculations are computingdistances and directions between light sources and objects. Lighting engines also perform some dataconversion functions such as creating the normal vectors for triangles and vertices as well as“normalizing” those vectors. Normalizing a vector is the process of converting it to a new vector with alength of one unit and points in the same direction as the original vector. A ‘normal’ vector to a vertexor to a triangle by definition is a vector that is perpendicular to the triangle or the vertex described.Perpendicularity to a vertex is defined as perpendicular to the surface that the vertex describes.

The mathematics in a simplified lighting model is described here. The calculation of difference vectorsinvolves only addition (subtraction is simply adding a negative value) of the xyz data for each of theobjects and therefore requires three addition operations. For example the difference vector betweenlight source #1 at position (x1, y1, z1) and object #2 at position (x2, y2, z2) is:

x2 x1 x’

Difference Vector = y2 - y1 = y’

z2 z1 z’

These vectors must be processed further into a scalar distance value and a normalized vector beforeleaving the lighting engine. These operations are more complex but very necessary becauserepresenting the difference vectors as separate values for distance and direction simplifies future stepsin the 3D pipeline. Calculating the distance is a matter of squaring the x,y and z values adding themand taking the square root of the sum, as shown in Equation 1.

Equation 1 distance = (x2+y2+z2)1/2

Normalizing the vector requires dividing each of the x,y and z values by the distance. So the lightingengine requires the ability to add as well as divide.

Documents

Transform and Lighting