Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron. CUDA Lecture 2 History of GPUs

CUDA Lecture 2 History of GPUs

  • Upload

  • View

  • Download

Embed Size (px)


CUDA Lecture 2 History of GPUs. Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron. Graphics in a Nutshell. Make great images Intricate shapes Complex optical effects Seamless motion Make them fast Invent clever techniques Use every trick imaginable - PowerPoint PPT Presentation

Citation preview

Page 1: CUDA Lecture 2 History of GPUs

Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

CUDA Lecture 2History of GPUs

Page 2: CUDA Lecture 2 History of GPUs

Make great imagesIntricate shapesComplex optical effectsSeamless motion

Make them fastInvent clever techniquesUse every trick imaginableBuild monster hardware

Eugene d’Eon, David Luebke, Eric Enderton, In Proc. EGSR 2007 and GPU Gems 3

History of GPUs – Slide 2

Graphics in a Nutshell

Page 3: CUDA Lecture 2 History of GPUs

History of GPUs – Slide 3

The Graphics PipelineVertex Transform & Lighting

Triangle Setup & Rasterization

Texturing & Pixel Shading

Depth Test & Blending


Page 4: CUDA Lecture 2 History of GPUs

History of GPUs – Slide 4

The Graphics PipelineVertex Transform & Lighting

Triangle Setup & Rasterization

Texturing & Pixel Shading

Depth Test & Blending


Page 5: CUDA Lecture 2 History of GPUs

History of GPUs – Slide 5

The Graphics PipelineVertex Transform & Lighting

Triangle Setup & Rasterization

Texturing & Pixel Shading

Depth Test & Blending


Transform from “world space” to “image space”

Compute per-vertex lighting

Page 6: CUDA Lecture 2 History of GPUs

History of GPUs – Slide 6

The Graphics PipelineVertex Transform & Lighting

Triangle Setup & Rasterization

Texturing & Pixel Shading

Depth Test & Blending


Convert geometric representation (vertex) to image representation (fragment)

Interpolate per-vertex quantities across pixels

Page 7: CUDA Lecture 2 History of GPUs

History of GPUs – Slide 7

The Graphics PipelineVertex Transform & Lighting

Triangle Setup & Rasterization

Texturing & Pixel Shading

Depth Test & Blending


Page 8: CUDA Lecture 2 History of GPUs

The Graphics PipelineKey abstraction of real-time


Hardware used to look like this

One chip/board per stage

Fixed data flow through pipeline




Test & Blend

FramebufferHistory of GPUs – Slide 8

Page 9: CUDA Lecture 2 History of GPUs

The Graphics PipelineEverything fixed function with

a certain number of modes

Number of modes for each stage grew over time

Hard to optimize hardware

Developers always wanted more flexibility




Test & Blend

FramebufferHistory of GPUs – Slide 9

Page 10: CUDA Lecture 2 History of GPUs

The Graphics PipelineRemains a key abstraction

Hardware used to look like this

Vertex and pixel processing became programmable, new stages added

GPU architecture increasingly centers around shader execution




Test & Blend

FramebufferHistory of GPUs – Slide 10

Page 11: CUDA Lecture 2 History of GPUs

The Graphics PipelineExposing an (at first limited)

instruction set for some stages

Limited instructions and instruction types and no control flow at first

Expanded to full ISA




Test & Blend

FramebufferHistory of GPUs – Slide 11

Page 12: CUDA Lecture 2 History of GPUs

Workload and programming model provide lots of parallelism

Applications provide large groups of vertices at onceVertices can be processed in parallelApply same transform to all vertices

Triangles contain many pixelsPixels from a triangle can be processed in

parallelApply same shader to all pixels

Very efficient hardware to hide serialization bottlenecks History of GPUs – Slide 12

Why GPUs Scale So Nicely

Page 13: CUDA Lecture 2 History of GPUs

History of GPUs – Slide 13

With Moore’s Law…






VertexPixel 0


ndPixel 1Pixel 2Pixel 3

Vrtx 0





Page 14: CUDA Lecture 2 History of GPUs

Note that we do the same thing for lots of pixels/vertices

A warp = 32 threads launched togetherUsually execute together as well

History of GPUs – Slide 14

More Efficiency
















Page 15: CUDA Lecture 2 History of GPUs

All this performance attracted developersTo use GPUs, re-expressed their algorithms

as general purpose computations using GPUs and graphics API in applications other than 3-D graphicsPretend to be graphics; disguise data as

textures or geometry, disguise algorithm as render passes

Fool graphics pipeline to do computation to take advantage of massive parallelism of GPU

GPU accelerates critical path of application

History of GPUs – Slide 15

What Is (Historical) GPGPU?

Page 16: CUDA Lecture 2 History of GPUs

Data parallel algorithms leverage GPU attributesLarge data arrays, streaming throughputFine-grain SIMD parallelismLow-latency floating point (FP) computation

Applications – see http://GPGPU.orgGame effects (FX) physics, image processingPhysical modeling, computational engineering,

matrix algebra, convolution, correlation, sorting

History of GPUs – Slide 16

General Purpose GPUs (GPGPUs)

Page 17: CUDA Lecture 2 History of GPUs

Previous GPGPU ConstraintsDealing with graphics API

Working with the corner cases of the graphics API

Addressing modes Limited texture size/dimension

Shader capabilities Limited outputs

Instruction sets Lack of integer & bit ops

Communication limited Between pixels Scatter a[i] = p

History of GPUs – Slide 17

Input Registers

Fragment Program

Output Registers



Temp Registers

per threadper Shaderper Context

FB Memory

Page 18: CUDA Lecture 2 History of GPUs

To use GPUs, re-expressed algorithms as graphics computations

Very tedious, limited usabilityStill had some very nice results

This was the lead up to CUDA

History of GPUs – Slide 18

Summary: Early GPGPUs

Page 19: CUDA Lecture 2 History of GPUs

General purpose programming modelUser kicks off batches of threads on the GPUGPU = dedicated super-threaded, massively

data parallel co-processorTargeted software stack

Compute oriented drivers, language, and tools

History of GPUs – Slide 19

Compute Unified Device Architecture (CUDA)

Page 20: CUDA Lecture 2 History of GPUs

Driver for loading computation programs into GPUStandalone Driver - Optimized for computation Interface designed for compute – graphics-free

APIData sharing with OpenGL buffer objects Guaranteed maximum download & readback

speedsExplicit GPU memory management

History of GPUs – Slide 20

Compute Unified Device Architecture (CUDA)

Page 21: CUDA Lecture 2 History of GPUs

History of GPUs – Slide 21

Example of Physical Reality behind CUDA



GPU w/ local DRAM


Page 22: CUDA Lecture 2 History of GPUs

8-series GPUs deliver 25 to 200+ GFLOPSon compiled parallel C applications Available in laptops,

desktops, and clusters

GPU parallelism is doubling every year

Programming model scales transparently

History of GPUs – Slide 22

Parallel Computing on a GPU

GeForce 8800

Tesla D870

Page 23: CUDA Lecture 2 History of GPUs

Programmable in C with CUDA tools Multithreaded SPMD model uses application

data parallelism and thread parallelism

History of GPUs – Slide 23

Parallel Computing on a GPU

Tesla S870

Page 24: CUDA Lecture 2 History of GPUs

GPUs evolve as hardware and software evolve

Five stage graphics pipelining

An example of GPGPU

Intro to CUDA

History of GPUs – Slide 24

Final Thoughts

Page 25: CUDA Lecture 2 History of GPUs

Reading: Chapter 2, “Programming Massively Parallel Processors” by Kirk and Hwu.

Based on original material fromThe University of Illinois at Urbana-Champaign

David Kirk, Wen-mei W. HwuThe University of Minnesota: Weijun XiaoStanford University: Jared Hoberock, David

TarjanRevision history: last updated 5/24/2011.

History of GPUs – Slide 25

End Credits