Upload
barbra-watkins
View
217
Download
0
Embed Size (px)
DESCRIPTION
GPU Performance Trends
Citation preview
David Angulo RubioFAMU CIS GradStudent
Introduction
GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become extremely powerful and flexible Programmability Precision Power
GPGPU computing is an emerging field which objective is to harness GPUs for general-purpose computation
GPU Performance Trends
Motivation: Flexible and Precise Modern GPUs are deeply programmable
Programmable pixel, vertex, video engines. Solidifying high level language support
Modern GPUs support high precision 32 bit floating point throughout the pipeline High enough for many (not all) applications Newest GPUs have 64bit support
Stream Programming Abstraction Streams
Collection of data records All data is expressed in streams
Kernels Inputs/outputs are streams Perform computation on streams Can be chained together
KERNEL
stream
stream
Stream Programming Abstraction
Dolphin Triangle Mesh
Stream Programming Abstraction Benchmark Funnel: In this
simulation, a cloth falls into a funnel and pass through it under the pressure of a ball. This model has 47K vertices, 92K triangles, and a lot of self-collisions. Our novel GPU-based CCD algorithm takes 4.4ms and 10ms per frame to compute all the collisions on a NVIDIA GeForce GTX 480 and a NVIDIA GeForce GTX 285, respectively.
Stream Programming Abstraction
Why Streams Ample computation by exposing parallelism
Streams expose data parallelism Multiple streams elements can be processed in parallel
Pipeline (task) parallelism Multiple tasks can be processed in parallel
Kernels yield high arithmetic intensity Efficient communication
Producer consumer locality Predictable memory access pattern
Optimize for throughput of all elements, not latency of one
Processing elements at once allows latency hiding
CPU GPU ANALOGIES
Stream/Data array = TextureMemory read= Texture Sample
Structuring a GPU Program Cpu assembles input data Cpu transfers data to GPU(GPU “main
memory” or “device memory”) Cpu calls GPU program (or set of
kernels).GPU runs out of GPU main memory.
When GPU finishes, CPU copies back results into CPU memory.
Recent interfaces allow overlap What lessons can we draw from this
sequence of operations
Kernels
CPU GPU
ADVECT
KERNEL / LOOP BODY / ALGORITHM STEP = Fragment Program
You write one program. It runs on every vertex/fragment.
Conclusion
Can we apply these techniques to more general problems?
GPUs should excel at tasks that : Require ample computation Regular computation Efficient communication