David Angulo Rubio FAMU CIS GradStudent. Introduction GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become

David Angulo RubioFAMU CIS GradStudent

Introduction

GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become extremely powerful and flexible Programmability Precision Power

GPGPU computing is an emerging field which objective is to harness GPUs for general-purpose computation

GPU Performance Trends

Motivation: Flexible and Precise Modern GPUs are deeply programmable

Programmable pixel, vertex, video engines. Solidifying high level language support

Modern GPUs support high precision 32 bit floating point throughout the pipeline High enough for many (not all) applications Newest GPUs have 64bit support

Stream Programming Abstraction Streams

Collection of data records All data is expressed in streams

Kernels Inputs/outputs are streams Perform computation on streams Can be chained together

KERNEL

stream

stream

Stream Programming Abstraction

Dolphin Triangle Mesh

Stream Programming Abstraction Benchmark Funnel: In this

simulation, a cloth falls into a funnel and pass through it under the pressure of a ball. This model has 47K vertices, 92K triangles, and a lot of self-collisions. Our novel GPU-based CCD algorithm takes 4.4ms and 10ms per frame to compute all the collisions on a NVIDIA GeForce GTX 480 and a NVIDIA GeForce GTX 285, respectively.

Stream Programming Abstraction

Why Streams Ample computation by exposing parallelism

Streams expose data parallelism Multiple streams elements can be processed in parallel

Pipeline (task) parallelism Multiple tasks can be processed in parallel

Kernels yield high arithmetic intensity Efficient communication

Producer consumer locality Predictable memory access pattern

Optimize for throughput of all elements, not latency of one

Processing elements at once allows latency hiding

CPU GPU ANALOGIES

Stream/Data array = TextureMemory read= Texture Sample

Structuring a GPU Program Cpu assembles input data Cpu transfers data to GPU(GPU “main

memory” or “device memory”) Cpu calls GPU program (or set of

kernels).GPU runs out of GPU main memory.

When GPU finishes, CPU copies back results into CPU memory.

Recent interfaces allow overlap What lessons can we draw from this

sequence of operations

Kernels

CPU GPU

ADVECT

KERNEL / LOOP BODY / ALGORITHM STEP = Fragment Program

You write one program. It runs on every vertex/fragment.

Conclusion

Can we apply these techniques to more general problems?

GPUs should excel at tasks that : Require ample computation Regular computation Efficient communication

Documents

David Angulo Rubio FAMU CIS GradStudent. Introduction GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become