Exploiting Parallelism on GPUs

Preview:

DESCRIPTION

Exploiting Parallelism on GPUs. Matt Mukerjee David Naylor. Parallelism on GPUs. $100 NVIDIA video card  192 cores (Build Blacklight for ~$2000 ???) Incredibly low power Ubiquitous Question: Use for general computation? General Purpose GPU (GPGPU). ?. =. GPU Hardware. - PowerPoint PPT Presentation

Citation preview

Exploiting Parallelism on GPUs

Matt MukerjeeDavid Naylor

Parallelism on GPUs• $100 NVIDIA video card 192 cores– (Build Blacklight for ~$2000 ???)

• Incredibly low power• Ubiquitous

• Question: Use for general computation?– General Purpose GPU (GPGPU)

=?

GPU Hardware• Very specific constraints– Designed to be SIMD (e.g. shaders)– Zero-overhead thread scheduling– Little caching (compared to CPUs)

• Constantly stalled on memory access• MASSIVE # of threads / core• Much finer-grained threads

(“kernels”)

CUDA Architecture

Thread Blocks• GPUs are SIMD

• How does multithreading work?• Threads that branch are halted, then

run• Single Instruction Multiple….?

CUDA is an SIMT architecture

• Single Instruction Multiple Thread• Threads in a block execute the same

instructionMulti-threadedInstruction Unit

ObservationFitting the data structures needed by the threads in one multiprocessor requires application-specific tuning.

Example: MapReduce on CUDA

Too big forcache on one SM!

ProblemOnly one code branch within a block executes at a time

Enhancing SIMT

ProblemIf two multiprocessors share a cache line, there are more memory accesses than necessary.

Data Reordering

Recommended