Exploiting Parallelism on GPUs

View
53
Download
0
Category

Documents

Tags:

Preview:

DESCRIPTION

Exploiting Parallelism on GPUs. Matt Mukerjee David Naylor. Parallelism on GPUs. $100 NVIDIA video card  192 cores (Build Blacklight for ~$2000 ???) Incredibly low power Ubiquitous Question: Use for general computation? General Purpose GPU (GPGPU). ?. =. GPU Hardware. - PowerPoint PPT Presentation

Citation preview

Exploiting Parallelism on GPUs

Matt MukerjeeDavid Naylor

Parallelism on GPUs• $100 NVIDIA video card 192 cores– (Build Blacklight for ~$2000 ???)

• Incredibly low power• Ubiquitous

• Question: Use for general computation?– General Purpose GPU (GPGPU)

GPU Hardware• Very specific constraints– Designed to be SIMD (e.g. shaders)– Zero-overhead thread scheduling– Little caching (compared to CPUs)

• Constantly stalled on memory access• MASSIVE # of threads / core• Much finer-grained threads

(“kernels”)

CUDA Architecture

Thread Blocks• GPUs are SIMD

• How does multithreading work?• Threads that branch are halted, then

run• Single Instruction Multiple….?

CUDA is an SIMT architecture

• Single Instruction Multiple Thread• Threads in a block execute the same

instructionMulti-threadedInstruction Unit

ObservationFitting the data structures needed by the threads in one multiprocessor requires application-specific tuning.

Example: MapReduce on CUDA

Too big forcache on one SM!

ProblemOnly one code branch within a block executes at a time

Enhancing SIMT

ProblemIf two multiprocessors share a cache line, there are more memory accesses than necessary.

Data Reordering

Recommended

Project: TERAFLUX - Exploiting dataflow parallelism in

Documents

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Documents

PROJECT NUMBER: 249013 Exploiting dataflow parallelism ......Project: TERAFLUX - Exploiting dataflow parallelism in Teradevice Computing Grant Agreement Number: 249013 Call: FET proactive

Documents

Exploiting Parallelism in GPUs - Duke University · Exploiting Parallelism in GPUs by Blake Hechtman Department of Electrical and Computer Engineering Duke University Date: Approved:

Documents

NVIDIA OptiX 5.0 - Programming Guide · 1 OptiXoverview GPUs are best at exploiting very high degrees of parallelism, and ray tracing ﬁts that requirement perfectly. However, typical

Documents

CHAPTER 15 Exploiting Load/Store Parallelism via Memory

Documents

NVIDIAOptiX6...1 OptiXoverview GPUs are best at exploiting very high degrees of parallelism, and ray tracing ﬁts that requirement perfectly. However, typical ray tracing algorithms

Documents

PROJECT NUMBER: 249013 Exploiting dataflow parallelism in

Documents

Minesh B. Amin mamin @ mbasciences.com …...Terminology: "Exploiting Parallelism" Exploiting parallelism entails the management of a collection of serial tasks which may communicate

Documents

NVIDIA OptiX 5.1 - Programming Guide · 1 OptiXoverview GPUs are best at exploiting very high degrees of parallelism, and ray tracing ﬁts that requirement perfectly. However, typical

Documents

Exploiting Instruction Level Parallelism with Software Approaches

Documents

Exploiting Parallelism and Scalability - Duke Universitypeople.duke.edu/~bcl15/documents/xps2015-report.pdf · research program -- Exploiting Parallelism and Scalability (XPS). XPS

Documents

Exploiting Parallelism of Irregular Problems and

Documents

Exploiting thread-level parallelism on reconfigurable …cj82qf64f/fulltext.pdf · Exploiting Thread-Level Parallelism on Reconﬁgurable Architectures: a Cross-Layer Approach by

Documents

Exploiting Superword Level Parallelism with Multimedia Instruction Sets

Documents

Exploiting Fine-Grained Parallelism Through a Combination ...melvin/p287-melvin.pdf · Exploiting Fine-Grained Parallelism Through a Combination of Hardware and Software Techniques

Documents

Exploiting Intra Warp Parallelism for GPGPU

Documents

Exploiting Parallelism on GPUs

Documents

Project: TERAFLUX - Exploiting dataflow parallelism in

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

PROJECT NUMBER: 249013 Exploiting dataflow parallelism ......Project: TERAFLUX - Exploiting dataflow parallelism in Teradevice Computing Grant Agreement Number: 249013 Call: FET proactive

Exploiting Parallelism in GPUs - Duke University · Exploiting Parallelism in GPUs by Blake Hechtman Department of Electrical and Computer Engineering Duke University Date: Approved:

Exploiting CUDA Dynamic Parallelism for low power ARM

Janus : exploiting parallelism via hindsight

Exploiting Application Data-Parallelism on Dynamically

NVIDIA OptiX 5.0 - Programming Guide · 1 OptiXoverview GPUs are best at exploiting very high degrees of parallelism, and ray tracing ﬁts that requirement perfectly. However, typical

CHAPTER 15 Exploiting Load/Store Parallelism via Memory

NVIDIAOptiX6...1 OptiXoverview GPUs are best at exploiting very high degrees of parallelism, and ray tracing ﬁts that requirement perfectly. However, typical ray tracing algorithms

PROJECT NUMBER: 249013 Exploiting dataflow parallelism in

Minesh B. Amin mamin @ mbasciences.com …...Terminology: "Exploiting Parallelism" Exploiting parallelism entails the management of a collection of serial tasks which may communicate

NVIDIA OptiX 5.1 - Programming Guide · 1 OptiXoverview GPUs are best at exploiting very high degrees of parallelism, and ray tracing ﬁts that requirement perfectly. However, typical

Exploiting Instruction Level Parallelism with Software Approaches

Exploiting Parallelism and Scalability - Duke Universitypeople.duke.edu/~bcl15/documents/xps2015-report.pdf · research program -- Exploiting Parallelism and Scalability (XPS). XPS

Exploiting Parallelism of Irregular Problems and

Exploiting thread-level parallelism on reconfigurable …cj82qf64f/fulltext.pdf · Exploiting Thread-Level Parallelism on Reconﬁgurable Architectures: a Cross-Layer Approach by

Exploiting Superword Level Parallelism with Multimedia Instruction Sets

Exploiting Fine-Grained Parallelism Through a Combination ...melvin/p287-melvin.pdf · Exploiting Fine-Grained Parallelism Through a Combination of Hardware and Software Techniques

Exploiting Intra Warp Parallelism for GPGPU