18
CUDA 5.0 By Peter Holvenstot CS6260

CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

Embed Size (px)

Citation preview

Page 1: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

CUDA 5.0

By Peter HolvenstotCS6260

Page 2: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

CUDA 5.0

Latest iteration of CUDA toolkit

Requires Compute Capability 3.0

Compatible Kepler cards being installed @WMU

Page 3: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

Major New Features

GPUDirect Allows Direct Memory Access

GPU Object Linking Libraries for GPU code

Dynamic Parallelism Kernels inside kernels

Page 4: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

GPUDirect

Allows Direct Memory Access to PCIe bus

Third-party device access now supported

Requires use of pinned memory

DMAs can be chained across network

Page 5: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

GPUDirect

Page 6: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

Pinned Memory

Malloc() - unpinned, can be paged out

CudaHostAlloc() - pinned

Cannot be paged out

Takes longer to allocate, but allows features requiring DMA and increases copy performance

Page 7: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

Kernel Linking

Kernels now support compilation to .obj file

Allows compiling into/against static libraries

Allows closed-source distribution of libraries

Page 8: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

Dynamic Parallelism

CUDA 4.1: __device__ functions may make inline-able recursive calls

However, __global__ functions/kernels cannot

CUDA 5: GPU/kernels may launch additional kernels

Page 9: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

Dynamic Parallelism

Most important feature in release

Reduces need for synchronization

Allows program flow to be controlled by GPU

Allows recursion and subdivision of problems

Page 10: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

Dynamic Parallelism

CPU code can now become a kernel

Kernel calls can be used as tasks

GPU controls kernel launch/flow/scheduling

Increases practical thread count to thousands

Page 11: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

Dynamic Parallelism

Interesting data is not uniformly distributed

Dynamic parallelism can launch additional threads in interesting areas

Allows higher resolution in critical areas without slowing down others

Page 12: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

Source: NVIDIA

Page 13: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

Dynamic Parallelism

Nested Dependencies

Source: NVIDIA

Page 14: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

Dynamic Parallelism

Scheduling can be controlled by streams

No new concurrency guarantees

Launched kernels may execute out-of-order within a stream

Named streams can guarantee concurrency

Page 15: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

Dynamic Parallelism

Nested Dependencies - cudaDeviceSynchronize ()

Can be used inside a kernel

Synchronizes all launches by any kernel in block

Does NOT imply __syncthreads()!

Page 16: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

Dynamic Parallelism

Kernel launch implies memory sync operation

Child sees state at time of launch

Parent sees child writes after sync

Local and shared memory are private, cannot be shared with children

Page 17: CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed

Questions?