CSTalks - GPGPU - 19 Jan

Research in GPU Computing

Cao Thanh Tung

19-Jan-2011 Computing Students talk 2

Outline

● Introduction to GPU Computing– Past: Graphics Processing and GPGPU

– Present: CUDA and OpenCL

– A bit on the architecture

● Why GPU?● GPU v.s. Multi-core and Distributed● Open problems.● Where does this go?


Introduction to GPU Computing

● Who have access to 1,000 processors?







YOU



● In the past– GPU = Graphics Processing Unit















● In the past– GPGPU = General Purpose computation using GPUs



● Now– GPU = Graphics Processing Unit

General

__device__ float3 collideCell(int3 gridPos, uint index...{ uint gridHash = calcGridHash(gridPos); ... for(uint j=startIndex; j<endIndex; j++) { if (j != index) { ... force += collideSpheres(...); } } return force;}



● Now– We have CUDA (NVIDIA, proprietary) and OpenCL (open standard)

__device__ float3 collideCell(int3 gridPos, uint index...{ uint gridHash = calcGridHash(gridPos); ... for(uint j=startIndex; j<endIndex; j++) { if (j != index) { ... force += collideSpheres(...); } } return force;}



● A (just a little) bit on the architecture of the latest NVIDIA GPU (Fermi)– Very simple core (even simpler

than the Intel Atom)

– Little cache


Why GPU?


Why GPU?

● Performance


Why GPU?

● People have used it, and it works. – Bio-Informatics

– Finance

– Fluid Dynamics

– Data-mining

– Computer Vision

– Medical Imaging

– Numerical Analytics


Why GPU?

● A new, promising area– Fast growing

– Ubiquitous

– New paradigm → new problems, new challenges


GPU v.s. Multi-core

● A lot more threads of computation are required:– The GPU has a lot more “core” than a multi-core CPU.

– A GPU core is no where as powerful as a CPU core.


GPU v.s. Multi-core

● Challenges:– Not all problems can easily be broken into many small sub-

problems to be solved in parallel.

– Race conditions are much more serious.

– Atomic operations are still doable, locking is a performance killer. Lock-free algorithms are much preferable.

– Memory access bottleneck (memory is not that parallel)

– Debugging is a nightmare.


GPU v.s. Distributed

● GPU allows much cheaper communication between different threads.

● GPU memory is still limited compared to a distributed system.

● GPU cores are not completely independent processors– Need fine-grain parallelism

– Reaching the scalability of a distributed system is difficult.


Open problems

● Data-structures● Algorithms● Tools● Theory


Open problems

● Data-structures– Requirement: Able to handle very high level of concurrent access.

– Common data-structures like dynamic arrays, priority queues or hash tables are not very suitable for the GPU.

– Some existing works: kD-tree, quad-tree, read-only hash table...


Open problems

● Algorithms– Most sequential algorithms need serious re-design to make good

use of such a huge number of cores. ● Our computational geometry research: use the discrete

space computation to approximate the continuous space result.

– Traditional parallel algorithms may or may not work.● Usual assumption: infinite number of processors● No serious study on this so far!


Open problems

● Tools– Programming language: Better language or model to express

parallel algorithms?

– Compiler: Optimize GPU code? Auto-parallelization?● There's some work on OpenMP to CUDA.

– Debugging tool? Maybe a whole new “art of debugging” is needed.

– Software engineering is currently far behind the hardware development.


Open problems

● Theory– Some traditional approach:

● PRAM: CRCW, EREW. Too general. ● SIMD: Too restricted.

– Big Oh analysis may not be good enough. ● Time complexity is relevant, but work complexity is more

important. ● Most GPU computing works only talk about actual running

time. – Performance Modeling for GPU, anyone?


Where does this go?

● Intel/AMD already have 6 core 12 threads processors (maybe more).

● SeaMicro has a server with 512 Atom dual-core processors.● AMD Fusion: CPU + GPU.

● The GPU may not stay forever, but massively-multithreaded is definitely the future of computing.


Where to start?

● Check your PC. – If it's not at the age of being able to go to a Primary school, there's

a high chance it has a GPU.

● Go to NVIDIA/ATI website, download some development toolkit, and you're ready to go.


THANK YOU

● Any questions? Just ask. ● Any suggestion? What are you waiting for. ● Any problem or solution to discuss? Let's have a private talk

somewhere (j/k)

Education

CSTalks - GPGPU - 19 Jan