Upload
cstalks
View
708
Download
0
Tags:
Embed Size (px)
DESCRIPTION
First talk, GPGPby Tung
Citation preview
Research in GPU Computing
Cao Thanh Tung
19-Jan-2011 Computing Students talk 2
Outline
● Introduction to GPU Computing– Past: Graphics Processing and GPGPU
– Present: CUDA and OpenCL
– A bit on the architecture
● Why GPU?● GPU v.s. Multi-core and Distributed● Open problems.● Where does this go?
19-Jan-2011 Computing Students talk 3
Introduction to GPU Computing
● Who have access to 1,000 processors?
19-Jan-2011 Computing Students talk 4
Introduction to GPU Computing
● Who have access to 1,000 processors?
19-Jan-2011 Computing Students talk 5
Introduction to GPU Computing
● Who have access to 1,000 processors?
YOU
19-Jan-2011 Computing Students talk 6
Introduction to GPU Computing
● In the past– GPU = Graphics Processing Unit
19-Jan-2011 Computing Students talk 7
Introduction to GPU Computing
● In the past– GPU = Graphics Processing Unit
19-Jan-2011 Computing Students talk 8
Introduction to GPU Computing
● In the past– GPU = Graphics Processing Unit
19-Jan-2011 Computing Students talk 9
Introduction to GPU Computing
● In the past– GPU = Graphics Processing Unit
19-Jan-2011 Computing Students talk 10
Introduction to GPU Computing
● In the past– GPU = Graphics Processing Unit
19-Jan-2011 Computing Students talk 11
Introduction to GPU Computing
● In the past– GPGPU = General Purpose computation using GPUs
19-Jan-2011 Computing Students talk 12
Introduction to GPU Computing
● Now– GPU = Graphics Processing Unit
General
__device__ float3 collideCell(int3 gridPos, uint index...{ uint gridHash = calcGridHash(gridPos); ... for(uint j=startIndex; j<endIndex; j++) { if (j != index) { ... force += collideSpheres(...); } } return force;}
19-Jan-2011 Computing Students talk 13
Introduction to GPU Computing
● Now– We have CUDA (NVIDIA, proprietary) and OpenCL (open standard)
__device__ float3 collideCell(int3 gridPos, uint index...{ uint gridHash = calcGridHash(gridPos); ... for(uint j=startIndex; j<endIndex; j++) { if (j != index) { ... force += collideSpheres(...); } } return force;}
19-Jan-2011 Computing Students talk 14
Introduction to GPU Computing
● A (just a little) bit on the architecture of the latest NVIDIA GPU (Fermi)– Very simple core (even simpler
than the Intel Atom)
– Little cache
19-Jan-2011 Computing Students talk 15
Why GPU?
19-Jan-2011 Computing Students talk 16
Why GPU?
● Performance
19-Jan-2011 Computing Students talk 17
Why GPU?
● People have used it, and it works. – Bio-Informatics
– Finance
– Fluid Dynamics
– Data-mining
– Computer Vision
– Medical Imaging
– Numerical Analytics
19-Jan-2011 Computing Students talk 18
Why GPU?
● A new, promising area– Fast growing
– Ubiquitous
– New paradigm → new problems, new challenges
19-Jan-2011 Computing Students talk 19
GPU v.s. Multi-core
● A lot more threads of computation are required:– The GPU has a lot more “core” than a multi-core CPU.
– A GPU core is no where as powerful as a CPU core.
19-Jan-2011 Computing Students talk 20
GPU v.s. Multi-core
● Challenges:– Not all problems can easily be broken into many small sub-
problems to be solved in parallel.
– Race conditions are much more serious.
– Atomic operations are still doable, locking is a performance killer. Lock-free algorithms are much preferable.
– Memory access bottleneck (memory is not that parallel)
– Debugging is a nightmare.
19-Jan-2011 Computing Students talk 21
GPU v.s. Distributed
● GPU allows much cheaper communication between different threads.
● GPU memory is still limited compared to a distributed system.
● GPU cores are not completely independent processors– Need fine-grain parallelism
– Reaching the scalability of a distributed system is difficult.
19-Jan-2011 Computing Students talk 22
Open problems
● Data-structures● Algorithms● Tools● Theory
19-Jan-2011 Computing Students talk 23
Open problems
● Data-structures– Requirement: Able to handle very high level of concurrent access.
– Common data-structures like dynamic arrays, priority queues or hash tables are not very suitable for the GPU.
– Some existing works: kD-tree, quad-tree, read-only hash table...
19-Jan-2011 Computing Students talk 24
Open problems
● Algorithms– Most sequential algorithms need serious re-design to make good
use of such a huge number of cores. ● Our computational geometry research: use the discrete
space computation to approximate the continuous space result.
– Traditional parallel algorithms may or may not work.● Usual assumption: infinite number of processors● No serious study on this so far!
19-Jan-2011 Computing Students talk 25
Open problems
● Tools– Programming language: Better language or model to express
parallel algorithms?
– Compiler: Optimize GPU code? Auto-parallelization?● There's some work on OpenMP to CUDA.
– Debugging tool? Maybe a whole new “art of debugging” is needed.
– Software engineering is currently far behind the hardware development.
19-Jan-2011 Computing Students talk 26
Open problems
● Theory– Some traditional approach:
● PRAM: CRCW, EREW. Too general. ● SIMD: Too restricted.
– Big Oh analysis may not be good enough. ● Time complexity is relevant, but work complexity is more
important. ● Most GPU computing works only talk about actual running
time. – Performance Modeling for GPU, anyone?
19-Jan-2011 Computing Students talk 27
Where does this go?
● Intel/AMD already have 6 core 12 threads processors (maybe more).
● SeaMicro has a server with 512 Atom dual-core processors.● AMD Fusion: CPU + GPU.
● The GPU may not stay forever, but massively-multithreaded is definitely the future of computing.
19-Jan-2011 Computing Students talk 28
Where to start?
● Check your PC. – If it's not at the age of being able to go to a Primary school, there's
a high chance it has a GPU.
● Go to NVIDIA/ATI website, download some development toolkit, and you're ready to go.
19-Jan-2011 Computing Students talk 29
THANK YOU
● Any questions? Just ask. ● Any suggestion? What are you waiting for. ● Any problem or solution to discuss? Let's have a private talk
somewhere (j/k)