16
SPECIAL COURSE ON COMPUTER ARCHITECTURES ~GPU PROGRAMMING CONTEST~ 2012/06/22 Email: [email protected]

2012/06/22 Email: [email protected]. Contents GPU (Graphic Processing Unit) CUDA Programming Target: Clustering with Kmeans How to use

Embed Size (px)

Citation preview

Page 1: 2012/06/22 Email: nomura@am.ics.keio.ac.jp. Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use

SPECIAL COURSE ON COMPUTER ARCHITECTURES

~GPU PROGRAMMING CONTEST~

2012/06/22

Email: [email protected]

Page 2: 2012/06/22 Email: nomura@am.ics.keio.ac.jp. Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use

Contents

GPU (Graphic Processing Unit) CUDA Programming Target: Clustering with Kmeans How to use toolkit1.0 Towards the fastest program

Page 3: 2012/06/22 Email: nomura@am.ics.keio.ac.jp. Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use

GPU (Graphic Processing Unit)

Multicore processorSeveral handreds coresSP: Core in GPUSM: Composed of SPs

High memory bandwidth

GPU

SM

Global Memory

SM SM

SM

SPSP 240

SM30

(Each of them has 8 SP)

Memory Bandwidth

141.7 GB/s

SP

SP SP

SP SP

SP SPSP: Streaming ProcessorSM: Streaming MultiProcessor

Table: Specification of GeForce280

Page 4: 2012/06/22 Email: nomura@am.ics.keio.ac.jp. Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use

Flow of CUDA Program

1. Allocate GPU memory cudaMalloc()

2. Transfer input data cudaMemcpy()

3. Execute kernel

4. Transfer result data

5. Free GPU memory cudaFree()

Host

Device (GPU)

SP

CPUMain

Memory

Global Memory

SP

Kernel

Kernel

SP

Kernel

input 1

input 2

… …

input N

Array

… …

Array

input 1

input 2

input N

Data T

ransfer

output 1

output 2

output N

output 1

output 2

output N

Data T

ransfer

Page 5: 2012/06/22 Email: nomura@am.ics.keio.ac.jp. Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use

Target application: clustering with Kmeans A famous method for clustering

A program with kmeans method for a host processor is given. Modify it so that it works on GPU as fast as possible.

GeForce Tesla (GTX280) in Amano Lab. can be used for this contest.

Page 6: 2012/06/22 Email: nomura@am.ics.keio.ac.jp. Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use

Kmeans method(1/5)

Initial state:Nodes in a certain color is distributed randomly.(Here, 100nodes with 5 colors are shown)

STEP1:Centre of gravity is computed for each colored node set.(X in the figure is each centre)

Reference URL: http://d.hatena.ne.jp/nitoyon/20090409/kmeans_visualise

Page 7: 2012/06/22 Email: nomura@am.ics.keio.ac.jp. Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use

Kmeans method(2/5)

STEP2The color of each node is changed into that of the nearest centre.

STEP1:Again, the centre of gravity is computer in node set with the same color.

Page 8: 2012/06/22 Email: nomura@am.ics.keio.ac.jp. Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use

Kmeans method(3/5)

STEP2:Again, the color of each node is changed into that of the nearest centre.

STEP1:Again, the centre of gravity is computer in node set with the same color.

Page 9: 2012/06/22 Email: nomura@am.ics.keio.ac.jp. Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use

Kmeans method(4/5)

STEP2:Again, the color of each node is changed into that of the nearest centre.

STEP1:Again, the centre of gravity is computer in node set with the same color.

Page 10: 2012/06/22 Email: nomura@am.ics.keio.ac.jp. Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use

Kmeans method(5/5)

STEP2:Again and again, the color of each node is changed into that of the nearest centre.

Terminate Condition:The color of all nodes are the same as the color of the centre, thus, there is no need to change the color.→Terminate.

Page 11: 2012/06/22 Email: nomura@am.ics.keio.ac.jp. Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use

How to start ssh 131.113.69.98 for login.

Your account has been available. If you have not received mail about account, please send mail to [email protected] .

Download kmeans.tar.gz and ungip. There are useful sample codes in kmeans. Mission1:Make GPU version based on CPU

version.Describe gpuKMeans in kmeans.cu

cpuKMeans in main.cu is a CPU version for reference. Mission2:Optimize the CPU code so that it runs

as fast as possible.

Page 12: 2012/06/22 Email: nomura@am.ics.keio.ac.jp. Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use

Toolkit1.0 kmeans.cu

To describe K-means program for GPU Please modify this file

main.cu To read input data, describe CPU program Modification forbidden

check.c To visualize output data by OpenCV

gen.c To generate input data

Makefile data/

Input data result/

Output data

Page 13: 2012/06/22 Email: nomura@am.ics.keio.ac.jp. Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use

How to use Toolkit1.0

$ makeCompile

$ make gpuExecute GPU Program

$ make cpuExecute CPU Program

$ ./gen SEED (SEED = 0,1,2,…)Generate input data

Page 14: 2012/06/22 Email: nomura@am.ics.keio.ac.jp. Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use

Sample Code

Vector addition program for GPU$ make : Compile$ ./main : Program run

PointMemory allocation on GPU

○ cudaMalloc(), cudaFree()Data transfer between CPU and GPU

○ cudaMemcpy()Format of GPU kernel function

Page 15: 2012/06/22 Email: nomura@am.ics.keio.ac.jp. Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use

Towards the fastest program Minimum requirement

Implementation K-means program on GPUParallelizing STEP1 or STEP2 in K-means

How to optimize programParallelizing both of STEP1 and STEP2Shared memory, Constant memoryCoalesced Memory Access  etc

Web Site NVIDIA GPU Computing Document: http

://developer.nvidia.com/nvidia-gpu-computing-documentation

Fixstars CUDA Infromation Site: http://gpu.fixstars.com/index.php/

Page 16: 2012/06/22 Email: nomura@am.ics.keio.ac.jp. Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use

Announcement:

If you have not an account mail to [email protected] Your name should be included in the mail.

Deadline: 7/22 (Fri) 24:00 Copy follows in ~/comparch

Source code and simple report Please check the web site. Additional informatio

n will be on it. If you have any question about the contest,

please send mail to:  [email protected]