Open CL Hucai Huang. Introduction Today's computing environments are becoming more multifaceted, exploiting the capabilities of a range of multi-core

OpenCL

Hucai Huang

IntroductionIntroduction

Today's computing environments are becoming more multifaceted, exploiting the capabilities of a range of multi-core microprocessors, central processing units(CPUs), digital signal processors, reconfigurable hardware, and Graphic processing units(GPUs).


Because of the importance of multi-core, many-core, the need to unified language between all kind of GPUs and CPUs became the most important issue for programmers.


OpenCL stands for Open Computing Language. It supported by Apple & AMD NVIDIA and other several vendors. The Khronos group who made OpenGL made OpenCL. And needed only 6 months to come up with the specifications.

OpenCLOpenCL

1. Royally-free.2. Support both task and data parallel programing modes.3. Works for all kinds of GPGPUs4. Works for all kind of multi cores CPUs5. Works on Cell processors.6. Works on handhelds and mobile devices.7. Works with C language under C99.

OpenCLOpenCL

Can make query on available devices and build an context of the available devices.

Thats mean the programmers would be able to program more freely for any kind of device.

Also there application would survive even if the hardware changed in the future.

OpenCLOpenCL

Is a low level language. Even lower than CUDA.

Some high level languages such as RapidMind planing to implement OpenCL in their programming language.

RapidMind is already within OpenCL working group

OpenCL Platform Model

OpenCL Memory Model

Basic OpenCL Program StructureBasic OpenCL Program Structure

Matrix additionMatrix addition

__kernel void vec_add (__global const float *a,

__global const float *b,

__global float *c)

{

int gid = get_global_id(0);

c[gid] = a[gid] + b[gid];

}

VecAdd: Context, Devices and VecAdd: Context, Devices and QueueQueue

// create the OpenCL context on a GPU device

cl_context context = clCreateContextFromType(0, // (must be 0)

CL_DEVIC_TYPE_GPU,

NULL, //error callback

NULL, // user data

NULL); // error code

// get the list of GPU devices associated with context

size_t cb;

clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &cb);

cl_device_id *devices = malloc(cb);

clGetContextInfo(context, CL_CONTEXT_DEVICES, cb, devices, NULL);

// create a command-queue

cl_cmd_queue cmd_queue = clCreateCommandQueue(context,

devices[0],

0, // default options

NULL); // error code

VecAdd: Creat Memory Object

cl_mem memobjs[3];// allocate input buffer memory objectsmemobjs[0] = clCreateBuffer(context,CL_MEM_READ_ONLY | // flagsCL_MEM_COPY_HOST_PTR,sizeof(cl_float)*n, // sizesrcA, // host pointerNULL); // error codememobjs[1] = clCreateBuffer(context,CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,sizeof(cl_float)*n, srcB, NULL);// allocate input buffer memory objectmemobjs[2] = clCreateBuffer(context, CL_MEM_WRITE_ONLY,sizeof(cl_float)*n, NULL, NULL);

VecAdd: Program and Kernel

// create the programcl_program program = clCreateProgramWithSource(context,1, // string count&program_source, // program stringsNULL, // string lengthsNULL); // error code// build the programcl_int err = clBuildProgram(program,0, // num devices in device listNULL, // device listNULL, // optionsNULL, // notifier callback function ptrNULL); // error code// create the kernelcl_kernel kernel = clCreateKernel(program, “vec_add”, NULL);

VecAdd: set Kernel Arguments

// set “a” vector argumenterr = clSetKernelArg(kernel,0, // argument index(void *)&memobjs[0], // argument datasizeof(cl_mem)); // argument data size// set “b” vector argumenterr |= clSetKernelArg(kernel, 1, (void *)&memobjs[1], sizeof(cl_mem));// set “c” vector argumenterr |= clSetKernelArg(kernel, 2, (void *)&memobjs[2], sizeof(cl_mem));

VecAdd: invoke kernel, read output

size_t global_work_size[1] = n; // set work-item dimensions// execute kernelerr = clEnqueueNDRangeKernel(cmd_queue, kernel,1, // Work dimensionsNULL, // must be NULL (work offset)global_work_size,NULL, // automatic local work size0, // no events to wait onNULL, // event listNULL); // event for this kernel// read output arrayerr = clEnqueueReadBuffer( context, memobjs[2],CL_TRUE, // blocking0, // offsetn*sizeof(cl_float), // sizedst, // pointer0, NULL, NULL); // events

ConclusionConclusion

OpenCL would attract HPC programmers because it is long term strategy with GPUs and other accelerators.


It might be complicated language for the short application, but it is very useful with more complicated application (look over N-Body example)


There are some restrictions on OpenCL. But it won't affect the language reliability.


There will be other implementations for OpenCL in other high end language where would be easy for the normal programmers.


In the end, you might find OpenCL very difficult. But when you master it, you will be the master of parallel computing

Sources

Takizawa, Hiroyuki and Kobayashi, Hiroaki. Hierarchical parallel processing of larg scale data clustering on a PC cluster with GPU co-processing. Soringer science 2006.

Nickson, Christopher. HPC Wire. Apple Planning Snow Leopard Surprise?. Degital Trends - Computing News. Dec 18th, 2008. <http://news.digitaltrends.com/news-article/18693/apple-planning-snow-leopard>.

Sutter, Herb. The Free Lunch is Over: A Fundamental Turn Toward Concurrency in Software.March 30, 2005. <http://www.gotw.ca/publications/concurrency-ddj.htm>.

HPC Wire. OpenCL on the Fast Track. Nov 4th, 2008. <http://www.hpcwire.com/blogs/OpenCL_On_the_Fast_Track_33608199.html>.

HPC Wire. OpenCL makes it Official. Dec 9th, 2008. <http://www.hpcwire.com/blogs/OpenCL-Makes-It-Official-35841524.html>.

West, John. OpenCL: To GPGPU and Beyond. HPC Wire. Dec 11th, 2008. <http://www.hpcwire.com/features/OpenCL-To-GPGPU-and-Beyond-36016144.html>.

Wolfe, Michael. Compilers and More: A GPU and Accelerator Programming Model. HPC Wire. Dec 9th, 2008. <http://www.hpcwire.com/specialfeatures/sc08/features/Compilers_and_More_A_GPU_and_Accelerator_Programming_Model.html?viewAll=y>.

McCool D, Michael and Du Toit, Stefanus. OpenCL Updates. HPC Wire. Nov 21st, 2008. <http://www.hpcwire.com/specialfeatures/sc08/features/OpenCL_Update_34860779.html>.

http://news.digitaltrends.com/news-article/18693/apple-planning-snow-leopard

http://www.gotw.ca/publications/concurrency-ddj.htm

http://www.hpcwire.com/blogs/OpenCL_On_the_Fast_Track_33608199.html

http://www.hpcwire.com/blogs/OpenCL-Makes-It-Official-35841524.html

http://www.hpcwire.com/features/OpenCL-To-GPGPU-and-Beyond-36016144.html

http://www.hpcwire.com/specialfeatures/sc08/features/Compilers_and_More_A_GPU_and_Accelerator_Programming_Model.html?viewAll=y

http://www.hpcwire.com/specialfeatures/sc08/features/Compilers_and_More_A_GPU_and_Accelerator_Programming_Model.html?viewAll=y

http://www.hpcwire.com/specialfeatures/sc08/features/OpenCL_Update_34860779.html

Documents

Open CL Hucai Huang. Introduction Today's computing environments are becoming more multifaceted, exploiting the capabilities of a range of multi-core