44
Image Processing using CUDA Anders Eklund, PhD Virginia Tech Carilion Research Institute [email protected]

Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Embed Size (px)

Citation preview

Page 1: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Image Processing using CUDA

Anders Eklund, PhD

Virginia Tech Carilion Research Institute

[email protected]

Page 2: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Outline

• Storing an image in memory

• 2D Convolution

• Interpolation

• Calculating a similarity measure between two images

• Image registration

Page 3: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Storing an image

• How is an image stored in memory?

• There are at least two possibilities

• Row major order, column major order

Page 4: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Storing an image

• Row major order (C programming)

• A = [1 2 3] [4 5 6]

• Values are stored in memory as 1, 2, 3, 4, 5, 6

• Pixel at location (x,y) is accessed as x + y * WIDTH where WIDTH is the number of columns

Page 5: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Storing an image

• Column major order (Matlab)

• A = [1 2 3] [4 5 6]

• Values are stored in memory as 1, 4, 2, 5, 3, 6

• Pixel at location (x,y) is accessed as y + x * HEIGHT where HEIGHT is the number of rows

Page 6: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Storing an image

• Why is this important?

• When reading/writing data from/to global memory it is important to use coalesced reads and writes, for optimal performance

• Coalesced operation = the threads read/write consecutive memory locations

• Use the Nvidia profiler to check for uncoalesced memory operations

Page 7: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Storing an image

• Assume the image is stored in row major order • We use 2D thread blocks, 64 along x, 2 along y • int x = blockIdx.x * blockDim.x + threadIdx.x;

int y = blockIdx.y * blockDim.y + threadIdx.y;

• int idx = y + x * HEIGHT (wrong) • Image[idx] = 3.0f; Uncoalesced writes

• int idx = x + y * WIDTH (correct) • Image[idx] = 3.0f; Coalesced writes

Page 8: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Storing an image

• int idx = y + x * HEIGHT (wrong)

• Image[idx] = 3.0f; Uncoalesced writes

• Indices are not consecutive

• threadIdx.y = 0

• idx = 0, HEIGHT, 2*HEIGHT, 3*HEIGHT, 4*HEIGHT, …

• threadIdx.y = 1

• idx = 1, 1+HEIGHT, 1+2*HEIGHT, 1+3*HEIGHT, 1+4*HEIGHT, …

Page 9: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Storing an image

• int idx = x + y * WIDTH (correct)

• Image[idx] = 3.0f; Coalesced writes

• Indices are consecutive

• threadIdx.y = 0

• idx = 0, 1, 2, 3, 4, …

• threadIdx.y = 1

• idx = WIDTH, 1+WIDTH, 2+WIDTH, 3+WIDTH, …

Page 10: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Multiplying two images

• __global__ void Multiply(float* Result, const float* Image1, const float* Image2 , int DATA_W, int DATA_H) { int x = blockIdx.x * blockDim.x + threadIdx.x; int y = blockIdx.y * blockDim.y + threadIdx.y; if ( (x >= DATA_W) || (y >= DATA_H)) return; int idx = x + y * DATA_W; Result[idx] = Image1[idx] * Image2[idx]; }

Page 11: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Multiplying two images

• The kernel is completely bound by the memory bandwidth, two read operations, one write operation

• Uncoalesced memory operations make a big difference

• (In this specific kernel we could have used 1D thread blocks)

Page 12: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Image processing with C/C++

• We will use the CImg library to read and write images using C++ objects

• The CImg library is open source and consists of a single header file (Cimg.h)

• Works on Windows, Linux, Mac

• cimg.sourceforge.net

Page 13: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Convolution • Convolution = scalar product between filter

values and pixel values in each neighbourhood

• Slide the filter over all pixels, save each result in the center pixel

• Note the minus signs (means that the filter is rotated 180 degrees)!

Page 14: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library
Page 15: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Task 1

• Open imageprocessing_convolution.cu and imageprocessing_kernel.cu

• Complete the code for the kernel Convolution_2D_Texture

• The code reads an image from file, sends it to the GPU, copies back the filter response, writes the filter response to a new image

• Compares your result to convolution with CImg

Page 16: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Constant memory

• Constant memory is normally 64 KB

• Each multiprocessor on the GPU has a constant memory cache (8 KB)

• Put the filter kernel in constant memory

• __device__ __constant__ float c_Filter_2D[11][11];

• The filter will be in the cache during the whole execution, saves reads from global memory

Page 17: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Texture memory

• Texture memory is cached for spatially local reads

• Hardware support for reading outside the image

• Read the value at position (x,y), value = tex2D(tex_Image, x + 0.5f, y + 0.5f);

• Note the addition of 0.5f !

Page 18: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Compiling the code

• See the top of each file for how to compile the code

Page 19: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Checking results

• Compare the images filteredImageCUDA.bmp and filteredImageCImg.bmp, difference is given in difference.bmp

• Copy images from your account to your own computer, to be able to see the images

• scp [email protected]:/home/aeklund/GPULab/*.bmp .

• display filteredImageCUDA.bmp &

Page 20: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Checking results

• For the convolution, the maximum error compared to CImg should be something like 0.000015

• The total error should be something like 0.09

Page 21: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Convolution – First part

• __global__ void Convolution_2D_Texture (float* Result, int DATA_W, int DATA_H, int FILTER_W, int FILTER_H) { int x = blockIdx.x * blockDim.x + threadIdx.x; int y = blockIdx.y * blockDim.y + threadIdx.y; if ( (x >= DATA_W) || (y >= DATA_H)) return; float sum = 0.0f;

Page 22: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Convolution – Second part

• float yoffset = -((float)FILTER_H-1)/2.0f + 0.5f; for (int fy = FILTER_H-1; fy >= 0; fy--) { int xoffset = -((float)FILTER_W-1.0f)/2.0f + 0.5f; for (int fx = FILTER_W-1; fx >= 0; fx--) { sum += tex2D(tex_Image, x + xoffset,y + yoffset) * c_Filter[fy][fx]; xoffset += 1.0f; } yoffset += 1.0f; }

int idx = x + y * DATA_W; Result[idx] = sum;

Page 23: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Texture memory

• The texture memory has hardware support for linear interpolation

• So far we have only used the texture memory for fast reading from global memory (using the texture cache)

• Lets use the texture memory for interpolation

Page 24: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Rotating an image

• Use the rotation matrix

( cos(angle) -sin(angle) ) ( sin(angle) cos(angle) )

to transform each (x,y) coordinate, read from the new coordinate using texture memory

Page 25: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Rotating an image

(xnew) = ( cos(angle) -sin(angle) ) (xold) (ynew) ( sin(angle) cos(angle) ) (yold)

Page 26: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Rotating an image

• cos and sin in CUDA use double precision

• cosf and sinf use single precision

• All functions use radians and not degrees

Page 27: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Task 2

• Open imageprocessing_transformation.cu and imageprocessing_kernel.cu

• Complete the code for the kernel RotateImage, which rotates an image using texture memory for interpolation

• Extra task, rotate the image around the center of the image, instead of around the corner

Page 28: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Checking results

• Compare the images transformedImageCUDA.bmp and transformedImageCImg.bmp, difference is given in difference.bmp

Page 29: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Checking results

• For the rotation, the maximum error compared to CImg should be something like 5.23

• The total error should be something like 12331.2

• Interpolation in CImg is probably performed slightly differently

Page 30: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Rotating an image

• First part of the transformation kernel is the same as for the convolution kernel

• float xf = (float)x; float yf = (float)y; • float angler = angled/180.0f*pi; • float xnew = cosf(angler) * xf – sinf(angler)*yf; • float ynew = sinf(angler) * xf + cosf(angler)*yf; • value = tex2D(tex_Image, xnew + 0.5f,ynew + 0.5f); • TransformedImage[idx] = value;

Page 31: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Rotating an image around its center

• float xf = (float)x – (float)IMAGE_WIDTH/2.0f;

float yf = (float)y - (float)IMAGE_HEIGHT/2.0f; • float angler = angled/180.0f*pi; • float xnew = cosf(angler) * xf – sinf(angler)*yf; • float ynew = sinf(angler) * xf + cosf(angler)*yf; • xnew += (float)IMAGE_WIDTH/2.0f; • ynew += (float)IMAGE_HEIGHT/2.0f; • value = tex2D(tex_Image, xnew + 0.5f,ynew + 0.5f); • TransformedImage[idx] = value;

Page 32: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Image registration

• Image registration relies on the concept of optimizing a similarity measure

• Find the translations and rotations that maximize the similarity between two images

• Normalized cross correlation (NCC) is one of the most common similarity measures

Page 33: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Normalized cross correlation (NCC)

• Correlation between variables x and y

Page 34: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Normalized cross correlation (NCC)

• NCC can be calculated using vectors x and y as

• Only three scalar products are needed, between x and y, between x and x and between y and y (remove the mean of x and y first)

Page 35: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Representing an image as a long vector

Page 36: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

CUBLAS

• CUBLAS has many functions for matrix algebra

• Matrix-matrix multiplications,

matrix-vector multiplications, vector-vector multiplications

• The function cublasSdot can be used to calculate the scalar product between two vectors with float values

• Look in the CUDA documentation to see how it works

Page 37: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Task 3 • Open imageprocessing_similarity.cu

• Complete the code for how to calculate the correlation

between two images, using the CUBLAS library

• Each image is treated as a vector of length IMAGE_WIDTH * IMAGE_HEIGHT

• The mean values have already been removed

• You have to allocate memory on the GPU and copy data to the GPU

• Your correlation value is compared to a correlation calculated using regular C code

Page 38: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Calculating correlation using CUBLAS

• #include <cublas_v2.h>

• First create a handle to CUBLAS, already provided in the code

• cublasStatus_t status;

• cublasHandle_t handle;

• status = cublasCreate(&handle);

Page 39: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

• cublasStatus_t cublasSdot (cublasHandle_t handle, int n, const float *x, int incx, const float *y, int incy, float *result)

• n is the length of the vectors, in our case IMAGE_WIDTH * IMAGE_HEIGHT

• x is the pointer to the first image, d_Image1

• y is the pointer to the second image, d_Image2

• incx and incy is simply the distance between each element in the vectors, in our case 1

• result is the pointer to the calculated scalar product

Calculating correlation using CUBLAS

Page 40: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Calculating correlation using CUBLAS

• float productAB, productAA, productBB;

• status = cublasSdot(handle, IMAGE_WIDTH * IMAGE_HEIGHT, d_Image1, 1, d_Image2, 1, &productAB);

• status = cublasSdot(handle, IMAGE_WIDTH * IMAGE_HEIGHT, d_Image1, 1, d_Image1, 1, &productAA);

• status = cublasSdot(handle, IMAGE_WIDTH * IMAGE_HEIGHT, d_Image2, 1, d_Image2, 1, &productBB);

• float correlationGPU = productAB / (sqrt(productAA * productBB));

Page 41: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Image registration

• We now have the two most important building blocks for image registration

• Calculation of a similarity measure

• Applying a transformation to an image

Page 42: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Image registration

• Lets combine these two functions to perform a very simple form of image registration

• Register two images by finding the optimal rotation to apply to one of the images

Page 43: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Task 4

• Open imageprocessing_registration.cu and complete the code to perform a registration between the two images

• Make a for loop that in each iteration applies a rotation and calculates the correlation between the fixed image and the rotated image

• Print the rotation that gives the highest correlation

• Do not copy data to/from the GPU in each iteration!

Page 44: Image Processing using CUDA - Amazon S3 · Image Processing using CUDA Anders Eklund, PhD ... CUBLAS •CUBLAS has many ... between two images, using the CUBLAS library

Checking results

• The best rotation should be -30 degrees, giving a correlation of 0.861115