Hetero Lecture Slides 002 Lecture 1 Lecture-1-7-Kernel-multidimension

7/18/2019 Hetero Lecture Slides 002 Lecture 1 Lecture-1-7-Kernel-multidimension

http://slidepdf.com/reader/full/hetero-lecture-slides-002-lecture-1-lecture-1-7-kernel-multidimension 1/9

Kernel-based Parallel Pr

- Multidimensional Kernel Conf

Lecture 1.7



O

• To understand multidimensio

• Multi-dimensional block

indices• Mapping block/thread ind

data indices



host device

Kernel 1

Grid 1Block

0, 0

Block

1, 0

Grid 2

Threa

d

(0,1,

0)

Threa

d

(0,1,

1)

Thr

d

(0,

2

Thread

(0,0,0

)

Thread

(0,0,1

)

Thr

(0,

)

(1,0,0)(1,0,1)

A Multi-Dimensional Gr



16×16 blocks

Processing a Picture with a

62×76 picture



M0,2

M1,1

M0,1M0,0

M1,0

M0,3

M1,2 M1,3

M0,2M0,1M0,0 M0,3 M1,1M1,0 M1,2 M1,3 M2,1M2,0 M2,2 M2,3

M2,1M2,0 M2,2 M2,3

M3,1M3,0 M3,2 M3,3

M

Row*Width+Col = 2*4+1 = 9

M2M1M0 M3 M5M4 M6 M7 M9M8 M10 M11

MRow-Major Layout



Source Code of a Pictu

__global__ void PictureKernel(float* d_Pin,

int n, in

{

// Calculate the row # of the d_Pin and d_

int Row = blockIdx.y*blockDim.y + threadId

// Calculate the column # of the d_Pin and

int Col = blockIdx.x*blockDim.x + threadId

// each thread computes one element of d_Pif ((Row < m) && (Col < n)) {

d_Pout[Row*n+Col] = 2.0*d_Pin[Row*n+Col]

}

}

Scale every pixel va



Host Code for Launching Pictu

// assume that the picture is mxn

// m pixels in y dimension and n

dimension// input d_Pin has been allocated

copied to device

// output d_Pout has been allocat

dim3 DimGrid((n-1)/16 + 1, ((m-1)dim3 DimBlock(16, 16, 1);

PictureKernel<<<DimGrid,DimBlock>

d_Pout, n, m);



Covering a 62×76 Picture with 16×1

Not all threads in a Block will fo



To learn m

Sections

Documents

Hetero Lecture Slides 002 Lecture 1 Lecture-1-7-Kernel-multidimension