42
1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

Embed Size (px)

Citation preview

Page 1: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

1

CS4402 – Parallel Computing

Lecture 7

Parallel Graphics – More Fractals

Scheduling

Page 2: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 2

FRACTALS

Page 3: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 3

Fractals

A fractal is a set of points such that:

- its fractal dimension is infinite [infinite detail at every point].

- satisfies self-similarity: any part of the fractal is similar with the fractal.

Generating a fractal is a iterative process:

- start from P0

- iteratively generate P1=F(P0), P2=F(P1), …, Pn=F(Pn-1), …

P0 is a set of initial points

F is a transformation:

Geometric transformations: translations, rotations, scaling, …

Non-Linear coordinate transformation.

Page 4: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 4

We work with 2 rectangular areas.

The user space:

- Real coordinates (x,y)

- Bounded between [xMin,xMax]*[yMin,yMax]

The screen space

- Integer coordinates (i, j)

- Bounded between [0,w-1]*[0,h-1]

- Is upside down with the Oy axis downward

How to squeeze the user space into the screen space?

How to translate (x,y) in (i,j)?

Points vs Pixels

Page 5: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 5

Julia Sets – Self-Squaring Fractals

Consider the generating function F(z)=z2+c, z,c C.

Sequence of complex numbers: z0C and zn+1= zn2 + c.

Chaotic behaviour but two attractors for |zn|: 0 and +.

For a c C, Julia’s set Jc represents all the points whose orbit is finite.

Page 6: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 6

Julia Sets – Algorithm

Inputs:

c C the complex number; [xmin,xmax] * [ymin,ymax] a region in plane.

Niter a number of iterations for orbits; R a threshold for the attractor .

Output: Jc the Julia set of c

Algorithm

For each pixel (i,j) on the screen

translate (i,j) into (x,y)

construct z0=x+j*y;

find the orbit of z0 [first Niter elements]

if (all the orbit points are under the threshold) draw (x,y)

Page 7: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 7

for(i=0; i<=width; i++) for(j=0; j<width; j++){

int k =0;// construct the orbit of zz.re = XMIN + i*STEP; z.im = YMIN + j*STEP;for (k=0; k < NUMITER; k++) {

z = func(z,c);if (CompAbs(z) > R) break;

}

// test if the orbit in infiniteif (k>NUMITER-1) {

MPE_Draw_point(graph, i,j, MPE_YELLOW); MPE_Update(graph);

}else {

MPE_Draw_point(graph, i,j, MPE_RED); MPE_Update(graph);

}}

Page 8: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 8

Julia Sets – || Algorithm

Remark 1.

The double for loop on (i,j) can be split into processors e.g.

uniform block or cyclic on i.

uniform block or cyclic on j.

No communication at all between processors, therefore this is

embarrassingly || computation.

Remark 2.

All processors draw a block of the fractal or several rows on the XGraph.

Prank knows the area to draw.

Page 9: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 9

for(i=rank*width/size; i<=(rank+1)*width/size; i++) for(j=0; j<width; j++){// for(i=rank; i<width; i+=size) for(j=0; j<width; j++){// for(i=0; i<width; i++) for(j=rank*width/size; j<=(rank+1)*width/size; j++)// for(i=0; i<width; i++) for(j=rank; j<width; j+=size)

int k =0;// construct the orbit of zz.re = XMIN + i*STEP;z.im = YMIN + j*STEP;for (k=0; k < NUMITER; k++) {

z = func(z,c);if (CompAbs(z) > R) break;

}

// test if the orbit in infiniteif (k>NUMITER-1) {

MPE_Draw_point(graph, i,j, MPE_YELLOW); MPE_Update(graph);

}else {

MPE_Draw_point(graph, i,j, MPE_RED); MPE_Update(graph);}

}

Page 10: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 10

Page 11: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 11

Page 12: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 12

The Maldelbrot Set

THE MANDELBROT FRACTAL IS AN INDEX FOR JULIA FRACTALS

Maldelbrot Set contains all the points cC such that

z0=0 and zn+1= zn2 + c has an finite orbit.

Inputs: [xmin,xmax] * [ymin,ymax] a region in plane.

Niter a number of iterations for orbits; R a threshold for the attractor .

Output: M the Mandelbrot set.

Algorithm

For each (x,y) in [xmin,xmax] * [ymin,ymax]

c=x+i*y;

find the orbit of z0=0 while under the threshold.

if (all the orbit points are not under the threshold) draw c(x,y)

Page 13: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 13

for(i=0; i<=width; i++) for(j=0; j<width; j++){

int k =0;// construct the point cc.re = XMIN + i*STEP; c.im = YMIN + j*STEP;// construct the orbit of 0z.re = z.im = 0;for (k=0; k < NUMITER; k++) {

z = func(z,c);if (CompAbs(z) > R) break;

}

// test if the orbit in infiniteif (k>NUMITER-1) {

MPE_Draw_point(graph, i,j, MPE_YELLOW); MPE_Update(graph);

}else {

MPE_Draw_point(graph, i,j, MPE_RED); MPE_Update(graph);

}}

Page 14: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 14

The Mandelbrot Set – || Algorithm

Remark 1.

The double for loop on (i,j) can be split into processors e.g.

uniform block or cyclic on i.

uniform block or cyclic on j.

No communication at all between processors, therefore this is

embarrassingly || computation.

Remark 2.

When the orbit goes to infinity in k steps then we can draw the pixel (i,j)

with the k-th color from a palette.

Bands color-ed similarly contain points with the same behaviour.

Page 15: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 15

Page 16: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 16

Fractal and Prime Numbers

Prime numbers can generate fractals.Remarks:

- If p>5 is prime then p%5 is 1,2,3,4.- 1,2,3,4 represent direction to do e.g. left, right, up down.- The fractal has the sizes w and h.

Step 1. Initialise a matrix of color with 0.Step 2. For each number p>5

If p is prime thenif(p%5==1)x=(x-1)%w;if(p%5==2)x=(x+1)%w;if(p%5==3)y=(y-1)%w;if(p%5==4)y=(y+1)%w;

Increase the color of (x,y)

Step 3. Draw the pixels with the color matrix.

Page 17: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 17

Simple Remarks

The prime number set is infinite, furthermore it has no patter.

prime: 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, …

move: 3, 0, 2, 1, 3, 2, 4, 3, 4, 1, 2, …

The set of moves satisfies:

- it does not have any pattern moves are quite random.

- the number of 1-s, 2-s, 3-s and 4-s moves are quite similar,

hence the central pixels are reached more often.

The computation of the for loop is the most expensive operation.

Page 18: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 18

// initialise a matrix with 0for(i=0;i<width;i++)for(j=0;j<width;j++)map[i][j]=0;

//start from the image centreposX = posY = width/2;

// traverse the set of prime numbersfor(i=0;i<n;i++){

if(isPrime(2*i+1)){

// move to a new position on the map and increment itmove = (2*i+1)%5;if (move==1) posX = (posX-1)%width;if (move==2) posX = (posX+1)%width;if (move==3) posY = (posY-1)%width;if (move==4) posY = (posY+1)%width;

map[posY][posX]++}

}

Page 19: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 19

Parallel Computation: Simple Remarks

Processor rank gets some primes to test using some partitioning.

Processor rank therefore will traverse the pixels according with some moves.

Processor rank has to work with its own matrix map.

The map must be reduce on processor 0 to find the total number of hits.

Page 20: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 20

Parallel Computation: Simple Remarks

The parallel computation of processor rank follows the steps:

1. Initialise the matrix map.

2. For each prime number assigned to rank do

1. Find the move and go to a new location

2. Increment the map

3. Reduce the matrix map.

4. If processor 0 then draw the map.

Page 21: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

21

Splitting Loops

How to split the sequential loop if we have size processors?

Maths: n iterations & size processors n/size iterations per processor.

for(i=0;i<n;i++){

// body of looploop_body(data,i);

}

Page 22: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

22

Splitting Loops in Similar Blocks

P rank gets the iterations rank*n/size, rank*n/size+1,…, (rank+1)*n/size-1

for(i=rank*n/size;i<(rank+1)*n/size;i++){

//aquire the data for this iterationloop_body(data,i);

}

rank*n/size (rank+1)*n/size-1

P rank

Page 23: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

23

Splitting Loops in Cycles

P rank gets the iterations rank, rank+size, rank+2*size,….

for(i=rank;i<n;i+=size){

//aquire the data for this iterationloop_body(data,i);

}

P rank

Page 24: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

24

Splitting Loops in Variable Blocks

P rank gets the iterations l[rank], l[rank]+1,…, u[rank]

for(i=l[rank];i<=u[rank];i++){

//aquire the data for this iterationloop_body(data,i);

}

l[rank] u[rank]

P rank

Page 25: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 25

// initialise a matrix with 0for(i=0;i<width;i++)for(j=0;j<width;j++)map[i][j]=0;

//start from the image centreposX = posY = width/2;

// traverse the set of prime numbersfor(i=rank*n/size;i<(rank+1)*n/size;i++){

if(isPrime(p=2*i+1)){

// move to a new position on the map and increment itmove = p%5;if (move==1) posX = (posX-1)%width;if (move==2) posX = (posX+1)%width;if (move==3) posY = (posY-1)%width;if (move==4) posY = (posY+1)%width;

map[posY][posX]++}

}MPI_Reduce(&map[0][0], &globalMap[0][0], width*width, MPI_LONG, MPI_SUM, 0,

MPI_COMM_WORLD);

if(rank==0){

for(i=0;i<width;i++)for(j=0;j<width;j++)MPE_Draw_point(graph, i, j, colors[globalMap[i][j]);

}

Page 26: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 26

Page 27: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

04/19/23 27

Scheduling

Page 28: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

28

Parallel Loops

Parallel loops represent the main source of parallelism.

Consider a system with p processors P1,P2,…, Pp and

for i=1, n do

call loop_body(i)

end for

Scheduling Problem:

Map the iterations {1,2,…,n} onto processors so that:

- the execution time is minimal.

- the execution times per processors are balanced.

- the processor’s idle time is minimal.

Page 29: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

29

Parallel Loops

Suppose that the workload of loop_body is know and given by w1, w2,…, wn.

For Processor PJ the set of iteration is SJ={i1, i2, …, ik} so

- The execution time of Processor PJ is T(PJ)=∑ {wi: i in SJ}

- The execution time of the parallel loop is T=max{T(PJ): j=1,2,..,p}.

Static Scheduling: the partition is found at the compiling time.

Dynamic Scheduling: the partition is found at the running time.

Page 30: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

30

Data Dependency

A dependency exists between program statements when the order of statement

execution affects the results of the program.

A data dependency results from multiple use of the same location(s) in storage

by different tasks. A data is “input” for another data.

Dependencies are important to parallel programming because they are one of the

primary inhibitors to parallelism.

Loops with data dependencies cannot be scheduled.

Example: The following for loop contains data dependencies.

for i=1, n do

a[i]=a[i-1]+1

end for

Page 31: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

31

Load Balancing

Load balancing refers to the practice of distributing work among

processors so that all processors are kept busy all of the time.

If all the processor execution times are the same then a perfect load balance

is achieved.

Load Imbalance is the most important overhead of parallel computation

and reflects the case when there is a difference between two execution

times.

Page 32: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

32

Page 33: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

33

Page 34: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

34

Useful Rules:

- If the workloads are similar then use static uniform block scheduling.

- If the workloads increase/decrease then use static cyclic scheduling.

- If we know the workloads and they are simple then guide the load balance.

- If the workloads are not known they use dynamic methods.

Page 35: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

35

Balanced Workload Block Scheduling

w1, w2, …, wn the workload of the iterations

- total workload is w1+ w2+ …+ wn

- average per processor is

Each Processor gets consecutive iterations:

- lrank urank– the lower and upper indices of the block

- The workload is

size

wwwW n

...21

Wwww ull ...1

Page 36: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

36

Balanced Workload Block Scheduling

Simple to work with integrals:

Average Workload per a processor is

Each processor workload is

n

diiwsize

W0

)(1

Wdiiwid

id

x

x

1

)(

WidWxWiddiiwWdiiw id

xx

x

idid

id

1

0

)()(1

Page 37: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

37

Page 38: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

38

Page 39: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

39

Page 40: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

40

Page 41: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

41

Page 42: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

42

Granularity

Granularity is the ratio of computation to communication.

Periods of computation are typically separated from periods of communication by synchronization events.

Fine-grain Parallelism: Relatively small amounts of computational work are done between communication events.

Facilitates load balancing and Implies high communication overhead and less opportunity for performance enhancement

Coarse-grain Parallelism: Relatively large amounts of computational work are done between communication/synchronization events. Harder to load balance efficiently