View
216
Download
0
Category
Preview:
DESCRIPTION
MA/CS 471 Fall Next Step To Parallelism Now we have made sure that there are no intrinsically serial computation steps in system solve we are free to divide up the work between processes. We will proceed by deciding which finite-element triangle goes to which processor
Citation preview
MA/CS 471MA/CS 471
Lecture 15, Fall 2002
Introduction to Graph Partitioning
MA/CS 471 Fall 2002 2
Graph (or mesh) PartitioningGraph (or mesh) Partitioning We have so far implemented a finite element Poisson solver.
The implementation is serial and not suited to parallel computing immediately
We have started to make the algorithm more suitable by switching from the LU factorization approach to solving the linear system –> to a conjugate gradient, iterative, algorithm which does not have the same bottlenecks to parallel computation
MA/CS 471 Fall 2002 3
Next Step To ParallelismNext Step To Parallelism Now we have made sure that there are no intrinsically
serial computation steps in system solve we are free to divide up the work between processes.
We will proceed by deciding which finite-element triangle goes to which processor
MA/CS 471 Fall 2002 4
Mesh PartioningMesh Partioning So far, I have supplied files which include information
on which triangle goes to which processor These files were generated using pmetis http://www-users.cs.umn.edu/~karypis This is a serial routine, however Karypis has written
a parallel version which can be used as a library. The library is called parmetis…
MA/CS 471 Fall 2002 5
MA/CS 471 Fall 2002 6
MA/CS 471 Fall 2002 7
MA/CS 471 Fall 2002 8
MA/CS 471 Fall 2002 9
MA/CS 471 Fall 2002 10
Team Project ContinuedTeam Project Continued Now we are ready to progress towards making the
serial Poisson solver work in paralllel.
This task divides into a number of steps:
Conversion of umDriver, umMESH, umStartUp, umMatrix and umSolve
Adding a routine to read in a partition file (or call parMetis to obtain a partition vector)
MA/CS 471 Fall 2002 11
umDriver modificationumDriver modification This code should now initialize MPI
This code should call the umPartition routine
This should be modified to find the number of processors and local processor ID (stored in your struct/class..)
This code should finalize MPI
MA/CS 471 Fall 2002 12
umPartitionumPartition This code should read in a partition from file
The input should be the name of the partition file, the current process ID (rank) and the number of processes (size)
The output should be a list of elements belonging to this process
MA/CS 471 Fall 2002 13
umMESH ModificationsumMESH Modifications This routine should now be fed a partition file
determining which elements it should read in from the .neu input mesh file
You should replace the elmttoelmt part with a piece of code which goes through the .neu file and reads in which element/face lies on the boundary and use this to mark whether a node is known or unknown
Each process should send a list of its “known” vertices’ global numbers to each other process so all nodes can be correctly identified as lying on the boundary or not
MA/CS 471 Fall 2002 14
umStartUp modificationumStartUp modification Remains largely unchanged (depending on how you
read in umVertX,umVertY, elmttonode).
MA/CS 471 Fall 2002 15
umMatrix modificationumMatrix modification This routine should be modified so that instead of
creating the mat matrix it should be fed a vector vecand returns mat*vec
IT SHOULD NOT STORE THE GLOBAL MATRIX AT ALL!!
I strongly suggest creating a new routine (umMatrixOP) and comparing the output from this with using umMatrix to build and multiply some vector as debugging
MA/CS 471 Fall 2002 16
umSolve modificationumSolve modification The major biggy here is the replacement of umAinvB
with a call to your own conjugate gradient solver
Note – the rhs vector is filled up here with a global gather of the elemental contributions, so this will have to be modified due to the elements on other processes.
MA/CS 471 Fall 2002 17
umCG modificationumCG modification umCG is the routine which should take a rhs and
return an approximate solution using CG.
Each step of the CG algorithm needs to be analyzed to determine the process data dependency
For the matrix*vector steps a certain amount of data swap is required
For the dot products an allreduce is required.
Strongly suggest creating the exchange sequence before the iterations start.
MA/CS 471 Fall 2002 18
Work PartitionWork Partition Here’s the deal – there are approximately six unequal
chunks of work to be done. I suggest the following code split up
umDriver, umCG umPartition, umSolve umMESH, umStartUp umMatrixOP
However, you are free to choose.
Try to minimize the amount of data stored on multiple processes (but do not make the task too difficult, by not sharing anything)
MA/CS 471 Fall 2002 19
Discussion and Project Write-UpDiscussion and Project Write-Up This is a little tricky so now is the time to form a plan and to ask any
questions.
This will be due on Tuesday 22nd October
As usual I need a complete write up.
This should include parallel timings and speed up tests (I.e. for a fixed grid find wall clock time umCG for Nprocs =2,4,6,8,10,12,14,16 and compare in a graph)
Test the code to make sure it is giving the same results (up to convergence tolerance) as the serial code
Profile your code using upshot
Include pictures showing partition (use a different colour per partition) and parallel solution.
Recommended