11
4 th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress” 1/11 Dynamic Re-Allocation of Meshes for parallel Finite Element Applications D R A M A D R A M A Project No. 24953 of the European Commission's ESPRIT Programme (Long Term Research) Project Partners and contributing personnel: CEMEF (Ecole des Mines/ARMINES): T. Coupez, H. Digonnet Engineering Systems International S.A.: J. Clinckemaillie, G. Thierry K.U. Leuven: B. Maerten, D. Roose NEC Europe Ltd., C&C Research Laboratories: A. Basermann, J. Fingberg, G. Lonsdale Transvalor S.A.: R. Ducloux Contact for further information: G. Lonsdale (lonsdale @ ccrl-nece.technopark.gmd.de) Project Web-page: http://www.cs.kuleuven.ac.be/cwis/research/natw/DRAMA.html Project Overview Background to the developments The ESPRIT project DRAMA has been initiated to support the take-up of large scale parallel simulation in industry by dealing with the main problem which restricts the use of message-passing simulation codes - the inability to perform dynamic load-balancing. The particular focus of the project is on the requirements of industrial Finite Element codes, but codes using Finite Volume formulations will also be able to make use of the project results. The focus on the message-passing approach corresponds to the target of addressing large scale and thus highly scalable parallel applications. The most obvious cases where message-passing codes require dynamic load balancing are those where parallelisation via mesh partitioning is combined with adaptive meshing (as in local mesh refinement and coarsening) or adaptive re-meshing. However, as will be seen when considering the applications included within the DRAMA project, a need for dynamic load balancing arises in applications with fixed meshes where computational and/or communications costs vary greatly as the simulation progresses. Major advances have been made in recent years in the two areas which form the starting point for the project activities: the development of parallel mesh-partitioning algorithms suitable for dynamic re- partitioning (re-allocation of sub-meshes to processors at run-time); the migration and optimisation of industrial-strength simulation codes to HPC platforms using the message-passing paradigm. However, most industrial-strength parallel simulations using large processor numbers are performed with static partitioning and non-adaptive meshing - or when adaptive meshing, then with a sequentialised re- partitioning phase which greatly reduces the parallel performance. Thus, much of the exploitation within the end-user industry can currently be categorised as “exploratory installations”. The DRAMA

Dynamic re-allocation of meshes for parallel Finite Element applications

Embed Size (px)

Citation preview

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998,Presentation at the Mini-Symposium “ Dynamic Balancing: Current Status and Recent Progress”

1/11

Dynamic Re-Allocation of Meshes for parallel Finite Element Applications

DRAMA

DRAMA

Project No. 24953 of the European Commission's ESPRIT Programme (Long Term Research)

Project Partners and contributing personnel:

CEMEF (Ecole des Mines/ARMINES): T. Coupez, H. DigonnetEngineer ing Systems International S.A.: J. Clinckemaillie, G. ThierryK.U. Leuven: B. Maerten, D. RooseNEC Europe L td., C& C Research L aborator ies: A. Basermann, J. Fingberg, G. LonsdaleTransvalor S.A.: R. Ducloux

Contact for further information: G. Lonsdale (lonsdale @ ccrl-nece.technopark.gmd.de)Project Web-page: http://www.cs.kuleuven.ac.be/cwis/research/natw/DRAMA.html

Project Overview

Background to the developmentsThe ESPRIT project DRAMA has been initiated to support the take-up of large scale parallelsimulation in industry by dealing with the main problem which restricts the use of message-passingsimulation codes - the inability to perform dynamic load-balancing. The particular focus of the projectis on the requirements of industrial Finite Element codes, but codes using Finite Volume formulationswill also be able to make use of the project results. The focus on the message-passing approachcorresponds to the target of addressing large scale and thus highly scalable parallel applications.The most obvious cases where message-passing codes require dynamic load balancing are those whereparallelisation via mesh partitioning is combined with adaptive meshing (as in local mesh refinementand coarsening) or adaptive re-meshing. However, as will be seen when considering the applicationsincluded within the DRAMA project, a need for dynamic load balancing arises in applications withfixed meshes where computational and/or communications costs vary greatly as the simulationprogresses.Major advances have been made in recent years in the two areas which form the starting point for theproject activities: the development of parallel mesh-partitioning algorithms suitable for dynamic re-partitioning (re-allocation of sub-meshes to processors at run-time); the migration and optimisation ofindustrial-strength simulation codes to HPC platforms using the message-passing paradigm. However,most industrial-strength parallel simulations using large processor numbers are performed with staticpartitioning and non-adaptive meshing - or when adaptive meshing, then with a sequentialised re-partitioning phase which greatly reduces the parallel performance. Thus, much of the exploitationwithin the end-user industry can currently be categorised as “exploratory installations”. The DRAMA

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998,Presentation at the Mini-Symposium “ Dynamic Balancing: Current Status and Recent Progress”

2/11

project aims to bring together the developments in parallel partitioning and parallel FE applications toensure that the potential of scalable computing can be achieved for fully-functional industrialsimulation, which includes efficient adaptive meshing (and re-meshing) options. The parallel dynamicre-partitioning routines should also be to handle the full complexity and range of finite elements asused in industrial structural mechanics codes, as exemplified by the applications within the project.

The DRAMA Approach

The central product of the project will be the DRAMA Library comprising various tools for dynamicre-partitioning of unstructured finite element applications. The core library functions will perform aparallel computation of a mesh re-allocation that will re-balance the costs of the application codebased on the DRAMA cost model. The DRAMA cost model is able to take account of: dynamicallychanging computational and communications requirements. Furthermore, it is formulated in such away that all information can be provided by the application based on its actual local data and measuredcosts (via code instrumentation). The library will provide support information to enable an efficientmigration of the re-allocation between processors.Via the DRAMA Library, dynamic load balancing may be achieved which wil l enable scalable,efficient parallel FE applications, even with adaptive mesh refinement (coarsening) and re-meshing.As a by-product to this approach, fully parallel mesh generation wil l be enabled via exploitation of theparallel re-partitioning of adaptively generated meshes.

The mesh re-allocation approach to dynamic load balancing wil l be demonstrated and validated by theleading industrial codes PAM-CRASH (for crashworthiness simulation), PAM-STAMP (for metalstamping / deep-drawing and related simulations), FORGE-3 (for forging with viscoplasticincompressible materials). Despite this emphasis on the validation codes within the project, the libraryhas been designed to be general purpose. Since the final DRAMA library will be put into the publicdomain, it is hoped that a wide range of applications will be able to make use of the project results.

The DRAMA Applications

While the technology to be developed has an impact for a wide range of applications, one possible‘classical’ application being the adaptive shock-capturing features in aerodynamics codes, theDRAMA project focuses on structural mechanics codes whose large deformation simulations highlightthe importance of the dynamic handling. The industrial simulation codes chosen for the validation ofthe DRAMA approach and library are representative of the wide-ranging finite element simulationcodes which have a natural requirement for a re-partitioning library as parallelisation aid. All theDRAMA applications use time-marching as basic solution procedure and both explicit (PAM-CRASH/-STAMP) and implicit (FORGE3) methods are included. Causes of load-imbalance, andresulting degradation of scalability, are: (a) a dynamic behaviour of computational cost per elementand of the communication patterns ; (b) meshes which are changing during the calculation - adaptivemeshing or re-meshing, including reshaping, refinement and coarsening. The self-impacting contact-impact algorithms used in PAM-CRASH are extreme cases of the former. Adaptive meshing isessential for codes like FORGE3 or PAM-STAMP where the large deformations would otherwiseresult in extremely severe distortions of the mesh elements.

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998,Presentation at the Mini-Symposium “ Dynamic Balancing: Current Status and Recent Progress”

3/11

FORGE-3 & Parallel adaptive re-meshing

FORGE3 from Transvalor is an implicit finite element code designed for the simulation of three-dimensional metal forming. It is able to simulate the large deformations of viscoplastic incompressiblematerials with unilateral contact conditions. The code is based on a stable mixed velocity/pressureformulation using tetrahedral unstructured meshes and employs an implicit time stepping technique.Central to the Newton iteration dealing with the non-linearity arising from the behaviour of materialand the unilateral contact condition is an iterative procedure based on a conjugate residual method forthe solution of the large linear system.The parallelisation of the full code, including adaptive re-meshing, was done within the EUROPORTproject ([1]) employing a mesh partitioning approach. For forging simulations, the capability for re-meshing is a unique, competitive advantage of the FORGE3 code. A functioning 3-D parallel re-meshing procedure has been established which requires a repartitioning stage ('element migration'),not only to avoid load imbalance but also to deal with the interface re-meshing. This has to date beenachieved via a centralised re-allocation process, which becomes a bottleneck, especially for largeproblems or when a large number of processors are used. Figure 1 shows an example of the evolvingmesh as local re-meshing is followed by re-partitioning and (subsequent) interface re-meshing as thelocal re-meshing procedure is performed at later time-steps.

Figure 1: Adaptive re-meshing and re-partitioning for a crankshaft forging simulation with FORGE3(A) initial mesh & partition; (B) parallel meshing without repartitioning; (C) repartitioning; (D) mesh andpartition after several further increments.

The reader is referred to [2,3] and the references therein for further information.

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998,Presentation at the Mini-Symposium “ Dynamic Balancing: Current Status and Recent Progress”

4/11

PAM-CRASH & PAM-STAMP

The PAM-CRASH and PAM-STAMP codes are two of the ESI/PSI Group products that are buil taround the PAM-SOLID core solver libraries. This means that they share the same basic algorithmsand computational kernels, but include different algorithms and routines for application-specificfunctions. The most crucial components within the crashworthiness code PAM-CRASH are thecontact-impact algorithms whose main feature, from the DRAMA viewpoint, is the dynamicallychanging computation and communications costs. Contact-impact algorithms are also crucial to thesimulations performed by PAM-STAMP, but the much more significant parallelisation requirement isthe efficient handling of adaptive meshing since around 90% of stamping applications rely on theadaptive meshing features. In contrast to the re-meshing approach adopted by FORGE3, PAM-STAMP uses a mesh-refinement (and coarsening) strategy based on the original user-defined mesh.An example of such a mesh can be seen in Figure 2.

Leaving details of the algorithms and their parallelisation to [6,7] and the references therein, asummary of the PAM-SOLID-based codes is as follows:The non-linear explicit finite element method employed uses a Lagrangian formulation of theequations of motion of the nodes of the unstructured mesh constructed by the replacement of thephysical model by an interconnected set of mechanical elements. Modelling of the materials involvedin the physical model is done on an element level. This locali ty of discretisation enables all stress-strain calculations to be performed element-wise and the use of a simple central difference time-marching scheme for the, thus diagonalised, equations of motion. For most industrial models, themajority of elements employed are 4 node thin shell (reduced integration) elements. For PAM-CRASH, these are supplemented by a whole range of, in part highly specialised, mechanical elements.The two dominant (in terms of CPU time) computational components are: the element-wise (and thuswith mesh partitioning highly local) stress-strain calculations; contact-impact calculations. The contactalgorithms serve to detect and correct penetration of structural components and have, in contrast to thestress-strain calculations, a pseudo-global nature. They first perform a proximity and penetrationsearch, followed by a penetration correction procedure. An implementation (or practical usage) issuewhich affects parallelisation is that the contact calculations are performed only within user-defined(and not necessarily disjunct) areas, referred to as “slide-lines”.

The current message-passing version of PAM-CRASH, further developed from the prototypesproduced within the CAMAS-EUROPORT ([1]) and EUROPORT-D ([8]) projects, employs a staticpartitioning approach. This can lead to greatly reduced scalabil ity due to the dynamically changingcosts within the contact-impact slide-lines, whose distribution across processes may be in any case notbalanced.

Figure 2: Adaptive meshing as occurring in PAM-STAMP simulations

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998,Presentation at the Mini-Symposium “ Dynamic Balancing: Current Status and Recent Progress”

5/11

The DRAMA Library

The DRAMA Library is designed to be called by parallel message-passing (MPI) finite elementapplications. The “expectation” of such applications is for the rapid provision of information about: are-partitioning of the mesh which balances the costs occurring in the application; the interactionbetween processes required to achieve the re-partitioning. Given the normal complexity andapplication dependence of such algorithms, the actual data migration would not be expected of thelibrary. Thus, the DRAMA library and its re-partitioning algorithms must be efficient, parallel(operating on distributed data) and must also take the current partition into account, in order to avoidhigh communication costs during the resulting data migration. Furthermore, it should be based onactual occurring costs, rather than some abstract heuristic. The current library design and re-partitioning modules included has taken these requirements into account by the careful definition ofthe cost model and library interface. A summary of this strategy would be: “The DRAMA Library isdesigned to balance in parallel the actual costs occurring on the application's finite element mesh”.

The DRAMA Cost Model & Library Interface

An introduction to the initial definition of the DRAMA Cost Model and the interface has been given insome detail in [9]. Full details of the library interface wil l be provided (November '98) in the publiclyavailable project deliverable [10]. In the following, an overview of the features included in the costmodel will be given, followed by the components of the library and a simple example of the meshinformation transfer between the code and the DRAMA library. The DRAMA Library is written in Cand message-passing exploits MPI. The library may be called by applications written in both Fortranand C.

The interface between the application code and the library is designed around the DRAMA cost model(which results in an objective cost function for the load-balancing re-partitioning algorithms) and theinstrumentation of the application code to specify current and future computational andcommunication costs.

The DRAMA cost model provides a measure of the quality of the current distribution and allows theprediction of the effect on the computation of moving some parts of the mesh to other sub-domains.Calculation and communication speeds of the processors are taken into account by a combination ofhardware specific parameters and costs which are based on time measurements and enumerationprovided by application code instrumentation. Heterogeneous machine architectures can also be takeninto account in this way. The essential feature is that the cost model is mesh-based, so that it is able totake account of the various workload contributions and communication dependencies that can occur infinite element applications. Being mesh-based, the DRAMA cost model includes both per element andper node computational costs. Indeed, within a finite element code, part of the computations may beperformed element-wise, for example, a matrix-assembly phase, while other operations are nodebased, such as the update of physical variables and nodal co-ordinates or the solution of systems oflinear equations. Furthermore, the inter-sub-domain communication is frequently carried out usingnode lists. Therefore, the cost model includes element-element, node-node, and element-node datadependencies.In addition to data dependencies between neighbouring elements and nodes in the mesh, dependenciesbetween arbitrary parts of the mesh can occur. For the PAM-CRASH code, such data dependenciesoriginate within the contact-impact algorithms when the penetration of mesh segments by non-connected nodes is detected and corrected. The DRAMA cost model (and of course the libraryinterface) allows the construction of “virtual elements” which represent the occurring costs of suchdependencies.

The current library design includes several types of mesh re-partitioners that may be selected by theapplication: mesh-migration, graph partitioning & co-ordinate partitioning. An overview of these

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998,Presentation at the Mini-Symposium “ Dynamic Balancing: Current Status and Recent Progress”

6/11

approaches wil l be given in the next sub-section. The library builds upon these partitioning optionswith modules to provide the interface to the full DRAMA input mesh and the cost monitoringparameters and to deliver the full DRAMA output mesh and data migration information (old ↔ newmesh relationships). An overview of the library design is given in Figure 3.

Figure 3. The DRAMA Library Design

The reader is referred to [10] for details of the interface format. The following example will focus onthe input and output DRAMA meshes and mesh relationships and will omit the definition of weightsper element and node type and also the definition of timing and enumerated operation/communicationcounts.The numbering format used within the DRAMA library interface is a dual numbering which isglobally unique - it combines local node and element numbering with a unique processor number towhich the node/element is “assigned”. Typical finite element applications with replicated nodes onsub-domain boundaries or with overlap/halo regions will be able to conform to this numbering -though mapping to- and from this numbering will have to be performed by the application. The simpleoriginal mesh with its partitioning on two processes (using the above dual numbering) is shown inFigure 4a together with the two parts of the input mesh provided to the DRAMA Library (in parallelby the two calli ng application processes). The horizontal line within the table is a demarcationbetween the two parallel inputs. The resulting partition (in the updated numbering system) is shown in

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998,Presentation at the Mini-Symposium “ Dynamic Balancing: Current Status and Recent Progress”

7/11

Figure 4b. Figure 4c shows the output from the DRAMA Library - tables giving each process its newpartition and the data migration relationships.

Figure 4a: Existing partitioned example mesh and the DRAMA input mesh

Figure 4b: The resulting re-partitioned mesh

DRAMA Inputmesh (on the twoprocesses)

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998,Presentation at the Mini-Symposium “ Dynamic Balancing: Current Status and Recent Progress”

8/11

Figure 4c: DRAMA Output information

New Mesh,i.e. Newpar tition

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998,Presentation at the Mini-Symposium “ Dynamic Balancing: Current Status and Recent Progress”

9/11

Re-Partitioning Modules

As mentioned above, several types of mesh re-partitioners are available for selection by theapplication: mesh-migration, graph partitioning & co-ordinate partitioning.

The mesh-migration approach uses the DRAMA cost model directly as a cost function when applyingan iterative procedure in which processor pairs perform load balancing (or more precisely costfunction balancing) by the logical exchange of elements between their sub-domains. While theoreticalconvergence proofs show that in worst cases the number of iterative steps grows with the square of thenumber of processes, practical experience with finite element meshes shows that load balance isachieved after only small numbers of iterations. For further information see [11,12].

‘Classical’ graph partitioning methods employing weighted graphs derived from either element ornodal mesh connections would be unable to fully account for the costs arising in a finite elementapplication in general. The mesh-to-graph module of the DRAMA library constructs an appropriateweighted graph from the distributed mesh. Depending on the properties and the needs of theapplication, the resulting graph can be an 'element graph', a 'node graph', or a combined 'element-nodegraph'. The latter contains all possible relevant cost contributions for finite element codes. For a givenpartition, edges between nodes, elements or elements and nodes represent different communicationrequirements between processors. For instance, edges between elements and nodes lead tocommunication when a sub-domain possesses an element but not all i ts nodes. The combination of themesh-to-graph module with a suitable graph partioner results in a mesh partitioner based on theDRAMA cost model.Within the current version of the DRAMA library, the subsequent graph partitioning is carried out bycalling routines from PARMETIS, the software package developed by Karypis et al., University ofMinneapolis ([13,14]). PARMETIS contains several strategies for graph re-partitioning; in particulara multilevel method based on 'diffusing' load to adjacent partitions. The idea behind this multileveltechnique is that from the originally graph a hierarchy of coarser graphs is generated (by merginggraph vertices to 'supervertices'). A careful re-partitioning of the coarsest graph is computed, and thenthis new partitioning is successively 'projected' onto the next finer graph and improved. The latter isachieved by a load diffusion scheme.

The co-ordinate partitioning option refers to the standard recursive co-ordinate bisection approach. Inaddition to providing a possible default partitioning scheme, it is also included to provide a possibili tyfor a later investigation of a dual-partitioning approach (see [15]) for the contact-impact phase of thecalculations within PAM-CRASH.

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998,Presentation at the Mini-Symposium “ Dynamic Balancing: Current Status and Recent Progress”

10/11

Future Outlook

Following the library design and initial implementation phase, investigations are underway to verifythat the re-partitioning defined by the library, based on the parameters and costs provided by codeinstrumentation via the library interface, leads to a maintenance of load balance within the continuingsimulations. Results of very preliminary tests with the self-impacting box beam model for the PAM-CRASH code (which includes the use of virtual elements defined by the contact-impact algorithms)are shown in Figure 5.

Figure 5: Preliminary re-partitioning results with PAM-CRASH interfaced to DRAMA

In the near future, validation and performance benchmarking wil l be carried out with both PAM-CRASH and FORGE3 simulations using (limited size) industrial examples. In addition, thepossibili ties for parallel mesh generation based on the combination of parallel remeshing and re-partitioning through mesh migration will be demonstrated.In the latter stages of the project, the expectation is to achieve high scalabil ity with large scaleindustrial modell ing with all three DRAMA applications codes including adaptive meshing andremeshing.

In addition to continuing modification of the re-partitioning modules currently included in the library,a co-operation with the University of Greenwich wil l lead to the inclusion of a DRAMA interface to amodified version of the Jostle mesh partitioning software ([16]).

InitialPar tition After 1st

re-partitioning

After 2nd re-partitioning

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998,Presentation at the Mini-Symposium “ Dynamic Balancing: Current Status and Recent Progress”

11/11

References

[1] K. Stüben, H. Mierrendorff , C.-A. Thole and O. Thomas, Parallel industrial Fluid Dynamics andStructural Mechanics codes, 90-98, in [ 3] , 1996

[2] J. A. Ell iott, S. H. Brindle, A. Colbrook, D. G. Green and F. Wray, Real industrial HPCapplications, 29-35, in [ 3] , 1996

[3] H. Liddell , A. Colbrook, B. Hertzberger and P. Sloot (Eds.), Proceedings of the HPCN '96Conference, Lecture Notes in Computer Science 1067, Springer-Verlag, 1996

[4] T. Coupez and S. Marie, From a direct solver to a parallel iterative solver in 3D formingsimulation, Int. J. Supercomputer Applications and High Performance Computing, 11(4), 205-211, 1997

[5] T. Coupez, S. Marie and R. Ducloux, Parallel 3D simulation of forming processes includingparallel remeshing and reloading, Numerical Methods in Engineering '96 (Proceedings of 2nd

ECCOMAS Conference, J.-A. Désidéri et. al. Editors), 738-743, Wiley, 1996[6] J Clinckemailli e, B Elsner, G Lonsdale, S Meliciani, S Vlachoutsis, F de Bruyne and M Holzner,

Performance issues of the parallel PAM-CRASH code, Int. J. Supercomputer Applications andHigh Performance Computing, 11(1), 3-11,1997

[7] G. Lonsdale, A. Petitet, F. Zimmermann, J. Clinckemailli e, S. Meliciani and S. Vlachoutsis,Programming crashworthiness simulation for parallel platforms, Mathematical and ComputerModelli ng, to appear.

[8] EUROPORT-D ESPRIT HPCN Project No. 21102, World-Wide Web Document:http://www.gmd.de/SCAI/europort-d/

[9] B. Maerten, A. Basermann, J. Fingberg, G. Lonsdale, D. Roose, Parallel dynamic mesh re-partitioning in FEM codes, Advances in Computational Mechanics with High PerformanceComputing (Proceedings of the 2nd Euro-Conference on parallel and distributed computing forcomputational mechanics, B.H.V. Topping Ed.), Saxe-Coburg, 163-167, 1998

[10] The DRAMA Consortium, Library Interface Definition, DRAMA Project Deliverable D1.2a,1998

[11] T. Coupez, Parallel adaptive remeshing in 3D moving mesh finite element, Numerical GridGeneration in Computational Field Simulation, Vol. 1 (B.K. Soni et.al Editors), 783-792,Mississippi University, 1996

[12] C. Ozturan, H. L. de Cougny, M. S. Shephard and J. E.Flaherty, Parallel adaptive meshrefinement and redistribution on distributed memory computers, Comp. Meth. Mech. Engnrg.,119, 123-137, 1994

[13] G. Karypis, K. Schloegel and V. Kumar, PARMETIS Parallel graph partitioning and sparsematrix ordering library, Version 1.0, Dept. of Computer Science, University of Minnesota, 1997

[14] K. Schloegel, G. Karypis and V. Kumar, Multilevel diffusion schemes for repartitioning ofadaptive meshes, J. Parallel and Distributed Computing, 47, 109-124, 1997

[15] S. A. Attaway, E. J. Barragy, K. H. Brown, D.R. Gardner, B. A. Hendrickson and S. J.Plimpton, Transient solid dynamics simulations on the Sandia/Intel Teraflop computer,Supercomputing '97 (Proceedings on CD-ROM), Technical Paper, 1997

[16] C. Walshaw, M. Cross and M. Everett, Dynamic load-balancing for parallel adaptiveunstructured meshes, Parallel processing for scientific computing (M. Heath et. al. Eds.), SIAM,1997