October 25, 2007 P_HUGG and P_OPT: An Overview of Parallel Hierarchical Cartesian Mesh Generation and Optimization- based Smoothing Presented at NASA Langley,

October 25, 2007

P_HUGG and P_OPT:An Overview of Parallel Hierarchical Cartesian

Mesh Generation and Optimization-based Smoothing

Presented at NASA Langley, Hampton, VA, bySteve L. Karman, Jr.

Vincent C. Betro

Outline

• Parallel Hierarchical Unstructured Cartesian Mesh Generation

- Terminology and Strategy

- Partitioning

- Results

• Parallel Optimization-based Smoothing

- Terminology and Strategy

- Partitioning

- Results• Conclusions• Future Work

P_HUGG: Terminology and Strategy

• Develop an algorithm for generating a high-quality mesh– Create hybrid or general cut polyhedral meshes with body-

conforming cut elements using closed loops/shells– Allow user to define refinement spacing, which may be larger or

smaller than the existing geometry spacing– Modify spacing based on curvature and intersection tests– Speed process by using MPI and grid partitioning

• Implement various C++ class structures for compact communication during meshing

• Validate mesh quality by testing on several geometries and optimizing the final mesh with parallel optimization-based smoothing


P_HUGG uses Isotropic refinement to build the Octree structure.

This allows for uniformity which makes data structures more consistent and communication more efficient.


• The building block of a hierarchical Cartesian mesh is the voxel, which is short for “volumetric pixel”.

• Voxels are indexed using a processor-index pair, to aid in parallel communication

• Each voxel contains information pointing to its relational location in the mesh, but no physical coordinates

– cell-to-node hash table

– parent index

– neighbor indices

– child indices

– boundary facet list/boundary element shell list


Physical nodes…

• are also indexed using a processor-index pair• are assigned ownership by the lowest processor that owns a voxel

which contains the node• contain the physical coordinates of each node created as part of

refining a voxel• can be ignored until the general cutting process when tolerances

dictate the snapping of nodes


Super Cell CreationIn order to begin recursive

refinement, a Cartesian super cell is

created around the existing

geometry, unless the outer boundary

is initially a cube in which case the

super cell and the outer boundary

are coincident and there will be no

“external” voxels to be turned off

during cutting.


Spawning to

Multiple ProcessorsOnce the super cell has been refined

into as many (or more) voxels as

there exist processors, each

processor receives one (or more)

voxel(s). Ownership of the voxel is

reassigned to the processor to which

it is spawned, and the nodes’

processor-index pairs are then

updated.


Refinement occurs on each processor simultaneously and…

• new nodes are created at the mid-edges, mid-faces, and centroids of existing voxels

• nodes are guaranteed to be unique, since all nodes created on a partition boundary are communicated and a common index is established

• neighbors are re-calculated based on the tree structure• lineage of voxels is passed along in the processor-index pair• voxels are tagged for refinement based on spacing parameters and

mesh quality constraints (including cell size gradation parameter)


Mesh quality is enforced by determining

unacceptable voxel configurations

One face connecting more than three different levels of refinement

Opposite neighbors both at a higher level of refinement

than the current voxel


Ghost voxels…

• are integral in assuring that refinement is consistent on borders between processors

• are denoted by having a different processor set as owner in the processor-index pair than the processor on which they reside

• allow new nodes created during refinement and cutting to be indexed correctly and not be duplicated

• exist in the normal neighboring positions to a voxel as well as at the corners

• contain no information about non-bordering children or the results of the cutting process


• The voxels shaded in orange are in the upper left corner of a given processor.

• The voxels shaded in green are the finest level ghost voxels used in the neighbor tables on that processor.

• The voxels shaded in blue are the ghost parents of ghost voxels, but only show the children directly bordering the processor in question.


Once a mesh has been

generated around a

geometry, the elements

(voxels and nodes) that

are outside the

computational domain

must be “turned off”.

Then, body conforming

shells are generated

with the remainders of

voxels that have been

“cut” by the geometry.


Mark in and out status of nodes during shell creation;

use flood fill to mark remaining nodes.Uncut inside voxels are stored

as hexahedra or polyhedra.

Within a voxel, polygons with common boundaries are merged.

Create shells using the cut polygons and exposed voxel faces.

Eliminate collinear points to minimize the number of edges on

the final polygonal elements.

Triangular facets are passed down the Octree to the finest level, clipped by the bounds of each


ToleranceTolerance

The tolerance used in P_HUGG is a user-specified factor to be multiplied times the length of a edge of a voxel on the finest level. This tolerance is used for snapping cutting intersections to already created points without significant loss of accuracy in reconstructing the original geometry. The proper size for this factor has proven to be difficult to determine from case to case.

P_HUGG: Partitioning

• In P_HUGG2D, round robin partitioning was used. This can cause a sizable increase in surface area and load balancing issues.

• In, P_HUGG, the partitioning is based on a factor computed by finding the ratio of facet areas to user spacing parameters within each pre-spawn voxel.

• This weight factor rectifies the load-balancing issues and Metis will be implemented to assist in reducing surface area.

8 procs (P_HUGG)

The color coding corresponds to the domain owned by each of the processors. A disjoint domain is a distinct possibility, when the number of processors is not a

power of eight or the facet area to spacing parameter ratio is applied.

8 procs (P_HUGG2D)


The geometry to processor distribution on the surface of the cube without the adjacent mesh demonstrates the equal distribution obtained by the use of the facet area to defined spacing parameters ratio in correcting load balancing issues.


• The 64-processor distribution of the mesh around the sphere shows that while load balancing has been greatly improved, the surface area issues need to be corrected with Metis.

• The same can be said of the 8-processor distribution of the mesh around the hull.


Example of a non-trivial partition of the M6 wing, y and z planes

P_HUGG: PartitioningExample of a non-trivial partition of the M6 wing, with and without mesh

P_OPT: Terminology and Strategy

In order to remove high aspect ratio elements (sliver cells) and get improved results from the flow solver, optimization-based smoothing is performed on the mesh.

• Each node is perturbed based on a cost function calculated using Jacobians and condition numbers of the surrounding elements.

• If the perturbation improves the cost function for the node, the node is moved permanently to the new position

• The mesh is moved until eventually all perturbations cannot improve the cost function

P_OPT: Terminology and Strategy

• The M6 wing after optimization was used to spread out cells that get bunched about a voxel level change.

• The M6 wing before optimization would not be conducive to running on a flow solver due to sliver cells.

*case run in serial

P_OPT: Partitioning

• Metis is used to either decompose a mesh on one machine and feed it back into the optimizer as parallel mesh files or decompose the mesh while the code is running and feed it to other procs through communication

• Nodes are partitioned with no weighting using compressed row storage and eventually will be weighted by whether or not they are part of the geometry facets

• A standard CGNS file format is used in the parallel mesh files with the addition of a partition to global node map at the end of the file which includes the owner of each node on the process, the local node number of the node on the owning process, and the global node number

P_OPT: Results

Two Processor Optimized Cube

P_OPT: Results

Four Processor Optimized Cube

P_OPT: Results

Sixteen Processor Optimized Cube

Conclusions

P_HUGG• The algorithm now exists to generate large, high-quality meshes on complex three-

dimensional geometries in parallel• The meshes generated can be either hybrid or composed of general polyhedra• The use of general cutting allows for very precise, body-conforming meshes• User-defined spacing allows flexibility in mesh generation without loss of mesh quality• The use of the Cartesian, hierarchical Octree structure allows for ease of initial mesh

generation and future adaptation

P_OPT• The algorithm now exists to optimize large meshes in parallel using node perturbation

and cost function analysis• Users may either supply a serial CGNS file or multiple parallel CGNS files with the

addition of a partiton-to-global map• Metis is used to partition the mesh such that both surface area and load balancing

are optimal

Future Work

P_HUGG• Implement Metis to assist in load balancing and decreasing surface area

• Evaluate parallel performance and test robustness

• Tackle multiply connected polyhedra issues

P_OPT• Implement ParMetis to further load balance previously parallelized mesh

files

• If a node is part of a geometry facet, apply an extra weighting to the node when passing it to Metis or ParMetis to better load balancing effectiveness and decrease surface area

• Do parallel efficiency testing on a cluster to attempt to better the algorithm itself as well as test its robustness

Documents

October 25, 2007 P_HUGG and P_OPT: An Overview of Parallel Hierarchical Cartesian Mesh Generation and Optimization- based Smoothing Presented at NASA Langley,