21
Physics of the Earth and Planetary Interiors 163 (2007) 2–22 Toward an automated parallel computing environment for geosciences Huai Zhang a,b,, Mian Liu b , Yaolin Shi a , David A. Yuen c,d , Zhenzhen Yan a , Guoping Liang e a Laboratory of Computational Geodynamics, Graduate University of Chinese Academy of Sciences, Beijing, PR China b Department of Geological Sciences, University of Missouri, Columbia, MO 65211, USA c Department of Geology and Geophysics, University of Minnesota, Minneapolis, MN 55455, USA d Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA e Institute of Mathematics, Chinese Academy of Sciences, Beijing, PR China Received 22 January 2007 Abstract Software for geodynamic modeling has not kept up with the fast growing computing hardware and network resources. In the past decade supercomputing power has become available to most researchers in the form of affordable Beowulf clusters and other parallel computer platforms. However, to take full advantage of such computing power requires developing parallel algorithms and associated software, a task that is often too daunting for geoscience modelers whose main expertise is in geosciences. We introduce here an automated parallel computing environment built on open-source algorithms and libraries. Users interact with this computing environment by specifying the partial differential equations, solvers, and model-specific properties using an English-like modeling language in the input files. The system then automatically generates the finite element codes that can be run on distributed or shared memory parallel machines. This system is dynamic and flexible, allowing users to address different problems in geosciences. It is capable of providing web-based services, enabling users to generate source codes online. This unique feature will facilitate high-performance computing to be integrated with distributed data grids in the emerging cyber-infrastructures for geosciences. In this paper we discuss the principles of this automated modeling environment and provide examples to demonstrate its versatility. © 2007 Published by Elsevier B.V. Keywords: Parallel computing; Finite element method; Geodynamics; Cyber-infrastructure; GEON 1. Introduction With rapid growth of affordable parallel computers, especially the Beowulf class PC- and workstation- clusters, parallel computing has become a powerful Corresponding author at: Laboratory of Computational Geody- namics, Graduate University of Chinese Academy of Sciences, Beijing, PR China. E-mail address: [email protected] (H. Zhang). tool in the studies of geodynamo (Glatzmaier and Roberts, 1995), seismic wave propagation (Komatitsch and Tromp, 2002a,b), mantle convection (Kameyama and Yuen, 2006; Matyska and Yuen, 2005), lithospheric dynamics (Surussavadee and Staelin, 2006), and other fields of geosciences during the past decade. Powerful as it is, parallel computing is often time-consuming and requires multidisciplinary expertise: the understanding of the fundamental physics governing the geological processes, the ability to mathematically formulate the physics in terms of proper partial differential equa- 0031-9201/$ – see front matter © 2007 Published by Elsevier B.V. doi:10.1016/j.pepi.2007.05.008

Toward an automated parallel computing …web.missouri.edu/~lium/pdfs/Papers/Zhang07-PEPI-parallel...Physics of the Earth and Planetary Interiors 163 (2007) 2–22 Toward an automated

Embed Size (px)

Citation preview

Physics of the Earth and Planetary Interiors 163 (2007) 2–22

Toward an automated parallel computingenvironment for geosciences

Huai Zhang a,b,∗, Mian Liu b, Yaolin Shi a, David A. Yuen c,d,Zhenzhen Yan a, Guoping Liang e

a Laboratory of Computational Geodynamics, Graduate University of Chinese Academy of Sciences, Beijing, PR Chinab Department of Geological Sciences, University of Missouri, Columbia, MO 65211, USA

c Department of Geology and Geophysics, University of Minnesota, Minneapolis, MN 55455, USAd Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA

e Institute of Mathematics, Chinese Academy of Sciences, Beijing, PR China

Received 22 January 2007

Abstract

Software for geodynamic modeling has not kept up with the fast growing computing hardware and network resources. In thepast decade supercomputing power has become available to most researchers in the form of affordable Beowulf clusters and otherparallel computer platforms. However, to take full advantage of such computing power requires developing parallel algorithms andassociated software, a task that is often too daunting for geoscience modelers whose main expertise is in geosciences. We introducehere an automated parallel computing environment built on open-source algorithms and libraries. Users interact with this computingenvironment by specifying the partial differential equations, solvers, and model-specific properties using an English-like modelinglanguage in the input files. The system then automatically generates the finite element codes that can be run on distributed or

shared memory parallel machines. This system is dynamic and flexible, allowing users to address different problems in geosciences.It is capable of providing web-based services, enabling users to generate source codes online. This unique feature will facilitatehigh-performance computing to be integrated with distributed data grids in the emerging cyber-infrastructures for geosciences. Inthis paper we discuss the principles of this automated modeling environment and provide examples to demonstrate its versatility.© 2007 Published by Elsevier B.V.

; Cyber

Keywords: Parallel computing; Finite element method; Geodynamics

1. Introduction

With rapid growth of affordable parallel computers,especially the Beowulf class PC- and workstation-clusters, parallel computing has become a powerful

∗ Corresponding author at: Laboratory of Computational Geody-namics, Graduate University of Chinese Academy of Sciences, Beijing,PR China.

E-mail address: [email protected] (H. Zhang).

0031-9201/$ – see front matter © 2007 Published by Elsevier B.V.doi:10.1016/j.pepi.2007.05.008

-infrastructure; GEON

tool in the studies of geodynamo (Glatzmaier andRoberts, 1995), seismic wave propagation (Komatitschand Tromp, 2002a,b), mantle convection (Kameyamaand Yuen, 2006; Matyska and Yuen, 2005), lithosphericdynamics (Surussavadee and Staelin, 2006), and otherfields of geosciences during the past decade. Powerfulas it is, parallel computing is often time-consuming and

requires multidisciplinary expertise: the understandingof the fundamental physics governing the geologicalprocesses, the ability to mathematically formulate thephysics in terms of proper partial differential equa-

h and P

tethfsmeitsbmsfsab

mCwoigaeMulppeiiaT(deo2aa(filP

ape

H. Zhang et al. / Physics of the Eart

ions (PDEs) and initial and boundary conditions, thexpertise of developing parallel numerical algorithmso solve the PDEs on parallel machines, and the know-ow of benchmarking the model, optimizing the codesor the computer platforms, and visualizing the mas-ive model outputs. Thus, developing parallel computingodels usually entails a team effort with combined

xpertise in physical sciences and software engineer-ng. In some fields where a general model is centralo the community, such as air-circulation in the atmo-phere or convection in the mantle, collective effort haseen devoted to develop community parallel computingodels. One example is the MM5 parallel computing

oftware packages (http://www.mmm.ucar.edu/mm5/)or weather forecasting. In other fields, such as litho-pheric dynamics, where modeling demands are diversend complicated, parallel computing remains inaccessi-le to many researchers.

Geoscientists around the world are taking variouseasures to overcome this difficulty. For example, theomputational Infrastructure for Geodynamics (CIG,ww.geodynamics.org) is a membership-governedrganization that develops, maintains, and dissem-nates open-source software packages for typicaleodynamic models, such as mantle convectionnd global seismic wave propagation. In min-ral physics, the Virtual Laboratory for Planetaryaterials (VLAB, http://vlab.msi.umn.edu) allows

sers to run large-scale quantum-mechanical calcu-ations using web-services technology. The GeoFEMroject (http://geofem.tokyo.rist.or.jp/) provides a multi-urpose, multi-physics parallel finite element solvingnvironment, in which specific types of models,ncluding seismic wave propagation, fluid dynam-cs, and structure mechanics, can be plugged into

general supporting system. Escript, developed byhe Earth Systems Science Computational Centre

ESSCC) at The University of Queensland, is a systemesigned to implement PDE-based geoscience mod-ls using a computational modeling language basedn the object-oriented Python scripts (Davies et al.,004). Numerous commercial finite element pack-ges, such as PDE2D (http://members.aol.com/pde2d/)nd Finite Element program GENerator (FEGEN)http://www.fegensoft.com), now allow users to generatenite element codes using high-level scripts (modeling

anguages) in the input files (drivers) that specify theDEs and model-specific properties.

In this paper, we introduce a prototype of anutomated parallel computing environment. Instead ofroviding static packages of specific models, this systemnables geoscientists to model their own problems using

lanetary Interiors 163 (2007) 2–22 3

high-level scripts, and automatically generate the com-plete, machine-independent Fortran source codes whichcan be run on parallel machines. This system may freegeoscientists from most part of the time-consuming anderror-prone coding for parallel computation. It has thepotential to seamlessly integrate data grids with dis-tributed high-performance computing facilities in thenew generation of cyber-infrastructures. In the follow-ing, we first introduce this system, and then describe themodeling language user interface, and the automation ofgenerating finite element codes. We then provide exam-ples of using this system to model a variety of geologicalprocesses to show the versatility of this system.

2. Overview of the automated parallelcomputing environment

The work flow of geosciences investigation usuallyconsists of the following steps: first, from the observa-tional data we formulate a physical model that attemptsto capture the fundamental physics responsible for theobserved. Second, we describe the physical model usinga set of mathematical equations, usually in the formof partial differential equations, with proper initial andboundary conditions as well as model-specific proper-ties such as the rheology and other physical parameters.Finally, we solve these mathematical equations with theproper constraints; the model is tested and explored bycomparing the model predictions with observational data(Fig. 1).

Most physical models in Earth sciences are com-plicated and require solving non-linear, coupled, ortime-dependent PDEs in spatial and temporal domains.For these problems, numerical solutions are oftenneeded, and finite element method (FEM) is by farthe most popular method. FEM discretizes the modeldomain into a finite number of elements with simplegeometry; for each element the partial differential equa-tions can be simplified with a set of algebraic equations.Thus, FEM turns a problem of solving a system of partialdifferential equations with complex initial and bound-ary conditions into a much easier problem of solving asystem of algebraic equations that can be performed bycomputers.

Although simple in principle, using FEM is usu-ally complicated and time-consuming, especially whendealing with coupled, non-linear problems or parallelalgorithms. The tasks can be challenging even to skilled

modelers. The automated parallel computing environ-ment to be introduced here is aimed at lowering thetechnical hurdles of using parallel FEM. This systemincludes two key aspects: (1) the user interface based

4 H. Zhang et al. / Physics of the Earth and Planetary Interiors 163 (2007) 2–22

ated p

Fig. 1. Flow chart showing how the modeling language based autominvestigations in geosciences.

on a high-level, English-like modeling language; (2) anautomated system of generating and assembling finiteelement codes. Similar to C, Java, Fortran and other high-level computer languages, the modeling language is a setof expressions to describe various PDEs and mathemat-ical algorithms using familiar mathematical notationsand specific variables. It uses advanced interpreters andinterfaces to numerically describe the physical modeland instruct the system to generate the desired For-tran source codes. The automation of generating FEMcodes is possible because the major part of a FEM code,which is olving a large system of matrices, is similarfor many FEM models, thus preexisting subprogramsand algorithms can be reused. This approach has beenused in some commercial software packages, such as thePDE2D (http://members.aol.com/pde2d/), which allowsuser to answer a set of questions about the model region,partial differential equations, boundary conditions, and

preferred solver and graphical output options through aninteractive driver; the finite element codes are then gen-erated and compiled automatically, sparing user muchlabor of finite element coding.

arallel computing environment may ease the work flow of scientific

3. Modeling language expressions for partialdifferential equations

Here we explain the modeling language used in ourautomated parallel computing environment. All partialdifferential equations fall into there main categories:elliptic, parabolic, and hyperbolic equations. For design-ing numerical solutions, we group these equations intoanother three types: static, one-order partial derivativeand two-order partial derivative of u with respect to timet, respectively:

Lu = f (3.1)

C∂u

∂t+ Lu = f (3.2)

M∂u2

∂t2 + C∂u

∂t+ Lu = f (3.3)

where u denote unknown and L, C, and M are the linearor non-linear operator (or coefficient) of u, ∂u/∂t, and∂u2/∂2t, respectively. The right-hand-side term f, can beeither linear or non-linear. If L, C, M or f is non-linear,

h and P

ta

ustpaa2((ft

ebnnpl

3l

gadd{

waLrm

w

(

wdoooa

(

H. Zhang et al. / Physics of the Eart

hen these PDEs are non-linear. Eqs. (3.2) and (3.3) arelso called evolutionary equations.

In FEM, the Ritz and Galerkin methods are commonlysed to convert PDEs to a system of high dimen-ional linear equations, which may then be projectedo a lower dimensional system by using variationalrinciples. A variety of numerical techniques can bepplied to manage non-linearity at each timestep, suchs the Newton–Raphson method (Dettmer and Peric,006) for non-linear iterations, the Crank–NicholsonHonda et al., 1993) and the Newmark schemesKomatitsch et al., 1999; Zampieri and Pavarino, 2006)or time-dependent equations with respect of time.

In our modeling environment, each term of thesequations, as well as the algorithms, can be specifiedy the modeling language which link them to properumerical segments and assemble them into a coherentumerical scheme. This works even for systems of cou-led physics. We illustrate here the use of the modelinganguage in three examples.

.1. Describing a linear problem with the modelinganguage

We first show how to use the modeling lan-uage to specify and describe a linear problem, suchs a the Dirichlet problem (Krishnamoorthy, 1995)efined by the Laplace’s equation in a two-dimensionalomain:

−�u = f (in Ω)

u = u0 (on ∂Ω)(3.4)

here f is given and u is an unknown function of x and y intwo-dimensional connected open domain Ω, and � isaplacian operator denote the second derivative of u with

espect to x and y whose boundary ∂Ω is a continuousanifold or a polygon.The weak or variational form of Eq. (3.4) can be

ritten as following,

∇u, ∇u)Ω −∮

∂Ω

∂u

∂nu ds = (f, u)Ω (3.5)

here� is the gradient, (·, ·) the inner product in the two-imensional plane, and u represents virtual displacementf u. Furthermore, u can be turned into an inner product

n a suitable space of “once differentiable” functionsf Ω that is zero on ∂Ω, and (3.5) can be rewrittens:

∇u, ∇u)Ω = (f, u)Ω (3.6)

lanetary Interiors 163 (2007) 2–22 5

In our parallel computing environment, we describethis problem use the following scripts in the modelinglanguage:

The ‘defi’ section defines the unknown u and the coor-dination. The “shap” section specifies the finite elementdiscretization: the type of the shape function and thenode number of each element. For example, replacing‘%1’ and ‘%2’ with ‘quadrilateral’ and ‘4’ means that weselect a four-node quadrilateral element for discretiza-tion of the domain Ω. ‘gaus %3’ defines the method ofGaussian quadrature rule in each coordinate (Kronrod,1965), and ‘func gux guy’ defined two functions, whoseexpressions are defined in the ‘func’ section; ‘[u/x]’denotes the derivative of u with respect to x. In the‘stiff’ section, the actual Laplace equations’ weak formexpression is given. The symbol [·; ·] denotes the scalar

inner product of two functions. The function before thesemicolon is the unknown function or its derivative, thefunction behind the semicolon is the virtual displacementof the unknown function or its derivative. ‘dist’ means

h and P

ui+1,

6 H. Zhang et al. / Physics of the Eart

that the element stiff matrix will be calculated and storedin distributed manner. The right hand function is givenvia ‘load’ keyword.

Having described the PDEs and their weak form(Eqs. (3.4)–(3.6)) in the modeling language, we need toinstruct the system what method we wish to use to solvethis Laplace’s equation, and other options for the numer-ical scheme. These instructions are given in two inputfiles: ‘.gio’ and ‘.gcn’. The ‘.gio’ file tells the system theelement type and auxiliary items, such as input data formaterial coefficients. The ‘.gcn’ file specifies what kindof solving method we want to use. For this example, the

contents of the ‘.gcn’ file include these lines:

In the ‘defi’ section, ‘a’ defines a field named ‘a’,and ‘ell’ tell the system to use elliptic type algorithm tosolve this problem. This algorithm is one of the standardalgorithms built in the system. Users can also define theirown algorithms by modifying the ‘.gcn’ file.

3.2. Describing a non-linear problem with themodeling language

For a non-linear problem, the procedure is slightlydifferent. For the PDEs:{

−�u + u3 = −4 + (x2 + y2)3

(in Ω)

u = x2 + y2 (on ∂Ω)(3.7)

The corresponding Galerkin weak form is

(−�u, u)Ω + (u3, u)Ω = (f, u)Ω (3.8)

Integrating by parts on the first term of the left-hand-sideof (3.8), we obtain

(∇u, ∇u) + (u3, u) = (f, u) +(

∂u, u

)(3.9)

{(∇dui+1, ∇u)Ω + (3u2

i , d

ui+1 = ui + �ui+1

Ω Ω Ω∂n ∂Ω

where f is the right-hand-side of (3.7):

f = −4 + (x2 + y2)3

(3.10)

lanetary Interiors 163 (2007) 2–22

where u represents virtual displacement of u. Further-more, u can be turned into an inner product on a suitablespace of “once differentiable” functions of Ω that arezero on ∂Ω, then (3.9) can be rewritten as:

(∇u, ∇u)Ω + (u3, u)Ω = (f, u)Ω (3.11)

To apply Newton’s method (Ypma, 1995), we have tolinearize (3.11) by defining the functional F as:{

F (u) = (∇u, ∇u)Ω + (u3, u)Ω = (f, u)Ωui+1 = ui + �ui+1

(3.12)

the linearized form is

u)Ω = −(∇ui, ∇u)Ω − (u3i , u)Ω + (f, u)Ω (3.13)

Add F(ui) to each side of (3.13), we have:

(∇dui+1, ∇u)Ω + (3u2i , dui+1, u)Ω + (∇ui, ∇u)Ω

+(u3i , u)Ω − (f, u)Ω = 0 (3.14)

Finally, we obtain

(∇ui+1, ∇u)Ω + 3u2i (ui+1, u)Ω = (f + 2u3

i , u)Ω

(3.15)

where ui+1 denotes unknown value after i + 1 timesnon-linear iteration, ui the unknown value after i timesnon-linear iteration and �ui+1 denotes unknown incre-ment value after i + 1 times non-linear iteration, respec-tively.

In our FEM modeling environment these FEMalgorithms can be expressed using the modeling lan-guage (Appendix A) similar to the linear problems.Following a definition section, an ‘equation’ sectiondescribes the formation of the linear system and thedesired linear solvers, such as the Gauss elimina-tion method, or pre-conditionered Krylov subspaceBiCG-stable method. Finally, a ‘solution’ describesthe algorithm for non-linear iteration schemes, suchas the modified Newton–Raphson method (AppendixA).

3.3. Describing a non-linear couple problem usingthe modeling language

A multi-field or coupled problem generally con-sists of several partial differential equations, and the

unknowns are interdependent. In this case a couplednumerical scheme is needed to specify the inter-connections between different unknown fields and tocontrol the iteration order. This can be expressed explic-

h and P

ie

fiattawfiat

Ttrncs

4

pisomatcabi

st

H. Zhang et al. / Physics of the Eart

tly in the input file ‘.gcn’ as shown in the followingxample:

The keyword ‘defi’ specifies the algorithms for eacheld and their interrelationship. In this example, therere two fields: ‘a’ and ‘b’. The line “a ell b &” meanshat the algorithm ‘ell’ (algorithm for solving ellipticype of PDE for a single field, a standard linear staticlgorithm that is available from the algorithm library)ill be used for solving fields ‘a’ and ‘b’. Furthermore,eld ‘a’ depends on the results of field ‘b’. Note that thelgorithm for each field can be either directly taken fromhe algorithm library or provided by users.

The second section consists of four command lines.he first two lines initialize fields ‘a’ and ‘b’, respec-

ively. The third and fourth lines solve fields ‘b’ and ‘a’,espectively. The solution of field ‘b’ is obtained at eachon-linear step before the calculation of field ‘a’. Theharacter string ‘sin’ in the command lines means thatymmetric solver is adopted.

. Automatic generation of parallel FEM codes

A finite element model usually consists of some basicarts: pre-processing, mesh and data partition, comput-ng element matrix and load vector, assembling andolving large sparse matrices, and post-processing. Manyf these parts are similar, even identical, in different FEModels. In an automated FEM modeling systems such

s the one introduced here, these “static” parts are writ-en as standardized FEM segment source codes, whichan be specified using the modeling language describedbove, and the final FEM source code may be assembledy plugging the necessary segments of the source codes

nto a program stencil (Fig. 2).

In our parallel computing environment, the programtencils were designed based on the domain decomposi-ion method. We designed a cluster of program stencils

lanetary Interiors 163 (2007) 2–22 7

for different types of PDEs. We have also included someof the new program stencils emerged in recent years(Gross et al., 2005) with enriched intrinsic commands.

4.1. Domain decomposition

Parallel computing takes “divide to conquer” strategyto solve complicated problems. In our system we usethe domain decomposition method (DDM) (Valli, 1999)to dived the modeling domain into sub-domains, andsolve for the unknown fields in these sub-domains simul-taneously using multiple computer processors. Fig. 3illustrates the use of this method in solving a bound-ary value problem in domain Ω, which is divided intointer-connected sub-domains (Ωi, i = 1, 2, . . ., n). Indoing so the original problem is turned into a groupof relatively simpler and smaller boundary value prob-lems that can be solved simultaneously on differentcomputer nodes. The inner boundaries between the sub-domains, shownby lines Γ 1 and Γ 2 in Fig. 3, can benon-overlapped or overlapped, and various precondi-tioners (Ainsworth, 1996) can be used to treat the innerboundaries (Jianwen Cao, 2005; Korneev and Jensen,1999; Rakowsky, 1999; Zumbusch, 1996). An arrayof methods, including the Language multiplier basedsubstructuring method (Mandel and Tezaur, 1996), pre-conditioned Krylov subspace iteration (Axelsson et al.,2004; Ito and Toivanen, 2006; Prudencio et al., 2006),parallel direct solvers (Agullo et al., 2006), and multi-grid (Douglas, 1996; Jung, 1997; Lv et al., 2006) andmulti-level (Axelsson and Larin, 1998) solvers, can beused to solve for the parallel FEM problems.

4.2. Mesh and data partition

To decompose the model domain for parallel com-puting, edge- or vertex-based partition methods, withor without overlapping regions, need to be selected inadvance. For the partitioning scheme, we also need todefine variables for both local and global interfaces.After the computing domain is partitioned, local vari-ables are re-ordered, and then the global ordering of thevariables must be updated systematically.

In our parallel computing environment, we havedeveloped a subsystem to directly generate unstructuredsparse graph (Michel et al., 1997) from hybrid finite ele-ment mesh (Fig. 4). This process can be separated fromthe parallel computing program as a pre-processing. We

have also developed subroutines which can be assem-bled with the parallel computing program and generatea sparse graph in parallel. We have used the programsMetis and ParaMetis (Karypis, 2003) for the partition of

8 H. Zhang et al. / Physics of the Earth and Planetary Interiors 163 (2007) 2–22

languagntial eqenter cosource c

Fig. 2. Automated generation of Fortran source codes in the modelingmodeling language, the upper part is the expressions of partial differeof the elliptic type PDEs. The system generates program segments (cinserted in to a program stencil for assembling as a complete Fortran

unstructured sparse graph with or without inner bound-

ary overlapping. The mesh partition and data partitioncan be executed based on the partition result of unstruc-tured sparse graph.

Fig. 3. A sketch for domain decomposition method.

e based modeling environment. The left column shows the input usinguations, and the lower part shows the solving algorithmic expressionslumn) according to these expressions. All the program segments areode (the right column).

4.3. Parallel solvers and the solver interface

Large sparse linear systems arising from many geosci-entific models may have several millions of equations.Solving such large systems is challenging because ofthe high demand on computational time and memory.Iterative solution techniques based on Krylov subspacemethods and preconditioning methods (multi-grid andapproximate LU factorization techniques) are commonlyused. Because the Krylov subspace methods require onlymatrix–vector products, the role of the domain decompo-sition method (DDM) is restricted to the development ofefficient parallel preconditioners. Our system providestwo preconditioners that use iterative solvers based ongeneralized minimal residual method (GMRES) to form

an approximate solution of the Schur complement sys-tem.

Our parallel computing environment allows user toselect different parallel solvers from the catalog, which

H. Zhang et al. / Physics of the Earth and Planetary Interiors 163 (2007) 2–22 9

F to unstt

isspdlip21Sp

og

ig. 4. Transfer of hybrid finite element unstructured meshes (left)essellation (right).

ncludes parallel direct solvers and parallel iterationolvers. We have developed some of parallel solvers,uch as the Lagrange multiplier-based domain decom-osition method solvers. The system provides these asefault solvers. In recent years, numerous large-scaleinear system parallel solvers have become availablen public domain. These include multifrontal massivelyarallel sparse direct solver (MUMPS) (Amestoy et al.,001), SuperLU (Li and Demmel, 2003), Aztec (Shadid,992), and parallel Algebraic Recursive Multi-levelolvers (pARMS) (Li et al., 2003). Our parallel com-

uting environment provides interface to these solvers.

In each sub-domain, we first compute the sparse graphf the gross matrix. Then the memory requirement ofross matrix can be determined. We can design different

Fig. 5. Hierarchical structure of the automa

ructured sparse graph, also known as Voronoi diagrams or Voronoi

program stencils (see below) to link to different parallelsolvers. Users can easily explore multiple variations ofa parallel algorithm and select a parallel solver from themenu provided by the system.

4.4. Hierarchical structure of the FEM modelingenvironment

Fig. 5 shows the hierarchical structure of our auto-mated parallel modeling system. This system has threehierarchical layers. The top layer is the user inter-

face consisting of three input files: ‘.PDE’, ‘.GIO’ and‘.GCN’. In these files user uses the modeling lan-guage to specify, respectively, the PDEs, the elementtype and other auxiliary items, such as input data for

ted parallel modeling environment.

h and P

10 H. Zhang et al. / Physics of the Eart

material coefficients, and the solvers to be used, as dis-cussed in Section 3. The second layer is the sourcecode generator and supporting facilities. After receiv-ing the input files, the system runs a script file, named‘.fem’ (Appendix B) to generates the program segmentsand insert them into the program stencils. These codescan be compiled directly and linked with the libraries(libs) on the bottom layer to form executable programs,including program for pre-processing and mesh gen-eration, and post-processing programs. These build-inlibraries include sequential and parallel solvers for large-scale sparse linear system, auxiliary libraries of shapefunctions, Jacobi matrix calculating and its reversion.The libraries for pre-processing and unstructured finiteelement mesh generation include subroutines for theDelaunay triangulation and quadrilateral unstructuredmesh generation in two dimension (Owen and Saigal,2000; Rebay, 1993), tetrahedral mesh generation inthree dimension (Lo and Wang, 2005), Volonoi diagram(dual partition of Delaunay triangulation) for generatingsparse graph (Gold and Angel, 2006) and sparse graphpartition. The post-processing programs can arrange datain the formats ready for serial or parallel visualizationtools, such as GID, OpenDX (http://www.opendx.org),and Para View (http://www.paraview.org).

The system can build and provide finished modellibraries with various kinds of PDEs, with choicesof shape functions and operators in Cartesian, spher-ical, and cylindrical coordinates in 1D, 2D, and 3D.From the user menu, one can choose the libraryfiles by simply clicking the options from user inter-face. User has the flexibility to modify the codeswith the desired element subroutine and algorithmimplementation programs, including shape functionsand algorithm expressions within these libraries. Thelibraries of finished FEM models will include thefollowing:

• Solid mechanics of linear elastic, viscoelastic, andvisco-plastic–elastic, finite deformation analysis andLagrangian multiplier for discontinuous deformationmodels.

• Fluid dynamics of Navier–Stokes equations, shallowwater equations, Stokesisan fluid, Newtonian and non-Newtonian fluids.

• Rock mechanics.• Heat transfer: diffusion, convection, radiation of

steady-state and time-dependent models.

• Porous media: steady-state and transit transportation

process in porous media.• Electromagnetic problems: static, time-harmonic, and

fully time-dependent dynamics models.

lanetary Interiors 163 (2007) 2–22

• Typical mathematic models: linear, non-linear, cou-pled linear and time-dependent mathematical andphysical models, with a lot of corresponding algorithmscheme included.

5. Geosciences applications

The automated parallel modeling environment is builton a core of the commercial version for serial FE elementmodeling (http://www.fegensoft.com), which has beenwidely used in engineering. The parallel computing envi-ronment is still under developing. We show here someexamples to show its versatility. A complete example ofusing the automated parallel computing environment isprovided in Appendix D.

5.1. Parallel computing of lithospheric dynamics

We have used this system to model complex litho-spheric dynamics in recent years; some of the results arepresented by Liu et al. (this volume). In one study we con-structed a subcontinental scale model for active tectonicsin the western United States. With over half millionelements, our 3D lithospheric model allowed inclu-sion of all major geological boundaries and first-orderlithospheric structure in the model. Using our parallelmodeling system, we were able to construct and fine-tunethis sophisticated model in few weeks that would nor-mally take months even years to build. The model runsefficiently on different PC-clusters, from our small 32-processor cluster to 512-processor large clusters. Suchmodels provide a geodynamic framework to incorpo-rate detailed 3D lithospheric structures, such as those tobe expected from the EarthScope (www.earthscope.org),for advanced simulation of 4D continental evolution.

In another example (Liu et al., this volume), we devel-oped a parallel FEM model for the entire San AndreasFault system. The parallel algorithms built in our systemallows simulation with more realistic viscoelasto-plasticrheology for lithospheric deformation in the plate bound-ary zone (Li and Liu, 2006), and the computing powerof parallel machines allowed us to explore the multi-timescale faulting, from rupture in seconds to secular slipin thousands of years, in a single self-consistent model.

5.2. Seismic wave propagation in the Fuzhou basin

We have also used the parallel modeling system

to construct a preliminary model for seismic wavepropagation. Regional seismic wave propagation canbe complicated by heterogeneities in the crust andupper mantle; in sedimentary basins small-scale het-

H. Zhang et al. / Physics of the Earth and Planetary Interiors 163 (2007) 2–22 11

of seism

ea(np(w

et5F(mps

ρ

d

ptladoguacm

Fig. 6. 2D and 3D results of FE simulation

rogeneities in the sedimentary cover may significantlyffect the wave propagation and amplify ground motionCardenas-Soto and Chavez-Garcia, 2003). Whereasumerous commercial and public domain softwareackages are available, such as SPECFEM3D BASINhttp://www.geodynamics.org), they are inadequate ife need to consider complex basin structures.As part of a national effort in China to study potential

arthquake hazards around major cities, we attemptedo model wave propagation in the Fuzhou basin, a0 km × 60 km region in China’s coastal province ofujian with busting economy and dense populationFig. 6). In this case, a high-resolution finite elementodel is needed to explore the impact on seismic wave

ropagation by the heterogeneous basin and lithospherictructure. The equation for wave propagation is given by:

u + cu − (μ∇ · (∇μ) + (λ + μ)∇(∇ · u)) + f = 0

(5.1)

here ρ is the density, c the damp coefficient, λ andthe Lame constants, and dots over u indicate time-

erivatives.For this second-order time-dependent hyperbolic

roblem, we used Newmark scheme (Artuzi, 2005) inime domain, and incorporated the perfectly matchedayer (PML) boundary condition into the formulation tobsorb the unwanted radiation out of the computationalomain (Obayya et al., 2000; Wang, 2003). It took usnly a few days to prepare the input information andenerate all source codes need for serial and parallel sim-

lation (see Appendix C for the input files for generatingll the source codes of this model). The full 3D numeri-al results are shown in Fig. 6. This model has about 1.2illion unstructured grid points. The preliminary results

ic wave propagation in the Fuzhou basin.

show that the maximum amplitudes of ground accel-eration are closely related to the heterogeneous basinstructure. Thus, the detailed wave propagation modelingmay provide some useful guide for urban planning andpreparation of earthquake hazards.

5.3. Mantle convection coupled with lithosphericdynamics

In the final example, we use this modeling systemto solve fluid mechanics related to mantle convec-tion, coupled with plate tectonics. Tectonic plates maybe reviewed as the top thermo-mechanical bound-ary layer of the convective mantle (McKenzie et al.,1974). However, modeling the coupled system of litho-spheric dynamics and mantle convection is challengingbecause of the different rheology and physical processesinvolved. We have attempted to model such a systemusing our parallel modeling environment (Sun, 2002).Approximating the mantle as an incompressible and iso-viscous fluid in a spherical shell, the governing equationsincluding the conservation of momentum:

0 = −∇p + μ�v (5.2)

∇ · v = 0 (5.3)

and the conservation of energy:

∂T

∂t= ∇ · (k∇T ) − v · ∇T + q (5.4)

where � denotes the gradient operator, � · denotes the

divergence operator, v represents velocities, T denotestemperature, t denotes time, p represents deviatoric pres-sure, μ denotes mean rheology coefficient, k the thermaldiffusivity and q is the heat source term, respectively.

h and P

12 H. Zhang et al. / Physics of the Eart

The lithospheric shell assumes a power-law rheology

ε = Aσn (5.5)

where A represents mean viscosity of lithospheric shell,and ε represents the mean strain rate,

ε =√

23 εij εij (5.6)

In which

εij = 12 (ui,j + uj,i − ukuk,j) (5.7)

and the mean stress is defined as

σ =√

32σijσij (5.8)

Solving Eqs. (5.2)–(5.4) requires different iterationapproaches because different types of PDEs are involved,and the computational problem is specially suited forparallel computers. Using our parallel FEM modeling

environment, we were able to generate a complete For-tran source codes for this model (Sun, 2002).

We used Eqs. (5.2), (5.3) and (5.5) to simulate mantleconvection, and Eqs. (5.5) and (5.3) to model litho-

Fig. 7. The model of coupled plate-mantle convection. (a and b) Show the unFE mesh. (d) Predicted surface velocity. All the unstructured mesh are genera

lanetary Interiors 163 (2007) 2–22

spheric deformation. Eqs. (5.2) and (5.3) are computedseparately using different non-linear algorithms. Thecoupling between Eqs. (5.2) and (5.3) is fulfilled throughexchanging and updating the velocity and temperaturefields after each step of convergence. The Lagrangemultiplier domain decomposition method (LMDDM)and the Lagrange multiplier discontinuous deforma-tion analyses (LMDDA) were used for interactionsbetween neighbor plates (Sun Xunying and Guoping,2002). After each time step, the velocity field of thelower layer of the lithosphere will be updated fromvalues derived from mantle convection simulations.The domain decomposition and some of the comput-ing results are shown in Fig. 7. We used 17 computernodes for this simulation. One of them works as themaster node which controls the parallel computing pro-cess by distributing data to different slave nodes andgathering and updating the global values after each iter-

ation step and time step. These automatically generatedparallel finite element codes are scalable with the pro-cessors available and worked well on different parallelmachines.

structured mesh generation and domain decomposition. (c) Assembleted in parallel by different processors.

h and P

6

mtatemtsg(aseibhisU

Fiv

H. Zhang et al. / Physics of the Eart

. Discussion

The automated parallel FEM modeling environ-ent presented here can be readily integrated into

he emerging cyber-infrastructures for geosciences thatre driven by both grid computational technology andhe unprecedented growth of geosciences data. Thever-increasing internet connection and fast develop-ent of grid middleware technology have stimulated

he fast growth of cyber-infrastructure in geosciences,uch as the Geoscience Network (GEON) which inte-rate multidisciplinary data bases with application toolshttp://www.geongrid.org), including data manipulationnd visualization. Modeling often provides the criticaltep to turn observation into understanding and knowl-dge. The automated parallel computing environmentntroduced here makes FEM modeling easy and accessi-le. It eases the tasks of finite element modeling and

elps scientists to take full advantage of the increas-ngly affordable parallel and grid computing hardware toolve complicated multi-scale, multi-physical problems.sing the English-like modeling language to describe the

ig. 8. Sketch show integration of the parallel computing environment withncludes five parts: preprocessing, element subroutine generation, parallel soisualization and other network resources.

lanetary Interiors 163 (2007) 2–22 13

PDEs and algorithms, scientists can be liberated frommost part of the time-consuming and error-prone tasksof finite element coding. Anyone with a solid under-standing of the physics to be investigated and somebasic knowledge of FEM modeling can use this sys-tem. When incorporated into the emerging collaboratorycyber-infrastructures, our modeling system can facil-itate collaborative data exploration and interpretation.Fig. 8 shows the basic concept of integrating this auto-mated parallel computing environment with grid-basedgeoscience cyber-infrastructure.

One of the distinguished advantages of this systemis that one can manipulate all the facilities providedby the distributed cyber-infrastructure through gridmiddleware. Moreover, it allows user to access an intel-ligent problem-oriented or service-oriented architectureand take full advantage of all the resources, such asgeological data, high performance computing power,

computing-intensive or memory-intensive model anal-ysis, 4D visualization for multidisciplinary informationor computing results, as well as convenient web-basedinterface.

geosciences cyber-infrastructure. The parallel environment systemlver, non-linear algorithm and iteration method. It can interface with

h and P

14 H. Zhang et al. / Physics of the Eart

All the source code generated, program stencils,libraries of different PDEs and algorithms are acces-sible to the users. This is an ongoing effort; apreliminary system is available through a CommonGateway Interface (CGI) interface on the websitehttp://www.fegensoft.com, which allow user to becomedevelopers. We aspire to make this system an open-source community computing environment in the nearfuture.

7. Conclusions

We have constructed a prototype automated parallelcomputing environment which automatically generatesfinite element Fortran source codes from user input usinghigh-level modeling language. This system can spareresearchers from much of the time-consuming and error-prone FEM coding, and greatly shorten the times neededfor developing most FEM models.

This system consists of three steps commensu-rate with the current geoscience research models. Inthe first step, this system translates physical modelsusing English-like modeling language into numericalsegments. The parallel computing environment thenautomatically generates source codes following instruc-tions in the user input files. The numerical results canthen be output to visualization facilities.

Contrasting to most problem-specific finite elementpackages, this system is dynamic. It can be used tosolve various kinds of physical problems described bydifferential PDEs with optional algorithms. The openarchitecture of this system could facilitate integration ofdata grids with high performance computing facility ina comprehend cyber-infrastructure for geosciences.

Acknowledgements

This work is supported by the National ScienceFoundation of China under Grant Numbers 40474038,40574021, 40374038 and by National Basic ResearchProgram of China under Grant Number 2004cb418406.Zhang’s visit to the University of Missouri-Columbiawas partially supported by the NSF-EAR 0225546 grantto Liu. This work was conducted as part of the visu-alization working group at laboratory of computationalgeodynamics supported by the Graduate University of

Chinese Academy of Sciences, we thank Shi Chen andShaolin Chen in the visualization working group whoprovided some of the figures. David A. Yuen thankedNSF for support in CMG and ITR programs.

lanetary Interiors 163 (2007) 2–22

Appendix A. Finite element modeling languagefor a non-linear problem

The modeling language expressions for describing thenon-linear problem in Section 3.2 are shown here:

The scripts are similar to the linear Laplace’s equationin Section 3.1. Only the ‘dist’ and ‘load’ sections aredifferent.

The non-linear algorithm discussed here specify themethod of solving the PDEs, including how to linearizea non-linear differential equation, how to discretize thetime variable for a time-dependent problem, and howto control the incremental value of each iteration step(i.e. the relaxation factor) and the convergence preci-

sion for a non-linear problem. For a coupled multi-fieldproblem, it also includes the iteration sequence and thecoupling schedule of different equations by means ofcoef u.

h and P

H. Zhang et al. / Physics of the Eart lanetary Interiors 163 (2007) 2–22 15

h and P

16 H. Zhang et al. / Physics of the Eart

Appendix B. A typical sample of the ‘.fem’ file

In the automated parallel modeling environment, the‘.fem’ scripts file links the user input files and thelibraries to generate all the source codes (Fig. 5). Thefollowing is a typical sample of this file:

lanetary Interiors 163 (2007) 2–22

Each ‘#’ marks one section. The ‘#dir’ section tellsthe system that the problem is non-linear in a two-dimensional Cartesian system. The ‘#schem’ sectionspecifies the PDE to be elliptic type with one unknown.The ‘#nfe’ section defines the non-linear algorithm to beused. The ‘#ges’section will link all the subroutines andlibrary files related to quadrilateral mesh to the system.The ‘#solv’ section is the user input to specify the solverto be used, ‘sin’ means that we select a symmetric LUdecomposition solver. The ‘#method’ section is anotheruser option, here ‘outcore’ tells the system to store thegross stiff matrix on hard-disk, rather than in the RAM.

Appendix C. Input files for the model of seismicwave propagation in the Fuzhou basin

Here we show the complete input files which are usedfor generating the finite element source codes for the

model of seismic wave propagation discussed in Section5.2. Only two input files are needed here, the file ‘.pde’for describing the PDE of Eq. (5.1), and the file ‘.nfe’for defining the algorithms.

C

T

H. Zhang et al. / Physics of the Earth and Planetary Interiors 163 (2007) 2–22 17

.1. Input PDE file seismic.pde

he ‘.pde’ file contains three main parts. They are indicated by the keywords ‘defi’, ‘func’, and ‘stif’, respectively:

18 H. Zhang et al. / Physics of the Earth and Planetary Int

C.2. Input algorithm definition file newmark.nfe

The following lines are the input ‘.nfe’ file for the Newmark algorithm.

eriors 163 (2007) 2–22

h and P

tmbfitsaBais

Ame

aashtltc

H. Zhang et al. / Physics of the Eart

From these two input files the system generated allhe source codes for the model shown in Section 5.2. The

ass matrix and damp matrix are defined as a lump formy default. The seismic.pde file is similar to the ‘.pde’le for the linear case in Section 2. The keyword ‘fortran’

ells the system that we want to insert a Fortran programegment into the source code directly. The o, aa, a0, . . .,7 are parameters for Newmark scheme (Artuzi, 2005).y default, we use o = 0.5 which is common in manypplications. The whole scheme of Newmark methods described jointly by the ‘equation’ part and ‘fortran’ections.

ppendix D. A complete example of using theodeling-language based parallel FEM modeling

nvironment

We provide here a complete example of using theutomated parallel FEM modeling environment to solvephysical problem, which in this example is steady-

tate heat conduction in two dimensions (2D). We showow to use modeling language expressions to describe

he partial differential equations and to generate paral-el finite element codes in this system. We also provideesting results to validate and benchmark our parallelomputing system.

lanetary Interiors 163 (2007) 2–22 19

Consider a solid disk with constant thermal proper-ties. The outer rim is imposed to a constant temperature,where the center of the disk is connected to a rod throughwhich heat is conducted away. The 2D steady-state tem-perature field is given by

−∇ · (k∇u) = q (D.1)

where u is the temperature, k the thermal conductiv-ity, q denotes heat source/sink, and � · and � are thedivergence and gradient operators, respectively.

This is a Dirichlet problem. The weak form of (D.1)based on the Galerkin finite element method can be writ-ten as

(k∇u, ∇u) = (q, u) (D.2)

where u is the virtual displacement of u.To solve this problem using finite element method, we

first specify the problem and the desired solving algo-rithms in two input files using the modeling languageexpressions. To describe the PDE and the weak form,we write the ‘heat.pde’ file as following

h and P

20 H. Zhang et al. / Physics of the Eart

The keyworks ‘coor’, ‘shap’, ‘gaus’, ‘func’ are identicalto those we described in Section 3. The keyword ‘mate’provides the system default material factors. ‘Func’ and‘stif’ sections are still definition of functions definedafter keyword ‘func’ and the weak form of Eq. (D.2).

We need a ‘.nfe’ file to tell our system explicitly howto solve this equation. Because Eq. (D.1) is a typicalelliptic type PDE, we can copy the default ‘.nfe’ file fromour system algorithm library and rename it to ‘heat.nfe’.We list this file as following.

Finally, we need an file that describes the computingdomain, the actual finite element type, input and outputdata structure, etc. This file can be generated automat-ically by selecting from the user menu provided by thesystem, here it is named ‘heat.gio’.

lanetary Interiors 163 (2007) 2–22

From the input files ‘heat.pde’ and ‘head.nfe’, thesystem generates automatically the source Fortran codesfor serial and parallel finite element computation. Thesystem also generated interface source codes for pre-and post-processing from the ‘heat.gio’ file.

Take the non-dimensional value of the disk’s radiusto be 10, and the non-dimensional temperature on theouter rim to be 120. If q in (D.1) has a value of 4 and kequals to 1, then the analytical solution of Eq. (D.1) is

u = x2 + y2 + 20 (D.3)

Table 1 shows the analytic solution compared with thenumerical solutions from serial and parallel FEM com-putations using the FEM codes generated by the system.The results are shown in Fig. 9. Please see Table 1 for

H. Zhang et al. / Physics of the Earth and Planetary Interiors 163 (2007) 2–22 21

Table 1Comparison of analytical and FEM results of Eq. (D.1)

1 20 30.111 42.222 53.333 64.444 75.556 86.667 97.778 198.89 1202 19.996 31.107 42.219 53.330 64.442 75.554 86.665 97.777 108.89 1203 20.003 31.114 42.225 53.334 64.445 75.557 86.668 97.779 108.89 1204 20 30.111 42.222 53.333 64.444 75.556 86.667 97.778 198.89 1205 20 30.111 42.222 53.333 64.444 75.556 86.667 97.778 198.89 120

1, Analytical solution; 2, serial code, 8842 triangle elements; 3, serial code, 19,086 triangle elements; 4, serial and parallel code, 293,170 triangleelements; 5, parallel code, 809,568 triangle elements.

F d the pac

tr

ff

R

A

A

A

A

A

A

C

ig. 9. These two graphs are description of our computing domain anenter of the disk, and increases non-linearly to 120 at the outer rim.

he details of the numerical comparison and benchmarkesults.

All the parallel and serial source code generated byor this example, and the executable files, are availablerom http://hpcc.gucas.ac.cn/benchmark/temp.

eferences

gullo, E., Guermouche, A., L’Excellent, J.Y., 2006. A preliminaryout-of-core extension of a parallel multifrontal solver Euro-Par2006 Parallel Processing. Lect. Notes Comput. Sci., 1053–1063.

insworth, M., 1996. A preconditioner based on domain decom-position for h-p finite-element approximation on quasi-uniformmeshes. SIAM J. Numer. Anal. 33 (4), 1358–1376.

mestoy, P.R., Duff, I.S., L’Excellent, J.Y., Koster, J., 2001. Afully asynchronous multifrontal solver using distributed dynamicscheduling. SIAM J. Matrix Anal. Appl. 23 (1), 15–41.

rtuzi, W.A., 2005. Improving the Newmark time integration schemein finite element time domain methods. IEEE Microw. WirelessComp. Lett. 15 (12), 898–900.

xelsson, O., Bai, Z.Z., Qiu, S.X., 2004. A class of nested iterationschemes for linear systems with a coefficient matrix with a domi-nant positive definite symmetric part. Numer. Algorithms 35 (2–4),351–372.

xelsson, O., Larin, M., 1998. An algebraic multilevel iteration

method for finite element matrices. J. Comput. Appl. Math. 89(1), 135–153.

ardenas-Soto, M., Chavez-Garcia, F., 2003. Regional path effects onseismic wave propagation in central Mexico. Bull. Seismol. Soc.Am. 93 (3), 973–985.

rallel finite element computing result, which has a value of 20 at the

Davies, M., Gross, L., Muhlhaus, H.-B., 2004. Scripting high perfor-mance earth systems simulations on the SGI Altix 3700. In: SeventhInternational Conference on High Performance Computing andGrid in the Asia Pacific Region, pp. 244–251.

Dettmer, W., Peric, D., 2006. A computational framework forfluid-structure interaction: finite element formulation and appli-cations. Comput. Methods Appl. Mech. Eng. 195 (41–43), 5754–5779.

Douglas, C.C., 1996. Multigrid methods in science and engineering.IEEE Comput. Sci. Eng. 3 (4), 55–68.

Glatzmaier, G.A., Roberts, P.H., 1995. A 3-dimensional convectivedynamo solution with rotating and finitely conducting inner-coreand mantle. Phys. Earth Planet. Inter. 91 (1–3), 63–75.

Gold, C., Angel, P., 2006. Voronoi hierarchies. In: Geographic Infor-mation Science, Proceedings. Lecture Notes in Computer Science,pp. 99–111.

Gross, L., Mora, P., Saez, E., Weatherley, D., Xing, H., 2005. Softwareinfrastructure for solving non-linear partial differential equationsand its application to modelling crustal fault systems. ANZIAM J.46 (E), 1141–1154.

Honda, S., Balachandar, S., Yuen, D.A., Reuteler, D., 1993.3-Dimensional mantle dynamics with an endothermic phase-transition. Geophys. Res. Lett. 20 (3), 221–224.

Ito, K., Toivanen, J., 2006. Preconditioned iterative methods on sparsesubspaces. Appl. Math. Lett. 19 (11), 1191–1197.

Jianwen Cao, J.S., 2005. An efficient and effective nonlinear solver in

a parallel software for large scale petroleum reservoir simulation.Int. J. Numer. Anal. Model. 2, 15–27.

Jung, M., 1997. On the parallelization of multi-grid methods usinga non-overlapping domain decomposition data structure. Appl.Numer. Math. 23 (1), 119–137.

h and P

22 H. Zhang et al. / Physics of the Eart

Kameyama, M., Yuen, D.A., 2006. 3-D convection studies on the ther-mal state in the lower mantle with post-perovskite phase transition.Geophys. Res. Lett. 33 (12).

Karypis, G., 2003. Multi-Constraint Mesh Partitioning for Con-tact/Impact Computations sc’03. Phoenix, Arizona, USA.

Komatitsch, D., Tromp, J., 2002a. Spectral-element simulations ofglobal seismic wave propagation—I. Validation. Geophys. J. Int.149 (2), 390–412.

Komatitsch, D., Tromp, J., 2002b. Spectral-element simulations ofglobal seismic wave propagation—II. Three-dimensional mod-els, oceans, rotation and self-gravitation. Geophys. J. Int. 150 (1),303–318.

Komatitsch, D., Vilotte, J.P., Vai, R., Castillo-Covarrubias, J.M.,Sanchez-Sesma, F.J., 1999. The spectral element method for elas-tic wave equations—application to 2-D and 3-D seismic problems.Int. J. Numer. Methods Eng. 45 (9), 1139–1164.

Korneev, V.G., Jensen, S., 1999. Domain decomposition precondition-ing in the hierarchical p-version of the finite element method. Appl.Numer. Math. 29 (4), 479–518.

Krishnamoorthy, C.S., 1995. Finite Element Analysis—Theory andProgramming. McGraw-Hill Publication, New Delhi, 535 pp.

Kronrod, A.S., 1965. Nodes and Weights of Quadrature Formulas.Sixteen-Place Tables. Consultants Bureau, New York, 341 pp.

Li, Q.S., Liu, M., 2006. Geometrical impact of the San Andreas Faulton stress and seismicity in California. Geophys. Res. Lett. 33 (8).

Li, X.Y.S., Demmel, J.W., 2003. SuperLU DIST: a scalabledistributed-memory sparse direct solver for unsymmetric linearsystems. ACM Trans. Math. Software 29 (2), 110–140.

Li, Z.Z., Saad, Y., Sosonkina, M., 2003. pARMS: a parallel versionof the algebraic recursive multilevel solver. Numer. Lin. AlgebraAppl. 10 (5–6), 485–509.

Lo, S.H., Wang, W.X., 2005. Generation of tetrahedral mesh ofvariable element size by sphere packing over an unbounded 3Ddomain. Comput. Methods Appl. Mech. Eng. 194 (48–49), 5002–5018.

Lv, X., Zhao, Y., Huang, X.Y., Xia, G.H., Wang, Z.J., 2006. An efficientparallel/unstructured-multigrid preconditioned implicit method forsimulating 3D unsteady compressible flows with moving objects.J. Comput. Phys. 215 (2), 661–690.

Mandel, J., Tezaur, R., 1996. Convergence of a substructuring methodwith Lagrange multipliers. Numer. Math. 73 (4), 473–487.

Matyska, C., Yuen, D.A., 2005. The importance of radiative heattransfer on superplumes in the lower mantle with the new post-perovskite phase change. Earth Planet. Sci. Lett. 234 (1–2), 71–81.

lanetary Interiors 163 (2007) 2–22

McKenzie, D.P., Robert, J.M., Weiss, N.O., 1974. Convection in theearth’s mantle: toward a numerical solution. J. Fluid Mech. 62,465–538.

Michel, J., Pellegrini, F., Roman, J., 1997. Unstructured graph par-titioning for sparse linear system solving. Solving irregularlystructured problems in parallel. Lect. Notes Comput. Sci., 273–286.

Obayya, S.S.A., Rahman, B.M.A., El-Mikati, H.A., 2000. New full-vectorial numerically efficient propagation algorithm based on thefinite element method. J. Lightwave Technol. 18 (3), 409–415.

Owen, S.J., Saigal, S., 2000. H-Morph: an indirect approach to advanc-ing front hex meshing. Int. J. Numer. Methods Eng. 49 (1–2),289–312.

Prudencio, E.E., Byrd, R., Cai, X.C., 2006. Parallel full spaceSQP Lagrange–Newton–Krylov–Schwarz algorithms for PDE-constrained optimization problems. SIAM J. Sci. Comput. 27 (4),1305–1328.

Rakowsky, N., 1999. The Schur complement method as a fast parallelsolver for elliptic partial differential equations in oceanography.Numer. Lin. Algebra Appl. 6 (6), 497–510.

Rebay, S., 1993. Efficient unstructured mesh generation by means ofDelaunay triangulation and Bowyer–Watson algorithm. J. Comput.Phys. 106 (1), 125–138.

Shadid, J.N.T.R.S., 1992. Sparse iterative algorithm software for large-scale MIMD machines—an initial discussion and implementation.Concurr. Pract. Exp. 4 (6), 481–497.

Sun Xunying, Z.H., Guoping, L., 2002. Mantle flow beneath the Asiancontinent and its force to the crust. Acta Seismol. Sinica 15 (3),241–246.

Surussavadee, C., Staelin, D.H., 2006. Comparison of AMSUmillimeter-wave. satellite observations, MM5/TBSCAT predictedradiances, and electromagnetic models for hydrometeors. IEEETrans. Geosci. Remote Sens. 44 (10), 2667–2678.

Valli, A.Q.a.A., 1999. Domain Decomposition Methods for PartialDifferential Equations. Oxford Science Publications, 412 pp.

Wang, T.L.X.M.T., 2003. Finite-difference modeling of elastic wavepropagation: a nonsplitting perfectly matched layer approach. Geo-physics 68 (5), 1749–1755.

Ypma, T.J., 1995. Historical development of the Newton–Raphsonmethod. SIAM Rev. 37 (4), 531–551.

Zampieri, E., Pavarino, L.E., 2006. Approximation of acoustic wavesby explicit Newmark’s schemes and spectral element methods. J.Comput. Appl. Math. 185 (2), 308–325.

Zumbusch, G.W., 1996. Schur Complement Domain DecompositionMethods in Diffpack, 471 pp.