Algebraic domain decomposition solver for linear elasticityperso.unifr.ch/ales.janka//papers/applmath.pdfRng(ATj), i = 1,,.. , J, so tha t for all w € R"1 we hav e Further, let Q

44 (1999) APPLICATIONS OF MATHEMATICS No. 6, 435-458

ALGEBRAIC DOMAIN DECOMPOSITION SOLVER

FOR LINEAR ELASTICITY

ALES JANKA, Plzen

Abstract. We generalize the overlapping Schwarz domain decomposition method to prob-lems of linear elasticity. The convergence rate independent of the mesh size, coarse-spacesize, Korn's constant and essential boundary conditions is proved here. Abstract conver-gence bounds developed here can be used for an analysis of the method applied to singularperturbations of other elliptic problems.

Keywords: algebraic multigrid, zero energy modes, convergence theory, finite elements,computational mechanics, iterative solvers

MSC 1991: 65F10, 65N55

1. INTRODUCTION

This paper is concerned with the development and analysis of a black-box alge-braic solver suitable for problems of structural mechanics. The presented methodis an overlapping Schwarz domain decomposition ([7]) with a coarse space given bysmoothed aggregations ([11, 15, 12]). For scalar elliptic problems, such a method isproposed and analyzed in [5]. Here, we adapt the algorithm by employing a coarsespace created using zero-energy modes ([15]), and prove the optimal convergencerate independent of the H 1-coercivity (Korn's constant) and the essential boundaryconditions. In a certain sense, the convergence bounds given here are also indepen-dent of the computational domain. The indirect dependence arises only throughsubdomains that can be generated based on the nature of the solved problem.

The direct generalization of the estimates presented in [5] to problems of linearelasticity is possible only to a certain extent. More precisely, by following the spiritof the analysis presented there, it is impossible to prove practically very importantindependences of the convergence rate on the above mentioned problem data.

This difficulty is avoided by establishing abstract convergence bounds that use sep-arate assumptions on the nonsmoothed coarse space and the prolongator smoother.

435

When applied to a particular problem, one has to prove approximation properties ofthe nonsmoothed coarse space that is geometrically very simple. The independenceof such approximation property of a Korn's constant and essential boundary condi-tions comes as a direct consequence of the fact that aggregates are disjoint sets ofnodes.

The assumptions of the abstract theory are verified for problems discretized onquasiuniform PI finite element meshes. Using algebraic tools developed here, thegeneralization of the theory to problems discretized by using more complex shape-regular elements and problems with jumps in coefficients is quite straigtforward.

The paper is organized as follows: in Section 2 we give an outline of an alge-braic Schwarz domain decomposition method with an abstract convergence theorem.Section 3 presents the convergence theorem for our method together with the as-sumptions on the components of the algorithm, which are verified in Section 4. Themain convergence theorem is then re-stated in Section 5. Section 6 deals with com-putational complexity of the proposed algorithm and Section 7 contains the resultsof the numerical experiments on industrial models.

2. ABSTRACT SCHWARZ DOMAIN DECOMPOSITION

We are interested in solving a system of linear algebraic equations

Analogously for the coarse space,

436

where A is a symmetric positive definite matrix corresponding to a finite elementdiscretization of a continuous problem by PI elements.

Our method is a standard multiplicative Schwarz overlapping domain decomposi-tion with a coarse space given by smoothed aggregations as in [15].

In an abstract setting, the overlapping Schwarz method is determined by a coarsespace Rng(P) and by local subdomain spaces Rng(N i), i = 1, . . . , J, where P: IR"2 ->Rni, raa •C HI, is the prolongator and Ni are 0/1 matrices specifying subdomainsunderstood as sets of degrees of freedom.

Based on the stiffness matrix A, let us define subdomain selections Ai and localcorrection operators Ri, i = 1,..., J, by

Further, let us set

Note that TO and T^'s, i = 1,..., J are orthogonal projections in the .A-inner productto Rng(P) and Rng(JVi), respectively.

For the sake of parallelism we introduce a coloring {Ci}"̂ of the set {1,..., J}satisfying

cos(Rng(Nj), Rng(Nk)) = 0 for every;, fc € d , i = 1,..., nc ,

where the cosine is measured in the A-inner product. Let us further define theconstant

Obviously, cos(Rng(JVj),Rng(//fc)) = 0 if the subdomains corresponding to the ma-trices NJ , Nk are not adjacent or overlapping.

With such a coloring of subdomains we are able to solve subdomain problemspartially in parallel by the following algorithm.

Algorithm 2.1. Given a vector x*, the method returns xi+1 computed asfollows:

1. Set z° := x*.2. For i = 1,..., nc perform local corrections:

3. Perform one coarse-level correction:

4. (optional) Set znc := z°. For i = nc , . . . , 1 do

5. Set xi+1 := z°.

437

Note that this algorithm can be viewed as either a multiplicative or a hybrid one.One can easily see that for the error-propagation operator E: K"1 -*• Rni we have

Here, the first term is the error-propagation operator of the colored multiplicativealgorithm while the second term is E for a hybrid algorithm.

It is a well known fact (see e.g. Lions lemma in Mandel, Bj0rstad [2], Theorem 3.2combined with Bramble, Pasciak, Wang, Xu [4], Theorem 2.2, improved by Xu [16],[17]) that the convergence estimates of Algorithm 2.1 are governed by the followingtheorem:

Theorem 2.2. Let Q0: K™1 -4 Rng(P), Qt: Kni ->• Rng(7V;), i = 1,..., J be themappings decomposing unity in Kni, and let CL > 1 be a constant such that

Then for the error propagation operator E of Algorithm 2.1 we have

where K is the overlap bound in (2).

3. SMOOTHED AGGREGATION COARSE SPACE

The smoothed aggregation technique consists in constructing the prolongator Pin the form

Here, P is the tentative prolongator responsible for the approximation propertiesof the coarse space. The purpose of the prolongator smoother S is to enforce thesmoothness of the coarse space functions. The construction of such smoother aimsat reducing the energy of the tentative coarse space basis. The precise definitions ofP and 5 will be given in Sections 4.1 and 4.2, respectively.

To get a picture of the method, it is worth mentioning in advance that the Schwarzsubdomains Rng(A'i) can be derived from the nonzero structure of the prolongator P.

The purpose of this section is to prove abstract convergence bounds under assump-tions on the prolongator smoother 5, the tentative prolongator P and the Schwarzdomain decomposition subspaces Rng(Ni).

438

Assumption 3.1. Let us denote an available upper bound of g(STAS) by8s%s- Assume that the prolongator smoother S: Rni -> R™1 satisfies

R e m a r k 3.2. Notice that in the weak approximation property (9) of the ten-tative coarse space we employ an estimate of g(STAS), rather than using @(A). Asff(STAS) <C s(A), we are here making a weaker assumption than in the standardalgebraic multigrid theory (e.g. by Bramble [3], Assumption A.6, p. 40).

The assumption (10) on Schwarz domain decomposition subspaces (cf. e.g. [7])is here in the role of a smoothing property. If we had just a pure multiplicativeSchwarz algorithm without any coarse space, the condition (10) would reduce to theassumption from Lions' lemma

ensuring (together with the overlap bound K) convergence of the algorithm. Theconvergence rate of such algorithm would, however, depend on the geometry of theproblem through the dependence of Cs on the boundary conditions, on the compu-tational domain and on the geometry of the Schwarz subdomains! By adding thecoarse grid correction to the multiplicative Schwarz domain decomposition we makea weaker assumption on the Schwarz subdomains by inserting the 1-2 penalizing terminto the right-hand side.

In fact, the inequality (10) indicates how wide a range of low-energy errors is re-duced by the coarse space correction and what errors have to be dealt with on the

439

Also, we assume that there exist decomposition operators Qi\ M"1 -» Rng(Ni)that decompose the unity on R"1 to Schwarz domain decomposition subspacesRng(ATj), i = 1,,.., J, so that for all w € R"1 we have

Further, let Q0: J$ni -» Rng(P) be an operator for which the weak approximationproperty of the tentative coarse space is satisfied in the following form:

fine level by the Schwarz domain decomposition; this all with respect to the smooth-ing properties of the prolongator smoother S. It is obvious that the more "powerful"S we have, the smaller band of low-energy errors we can reduce by the coarse spacecorrection. However, as we will see later, we do not choose too "powerful" smoothers,we are content with such a prolongator smoother S which just ensures the energystability of the smoothed coarse space.

Theorem 3.3. Under Assumption 3.1, Algorithm 2.1 with the prolongator P ofthe form (6) converges in the energy norm at the rate

where E is the error-propagation operator in (3), K is the overlap bound (2), andCL>\ is a constant of the form CL = (I + 2C3)(C2 + I)2 + C3(Ci + C2)2 + 2C3.

Before giving the proof of Theorem 3.3, we have to see what the assumptions (7),(8), and (9) imply for the properties of the smoothed coarse space. For this, weemploy an operator splitting introduced by VanSk, Brezina and Mandel in [12].

Definition 3.4. Having the operator Qo from (9), 5 from (7-8) and decomposi-tion operators {Qi} as in (10), we define mappings Qi: R™1 -»• Rng(Ni), i = 1,... , Jand Qo: Rni -> R-ng(P) by

Lemma 3.5. Let the operator Q0: K"1 ->• Rng(P) satisfy (9) and let thesmoother S satisfy (7) and (8). Then for the smoothed operator Q0 = SQo • Kni ->Rng(S-P) we have

and

P r o o f . We start with the weak approximation property (11) of the smoothedcoarse space: applying the triangle inequality, assumption (7), assumption (9) andassumption (8) in this order we get

440

Verifying the energy stability (12), we introduce the splitting of SQo> then applythe triangle inequality, assumption (7), and the fact that ||5w||/i < g(SrAS)||w|| forall w e i"1. By assumption (9) we then get

This concludes the proof.

Lemma 3.6. Under Assumption 3.1 the operators Qi, i = 0,. . . , J from Defini-tion 3.4 satisfy

and for all u e Kni the inequality

holds with the constant CL = (1 + 2C3)(C2 + I)2 + C3(d + C2)2 + 2C3.

P r o o f . The decomposition property of {Qi} in (13) follows directly from thedecomposition property of Q». The energy stability (14) of the decomposition map-pings {Qi} can be shown by using Definition 3.4, estimate (12), assumptions (10),(7), and estimates (11) and (12):

which concludes the proof. D

Now we are ready to prove the Convergence Theorem 3.3.

P r o o f . The proof follows from Assumption 3.1, Lemma 3.6 and the AbstractConvergence Theorem 2.2 by setting Qi = Qi. D

441

4. COMPONENTS OF THE ALGORITHM

In the following sections we will confine ourselves to linear elasticity problems andwe will discuss the three components of the algorithm—the properties of the tenta-tive coarse space, the prolongator smoother and the Schwarz domain decompositionsubspaces.

Let fl C Rd be the bounded computational domain, 7" the quasiuniform triangu-lation of fl Further, let V< = -Pl(T), i = 1,..., d, be the corresponding PI finiteelement spaces for the i-th displacement field and

V = [u € Vi x , . . . x Vd: Bku = 0 on selected boundary nodes k},

where Bk is an arbitrary dx d matrix.We assume that the matrix A is a stiffness matrix obtained by discretizing the

continuous problem:

for given / € V find u 6 V : (u,v)s = (f,v), Vv € V,

where

The induced energy semi-norm is then | • \e = (•, -)e •

4.1. Tentative prolongator.The aim of this section is to give a precise definition of the tentative prolongator

P and to verify that such a tentative prolongator satisfies assumption (9).The tentative prolongator P will be created by purely algebraic means based on

the aggregation technique in [15]. It rests on the aggregation of all active finiteelement nodes to a system of disjoint aggregates {A}, A* D Aj = 0, for all i / j.

Definition 4.1. Let us denote by RBM the kernel of | • |e in [Hl(ty]d,

Further, let RBM be its discrete form, i.e. if we use the finite element interpolatorHh,: Rni -> V, we can find for each u 6 RBM some u € RBM such that II^u = u onfj apart from one layer of finite elements around the essential boundary conditions.

The reason for excluding the elements around this part of the boundary is thatthe finite element space V is not able to support RBM there.

442

A l g o r i t h m 4.2. Let (ffc)^, n/c = dim(RBM), be a basis of RBM. Weconstruct the tentative prolongator P by the following algorithm:

For i = 1,..., J do

stems from the necessity to verify assumption (9) on our tentative coarse space. Wewant to avoid using Korn's inequality on the whole computational domain O, becauseKorn's constant CK depends on the domain. Instead, we propose to use (15) on thetentative coarse space restricted to continuous envelopes Bi of the nodal aggregatesAi, whose shape we can control in the code.

This of course means that the tentative coarse space restricted to Ai must beable to support exactly the functions from RBM, so that we can project them outin the left-hand side of (15). Note that the tentative coarse space Rng(P) with Pconstructed by Algorithm 4.2 has this property.

Definition 4.4. Suppose a domain D is star-shaped with respect to a ball B,i.e. for all x & D, the closed convex hull of {a;} U B is a subset of D. Let

£max = sup{0: D is star-shaped with respect to a ball of diameter Q}.

Then the chunkiness parameter of D is denned by

To provide estimates independent of the geometry of fi, we have to make someassumptions on the shape of the aggregates or their envelopes:

443

Orthonormalize (f lk)%=l to get (r^).Put t*k to be the [UK • i + k]-th column of P, k = 1, . . . , HK-

end.

R e m a r k 4.3. The orthonormalization step in Algorithm 4.2 is not required bythe theory, it just improves the conditioning of the matrix of the coarse problem.

In the next part, we recall some known facts about Korn's inequality which canbe found for example in [9], [8], and [6]. The need for using Korn's inequality

444

Because (hj + fa) € RBM, we can write using the scaled Poincare" inequality, Korn'sinequality, and Assumption 4.5 together with the above argument on CK

Now, we will find hi € RBM and a constant function fa such that

P r o o f . It is well known [8] for a star-shaped domain fi that Korn's constantC'tf(fi) on the factor space modulo rigid body modes (RBM's) can be controlled bythe chunkiness parameter 7(fi). Furthermore, for two Lipschitz domains fli and ^2,Korn's constant CK on fii U £ 1% can be estimated [6] as

Lemma 4.6. (PoincaRe-Kom) Under Assumption 4.5 on Bi, we have

Assumption 4.5. For each aggregate Ai, let there be a continuous subdomainBi c fi which contains all nodes listed in At and

1. there is an integer constant NA > 0 such that every point x e fi belongs to atmost N subdomains Bi (bounded overlaps),

2. diam(Bi) < CH i = l,...,J,3. meas(Bi) ^ CHd i = I,..., J ,4. each Bi is the union of at most Nu star-shaped domains u\ with

(a) uniformly bounded chunkiness parameter 7(0;;?) ^ 7^ ,k k

(b) meas(wf+1 n (|J wf)) ^ cw min{meas(wf+1),meas( U w^)}, k < Nu.3=1 3=1

A simple greedy algorithm (cf. [5]) can be proposed to create the aggregates {Ai}which satisfy the above assumption. However, note that here a "hidden" geometricaldependence infiltrates our estimates, due to geometrical dependence of the envelopesBi along the boundary of fi: if the boundary of H has "chunks" of a characteristicsize less than the size of an aggregate, then the aggregate (and also its envelope B{)adjacent to this part of the boundary will have a large chunkiness parameter.

Having Poincare'-Korn's inequality, we can find a mapping Qo: Rni -*• Rng(-P) forwhich we will verify the weak approximation property of the tentative coarse space

Rng(P).

Lemma 4.7. Let Ai and B, be as in Assumption 4.5. Then a mapping Qo'- Kni ->•Rng(.P), defined locally on each Ai as an l2-orthogonal projection onto RBM, i.e.

(17) (u-Qou.vJip^j^O, VveRBM,VueR n i , » = 1,...,J,

satisfies

where C depends only on Nu, 7^, ca, and NA from Assumption 4.5. Here d is thedimension of the problem.

P r o o f . For every Ai let ki 6 RBM be the minimizer of

Using the fact that Qo is an ^-projection in local sense as in (17), and thatk = Qok for all k 6 RBM together with Poincare"-Korn's inequality (Lemma 4.6)we get

Now, summing over i = 1,..., J and using Assumption 4.5 1) on the boundedoverlaps of {£?{} we get

445

446

The polynomial s(x) is given by

Lemma 4.7, as we will see later, verifies the weak approximation property (9) ofthe tentative coarse space.

4.2. Prolongator smoother.This section is devoted to the specification of our prolongator smoother S.As 5 is supposed to take care of the energy properties of the smoothed coarse

space, it is clear that S must be a function of A. For convenience of construction,let us take 5 to be a polynomial in A. It follows from the assumption (8) by takingu € RBM that the absolute term of 5 must be the identity matrix.

In the next part we present the optimal form of the polynomial 5 and verify itssmoothing properties required by Assumption 3.1.

A l g o r i t h m 4.8. Let us have an available upper bound QA of Q(A), and let usdenote by ds the desired degree of the polynomial smoother. Also, let us have UK =dim(RBM) and the number of subdomains J. We obtain the smoothed prolongationoperator P = SP by the following algorithm:

For i = 1,... , J • UK

Set w to be the i-th column of P.For k = l,...,ds

Set w := w - r^1 Aw, where rk = QA s^C^Ti)-end.Set the i-th column of P to w.

end.

The particular choice of rk is justified in the next lemma proposed by Mandel: outof the set of polynomials of a fixed degree ds with an absolute term equal to one wechoose the one with the best smoothing properties.

Lemma 4.9. Let Pn be a set of polynomials of degree n such that p(0) — I forall p(x) e Pn. Then for any integer n > 0, there is a unique polynomial s(x) e Pn

such that

In addition, the polynomial s(x) satisfies

P r o o f . It follows from the minimization properties of the Chebyshev polyno-mials that the polynomial

must be a scaled and shifted Chebyshev polynomial of degree (2n+l) such that itsatisfies (19) and the constraint s(0) = 1. Using this argument one arrives at

where w^n+i is the Chebyshev polynomial and from s(0) = q'(0) we have

This proves (19) and (21).The polynomial q(x) vanishes at the points x where iU2n+i(l - 2x) = 1, that is;

where 1 — 2x = cos (2Am/(2n 4- 1)). The value fc = 0 gives the simple root of q(x)r0 = 0, while k = 1,... ,n yield double roots given by (20).

Now we prove (22). This is equivalent to proving s2(z) ^ 1 for all x € [0,1]. Using(24) we will show that

After a substitution I — 2x = x it becomes

which is the well-known fact that the graph of the Chebyshev polynomial lies aboveits tangent at x = 1.

It remains to show (23). We know that s(x) is a polynomial of degree n withreal roots only. Further, we know that q(x) is bounded by the line y = (2n+i)3 andthat this bound is attained at all local maxima of q(x). Also, s(0) = 1. From thesefacts one deduces that s(x) is bounded within y — ± /2n+

11\ /^ and that the bound

447

is attained (n+1) times. Also, it is clear that s(x) is convex for x € [0, ri], wheren is the smallest root. It follows that the graph of s(x) lies for x € [0,1] above itstangent at x = 0, i.e.

nNow, using (25), s'(0) = - £ k2^1, the estimates sina; ^ %x, x e [0, |], and

oo 2

£ = T23' we can Prove (23):fc=i

which concludes the proof.

Lemma 4.10. The polynomial prolongator smoother S = s(j^) as in Algo-rithm 4.8, where QA is an available upper bound of Q(A), commutes with A andsatisfies

and

P r o o f . The proof follows directly from Lemma 4.9 by using the spectral map-ping theorem. D

We will see later that for one particular choice of ds Lemma 4.10 verifies theassumptions (7) and (8).

4.3. Schwarz subspaces.It remains to specify the Schwarz subdomains and verify the assumption (10).

448

Assumption 4.11. Each computational subdomain

satisfies

1. diam(n») ̂ CH,2. for all x € fl there exists flk '• x € flk and dist(a;, dfifc \ 30) ^ cH,3. there exists an integer constant Na such that every point x G ft belongs to at

most Nn subdomains ftj (bounded overlaps).

R e m a r k 4.12. For example, the computational subdomains can be derivedfrom the supports of coarse-space basis functions. Information about the coarse-space basis functions is contained in nx-tuples of columns of P, UK = dim(RBM).For this choice of the computational subdomains, one can show that Assumption 4.11on the geometry of the computational subdomains follows from Assumption 4.5 onthe aggregates {Ai} and from the choice of the degree of the prolongator smootherdeg(S) w H/h.

Now, we will construct the unity-decomposing mappings Qt that satisfy the as-sumption (10). For this purpose we define a set of weight functions {ipk} which willbe applied as masks to any finite element function w € V to decompose w to thesubdomains.

Lemma 4.13. From Assumption 4.11 it follows that there is a set of functions

449

By Assumption 4.11 1) and 3), and from Assumption 4.11 2) we have

P r o o f . We will provide just an outline of this proof, for the full version see [13]or [7]. For each flk we define

Further, one can show that

Let us set

For w(x) one can show that

and from (28) using the known equivalence | • |L;P « | • |iy>,«> one also has

The function ^(z) is then given by Vfe(^) = w(x)if>k(x), which trivially satisfiesstatements 2) and 3) of the lemma. Finally, from (28), (29), and (30) it follows that

almost everywhere on fi, which is statement 1) of the lemma. D

Now, we have to deal with the fact that for a PI finite element function w € Vthe product tfrkw is not a PI finite element function: we define an operator Ih •

[£Ti(n)]"->V,

where the sum is taken over all nodes fc in the finite element mesh, Xk are theircoordinates in fl and {vk} is a finite element PI basis.

Lemma 4.14. The operator Ih defined by (31) satisfies

where C is independent of h and depends on the aspect ratio of the finite elements.

450

P r o o f . Let us denote by T the finite element PI triangulation understood asa set of finite elements Tj. Then for every v € [#1(fi)]d,

where fc,- is the minimizer of min \\v — k\\\HiiT.\\d.3 A6RBM(fi) "l ( ')l

Let us prescribe function values (/i,..., /W) at all N vertices (x\,. ..,XN) of oneelement Tj. Then it is obvious that the minimizer of

is a linear function, i.e. for all v € [Hl(Tj)]d we have \IhV\\H1 (Tj)]d ^ H[#1(TJ)]<i-Using this argument, the fact that rigid body modes are approximated exactly on

PI elements, and Korn's inequality we have for all v 6 (Jf1(n)]d on all finite elementsTJ the inequality

Note that Korn's constant depends on the chunkiness parameter of each of the PIfinite elements Tj, i.e. on their aspect ratio.

Finally, when we substitute (34) into (33) we get

which completes the proof. D

Having still in mind the verification of the assumption (10), we are now ready togive a constructive proof of the following lemma.

Lemma 4.15. Under Assumption 4.11 there exist operators Q^: V -> V;, i =1,..., J decomposing tie unity in V such that

with a constant C which depends only on the aspect ratio of the elements and onthe overlap bound NQ.

451

P r o o f . Let us setQk = Ihi>k , k = l,...,J,

where, Ih '• [#1(fi)]d -> V is as in (31) and i/>k is as in Lemma 4.13.The decomposition of unity of Lemma 4.13 2) immediately yields the decomposi-

tion of unity by Q£.Using the stability of Ih in | • |e proved in Lemma 4.14 we get

Investigating only the integrand and applying Lemma 4.13, we obtain

Substituting into (36), we conclude

This inequality together with bounded overlaps in Assumption 4.11 3) gives

which proves (35).

452

453

Then the assumptions (7) and (8) on the smoother S from the abstract convergencetheory are verified by Lemma 4.10. The assumption (9) on the tentative prolongatorP is verified in Lemma 4.7 under Assumption 4.5 on the aggregates. The assumption(10) on the Schwarz subspaces follows from Lemma 4.15 under Assumption 4.11 onthe computational subdomains. D

6. COMPUTATIONAL COMPLEXITY

Following Vanfik and Brezina in [5], we will now give an asymptotic bound on theamount of floating point operations needed to carry out the iteration to reduce theerror to the truncation level. We will give estimates for the implementation on bothserial and parallel architectures.

where E is the error-propagation operator as in (3), K is the overlap bound (2), andCL > I is a constant which depends only on Nu, 70,, cu, NQ, NA and the aspectratio of the finite elements.

P r o o f . By the Gershgorin estimate of g(A) we set QA = Chd~2. The choice ofQSKs follows from (26).

5. CONVERGENCE THEOREM

In this section we are summing up the properties of the three components of thealgorithm (the tentative prolongator, the prolongator smoother and the Schwarzdomain decomposition) to verify the abstract Assumption 3.1 and to state the con-vergence theorem for the Schwarz domain decomposition algorithm with a coarsespace by smoothed aggregation.

Theorem 5.1. Consider Algorithm 2.1 with a coarse space by smoothed aggrega-tion using the tentative prolongator P from Algorithm 4.2 satisfying Assumption 4.5,together with the prolongator smoother S constructed by Algorithm 4.8, for whichcH/h ^ deg(S) ^ CH/h, and the computational subdomains satisfying Assump-tion 4.11. Such an algorithm converges at the rate

Let Nea denote the typical number of elements per subdomain, let d be the di-mension of the space on which the continuous problem is cast, and n the number ofall degrees of freedom in the whole system.

Let us first compute the amount of work needed for the setup. On a machine witha single CPU, we need O(deg(<5)n) operations to compute the prolongator P = SP.Taking into account that deg(<?) « ^ w Nejd, this becomes O(Nlldri). Further, we

3J-2 3d-2

need O(jf-Nead ) and O((jp-)~*~) operations to compute Cholesky factorizationsof the local and coarse level matrices, respectively. We also need O(n) operations tocompute the coarse level matrix, but this number can be neglected, as it is dominatedby other expenditures.

2d-l j(J_1

Each step of the iteration requires O(-^-Nee'1 ) and O((-ff-)~*~) operations tocompute the back substitutions in the local and coarse spaces, respectively. Theamount of work required to compute the defect, the corrections and the restrictionis O(n) and hence negligible.

Taking into account all the above listed expenditures, we use trivial calculus toconclude that the optimal value of the number of elements per subdomain is Nes =nMsf That is, N°^ = n? for 2D problems and N°^ = n& for 3D problems. Thetotal amount of work involved in the setup and the iterations for these optimal valuesis G(n%) and O(n$) in 2D and 3D, respectively.

The reason why we have introduced the coloring classes C; in the algorithm wasto facilitate the use of modern parallel architecture computers. For simplicity, weassume that we have at least n% processors. Then most of the procedures can takeadvantage of parallel implementation. In the evaluations of computational work weomit all operations costing O(n) operations.

The setup will then require O(deg(S)n2) operations to compute the prolonga-tor P = SP. If we assume that the local Cholesky decompositions are performedin parallel, we need O(Nesd ) and O((j^-)~^~) operations to compute Choleskyfactorizations of the local and coarse level matrices, respectively.

2d-l 2d_l

Each step of the iteration will require O(Nesd ) and O((j^)~;'~) operations tocompute the back substitutions in the local and coarse spaces, respectively.

Balancing these values, we obtain that the optimal size of a subdomain is aboutns in both 2D and 3D. The resulting computational complexity can be then boundedby O(n) in 2D and by O(n%) in 3D.

The above estimates show that the amount of work required to complete thewhole iterative process (including its setup) is asymptotically lower even than theback substitution step of direct methods based on matrix factorization, which wouldbe 0(ni) and O(n%) in 2D and 3D, respectively.

454

7. NUMERICAL EXPERIMENTS

7.1. Cube-test.We have tested numerical stability on the following example: let us pose a linear

elasticity problem on a geometry as in Fig. 1 consisting of a cube with a trussattached to one of its sides. On one end of the truss we set the displacement to bezero and on the opposite face of the cube we place a load-force pointing downwards.Let us consider a variable width of the truss. The model is discretized by PI finiteelements and aggregation is performed to create aggregates of radius 2. The degree ofthe prolongator smoother is also 2. All subspace problems (on Schwarz subdomainsand on the coarse space) are solved by LU decomposition. Table 1 compares thenumerical performance of a non-preconditioned conjugate gradient method to theconjugate gradient method preconditioned by our algorithm. We stop the CGMiteration loop once the relative preconditioned residual satisfies

where C denotes the domain decomposition preconditioner, cond(C, .A) is the condi-tion number estimate computed at runtime and Cj is the error after the i-th iteration.In this case we have used e — 1.0 • 10~6.

Fig. 1. Geometry of cube-test.

455

widtha

no truss12642

cond(A)

4.96 • 104

2.62 • 107

2.98 • 108

1.40 • 109

1.69 • 1010

iters.CGM588121511181 5 0 92 2 8 1

cond(C, A)

9 . 5 77 . 9 97 . 9 98.168.01

iters.PCGM

1816161918

no. ofnodes6502566621655576531065139

Table 1. Comparison of CGM vs. PCGM in cube-test.

Observe that as the truss is made thinner we loose control over the conditionnumber of the non-preconditioned problem.

7.2. Industrial models.In this section we give results of numerical experiments on some industrial models

obtained by courtesy of the University of Colorado in Denver. Again, we have usedthe symmetric Schwarz overlapping domain decomposition algorithm as a precondi-tioner of the conjugate gradients method with the same stopping condition (38), inthis case with e = 1.0 • 10-5. All experiments have been performed by the courtesyof the Supercomputer Centre at the University of West Bohemia in Plzen, CzechRepublic, on Digital AlphaServer 8400, kirke. zcu. cz, at 3330-4800 MFlops/s with2GB of RAM, using all of its 8 processors.

aggregatesradius=deg(S)

123

domainsno.

1,215450235

colors91211

iter.999

cond3.474.024.57

CPU/wall time [s]setup

652/793108/39349/95

iteration25/8422/16128/117

RAM[MB]78.684.8125

Table 6. shell

aggregatesradius=deg(S)

12

domainsno.

1,082295

colors2020

iter.1313

cond7.048.80


136/250850/3297

iteration120/193410/493

RAM[MB]465

1, 300Table 5. solid2

nametestlsolidlsolid2shell

no. of nodes12, 12525,05840,3299,915

dofs on node3336

||A||oo4.35 • 108

2.88 • 10s

1.49 • 107

3.18 .105

input data16.5 MB27.7 MB75.9 MB20.0 MB

meshn.a.

Fig. 2n.a.n.a.

Table 2. Description of experimental input dataaggregates

radius=deg(S)12

domainsno.620206

colors1814

iter.1312

cond8.087.56


193.4/30245.27/93

iteration26.9/13336.8/166

RAM[MB]47.683.0

Table 3. testlaggregates

radius=deg(S)123

domainsno.

1,433346140

colors252118

iter.667

cond1.832.283.04


155/178108/143210/378

iteration25/3644/7968/156

RAM[MB]212445765

Table 4. solidl

456

Pig. 2. solidl, courtesy of prof. Charbel Farhat, University of Colorado at Boulder

References

[1] R. A. Adams: Sobolev Spaces. Academic Press, London, 1975.[2] P. E. Bjorstad, J. Mandel: On the spectra of sums of orthogonal projections with ap-

plications to parallel computing. BIT 31 (1991), 76-88.[3] J. H. Bramble: Multigrid Methods. Pitman Res. Notes Math. Ser. 296, Longman Scien-

tific and Technical, 1993.[4] J. H. Bramble, J. E. Pasciak, J. Wang, J. Xu: Convergence estimates for product iter-

ative methods with applications to domain decomposition and multigrid. Math. Comp.57(1991), 1-21.

[5] M. Brezina, P. Vanek: One Black-box Iterative Solver. University of Colorado, Denver,1997, to appear.

[6] C. M. Dafermos: Some remarks on Horn's inequality. ZAMP 19 (1968), 913-920.[7] M. Dryja, O. Widlund: An additive variant of the Schwarz method for the case of many

subregions. Technical Report. Courant Institute of Mathematical Sciences 339, 1987.[8] V. A. Konratiev, O. A. Oleinik: Hardy's and Korn's type inequalities and their applica-

tions. Rend. Mat. Appl. (7) 10 (1990), 641-666.[9] J. Ne6as, I. Hlavdtek: Introduction to the Mathematical Theory of Elastic and

Elasto-plastic Bodies. TKI, SNTL Praha, 1983. (In Czech.)[10] K. Rektorys: Variational Methods hi Engineering and Problems of Mathematical

Physics. TKI, SNTL Praha, 1974. (In Czech.)[11] P. Vanik: Acceleration of convergence of a two-level algorithm by smoothing transfer

operator. Appl. Math. 37 (1992), 265-274.[12] P. Vanek, M. Brezina, J. Mandel: Convergence of Algebraic Multigrid Based on

Smoothed Aggregation. University of Colorado, March 1998, to appear.

457

[13] P. Vanek, M. Brezina, R. Tezaur: Two-level method for solids on unstructured meshes.To appeal in SIAM J. Sci. Comput.

[14] P. Vanek, J. Kfizkovd: Two-level Method on Unstructured Meshes With ConvergenceRate Independent of the Coarse-Space Size. University of West Bohemia, Plzen, preprintno. 70, Jan 1995.

[15] P. Vanek, J. Mandel, M. Brezina: Algebraic multigrid by smoothed aggregation forsecond and fourth order elliptic problems. Computing 55 (1996), 179-196.

[16] J. Xu; Iterative methods by space decomposition and subspace correction. Siam Review34, 4 (1992), 581-613.

[17] J. Xu: An Introduction to Multilevel Methods. VII. Numerical Analysis Summer School.University of Leicester, UK, to be published by Oxford University Press.

Author's address: Ales Janka, Zapadoceska univerzita, katedra matematiky fakulty Ap-likovanych ved, Univerzitni 22, 30614 Plzen, Czech Republic, e-mail: janka3Qkma.zcu.cz.

458

Documents

Algebraic domain decomposition solver for linear elasticityperso.unifr.ch/ales.janka//papers/applmath.pdfRng(ATj), i = 1,,.. , J, so tha t for all w € R"1 we hav e Further, let Q