27
A Sparse H-Matrix Arithmetic. Part II: Application to Multi-Dimensional Problems W. Hackbusch and B. N. Khoromskij, Leipzig Received March 3, 1999 Abstract The preceding Part I of this paper has introduced a class of matrices (H-matrices) which are data- sparse and allow an approximate matrix arithmetic of almost linear complexity. The matrices discussed in Part I are able to approximate discrete integral operators in the case of one spatial dimension. In the present Part II, the construction of H-matrices is explained for FEM and BEM applications in two and three spatial dimensions. The orders of complexity of the various matrix operations are exactly the same as in Part I. In particular, it is shown that the applicability of H-matrices does not require a regular mesh. We discuss quasi-uniform unstructured meshes and the case of composed surfaces as well. AMS Subject Classifications: 65F05, 65F30, 65F50, 65N38, 68P05, 45B05, 35C20. Key Words: Fast algorithms, hierarchical matrices, hierarchical block partitioning, sparse matrices, matrix inversion, BEM, FEM. 1. Introduction In Part I [6], the class of H-matrices is introduced and it is shown that this technique provides an ecient tool for sparse hierarchical approximation to large and fully populated stiness matrices arising in BEM (boundary element method) and FEM 1 applications. In particular, the storage, the matrix-vector multiplica- tion and standard matrix operations like the (truncated) matrix-matrix product and matrix inversion of H-matrices have a complexity between On and On log 2 n, where n is the problem size 2 . For example, the arithmetic of H- matrices can be applied to Schur-complements of H-matrices. The construction of H-matrices involves the same cluster tree of the underlying domain as the panel clustering technique (see [10] or [4, 11, 13]). The panel clus- tering matrix representation uses a row-wise clustering procedure and provides a matrix-vector multiplication of the complexity On log d 1 n for boundary element problems posed in R d ; d 2; 3: However, these panel clustering matrices cannot Computing 64, 21–47 (2000) 1 In the FEM case, the inverse matrix is the full one which needs a data-sparse representation. 2 If rank-k matrices are used for the block matrices, the constant in O... depends on k.

A Sparse H-Matrix Arithmetic

Embed Size (px)

Citation preview

Page 1: A Sparse H-Matrix Arithmetic

A Sparse H-Matrix Arithmetic.

Part II: Application to Multi-Dimensional Problems

W. Hackbusch and B. N. Khoromskij, Leipzig

Received March 3, 1999

Abstract

The preceding Part I of this paper has introduced a class of matrices (H-matrices) which are data-sparse and allow an approximate matrix arithmetic of almost linear complexity. The matrices discussedin Part I are able to approximate discrete integral operators in the case of one spatial dimension.

In the present Part II, the construction of H-matrices is explained for FEM and BEM applications intwo and three spatial dimensions. The orders of complexity of the various matrix operations areexactly the same as in Part I. In particular, it is shown that the applicability of H-matrices does notrequire a regular mesh. We discuss quasi-uniform unstructured meshes and the case of composedsurfaces as well.

AMS Subject Classi®cations: 65F05, 65F30, 65F50, 65N38, 68P05, 45B05, 35C20.

Key Words: Fast algorithms, hierarchical matrices, hierarchical block partitioning, sparse matrices,matrix inversion, BEM, FEM.

1. Introduction

In Part I [6], the class of H-matrices is introduced and it is shown that thistechnique provides an e�cient tool for sparse hierarchical approximation to largeand fully populated sti�ness matrices arising in BEM (boundary element method)and FEM1 applications. In particular, the storage, the matrix-vector multiplica-tion and standard matrix operations like the (truncated) matrix-matrix productand matrix inversion of H-matrices have a complexity between O�n� andO�n log2n�, where n is the problem size2. For example, the arithmetic of H-matrices can be applied to Schur-complements of H-matrices.

The construction of H-matrices involves the same cluster tree of the underlyingdomain as the panel clustering technique (see [10] or [4, 11, 13]). The panel clus-tering matrix representation uses a row-wise clustering procedure and provides amatrix-vector multiplication of the complexity O�n logd�1 n� for boundary elementproblems posed in Rd ; d � 2; 3: However, these panel clustering matrices cannot

Computing 64, 21±47 (2000)

1 In the FEM case, the inverse matrix is the full one which needs a data-sparse representation.2 If rank-k matrices are used for the block matrices, the constant in O�. . .� depends on k.

Page 2: A Sparse H-Matrix Arithmetic

cheaply be multiplied or inverted. For this purpose, the H-matrices are based ona further block-cluster tree, which leads to a rather general block decompositionof the matrix. Such a block decomposition is discussed in Part I [6] for a regularone-dimensional mesh. Here we concentrate on the corresponding construction ofH-matrices for 2D and 3D applications. In particular, these matrices approxi-mate dense matrices arising in 2D and 3D boundary element Galerkin/collocationmethods. Our assumption (12) on the kernel is also typical for the reliability ofwavelet approximation techniques (cf. [1, 2, 15]). However, di�erent from waveletapplications, we do not require (global or piecewise) smoothness of the surface(normal direction).

The practical implementation of the H-matrices is uniquely de®ned by thecluster tree and the choice of the far ®eld condition (9). A general constructionof the block-cluster trees is presented in x2 and x3. The reliability of H-matrixapproximations in BEM will be brie¯y discussed in x3.5. The complexityanalysis is prepared in several steps. First, the case of a regular 2D tensor-product mesh is analysed, see x4. In a second step, quasi-uniform and shape-regular unstructured triangulations are admitted (x5.1). In this way, the resultsobtained for the tensor-product meshes are used for the construction of anasymptotically optimal approximation to the minimal admissible cluster tree onthe given unstructured grid. More complicated case like manifolds composedfrom smooth patches are discussed in x5.2. Concerning the 3D case, we describethe H-matrices for regular 3D meshes in x6. Generalisations to unstructuredmeshes are completely analogous to the 2D case. Although the same techniquescan be applied also to any dimension d > 3, we omit this case.

The discussion of the H-matrix arithmetic is to completed by further papers3

about the numerical performance, the applicability to adaptive (non-quasi uni-form) grids, and the analysis of anisotropic kernels etc.

2. The Cluster Tree

We recall the de®nitions of an H-tree and of the particular partitionings intro-duced in [6]. Let I be a ®nite index set. Consider the vector space KI consisting ofvectors v � �vi�i2I over the ®eld K 2 fR;Cg. The usual block partitioning of avector is described by a (®xed) partitioning of I into disjoint subsets, i.e.,P � fIj : 1 � j � kg with

I � �[k

j�1Ij: �1�

In the following, we consider many di�erent partitionings including (locally)coarse or ®ne partitionings. The set of these partitionings is hierarchically struc-tured and is uniquely de®ned by the tree T � T �I�. The name H-tree is due to itshierarchical structure. The exact description of T is given in De®nition 2.1.Therein we use the notation

3 Recent, already available papers are [7±9].

22 W. Hackbusch and B. N. Khoromskij

Page 3: A Sparse H-Matrix Arithmetic

S�t� :� fs 2 T : s is son of tg for t 2 T �2�

for the sons of a vertex. A leaf is characterised by S�t� � ; �or #S�t� � 0�.De®nition 2.1. Let I be an index set. A tree T is called an H-tree (based on I) if thefollowing conditions hold:

(i) I 2 T .(ii) If t 2 T is no leaf, S�t� contains disjoint subsets of I and t is the union of its sons,i.e.,

t � �[s2S�t�

s: �3�

We conclude that I is always the root of T and t � I holds for all t 2 T . Usually,the tree is constructed such that #S�t� 6� 1, i.e., either t is a leaf or it has at leasttwo sons (cf. De®nition 2.1 in [6]). Note that #S�t� � 1 and s 2 S�t� imply s � tbecause of (3). However, in view of later theoretical constructions, we do notrequire #S�t� 6� 1 in De®nition 2.1.

Since the subsets t � I which form the vertices of T , are called clusters, the H-treeT is also named cluster tree.

As in [6], the set of all leaves is denoted by

L�T � :� ft 2 T : S�t� � ;g:In the following, we restrict all partitionings to those which are built by setscontained in the tree T . For these we use the name H-partitioning (or, T-parti-tioning, if we want to refer to the tree T ), i.e., P � fIj : 1 � j � kg with (1) is aT -partitioning of I if Ij 2 T (or equivalently, P � T �. The set of all suchT -partitionings is denoted by P�T �.

Remark 2.2. (i) P�T � � fL�T 0� : T 0 is a subtree of T and an H-treeg: (ii) Thereis a one-to-one mapping between T-partitionings and H-subtrees T 0; given byT 0 7! P :�L�T 0� 2 P�T �.

Proof: Part (i) follows from (ii). For the proof of (ii) let anH-subtree T 0 be given.Then (3) ensures I � �Ss2L�T 0� s; hence, P :�L�T 0� is a T -partitioning. If a T -partitioning P is given, consider the subtree T 0 of T consisting of all t 2 T witht\ Ij � Ij or t\ Ij � ; for all Ij 2 P . (

Remark 2.2 (ii) allows to de®ne a partial ordering.

De®nition 2.3. If P 0 �L�T 0� and P 00 �L�T 00� are two T-partitionings, P 0 is called®ner (coarser) than P 00 if and only if T 0 � T 00 �T 0 � T 00�.

So far, we have admitted arbitrary subsets of I as vertices of the tree T . Later,each index i 2 I will carry a position xi 2 Rd (e.g., d � 3). We may also identifythe index i with the position xi. This allows to de®ne a diameter

A Sparse H-Matrix Arithmetic. Part II 23

Page 4: A Sparse H-Matrix Arithmetic

diam�t� :� maxi;j2tkxi ÿ xjk for t 2 T �4�

using the Euclidean norm in Rd . Since we want to have many indices in t with tpossessing a diameter as small as possible, the naming cluster for t 2 T makessense and leads to the name cluster tree for T . Further, we will need the distance oftwo clusters:

dist�s; t� :� mini2s;j2t

kxi ÿ xjk for s; t 2 T : �5�

Remark 2.4. For Galerkin discretisations it is more reasonable to associate eachindex i with the support4 Xi � Rd of the corresponding basis functions. We in-troduce the notation

X �t� :�[i2t

Xi for t 2 T : �6�

In this case, the de®nitions (4) and (5) become

diam�t� � maxx;y2X �t�

kxÿ yk for t 2 T ; �7�

dist�s; t� � minx2X �t�;y2X �s�

kxÿ yk for s; t 2 T : �8�

The construction of the cluster tree T is the essential part of the H-matrix con-struction. In x4 we describe the cluster tree for a particular two-dimensional grid.This gives rise to a more general cluster tree for general two-dimensional mani-folds (x5). The three-dimensional case is discussed in x6. Other algorithms forgenerating the cluster tree can be considered as well (compare, e.g., Lage [12]).

3. The Block-Cluster Tree

While the vector components are indexed by i 2 I , the matrix entries have indicesfrom the index set I � I . The block-cluster tree is nothing but the cluster tree forI � I instead of I . Its notation is T2 � T �I � I�, while we write T1 � T �I� for theprevious cluster tree corresponding to I . We will describe a mapping s : T1 7!T2

which constructs the block-cluster tree T2 in a unique way from the cluster tree T1

discussed above. Therefore, the block-cluster tree T2 is ®xed as soon as the clustertree T1 is de®ned. In any case, the vertices of T2 belong to the Cartesian productT � T .

The H-matrices will be constructed on the basis of a particular (optimal) blockpartitioning P2 � T2.

4 Replace Xi � Rd by Xi � C � Rd�1 in the case of a manifold C.

24 W. Hackbusch and B. N. Khoromskij

Page 5: A Sparse H-Matrix Arithmetic

3.1. Construction of T2 from T

We describe two mappings s : T 7! T2. The simpler one is given in

Construction 3.1. Start with I � I 2 T2 and de®ne the sons of b � �t1; t2� 2 T2

(where t1; t2 2 T ) recursively by

� �s1; s2� with s1 2 S�t1�; s2 2 S�t2�, provided these sons exist,

� �t1; s2� with s2 2 S�t2�; if S�t1� � ; and S�t2� 6� ;;� �s1; t2� with s1 2 S�t1�; if S�t1� 6� ; and S�t2� � ;,� S2�b� � ; if S�t1� � ; and S�t2� � ;.In the latter case, S2 denotes the set function (2) for T2 instead of T1 � T �I�. Wecollect some trivial results in the next remark.

Remark 3.2. a� The depth of the tree T2 equals the depth of T.

b� If all branches of T have the same length k; only the ®rst and fourth cases ofConstruction 3:1 occur.c� Assume the case of �b�. If T is a binary tree, then T2 is a quadtree.

Due to Part (c) of the remark, we might like to modify Construction 3.1. Thesecond construction tries to ensure that the components t1; t2 2 T in b � �t1; t2�2 T2 do not have too di�erent diameters, while the degree of the verticesb � �t1; t2� equals the degree of either t1 or t2 2 T . This leads to the

Construction 3.3. Start with I � I 2 T2 and de®ne the sons of b � �t1; t2� 2 T2

�where t1; t2 2 T � recursively by

� �s1; t2� with s1 2 S�t1�; if S�t2� � ; or diam�t1� � diam�t2�; provided S�t1� 6� ;,� �t1; s2� with s2 2 S�t2�; if S�t1� � ; or diam�t1� < diam�t2�; provided S�t2� 6� ;,� S2�b� � ; if S�t1� � ; and S�t2� � ;.Similarly, one can replace the goal diam�t1� � diam�t2� by other options, e.g.,#t1 � #t2 (i.e., t1 and t2 should contain a similar number of indices). In the lattercase, a binary tree T1 leads to a binary tree T2.

It is an easy exercise to check that T2 satis®es the conditions of De®nition 2.1, i.e.,in both cases we obtain an H-partitioning of the product set I � I .

3.2. Admissible Blocks, Admissible T2-Partitionings

To guarantee a su�cient approximation, we need the admissibility condition

minfdiam�t1�; diam�t2�g � 2g dist�t1; t2� �9�

for the block b � �t1; t2�. Here, g < 1 is a constant which will be ®xed later (see,e.g., (16)).

A Sparse H-Matrix Arithmetic. Part II 25

Page 6: A Sparse H-Matrix Arithmetic

De®nition 3.4. A block b � �t1; t2� 2 T2 is called admissible if either b is a leaf or�9� holds.

In x2, we have introduced a T2-partitioning of I � I . It can be regarded as the setL�T 0�, where T 0 is a subtree of T2 with the properties I � I 2 T 0 and (3) withrespect to I � I . Another name for the T2-partitioning would be a covering ofI � I , since it is a subset P � fb1; . . . ; bpg � T2 of disjoint blocks with[1�i�pbi � I � I .

De®nition 3.5. A T2-partitioning P of I � I is called admissible, if all blocks t 2 Pare admissible.

A trivial example for an admissible T2-partitioning is P �L�T2�.

Remark 3.6. Let P 0 �L�T 0� and P 00 �L�T 00� be two di�erent admissible T2-partitionings, then the intersection T 0 \ T 00 yields an admissible T2-partitioningP �L�T 0 \ T 00�, which is ®ner than P 0 and P 00 in the sense of De®nition 2:3.Furthermore #P < minf#P 0;#P 00g holds for the number of blocks.

Proof: Since I � I 2 T 0 \ T 00, it remains to check that T 0 \ T 00 satis®es (3). Lett 2 T 0 \ T 00 be no leaf. Then it has (common) sons in T 0 and T 00. Due to (3), the setof sons in T 0 and T 00 must be the complete set S2�t� in both cases; hence, S2�t� isalso the set of sons in T 0 \ T 00, so that (3) holds. (

Due to this remark, we can ask for the smallest admissible partitioning. This leads to

De®nition 3.7. The minimal admissible T2-partitioning of I � I is the admissible T2-partitioning with the minimal number of blocks.

The minimal admissible T2-partitioning can be obtained by a simple search in thetree T2.

Algorithm 3.8. The construction of the minimal admissible T2-partitioningPmin of I � I is obtained as Pmin : U�fI � Ig� with U from

function U�P �; comment P � T2;

begin P 0 :� P ;

for all vertices t 2 P do

if t is not admissible then P 0 :� �P 0nftg� [ U�S2�t��;U :� P 0

end;

�10�

Proof: The proof that (10) yields the minimal admissible T2-partitioning is basedon the following observation: If b � �t1; t2� 2 T2 is admissible, then also all sons oft are admissible. This is due to the fact that the left-hand side in (9) weakly

26 W. Hackbusch and B. N. Khoromskij

Page 7: A Sparse H-Matrix Arithmetic

decreases if t1 or t2 are replaced by the (smaller) sons, while the right-hand sidedist(t1; t2) weakly increases. (

Remark 3.9. If one likes to replace the admissibility condition �9� by anothercondition Adm�b� �a Boolean-valued function�, one should ensure thatAdm�b� ) Adm�s� holds for all s 2 S2�b�.

Remark 3.10. The minimal admissible T2-partitioning is coarser �cf. De®nition2:3� than any other admissible T2-partitioning.

3.3. Complexity Considerations

In [6] we described two particular partitionings P2 � T2. Similarly, we will describea T2-partitioning in x4. This partitioning is admissible (the minimality is notdiscussed but can be shown if g is of appropriate size). For the ®xed partitioning,one can study the complexity of the various arithmetical operations.

In the general case, the T2-partitioning is determined as the minimal admissible T2-partitioning resulting from Algorithm 3.8. In order to ensure the desired com-plexity, it is su�cient to prove the complexity for some admissible T2-partitioning.The existence of such a T2-partitioning is su�cient, a constructive description isnot needed. The proof uses Remark 3.10.

Lemma 3.11. Assume that �i� the computational work increases if the T2-partitioningbecomes ®ner �cf. De®nition 2:3�, �ii� the complexity of some admissible T2-parti-tioning is known. Then the complexity of the minimal admissible T2-partitioning is atleast as good.

3.4. Hierarchical Matrices

In the following de®nition, P2 is a general T2-partitioning of I � I , although inpractical applications we shall use only admissible T2-partitionings P2. Each b 2 P2

corresponds to a location of a matrix block. Given a matrix M � �mij��i;j�2I�I 2KI�I , the matrix block corresponding to b is denoted by Mb � �mij��i;j�2b.

De®nition 3.12. Let P2 be a block partitioning of I � I and k 2 N. The underlying®eld of the vector space of matrices is K. The set of H-matrices induced by P2 is

MH;k�I � I ; P2� :� fM 2 KI�I : each block Mb; b 2 P2; satisfies rank�Mb� � kg:�11�

We call a matrix A an Rk-matrix if rank�A� � k. The properties of Rk- and, inparticular, R1-matrices are discussed in [6].

A Sparse H-Matrix Arithmetic. Part II 27

Page 8: A Sparse H-Matrix Arithmetic

Remark 3.13. All considerations about H-matrices do not refer to a specialordering of the unknowns. The index set is allowed to possess no ordering at all.Only if we visualise the block partitioning as in x4:3; we introduce a numbering ofthe blocks.

Remark 3.14. In De®nition 3.12 the upper bound of the rank is assumed to be thesame for all submatrices. One may consider variable bounds. Then k is a functionk : P2 ! N of the block and the inequality in �11� becomes rank�Mb� � k�b�. Forthe sake of simplicity, we regard k as a constant for the rest of the paper.

3.5. Approximation by H-Matrices

The reliability of H-matrices for the approximation of the integral operators

�Au��x� �Z

Rk�x; y�u�y�dy; x 2 R;

is essentially based on smoothness properties of the kernel5 k(x,y). In the boun-dary element method, integral operators occur with k�x; y� being Green's functionassociated with the partial di�erential equation under consideration or withk�x; y� replaced by a suitable directional derivatives Dk of k�x; y�. Here R is eithera bounded d-dimensional manifold (surface) C � Rd�1 or a bounded domain X inRd ; d � 2; 3: The single layer potential for the Laplace equation in R3 gives thefamiliar example k�x; y� :� 1

4p jxÿ yjÿ1 for x; y 2 R. The smoothness of k�x; y� withrespect to both x and y depends in a typical manner on the distance jxÿ yj. Notethat both the panel clustering method and the H-partitioning approach exploitonly the approximation of k�x; y� by a degenerate kernel (cf. [5, De®nition 3.3.3]).This holds for k�x; y� as well as for @k�x; y�=@n�x� or @k�x; y�=@n�y� (double layerkernel and its adjoint; cf. [5, (8.1.31a,b)]) even if the normal direction n is non-smooth because of the non-smoothness of the surface C, since only the smooth-ness properties of the singularity function k�x; y� are involved. More precisely, weassume that the singularity function k�x; y� satis®es6.

j@ax@

by k�x; y�j � c�jaj; jbj�jxÿ yjÿjajÿjbjjk�x; y�j for all a; b 2 Nd

0 ; x; y 2 Rd ; �12�

where a; b are multi-indices with jaj � a1 � � � � � ad and N0 � N [ f0g. Note thatsimilar assumptions are usually required in the wavelet or multi-resolutiontechnique (cf. [1, 2, 15]).

By De®nition 3.12, H-matrices consist locally (blockwise) of rank-k matrices. Asin the panel clustering method, these low rank matrices can be constructed via a

5 Note that the rank k 2 N and the kernel function k�x; y� are both written as k.6 Estimate (12) is a bit simpli®ed. It covers most of the situations, e.g., the case of the singularityfunction 1

4p jxÿ yjÿ1 for d � 3. As soon as logarithmic terms appear (as for d � 2; k�x; y� �log�xÿ y�=2p�, one has to modify (12).

28 W. Hackbusch and B. N. Khoromskij

Page 9: A Sparse H-Matrix Arithmetic

Taylor expansion7 of k�x; y�. Let x; y vary in the respective sets X �tx� and X �ty� (cf.(6)) corresponding to the clusters tx; ty 2 T and assume without loss of generalitythat diam�X �ty�� � diam�X �tx��. The optimal centre of expansion is the Cheby-shev centre8 y� of X �ty�, since then ky ÿ y�k � 1

2 diam�X �ty�� for all y 2 X �ty�. TheTaylor expansion reads k�x; y� � ~k�x; y� � R with the polynomial

~k�x; y� �Xmÿ1jmj�0

1

m!�y� ÿ y�m @

mk�x; y��@ym

�13�

and the remainder R, which can be estimated by

jRj � jk�x; y� ÿ ~k�x; y�j � 1

m!ky� ÿ ykm max

f2X �ty�;jcj�m

@ck�x; f�@fc

���� ����: �14�

Lemma 3.15. Assume �12� and �9� involving the su�ciently small parameter g < 1.Then for m � 1; the remainder �14� satis®es the estimate

jk�x; y� ÿ ~k�x; y�j � c�m�gmjk�x; y�j for x 2 X �tx�; y 2 X �ty�: �15�

Proof: The estimate ky ÿ y�k � 12 diam�X �ty�� for y 2 X �ty� is already stated. All

x 2 X �tx� and y; f 2 X �ty� satisfy kxÿ fk � dist�X �tx�;X �ty�� � 12gminfdiam

�X �tx��; diam�X �ty��g � 12g diam�X �ty�� � 1

g ky ÿ y�k.

Therefore,

jRj � jk�x; y� ÿ ~k�x; y�j � c�0;m�m!

ky� ÿ ykmjk�x; f�jkxÿ fkm � c�0;m�jk�x; f�j

m!gm

for some f 2 X �ty�. In the upper estimate, we may choose f with maxfjk�x;g�j : g 2 tyg � jk�x; f�j and use kk�x; y�j ÿ jk�x; f�k � jk�x; y�ÿ k�x; f�j � jyÿ fj @kj�x; ~f�=@yj � c�0; 1��jy ÿ fj=jxÿ ~fj�jk�x; ~f�j for some ~f between y and f. Togetherwith jy ÿ fj=jxÿ ~fj � 1

2 diamX �ty�=dist�X �tx�;X �ty�� � g, we conclude thatjk�x; f�j � jk�x; y�j=�1ÿ c�0; 1�g�. Hence, (15) holds with c�m� :� c�0;m�=�m!�1ÿ c�0; 1�g��. (

Let ~A be the integral operator with k�x; y� replaced by ~k�x; y�, provided that�tx; ty� 2 T2 is an admissible block and no leaf (i.e., (9) holds). Construct thecollocation or Galerkin system matrix from ~A instead of A. The perturbation ofthe matrix induced by ~Aÿ A yields a perturbed discrete solution. The e�ect of thisperturbation is studied in several papers on the panel clustering method (cf. [10,13]). Perturbations in the case of a negative order operator A are considered in [3].

7 This does not require that the practical implementation has to use the Taylor expansion. Analternative is the tensor-product interpolation (by polynomials or other functions) of k�x; y�. If thesingular-value decomposition technique from [6] is applied, the estimates are at least as good as theparticular ones for the Taylor expansion.8 Given a set X , the Chebyshev sphere is the minimal one containing X . Its centre is called theChebyshev centre.

A Sparse H-Matrix Arithmetic. Part II 29

Page 10: A Sparse H-Matrix Arithmetic

3.6. On the Choice of g and m

In order to obtain a small error, gm � e < 1 must be ensured for a suitable e.The rank k corresponding to the expansion (13) equals k � #fm 2 Nd

0 :0 � jmj � mÿ 1g � md . We may ®x m (and k) and choose g � e1=m. On the otherhand, g can be ®xed while the polynomial degree m is chosen: m � log e= log g.The ®rst case reminds to the h-version of the FEM, while the latter corresponds tothe p-version. The optimal choice is determined by the arising cost of the H-matrix operations. In the case of quasi-uniform meshes, one may conclude from[10] (therein (3.9a)) that the number of admissible clusters �t1; t2� 2 P2 on eachlevel ` may be estimated by O�gÿd2d`� (see also x4.1). This result implies that theleading term in the cost has a factor proportional to gÿdmd � gÿdk. Hence, onehas to minimise gÿdmd under the side condition gm � e. Allowing for simplicityreal-valued m, the result is

m � j log ej and g � 1=e: �16�The constant value of g expresses the fact that the choice (16) corresponds to thep-version.

4. The Two-Dimensional Model Case

In X � �0; 1� � �0; 1� we consider the regular grid

I � f�i; j� : 1 � i; j � Ng; N � 2p: �17�

Each index �i; j� 2 I is associated with the (collocation) point nij � ��iÿ 12�h;�jÿ 1

2�h� 2 R2, where h :� 1=N : The positions nij are used in (4) and (5).

4.1. The Cluster Tree T1 � T �I�The natural partitioning of I uses a division of the underlying squares into fourquarters. The clusters

t`a;b :� f�i; j� : 2pÿ`a� 1 � i � 2pÿ`�a� 1�; 2pÿ`b� 1 � j � 2pÿ`�b� 1�g �18�

with a; b 2 f0; . . . ; 2` ÿ 1g belong to level `. Hence, the tree T consisting of allclusters of level ` 2 f0; . . . ; pg is a quadtree. The number of clusters on level `equals O�22`�.Each index �i; j� 2 I is associated with the square9

Xij : f�x; y� : �iÿ 1�h � x � ih; �jÿ 1�h � y � jhg; �19�

9 The grid can also be associated with a regular triangulation and, e.g., the supports Xij of piecewiselinear functions. This would lead to another cluster tree. The asymptotic complexity bounds turn out tobe the same as for the present choice.

30 W. Hackbusch and B. N. Khoromskij

Page 11: A Sparse H-Matrix Arithmetic

which may be regarded as the support of the piecewise constant function for theindex �i; j�. Note that on level ` � 0 �t000 � I� we have one big square X �t000� (cf.(6)), while for ` � p we have 4p tiny squares X �tp

a;b�. Using the de®nitions (7) and(8), we obtain the diameter

diam�t� ����2p

2pÿ`h ����2p

=2` �20�for clusters of level `. Let t; t0 be two clusters of level ` characterised by �a; b� and�a0; b0� (cf. (18)). Then

dist�t; t0� � 2ÿ`����������������������������������������������d�aÿ a0�2 � d�bÿ b0�2

qwith d�k� :� maxf0; jkj ÿ 1g: �21�

4.2. The Block-Cluster Tree T2 � T �I � I�Let T2 � T �I � I� be de®ned according to Construction 3.1. An obvious result isstated in

Remark 4.1. Let b � �t1; t2� 2 T �I � I�. Then t1; t2 2 T belong to the same level` 2 f0; . . . ; pg.

Using minfdiam�t1�; diam�t2�g ����2p

=2` and dist�t1; t2� from (21), we observe thatb 2 T �I � I� is admissible for the choice 2g � ���

2p

, if the squares t1; t2 2 T �I� havea relative position as indicated in Fig. 1a: The square X1 corresponding to t1 is thecrossed square, while X2 must be outside the bold area . In the case of g � 1=

���2p

,the admissible T2-partitioning P2 is described in the following subsection.

4.3. H-Matrices

The admissible T2-partitioning P2 is to be de®ned. Because of the regular structureof the grid, the block partitioning P2 �L�T 02� corresponds to a well-structuredsubtree T 02 � T2. This allows the direct constructive de®nition of the T2-parti-tioning P2 and the corresponding H-matrices based on the following recursiveprocedure (cf. [6, Subsection 2.3]).

In [6, Section 5], we have introduced H-matrices which could be explained by thethree formats H (diagonal format), N (right-neighbour format) and N* (left-

Figure 1. Unacceptable clusters for a given cluster ``�'' depending on the threshold constant g

A Sparse H-Matrix Arithmetic. Part II 31

Page 12: A Sparse H-Matrix Arithmetic

neighbour format). Now the diagonal format is denoted by the symbol ( andinstead of two `neighbour formats' we have eight types denoted by the directions!; ; "; #&;.;-;%.

For any t 2 T1;H-matrices over the index set t � t have the (-format de®ned asfollows. If t is a leaf (level ` � p;#t � 1),matrix is 1� 1.Otherwise, t has four sons si

(1 � i � 4). The related square X �t� splits into the four smaller squares Xi :� X �si�.In the following visualisations, the 4 sons of a square are numbered as follows

X1 X2

X4 X3:

Correspondingly, the matrix A( over t � t has a 4� 4-block structure:

A( �

b11 b12 b13 b14

b21 b22 b23 b24

b31 b32 b33 b34

b41 b42 b43 b44

( ! & # ( # .- " ( " % ! (

: �22�

If t is of level ` � p ÿ 1, all blocks bij are of level p and trivial size 1� 1. In thefollowing, we assume that the bij are nontrivial, i.e., ` < p ÿ 1.

All squares X �si�, and X �sj� touch by at least one corner point,hence dist�X �si�;X �sj�� � 0. Therefore, the vertices �si; sj� 2 T2 are not admissibleand deserve a further decomposition. The type of block decomposition dependson the relative position of si; sj.

The diagonal blocks bii �1 � i � 4� belong to the index pairs �si; si� and haveagain format (.

The block b12 has a block format denoted by the arrow! directing from s1 to theright neighbour s2.

The squares X �s1� and X �s3� are diagonally neighboured. The correspondingsymbol of block b13 is &.

The block b14 corresponds to the squares X �s1� and X �s4� (the latter one is situ-ated below the former one). This leads to the #-format.

Similarly, the formats of bij �i � 2� are determined (see (22)).

Next, we have to describe the formats di�erent from(.We startwith the!-format:

A! �

baa bab bac bad

bba bbb bbc bbd

bca bcb bcc bcd

bda bdb bdc bdd

R R R R

! R R &% R R !R R R R

: �23�

32 W. Hackbusch and B. N. Khoromskij

Page 13: A Sparse H-Matrix Arithmetic

The matrix A! corresponds to the index pair �s; s0� 2 T2, where X �s0� is the rightneighbouring square of X �s�. The sons fa; b; c; dg of s and the sons fa; b; c; dg of s0

correspond to the squares situated as follows:

a b a b

d c d c:

The squares X �a� and X �a� satisfy diam�a�= diam�a� � ���2p

h`ÿ1�h`ÿ1 � 21ÿ`� anddist�a; a� � h`ÿ1. Hence, (9) holds with g � ���

2p

=2 and the pair �a; a� 2 T2 is ad-missible. By de®nition, the block baa can be represented by an Rk-matrix (thisformat is denoted by `R'). A di�erent situation arises for �b; a� 2 T2, where X �a� isthe direct right neighbouring square of b. Therefore, the block bba has the !-format. The complete result is described in (23).

The format

A �

R . R

R R R R

R R R R

R - R

�24�

is transposed to (23). Similarly,

A" �

R R % "R R " -R R R R

R R R R

; A# �

R R R R

R R R R

. # R R

# & R R

: �25�

The format of A% is even simpler. The blocks baa;� � � correspond to pairs ofsquares situated as follows:

a b

d c:

a b

d c

Only the �b; d�-block leads to dist�X �b�;X �d�� � 0 and requires a further de-composition. All other pairs of squares have a su�ciently large distance; there-fore, those blocks baa; � � � are de®ned to be Rk-matrices:

A Sparse H-Matrix Arithmetic. Part II 33

Page 14: A Sparse H-Matrix Arithmetic

A% �

R R R R

R R R %R R R R

R R R R

; A. �

R R R R

R R R R

R R R R

R . R R

: �26�

Similarly,

A- �

R R - R

R R R R

R R R R

R R R R

; A& �

R R R R

R R R R

& R R R

R R R R

: �27�

The recursions (22)±(27) de®ne a subtree T 02 of T2. The root I � I is of type ((level ` � 0� and has 16 sons (the 16 blocks of A(). According to (22), 4 sons areof type ( (level `ÿ 1), 2 sons of each of the types!; ; "; # and 1 son of each ofthe types&;.;-;% (see Fig. 2a). A vertex of type! has 12 sons of rank-k-type(R), 2 sons of type ! and 1 son of each of the types &;% (see Fig. 2b). Figures2c, d show the tree structure for the remaining types. The leaves of the subtree T 02are reached if the vertex has type R (i.e., condition (9) satis®ed) or if level p isreached (blocks of size 1� 1). In this particular situation, the T2-partitioningP2 �L�T 02� is the minimal admissible partitioning which also results from Al-gorithm 3.8 for the choice g � ���

2p

=2.

Figure 2 gives rise to the graph of Fig. 3, whose vertices are the formats. Thisgraph is a tree except the cycles induced by the edges of all formats 6� R to itself.

Figure 2a±d. The subtrees of the diagonal and typical auxiliary formats

34 W. Hackbusch and B. N. Khoromskij

Page 15: A Sparse H-Matrix Arithmetic

The edges were weighted by the multiplicity already shown in Fig. 2. The dis-cussion in the next Subsection will demonstrate the following remark.

Remark 4.2. a�For the following complexity considerations it is essential that onlythe format ( has a self-reference with weight 4; whereas all other weights are � 3.b�The complexity order does not depend on the number of di�erent formats. Forinstance, choosing g smaller than 1=

���2p

(as in Fig. 1b� one would need moreformats, but again only format ( has a self-reference with weight 4.

Formally, the recursions (22)±(27) must be used to de®ne the T2-partitioning P2,while in a second step the H-matrix set MH;k�I � I ; P2� is de®ned by De®nition3.12. Instead, we can give a direct de®nition of the matrix sets M�

`;p with upperindex � 2 fR;(;!; ; "; #;&;.;-;%g and level number 0 � ` � p 2 N0 (notethat by (17), the index set I depends on p). First we de®ne

M`;p�t1; t2� :� Kt1�t2 ; where t1; t2 2 T1 belong to level `;

i.e., i 2 t1 are the row indices and j 2 t2 the column indices of A 2M`�t1; t2� (notethat by Remark 4.1, the block matrices are of this kind). For ` � 0; t1 � t2 � I isthe only vertex of that level, but in general t1 6� t2 is possible. The level-`-matricesand the corresponding Rk-matrices are denoted by

M`;p :� fA 2M`;p�t1; t2� : t1; t2 2 T1 belong to level `g;MR

`;p :� fA 2M`;p : rank�A� � kg:

Here k � 1 is ®xed. The following recursive de®nition starts from ` � p and endswith ` � 0. Since #t1 � #t2 � 1 for level ` � p, we have that

Figure 3. The graph of the involved formats

A Sparse H-Matrix Arithmetic. Part II 35

Page 16: A Sparse H-Matrix Arithmetic

M�p;p is the set of 1� 1-matrices for all � 2 fR;(;!; ; "; #;&;.;-;%g:

For ` < p, the sons S�t1� � fa; b; c; dg; S�t2� � fa; b; c; dg of the vertices t1; t2 areassumed to have the geometric constellation as described in the beginning of thisSubsection (i.e., b �b� is the right neighbour of a �a�, etc.).� De®nition of M%

`;p: For ` < p; a matrix A 2M`;p�t1; t2� belongs to M%`;p, if its

block matrices in A � fAijgi2fa;b;c;dg;j2fa;b;c;dg satisfy Ab;d 2M%`�1;p and

Aij 2MR`�1;p; otherwise (cf. (26)).

� Similarly, M&`;p;M

-`;p;M

.`;p are de®ned (cf. (26), (27)).

� De®nition of M!`;p: For ` < p, a matrix A 2M`;p�t1; t2� belongs to M!

`;p, if itsblock matrices in A � fAijgi2fa;b;c;dg;j2fa;b;c;dg satisfy Aba;Acd 2M!

`�1;p;Aca 2M%

`�1;p; Abd 2M&`�1;p; and Aij 2MR

`�1;p otherwise (cf. (23)).

� Similarly, M `;p;M

#`;p;M

"`;p are de®ned.

� De®nition of M(`;p: For ` < p, a matrix A 2M`;p�t1; t1� belongs to M(

`;p, if itsblock matrices in A � fAijgi;j;2fa;b;c;dg satisfy Aii 2M(

`�1;p; Aab;Adc 2M!`�1;p;

Ab a;Ac d 2M `�1;p;Aa d ;Ab c 2M#

`�1;p;Aa c 2M&`�1;p;Ac a 2M-

`�1;p;Ab d 2M.`�1;p;

Ad b 2M%`�1;p.

Then MH;k�I � I ; P2� �M(0;p holds.

When using the grid (17) for di�erence or ®nite element discretisations of dif-ferential equations, we obtain a ®ve-, seven-, or nine-point formula as discreti-sation matrix. The next lemma implies that such a matrix can be exactlyrepresented by an H-matrix (see also the later Lemma 5.7).

Lemma 4.3. If the matrix A has a nine-point or an even sparser pattern; it is in theset MH;k�I � I ; P2� for any k � 1.

Proof: By de®nition, a nine-point matrix has non-zero entries only for index pairs�p; q� 2 I � I , where Xp \ Xq 6� ; holds for the squares introduced in (19). Letb � �t1; t2� 2 P2 � I � I be the block of the partitioning with �p; q� 2 b. The pre-vious characterisation yields dist�t1; t2� � dist�Xp;Xq� � 0. Hence, condition (9)cannot be satis®ed. Since b belongs to an admissible partitioning, it must be leaf,i.e., it is a 1� 1 block. Obviously, a 1� 1 block represents the matrix entry Apq

exactly. (

4.4. Complexity

In the following, we discuss the storage requirementsN(st and the costN(

MV of thematrix-vector multiplication. The complexity discussion for the format ( ®rst re-quires the study of the expenses for the other formats. Since matricesfrom M%

`;p;M&`;p;M

-`;p;M

.`;p behave similarly, we denote their format by the

collective symbol ``�'' (diagonal neighbourhood), while ``+'' refers to M!`;p;

M `;p;M

#`;p;M

"`;p.

36 W. Hackbusch and B. N. Khoromskij

Page 17: A Sparse H-Matrix Arithmetic

Note that the maximal level number p does not exceed O�j log hj�. The ranknumber k is chosen to be k � 1, in order to present concrete constants in theleading terms.

4.4.1. Storage

Below, the numberN�st�p� describes the storage requirements of an matrix

A 2M�0;p of the format � 2 f(;�;�g.

Lemma 4.4. Let k � 1 and n � # I � 4p. The storage size of matrices of thedi�erent formats amounts to

N(st �p� � �1� 54p�n� O�p�;

N�st �p� � 22n� O�1�;

N�st �p� � 10n� O�1�:

Proof: Note that NR1st �p� � 2n � 2 � 4p. Due to De®nition 3.12, we obtain the

recurrence formulae

N�st �p� �N�

st �p ÿ 1� � 15NR1st �p ÿ 1�;

N�st �p� � 2N�

st �p ÿ 1� � 2N�st �p ÿ 1� � 12NR1

st �p ÿ 1�;N(

st �p� � 4N(st �p ÿ 1� � 8N�

st �p ÿ 1� � 4N�st �p ÿ 1�;

�28�

with starting value Nst�0� � 1 for all formats. The ®rst equation in (28) impliesN�

st �p� � 10nÿ 9. Inserting this result into the second recurrence yields N�st �p� �

22n� O�1�. Therefore, the last recurrence becomes N(st �p� � 4N(

st �p ÿ 1��54n� O�1�. Its solution is N(

st �p� � �1� 54p�n� O�p�. (

4.4.2. Matrix-Vector Multiplication

Lemma 4.5. The cost for the matrix-vector multiplication is

N(MV �p� � �1� 82p�n� O�p�;

N�MV �p� � 30n� O�1�;

N�MV �p� � 19n� O�1�:

�29�

Proof: We recall NR1MV �p� � 3n. Consider type `�'. The multiplication of the 16

blocks at level p ÿ 1 with the (partial) vector costs N�MV �p ÿ 1� � 15NR1

MV �p ÿ 1�.The summation of the results costs 3n additions. This leads to N�

MV �p� �N�

MV �p ÿ 1� � 15NR1MV �p ÿ 1� � 3n �N�

MV �p ÿ 1� � 574 np and N�

MV �0� � 1. Itssolution is N�

MV �p� � 19nÿ 18.

Similarly, N�MV �p� � 2N�

MV �p ÿ 1� � 2N�MV �p ÿ 1� � 12NR1

MV �p ÿ 1� � 3n yieldsN�

MV �p� � 30n� O�1�. Finally, N(MV �p� � 4N(

MV �p ÿ 1� � 8N�MV �p ÿ 1��

A Sparse H-Matrix Arithmetic. Part II 37

Page 18: A Sparse H-Matrix Arithmetic

4N�MV �p ÿ 1� � 3n � 4N(

MV �p ÿ 1� � 82n� O�1� implies the result of theLemma. (

The estimate (29) is similar to the bound NMV �p� � 11pn� O�n� obtained in[6] for the 1D index set I with g � 1

2. Clearly, the corresponding constant in(29) depends on the spatial dimension (compare also Theorem 6.2 for the 3Dcase).

4.4.3. Matrix Addition, Multiplication and Inversion

As in [6], one can introduce the approximate addition �(, multiplication �(, andinversion of matrices from M(

p;p retaining the corresponding hierarchical matrixstructure. The formatted operations �( and �( are de®ned similarly to the caseof 1D-H-matrices considered in [6]. In fact, the complexity analysis of �( israther simple and yields N(�(�p� � O�pn�.The proof of N(�(�p� � O�p2n� is more lengthy, since various combinations offactors occur.

The inversion is based on blockwise Gauû-elimination involving the addition andmultiplication addressed above. In the 1D-case, it is fully described in [6, Section3.5]. While in the case of [6] the H-matrix was treated as a 2� 2 block matrix,the matrix (22) has now a 4� 4 block pattern. The necessary modi®cation isobvious and does not change the complexity order NInversion�p� � O�p2n�obtained in [6].

5. Construction for General 2D-Meshes

We consider an (unstructured) quasi-uniform triangulation T�h of X � R2 char-acterised by the maximal mesh size �h :� maxf�ds : s 2T�hg, where �ds is thediameter of the Chebyshev sphere of the triangle s (cf. Footnote 7). Assuming alsoshape regularity, there are generic constants c1; c2 > 0 such that

c1�ds � �h � c2ds for all s 2T�h; �30�where ds denotes the diameter of the inscribed circle for an (closed) element s ofT�h. In fact, we are not restricted to triangles s. Any elements satisfying (30) areallowed (isoparametric triangles, quadrangles, etc.).

For simplicity, we consider piecewise constant functions on s 2T�h. Then eachindex a 2 I corresponds to a basis function with support Xa � sa 2T�h. TheChebyshev centre of s is denoted by ns (or na if s � sa).

In order to constructH-matrix structures, we have to de®ne a suitable cluster treeT �I� (cf. Subsection 4.1). Proposals can be found in [12]. Here, we give a con-struction based on the uniform tensor-product grid discussed in the previoussection. Since the regular grid is needed only for reference, we call it the ®ctitiousgrid. We do not claim that the presented construction of T �I� is optimal, but itleads to a straightforward proof of the complexity bounds.

38 W. Hackbusch and B. N. Khoromskij

Page 19: A Sparse H-Matrix Arithmetic

5.1. How to Map the Fictitious Hierarchy onto the Unstructured Grid

Without loss of generality we may assume X � Xf :� �0; 1� � �0; 1� and

l�X� � cl�Xf � � c > 0; �31�where l denotes the two-dimensional measure. In Xf we consider the uniformtensor-product grid Th from Section 4. Its index set is denoted by If :�f�i; j� : 1 � i; j � Ng;N � 2p (the superscript `f ' stands for `®ctitious'), while I isthe index set of the unknowns of the unstructured grid.

The grid size of Th is assumed to be the largest h � 2ÿp satisfying

h <1

2���2p min

a;b;2I ;sa\sb 6�;�da � db� �32�

with da :� dsafrom (30). For each index a 2 I , the Chebyshev centre na belongs to

at least one of the squares Xij of Th��i; j� 2 If ; cf. (19)). Selecting one of thepossible indices in the multiple case, we are able to de®ne a mappingF : I ! If �a 7!F �a� � �i; j� via na 2 XF �a�. The following remark allows us to de-®ne F ÿ1 on F �I� � If .

Remark 5.1. Under condition (32), the mapping F is injective.

Proof: Let a 6� b. Then jna ÿ nbj � 12 �da � db� >

���2p

h � diamXij contradictsF �a� � F �b� � �i; j� 2 If . (

For any subset tf � If (not only for tf � F �I��, we de®ne

F ÿ1�tf � :� fa 2 I : F �a� 2 tf g � I :

Since If is the regular grid from Section 4, the cluster tree T �If � is already de-scribed. F gives rise to the cluster tree for the index set I :

T �I� � fF ÿ1�tf � : tf 2 T �If �g:

The arising tree T �I� meets the conditions of De®nition 2.1, but is unusual sincesome of the vertices t 2 T �I� may represent the empty set �F ÿ1�tf � � ;if tf \ F �I� � ;�. Moreover, if only one of the sons s 2 S�t� is non-empty, this sons represents the same subset as the father t. Although, in practice, this tree T �I�could be simpli®ed, we use the tree in the given form since then T �I� and T �If � areisomorphic.

As seen in Section 3.1, the cluster tree T �I� determines the block-cluster treeT2 � T �I � I�, which de®nes the H-matrix structure. The elements of T2 are pairs�t1; t2� with t1; t2 2 T �I�. In the case of Construction 3.1, T2 � F ÿ1�T f

2 � holds,where T f

2 � T �If � If � and F ÿ1��t1; t2�� :� �F ÿ1�t1�; F ÿ1�t2�� for t1; t2 � If . Oth-erwise, we use T2 :� F ÿ1�T f

2 � as de®nition for T2.

A Sparse H-Matrix Arithmetic. Part II 39

Page 20: A Sparse H-Matrix Arithmetic

Let P f2 2 T �If � If � be any admissible H-partitioning for the ®ctitious grid sat-

isfying (9) with the constant gf < 1. Below we will characterise the toleranceconstant g � gf needed for the de®nition of admissible clusters from the inducedpartitioning P2 2 T �I � I�.

Lemma 5.2. All t1; t2; t 2 T1 � T �I� satisfydiam t � diam F �t� � �h; �33�

dist�t1; t2� � dist�F �t1�; F �t2�� ÿ �h: �34�

Proof: Let x 2 sx 2T�h and y 2 sy 2T�h for triangles sx; sy � t with the Cheby-shev centres nx 2 sx; ny 2 sy . Then jnx ÿ ny j � diam F �t�, while jnx ÿ xj �diam�sx�=2 � �h=2 and jny ÿ yj � diam�sy�=2 � �h=2. Hence, jxÿ yj � diam F �t���h yields (33). Similarly, (34) is proved. (

Since the image F �t� � fF �a� : a 2 tg of a cluster t 2 T �I� is in general di�erent fromthe clusters in T �If �, we introduce mappings F` for all levels 0 � ` � p. The sets

T `�If � :� ftf 2 T �If � : tf is a cluster of level `g �0 � ` � p�

can also be de®ned by T 0�If � :� fIf g; T `�1�If � :� Stf2T `�If � S�tf � for 0 � ` < pand yield a level-wise decomposition of the tree T �If � � S0�`�p T `�If �. For t � I ,we de®ne level�t� :� minf0 � ` � p : F �t� � tf for some tf 2 T `�If �g.The mapping F` is de®ned on all subsets t � I with level�t� � ` and its value isF`�t� � tf if tf 2 T `�If � satis®es F �t� � tf . Hence, F`�t� denotes the `rounding up'of F �t� to an If -cluster of level `. Note that F ÿ1�F`�t�� � t.

We recall that �F`�t1�; F`�t2�� 2 P f2 is either a leaf of T f

2 � T �If � If � or satis®es theadmissibility condition (9): min�diam F`�t1�; diam F`�t2�� � 2gf dist�F`�t1�; F`�t2��for the corresponding level `. Because of Remark 4.1, the clusters ti �i � 1; 2�belong to the same level (say `). Assuming the latter inequality, we are interested inthe question whether �t1; t2� also satis®es condition (9) for a suitable parameter g.

Lemma 5.3. Assume t1; t2 2 T2; level�t1� � level�t2� � ` and���2p

=2` � �1� 4gf ��h. Then the admissibility condition

min�diam F`�t1�; diam F`�t2�� � 2gf dist�F`�t1�; F`�t2��

implies min�diam�t1�; diam�t2�� � 2g dist�t1; t2� for g :� 2gf .

Proof: Set A :� diam F`�t1� � diam F`�t2� ����2p

=2` (cf. (20)) and B :� dist�F`�t1�;F`�t2��. The inequalities (33) and (34) together with diam F �ti� � Aand dist�F �t1�; F �t2�� � B show

min�diam�t1�;diam�t2��dist�t1; t2� � A� �h

Bÿ �h:

40 W. Hackbusch and B. N. Khoromskij

Page 21: A Sparse H-Matrix Arithmetic

Note that A=B � 2gf . The assumption A � �1� 4gf ��h allows us to bound

A� �hBÿ �h

ÿ AB� �h

A� BB�Bÿ �h� �

A=B� 1

B=�hÿ 1� 1� 2gf

A=�2gf�h� ÿ 1

by 2gf . Hence, min�diam�t1�; diam�t2�� � 4gf dist�t1; t2� � 2g dist�t1; t2� for thechoice g � 2gf . (

Corollary 5.4. The modi®ed assumption diam F �t1� � diam F �t2� � �1e��2� 2e gf ���h for some e > 0 leads to min�diam�t1�; diam�t2�� � 2g dist�t1; t2� with

g :� �1� e�gf . Hence; any nf < 1 allows a choice g < 1.

The condition���2p

=2` � �1� 4gf ��h from Lemma 5.3 is not satis®ed in general,e.g., for ` � p we have

���2p

=2p � ���2p

h < �h (cf. (32)). However, there is a constantdp such that all clusters t of level ` � p ÿ dp ful®l

���2p

=2` � �1� 4gf ��h as stated inthe next lemma, where we may insert c :� 5 � 1� 4gf .

Lemma 5.5. Given a constant c, there is a constant dp 2 N independent of p so that���2p

=2` � c�h for all clusters t 2 T of level ` � p ÿ dp.

Proof: Let t be of level ` � p ÿ dp. The de®nition of h � 2ÿp by (32) together with(30) yields 21ÿp � 2h � 1

2��2p min�da � db� � 1

c2��2p �h. Hence,

���2p

=2pÿdp � �2dpÿ1=c2��h.Choose dp such that 2dpÿ1=c2 � c. (

We have to describe an admissible T2-partitioning P2, where the parameter g from(9) is de®ned by g � 2gf (see Lemma 5.3). The ®rst trial is to use P ��2 :� F ÿ1�P f

2 �,where P f

2 is the admissible T 02-partitioning corresponding to gf . Due to the pre-ceding lemmata, this leads to admissible blocks �t1; t2�, provided they belong to alevel ` � p ÿ dp. It remains to modify P ��2 at the levels ` with p ÿ dp < ` � p. ByRemark 2.2, there is a subtree T ��2 of T �I � I� with P ��2 �L�T ��2 �. Construct thesmaller tree T �2 in the following way:

1) Delete all vertices belonging to levels ` > p ÿ dp and

2) insert the sons �i; j�; i 2 t1; j 2 t2 for all non-admissible blocks �t1; t2� 2 T ��2 atlevel ` � p ÿ dp.

Then the ®nal T2-partitioning P �2 is P �2 �L�T �2 �. The matrix interpretation is thatall non-admissible blocks of level p ÿ dp are full submatrices.

Finally, M� :�MH;k�I � I ; P �2 � de®nes the H-matrix set corresponding to theunstructured mesh (cf. (11)).

Lemma 5.6. There holds nf � const n, where nf :� #If and n � #I . Moreover,

N�st�n� � O�n log n� and N�

MV �n� � O�n log n�

are the respective costs of the storage and the matrix-vector multiplication formatrices from M�.

A Sparse H-Matrix Arithmetic. Part II 41

Page 22: A Sparse H-Matrix Arithmetic

Proof: First we consider the auxiliary partitioning P ��2 � F ÿ1�P f2 � �L�T ��2 � from

above. Since T ��2 is isomorphic to a subtree of T f2 :� T �If � If �, the expenses

N��st ;N

��MV corresponding to the format M�� :�MH;k�I � I ; P ��2 � are less or equal

to the bounds O�nf log nf � in Lemmata 4.4±4.5. Since dp is a constant, the costsN�

st;N�MV are also bounded by O�nf log nf � (note that the same recurrence for-

mulae hold, but the starting value may be increased).

It remains to replace the ®ctitious dimension nf in the latter bound by the truedimension n. By (31) we have nf � hÿ2 � l�Xf � � hÿ2 � �l�X�=c�hÿ2. The proof ofLemma 5.5 has shown 2h � �h=�c2

���2p �, so that nf � l�X��8c22=c��hÿ2. The left in-

equality in (30) yields l�X� �Ps2T�hl�s� �Ps2T�h

�p=4��d2s � n�p=4c21��h2. The last

two estimates prove nf � const � n with const � �2p=c��c2=c1�2. (

Lemma 4.3 and its proof generalise to all FE sti�ness matrices.

Lemma 5.7. A ®nite element sti�ness matrix belongs to the set M� �MH;k�I ; P �2 �for any k � 1.

5.2. Two-Dimensional Manifolds

The above de®ned matrix formats M(0;p and M� enable data-sparse H-approxi-

mations for a wide class of ®nite element sti�ness matrices corresponding toboundary value problems in X � R2. Applications in boundary element methods(BEM) are based on manifolds (surfaces) instead of ¯at domains.

In a ®rst step we study the surface of a polyhedron (x5.2.1). Curvilinear surfacesare considered in x5.2.2.

5.2.1. H-Formats for Polyhedrons

Consider a polyhedron C � R3 composed of M plane faces Ci �1 � i � M�. Oneach Ci a quasi-uniform mesh is given which meets the conditions required in x5.We assume that all pairs of adjacent faces form an angle x 2 �x0; 2pÿ x0�, where0 < x0 � p.

To begin with, we assume that for a given admissibility parameter g < 1 theinequality

min�diamCi; diamCj� � 2g dist�Ci;Cj� �35�

holds for all disjoint and non-adjacent faces Ci;Cj. Note that the distance ismeasured by the Euclidean distance in R3.

In this situation we construct the cluster tree as follows. Let T i be the H-tree forthe face Ci, i.e., the root of T i is the index set I i corresponding to the unknowns10

associated with Ci �1 � i � M�. The global set of indices is I � [Mi�1I i. The cluster

10 According to the example of piecewise constant functions, we assume the index sets I i to be disjoint.If I i \ Ij 6� ; �i 6� j� due to unknowns belonging to the edges, obvious modi®cations are required.

42 W. Hackbusch and B. N. Khoromskij

Page 23: A Sparse H-Matrix Arithmetic

tree T1 � T �I� is de®ned as the union of the disjoint trees T i together with the newroot I possessing the M sons I i �1 � i � M�. The block-cluster tree is again de-noted by T2.

In the following, we propose a matrix format corresponding to the index set I .Given a block b � �t1; t2� 2 T2, three di�erent cases can occur:

(i) t1; t2 belong to the same face, i.e., t1; t2 � I i for some i � M .

(ii) t1; t2 belong to adjacent faces.

(iii) t1; t2 belong to disjoint and non-adjacent faces.

Case (i) corresponds to the plane case of x5.In Case (ii), two faces C1;C2 with a common edge e are involved. Turning C2 intothe plane of C1, we obtain C1 and ~C2 contained in X :� C1 [ ~C2 � R2. Choose thematrix structure as in x5 with admissibility parameter g sin x

2, where x is the anglebetween C1 and C2. Let t1 � C1; t2 � C2 and denote the corresponding cluster inthe rotated copy ~C2 by ~t2. One checks that min�diam t1; diam ~t2� �2g sin x

2 dist�t1;~t2� implies min�diam t1; diam t2� � 2g dist�t1; t2�. Therefore, thechosen partitioning is admissible.

Case (iii) is trivial, since assumption (35) ensures min�diam t1; diam t2� �2g dist�t1; t2�.Altogether, we have obtained an admissible partitioning P2 comparable to theplane case of x5 with g (partially) replaced by the smaller parameter g sin x

2.

It remains to discuss the case, where the chosen constant g does not satisfy (35). Inthis case divide each face into several smaller ones C0i �1 � i � M 0� with M 0 > M .If the subdivision is ®ne enough, we have

min�diamC0i; diamC0j� � 2g dist�C0i;C0j� for C0i � Ci; C0j � Cj;

where Ci;Cj are disjoint and non-adjacent (unre®ned) faces. Since g is a ®xedconstant, also M 0 is ®xed. The same arguments as above can be used to constructan admissible partitioning with similar structure as in the plain case. As inLemmata 4.4 and 4.5 we derive

Corollary 5.8. Under the assumption from above, the costs for storage, matrix-vector multiplication, and the further operations have the same complexity as in theplane case of x5.

The assumption x 2 �x0; 2pÿ x0� may lead to the impression that small anglescause di�culties. This is not the case. An important example is a slender aerofoil.Here it is well-known that the cluster tree must be constructed di�erently: Theclusters should contain the neighbouring parts from the upper and lower side ofthe wing.

Finally, we mention a special case, where the complexity is even better thanmentioned before.

A Sparse H-Matrix Arithmetic. Part II 43

Page 24: A Sparse H-Matrix Arithmetic

Remark 5.9. Consider the double-layer potential for the second order PDEs withconstant coe�cients in the case of piecewise ¯at surfaces. De®ne the structure ofthe approximating H-matrix as before. Then Nst and NMV are of the order O�n�instead of O�n log n�.

The reason is the fact that the kernel function satis®es @k�x; y�=@n�x� � 0 forx; y 2 Ci on any plane face Ci of the surface C.

5.2.2. Curved Manifold

A general manifold is described by an atlas of mappings. The usual practice is tostart from a (reference) polyhedron Cref and to de®ne a bi-Lipschitz mappingu : Cref ! C (cf. [3]) with Lipschitz constants c1; c2:

c1jxÿ yj � ju�x� ÿ u�y�j � c2jxÿ yj for all x; y 2 Cref :

Let T ref be the cluster tree from x5.2.1 for the reference boundary Cref . Thecorresponding tree for C is then de®ned by T � u�T ref �. Choose an admissibleT ref -partitioning P ref

2 with admissibility parameter gref . Then the resulting T -partitioning P2 � u�P ref

2 � is admissible with parameter g � c2c1

gref , as

min�diamu�t1�; diamu�t2�� � c2 min�diam�t1�; diam�t2�� � 2c2gref dist�t1; t2�� 2c2

c1gref dist�u�t1�;u�t2��:

Since the matrix-formats corresponding to P ref2 and P2 are identical, we obtain the

same complexity bounds of the computational cost as in x5.2.1.

6. The Three-Dimensional Case

In this section, we introduce the formats for matrices operating in the vector spaceassociated with an index set I for the cell-centred tensor product gridI3h � Ih � Ih � Ih in X � �0; 1�3 with the mesh size h � 2ÿp and #I � 8p. Similar tothe 2D-case, the cluster tree T � T1 is de®ned by the regular re®nement (subdi-vision into eight equal parts) of the initial index set I . T1�I� gives rise to the block-cluster tree T �I � I�, in which we determine the admissible partitioning accordingto the admissibility condition (9). In De®nition 6.1 below, we choose the constantg � ���

3p

=2 which corresponds to the 3D counterpart of Fig. 1a.

The natural notation of indices from I3h uses triples �i; j; k� 2 N3 with1 � i; j; k � 2p. As in the 2D case, we can describe the partitioning by a number of

formats Ma;b;c`;p , where �a; b; c� with a; b; c 2 fÿ1; 0; 1g indicates the shift in the

following sense. Let b � �t; t0� be a block, where t; t0 � I are clusters. If t � t0, wehave a diagonal block and the shift is given by �a; b; c� � �0; 0; 0�. For these blockswe introduce the `top format' M0;0;0

`;p . If t � �i0; j0; k0� � f�i; j; k� : 1 �i; j; k � 2pÿ`g and t0 � �i0 � 2pÿ`; j0; k0�� f�i; j; k� : 1 � i; j; k � 2pÿ`g are two

44 W. Hackbusch and B. N. Khoromskij

Page 25: A Sparse H-Matrix Arithmetic

clusters (cubes of length 2pÿ` in Z3�, their relation is given by the shift (1,0,0)indicating the direct neighbourhood in x-direction. Then, for b � �t; t0� we use theformat M1;0;0

`;p . Similarly, the other formats Mÿ1;0;0`;p ;M0;�1;0`;p ;M0;0;�1

`;p (``next neigh-

bours''), M1;1;0`;p ; . . . (``2D-diagonal neighbours'') and M�1;�1;�1

`;p (``3D-diagonalneighbours'') are involved. In De®nition 6.1 these formats contain the sameformat at the next level (``self-reference'') and other formats as depicted in thegraph corresponding to Fig. 3:

(see also Fig. 4b). Let r � fa; b; c; d; e; f ; g; hg be the set of the eight sons of acluster situated as shown in Fig. 4a. For example the block-matrix with columnsfrom a and rows from b is denoted by Aab.

In the following, we de®ne the matrix formats Ma;b;c`;p recursively with respect to

the degree jaj � jbj � jcj. The notations M`;p�t1; t2�;M`;p, and MR`;p have the same

meaning as in Subsection 4.3.

De®nition 6.1. a� For ` � p;Ma;b;cp;p is the set of 1� 1-matrices for all

a; b; c 2 fÿ1; 0; 1g. For ` < p, the formats are described in b±e).

b� Let r � fa; b; c; d; e; f ; g; hg be the 8 cubes as indicated in Fig. 4a andr0 � fa0; b0; c0; d 0; e0; f 0; g0; h0g the similar set of clusters shifted in the (1,1,1)-direc-

Figure 4. a Indexing for the 3D clusters, where a=(0,0,0) b = (1,0,0), c � �0;ÿ1; 0�, d � �0; 0;ÿ1�,e � �ÿ1;ÿ1; 0�, f � �1; 0;ÿ1�, g � �0;ÿ1;ÿ1�, h � �1;ÿ1;ÿ1�. b Graph of the 3D formats

Mm �0 � m � 3� corresponding to jaj � jbj � jcj � m

top format (0,0,0) self-reference = 8# &

next neighbours: (1,0,0) � � � self-reference = 4# &

2D-diagonal neighbours: (1,1,0) � � � self-reference = 2# &

3D-diagonal neighbours: (1,1,1) � � � self-reference = 1#

leaves Rk self-reference = 0

A Sparse H-Matrix Arithmetic. Part II 45

Page 26: A Sparse H-Matrix Arithmetic

tion so that b and g0 have one corner point in common. A matrix A 2M`;p�r; r0�belongs to M1;1;1

`;p , if its block matrices in A � fAijgi2r;j2r0 satisfy Abg0 2M1;1;1`�1;p and

Aij 2MR`�1;p, otherwise. Similarly, one de®nes Ma;b;c

`;p for other combinations subject

to jaj � jbj � jcj � 1.

c� Let r0 � fa0; b0; c0; d 0; e0; f 0; g0; h0g result from a shift of r � fa; b; c; d; e; f ; g; hg inthe direction (1,1,0) so that the pairs (b; c0), (f ; g0) of cubes have a common edge.

Then A � fAijgi2r;j2r0 2M1;1;0`;p holds if the submatrices have the formats Ab;c0 ,

Af ;g0 2M1;1;0`�1;p, Af ;c0 2M1;1;1

`�1;p, Ab;g0 2M1;1;ÿ1`�1;p , and Aij 2MR

`�1;p, otherwise. Simi-

larly; one de®nes Ma;b;c`;p for other combinations with jaj � jbj � jcj � 2.

d� Let r0 � fa0; b0; c0; d 0; e0; f 0; g0; h0g be resulting from a shift of r � fa; b; c; d; e; f ;g; hg in the direction (1,0,0) so that, e.g., b and a0 have a common face. ThenA � fAijgi2r;j2r0 2M1;0;0

`;p holds if Ab;a0 ;Af ;d 0 ;Ae;c0 ;Ah;g0 2 M1;0;0`�1;p;Ae;a0 ;Ah;d 0 2

M1;1;0`�1;p;Ab;c0 ;Af ;g0 2M1;ÿ1;0

`�1;p ;Ab;d 0 ;Ae;g0 2M1;0;ÿ1`�1;p ;Af ;a0 ; Ah;c0 2M1;0;1

`�1;p;Ah;a0 2M1;1;1

`�1;p;Af ;c0 2M1;ÿ1;1`�1;p ;Ab;g0 2M1;ÿ1;ÿ1

`�1;p ;Ae;d 0 2M1;1;ÿ1`�1;p ; and Aij 2MR

`�1;p; other-

wise. Similarly; for other combinations with jaj � jbj � jcj � 1.

e� Finally; let r0 � r. Then A � fAijgi;j;2r 2M0;0;0l;p holds if Aii 2M0;0;0

`�1;p;Aab;Ace;Adf ;Agh 2M1;0;0

`�1;p;Aca;Aeb;Agd ;Ahf 2M0;1;0`�1;p;Ada; Ahe;Afb;Agc 2M0;0;1

l�l;p;

Acb;Agf 2M1;1;0`�1;p;Aae;Adh 2M1;ÿ1;0

`�1;p ;Aga;Ahb 2M0;1;1`�1;p;Acd ;Aef 2M0;1;ÿ1

`�1;p ;Aaf ;

Ach 2M1;0;ÿ1`�1;p ;Adb;Age 2M1;0;1

`�1;p;Agb 2M1;1;1`�1;p;Acf 2M1;1;ÿ1

`�1;p ;Aah 2M1;ÿ1;ÿ1`�1;p ;Ade 2

M1;ÿ1;1`�1;p ; and Aji 2Mÿa;ÿb;ÿc

`�1;p if Aij 2Ma;b;c`�1;p.

The calculation of the storage and matrix-vector multiplication complexity for thedescribed formats is a result of four staggered recurrence formulae. Below, we givethe results (with exact constants) for the corresponding matrix-vector multipli-cation11. Here, we use the notation Mm

p :� [fMa;b;cqÿp;q : q � l; jaj � jbj � jcj � mg.

Note that all A 2Mmp are 8p � 8p matrices.

Theorem 6.2. Let n � 8p and let k � 1 be the rank of the Rk-blocks. Then for Mmp -

matrices the matrix-vector multiplication costs MmMV �p� equal

N3MV �p� � 35 � n� O�1�; N2

MV �p� � 51 � n� O�1�;N1

MV �p� � 187 � n� O�1�; N0MV �p� � 756 � pn� O�p�:

Proof: The desired estimate for N0MV �p� follows from the recurrence

N0MV �p� � 8N0

MV �p ÿ 1� � 24N1MV �p ÿ 1� � 24N2

MV �p ÿ 1� � 8N3MV �p ÿ 1� � 7n

taking into account N0MV �0� � 1 and substituting the results for the auxiliary

formats. (

In the case of a general ®nite element mesh in a 3D-domain X � R3, we canextend the considerations of x5 to three dimensions as well. We conclude that theH-matrix format is also applicable to general 3D ®nite element problems as wellas for volume integral formulations.

11More details are given in [7].

46 W. Hackbusch and B. N. Khoromskij

Page 27: A Sparse H-Matrix Arithmetic

References

[1] Dahmen, W., ProÈ ssdorf, S., Schneider, R.: Wavelet approximation methods for pseudodi�er-ential equations II: Matrix compression and fast solution. Adv. Comput. Math. 1, 259±335(1993).

[2] Dahmen, W., ProÈ ssdorf, S., Schneider, R.: Wavelet approximation methods for pseudodi�er-ential equations I: Stability and convergence. Math. Z. 215, 583±620 (1994).

[3] Graham, I. G., Hackbusch, W., Sauter, S. A.: Discrete boundary element methods on generalmeshes in 3D. Bath Mathematics Preprint 97/19, University of Bath, November 1997. To appearin Numer. Math.

[4] Hackbusch, W.: The panel clustering algorithm. MAFELAP 1990 (Whiteman, J. R., ed.),pp. 339±348. London: Academic Press, 1990.

[5] Hackbusch, W.: Integral equations. Theory and numerical treatment. ISNM 128. Basel:BirkhaÈ user 1995.

[6] Hackbusch, W.: A sparse matrix arithmetic based on H-matrices. Part I: Introduction toH-matrices. Computing 62, 89±108 (1999).

[7] Hackbusch, W., Khoromskij, B. N.: A sparse H-matrix arithmetic: general complexity estimates.To appear in: Numerical analysis in the 20th century. Vol. 6: Ordinary di�erential and integralequations (Psyce, Van der Bargh, Baker, Monegato, eds.). Elsevier, 2000. ± Leipzig, Max-Planck-Institut Mathematik, Preprint No 66 1999.

[8] Hackbusch, W., Khoromskij, B. N.: H-matrix approximation on graded meshes. To appear in:MAFELAP 1999 (Whiteman, J. R., ed.). Leipzig: Preprint No 54, Max-Planck-InstitutMathematik, 1999 (http://www.mis.mpg.de/cgi-bin/preprints.pl).

[9] Hackbusch, W., Khoromskij, B. N., Sauter, S. A.: On H2-matrices. In: Lectures on AppliedMathematics (Bungartz, Hoppe, Zenger, eds.). Berlin Heidelberg New York Tokyo: Springer,1999 ± Leipzig: Preprint No 50, Max-Planck-Institut Mathematik, 1999 (http://www.mis.mpg.de/cgi-bin/preprints.pl).

[10] Hackbusch, W., Nowak, Z. P.: On the fast matrix multiplication in the boundary element methodby panel clustering. Numer. Math. 54, 463±491 (1989).

[11] Hackbusch, W., Sauter, S. A.: On the e�cient use of the Galerkin method to solve Fredholmintegral equations. Appl. Math. 38, 301±322 (1993).

[12] Lage, C.: Softwareentwicklung zur Randelementmethode: Analyse und Entwurf e�zienterTechniken. Doctoral thesis, UniversitaÈ t Kiel, 1996.

[13] Sauter, S. A.: UÈ ber die e�ziente Verwendung des Galerkin-Verfahrens zur LoÈ sung Fredholm-scher Integralgleichungen. Doctoral thesis, UniversitaÈ t Kiel, 1992.

[14] Tyrtyshnikov, E. E.: Mosaic-skeleton approximations. Calcolo 33, 47±57 (1996).[15] von Petersdor�, T., Schwab, C.: Wavelet approximations for ®rst kind boundary integral

equations. Numer. Math. 74, 479±516 (1996).

W. Hackbusch B. N. KhoromskijMax-Planck-Institut Max-Planck-InstitutMathematik in den Naturwissenschaften Mathematik in den NaturwissenschaftenInselstr. 22-26 Inselstr. 22-26D-04103 Leipzig D-04103 LeipzigGermany Germanyemail: [email protected] email: [email protected]

A Sparse H-Matrix Arithmetic. Part II 47