21
i 0 V'W% t SI1 A PRECONDITIONED CONJUGATE GRADIENT METHOD FOR SOLVING A CLASS OF NON-SYMMETRIC LINEAR SYSTEMS by J. J. Dongarra, G. K. Leaf, and M. Minkoff ~A' A , ARGONNE NATIONAL LAORATORY, ARONWE, SAJN@OS .,_ :tpgrod for the U. s. EPAr r IN# U"Y :'W i c ot..-s--s m1'}.. ./Ie "iE 'r t d ::' ' .. r Lip .. ,. li 1. f ' v f ,,. . ."r .' .. A~41i { " . I:. :i. V.

r Lip i A~41i V'W% iE ::' 'r d t t SI1/67531/metadc283518/... · Summary - This report describes a conjugate gradient preconditioning scheme for solving a certain system of equations

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • i0

    V'W%t SI1

    A PRECONDITIONED CONJUGATE GRADIENT METHODFOR SOLVING A CLASS OF

    NON-SYMMETRIC LINEAR SYSTEMS

    by

    J. J. Dongarra, G. K. Leaf,

    and M. Minkoff

    ~A'A

    , ARGONNE NATIONAL LAORATORY, ARONWE, SAJN@OS .,_

    :tpgrod for the U. s. EPAr r IN# U"Y :'Wi c ot..-s--s m1'}.. ./Ie

    "iE 'r t d::'

    ' .. r Lip

    .. ,. li 1.

    f ' v f ,,.

    . ."r .' ..

    A~41i

    {

    " . I:.

    :i. V.

  • Distribution Category:Mathematics and Ccmrputrs

    (UC-32)

    LISCLAIM ER

    AN"r81-71

    ARGONNE NATIONAL LABORATORY9700 South Cuss AvenueArgonne, Illinois 60439

    A PRECONDITIONED CONJUGATE GRADIENT MI i i 1 (O1R )3OViN.;A CLASS OF NON-SYMMETRIC LINEAR SYSTIi:MS

    J.J. Dongarra, G.K. Leaf, and M. Minkof!

    Applied Mathematics Division

    October 1981

  • ABSTRACT

    This report describes a conjugate gradient preconditioning scheme for solving a -tain system of equations which arises in the solution of a three dimensional part ialdifferential equation. The problem involves solving systems of equations where thematrices are large, sparse, and non-symmetric.

    iii

  • A Preconditioned Conjugate Gradient Method forSolving a Class of Non-Symmetric Dine!- SysL;'ms

    J.J. Dogarra, GK Leaf, and Al. Minkoff

    Argonne National Laboratory

    Summary - This report describes a conjugate gradient preconditioningscheme for solving a certain system of equations which arises in the solu-tion of a three dimensional partial differential equation. The probleminvolves solving systems of equations where the matrices are large, sparse,and non-symmetric.

    1. SI'ATXMENT OF THE PROBLEMWe are concerned with the solution of large scale linear systems where the

    coefficient matrix arises from a finite difference approximatiori (seven pont) toan elliptic partial differential equation in three dimensions. Such systems canarise in the modeling of three dimensional, transient, two-phase flow systems[1,2]. The elliptic equation governs the pressure (or pressure diff rence) and hasto be solved numerically at each time step in the course of the transientanalysis. Because the physical problem involves convection, the elliptic equationcontains first order partial derivatives which implies that the usual seven pointfinite difference approximation leads to a non-symmetric coefficient matrix.Since these matrices arise from a finite difference approximation in threedimensions, the matrices will tend to be large. For example, a finite differencemesh of 15x15x30 mesh cells leads to a linear system of order 6750. with about45,000 non-zero coefficients in the matrix for a density of about .001. Thus theclass of matrices will be large, sparse, and non-symmetric. In this report weintend to review several possible approaches for solving systems of this type,describe an implementation based on the use of incomplete LU factorizationcoupled with a conjugate gradient method, and finally we shall describe somenumerical results for this algorithm.

    a MATRUX GENERATIONIn general, we are interested in solving linear systems Az = b where A is

    large, sparse and non-symmetric. A simple finite difference approximation to aPoisson equation with convection (or transport) terms can be used as the modelproblem for generating such matrices. The matrices used in this report will begenerated in the following manner:

    Consider the rectangular domain

    D = [0. L]x[0,L1 ]x[0,L:]* Work supported in part by the Applied Mathematical Sciences Research Program (KC-04-02)of the Office of Energy Research of the U.S. Department of Energy under Contract W-31-109-Eng-3&.

  • -2-

    in three dimensions and the elliptic equation

    -02 - 22 2 + c(z,y,z) + V(x,y,z).VV = F(z,yz) (2.1)

    subject to the boundary conditions -!-= 0 on the foul vertical faces, (Badenotes the normal derivative), and either,

    p = GL(z,y ,0) or -- = 0 (2.2)

    on the bottom face, and either,

    P = G'(z,y,L,) or 8- = 0 (2.3)11n

    on the top face. If a Neumann condition is imposed on all faces, then we shallspecify the value of Sp at some point in order to ensure uniqueness. We shallassume that 0, V,F are given bounded functions with 0 > 0 everywhere in D.

    For any three integers N,,NTN, we generate a simple mesh centereddifference approximations of the seven point type using centered differences onthe convective terms. Let Az = L,/N,, Ay=IL/N- , Az = L,/ N., z = iAz,0si

  • -3-

    primarily interested in generating matrices. We order the unknowns ;ijk using a(kij) ordering with the linear index

    m = k + (i-1)N, + (j-1)NN , 1sk

  • -4-

    form M = (A + AT ) and N = X(A - AT), such that M = MT, N = -NT andMx = Nx +f . The form of the matrix M, it is hoped, is such that the systemM-7 = b is easy to solve. If this is not the case then one must resort to a schemein which an inner and an outer iteration is used. The inner iteration is used tosolve Mqr = b and the outer iteration to find solution to the original problem.

    In a paper by Manteuffel[9], a method is described based on Tchebychevpolynr 2Linals in the complex plane. For the solution of the linear system Ax = fthe following iterative method can be used:

    xil =-a Axi + (1 + 3 )xi - Fxixi_ + a.f

    Manteuffel shows that this method converges to the solution of Ax = f if thespectrum of A is enclosed in an ellipse, with focii d -c , d +c , in the right halfplane and if a1 and fi are chosen to be

    2T(d/c)cT{+(d/ c)Ti _1(d/ c )

    T1 +i(d/ c)where Ti is the il TchebychefT polynomial

    T(z) = cosh(i cosh-1(z))

    for z complex. Manteuffel[1O] has provided an algorithm for estimating theparameters d and c adaptively.

    In order to improve the speed of convergence of the algorithm one may usea preconditioning matrix and solve the resulting system. Results involving apreconditioning of Manteuffel's algorithm are given by van der Vorst and vanKats [11]. They considered the use of a fast poisson solver, an incomplete Croutfactorization (with a parameter included to avoid partial pivoting), and incom-plete Cholesky factorization as preconditioners. Also fill-in on stripes adjacent tothe bands in the original matrix is allowed in the Cholesky approach. Results fora discretization of a two spatially dimensioned convective-diffusion equation areprovided. Van der Vost and Van Kats conclude the Manteuffel's algorithm withincomplete Crout preconditioning provides a competitive approach.

    In two papers [12,13] Axelsson has studied iterative methods for the numer-ical solution of boundary value problems. In particular he extends the use ofconjugate gradient methods to nonlinear problems and to constrained boundaryvalue problems. In [13] he studies various incomplete factorizations, i.e. LU fac-torizations which approximate A. If, during factorization, we allow no fill-in oraccept fill-in in certain entries we might expect to obtain a good incomplete fac-torization. However it is possible that the factorization process may be unstablesince the dropping of fill-in may cause pivot elements to become negative(assuming the matrix was originally positive definite). In [14] Meijerink and vander Vorst show that for M-matrices the process is in fact stable. 'his is howevernot the case in general. To overcome this problem Gustafsson [15] presents amodification in which neglected fill-in elements are added to the diagonal. Heproves this method to be stable for diagonally dominant matrices (although thismethod is again unstable for general matrices). Munksgaard and Axelsson [16]extend this approach by adding neglected elements to the diagonal or other ele-ments in the row. A major property of this technique is that row sums arepreserved. This is important since for second order Laplace problems usingconstant mesh spacing h [15] the condition number of A is O(h- 2 ) whereas the

  • -5-

    condition number of the preconditioned matrix is Q(h-1). This result holds ifthe rowsums of A and the preconditioning matrix differ by no more than 0(h)Thus by preserving rowsums we reduce the condition number of the precondi-tioned matrix which, in turn, affects the number of conjugate gradient itera-tions. In addition to the above preselection of fill-in elements, Munksgaard andAxelsson [16] have also considered the use of dynamic fill-in. In this approachfill-in elements are selected during factorization based on their magnitude.

    Another approach to dealing with the instability of incomplete decomposi-tion is presented by Kershaw [5]. Kershaw identifies the unstable pivots asdecomposition proceeds and these pivots are perturbed to create a stableapproximate factorization. Kershaw identifies the instable pivots as the decom-position proceeds. He characterizes these pivots in terms of the number ofsignificant digits available on the machine and determines a correction to pro-vide a "best possible" inverse. Also a bound on the resulting error matrix is pro-vided. His approach is applicable to both complete and incomplete factoriza-tions. Finally, Kershaw shows that for a complete factorization, if p pivots havebeen perturbed by his procedure then with exact arithmetic, only 2p + 1 conju-gate gradient iterations are needed.

    When considering splittings of the matrix A=M-N to be used with conjugategradient algorithms we would like to have a criterion for selecting a "best" split-ting. In [23] Greenbaum considers splittings when A is positive definite and sym-metric. A sharp upper bound on the error is given and a comparision test ispresented for determining when one splitting always yields a smaller error thananother splitting.

    In [17] Greenbaum considers the effect of finite precision arithmetic or theconjugate gradient algorithm. She shows that the effect of finite precisionversus exact arithmetic is to cause the finite precision conjugate gradient algo-rithm to periodically resemble the exact arithmetic algorithm associated withan earlier step and different starting point.

    4. BACKGROUND TO CONJUGATE GRADIENT AND PRECONDITIONINGThe method of conjugate gradients has been known for some time [7]. The

    algorithm in theory has finite termination after n steps when the matrix, say A,of order n is symmetric positive definite. Much of the early interest in themethod was tarnished since the finite termination of the algorithm no longerholds in the presence of roundoff errors and as a direct method the algorithm isnot competitive with Gaussian elimination either with respect to accuracy ornumber of operations. The algorithm has a simple form and can be easilytranslated into a computer program. A conjugate gradient algorithm can bestated as:

    Given a system Az = b of n linear equations where the matrix A is symmetricpositive definite and an initial vector zo, form the corresponding residual

    ro = b - Azo.

    Set po = ro and for i = 0,1,2, -.- find ;+i,rt+i,p+,,at, and b using the equations

    zt+1 = Ze + ap

    (4.1)rt~l = ri - at/ft

  • -6-

    + lriItit

    b r"ri

    pi1= ri+1 +ipi-Termination is controlled by monitoring ri or p.

    Some of the basic properties of the C-G method are [7;.

    (i) rip; = 0 i>j,

    (ii) p"Apt = 0 i /j,

    (iii) when p0 =r0 then rirj =0 i gj,

    (iv) when po = ro, the sets pg 3J=o and fr Ji Ieach form a basis for Si = Span TroAr 0, -,A 1r0o and the residual vectorri minimizes the quadratic form Q(r) = rTAhr over all vectors of the formr = ro+ As,s ESi. Since Si 4 1 ,Si we see that Q(ri) will monotonicallydecrease as i increases.

    When dealing with very large and sparse matrices C-G has a very attractiveproperty from an algorithmic standpoint, namely only inner products need beperformed. The inner product calculations will involve a row of the matrix and afull vector. When the matrix is sparse the amount of work is diminished sub-stantially.

    Another attractive feature of the C-G method is that, unlike some of theother iterative method for finding solution to linear systems, one does not needan estimate of the extreme eigenvalues or other parameters ror convergence.

    In order to accelerate the convergence of the C-G method a widely usedtechnique is to precondition the matrix A in such a way as to cluster the cigen-values, or said another way, to produce a matrix which is in some ways close tothe identity matrix.

    We turn now to the use of preconditioning techniques combined with C-Gapplied to non-symmetric matrices. We first observe from the C-G algorithm(4.1) that if A = I, then the C-G method would converge in one step. In this casewe have

    ro= b -z 0 ,

    Po = ro, (4.2)

    IIro|I2z 1 =zo+p 0 =z+b -zo=b.

    Thus, intuitively we will converge rapidly when the coefficient matrix is close, insome sense, to the identity matrix. This is the guiding principle in the use ofpreconditioning.

    Consider a linear system

    Bz = f, (4.3)where B is not necessarily symmetric but is well conditioned. Suppose we havean approximate factorization

    B LU, (4.4)

    then the following relations are immediate

  • -7-

    B( LU)~1 a 1, ( LU )~TBT --I, (4.5)

    (LU)-1B a 1, ( LUl)~r a1, (4.6)

    L~1BU-1 a I, U--TBT L-T M. (4.7)

    These three sets of relations can be used to generate six variations of a precon-ditioned C-C method for non-synunetric matrices.

    For the first three variations we start from the system of the form

    BTBx = BTf. (4.8)

    Our goal is to derive a system Ay = g where A = DTD with D m I. We observethat equation (4.5) is equivalent to the system

    (LU)-TBTB(LU)-1LUz = (LU)-TBT f .

    Hence, setting

    A = (LU)-TBTB(LU)-1 = [B(LU)-1]T[B(LU)~1] (4.9)

    y = LUz

    g = (LU)-TBTfwe have the system

    Ay =g,

    with A F I and symmetric positive definite.The second variation uses relation (4.6) and starts from equation (4.3) by

    multipling by (L U)-1. Then

    (LU)'1Bz = (LU)-1f ,

    from which we find

    BT(LU)-T(LU)-1BZ = BT(LU)-T(LU)-1f1.

    Setting

    A = BT(LU)-T(LU)-lB = [(LU)-'B]'[(LU)-1B]

    y =x (4.10)

    g = BT(LU)~T(LU)-1f

    we have a system Ay = g of the desired type.

    The third variation starts with equation (4.3) and uses (4.7) to obtain

    U-TBTL-TL-lBU-1 Uz = U-TBT LTL- f.

    Thus, setting

    A = U-TBTL-TL-'BU~1 = [L-lBU-1]T[L-1BU-1] (4.11)

    y = Uz

    g = UrBT L-L-f

    we have the system of the dv:sired type. This variation was suggested in theappendix of [5].

    The last three variations are based on the equations,

    BBT = f , z = BT z. (4.12)We are drawn to this equation by an error analysis of Equation (4.12) by Paige[19]. In the analysis, Paige shows that the square of the condition of B does not

  • -8-

    enter into the error bounds, only the condition number.

    In this case, our goal is to obtain a system Ay = g where A = DDT withD f I. Starting with Bz = f , it follows that

    B(LU)-'LUz =1,and setting,

    LUx = (LU)-TBTy,

    we have

    B(LU)~'(LU)-T BT y = f.

    Hence, setting

    A = B(LU)-'(LU)-TBT = [B(LU)-1][B(LU)~1]T

    z = (LU)-'(LU)-TBTy (4.13)

    89=fwe have a system Ay = g.

    Finally, starting with equation (4.9) and using relation (4.6), we find

    (LU)-1BBT(LU)~T(LU)Tz = (LU)-'f.

    Thus we can set

    A = (LU)- 1 BBT(LU)-T = [(LU)-lB][(LLU)-lB]T (4.14)

    y = (LU)T z. z = H'z

    g = (LU)-'f,

    to obtain a system of the desired type.

    Starting with Bz = f and having relation (4.7) in mind, we find

    L~ 1BU~1Uz = L~1f ,

    then setting Uz = U--TBT L -Ty, we find

    L-1BU-1U-TBTL-Ty = L'f.

    Setting

    A = L~'BU-1 U-TBTLT = [L-1BU-1]L-~1BU-1]T

    g =L -f ,(4.15)

    z = U~1U-TBTL-TY

    we have a system Ay = g of the desired type.For each of the six variants, the basic C-G algorithm (4.1) can be applied.

    For the first three variants A = DTD, while for the last three A = DDT, where thematrix D is the preconditioned matrix of the form B(LU) 1, (LU)'B, orL~'BU~1. To summarize, when we apply the C-G algorithm to each of these vari-ants, we obtain the following algorithms.

  • -9-

    I II I11

    D = B(LU)- 1 D = (LU)-'!] D = L~ (U-1

    y =LUx y =z y = (Izg = )Tf g = DT(LU)If g = 9T 1 I f

    initial yo = LUxo,r0 = f -Hz0ophase Ro = DTrT Ro = Dr(LU) ror Ro = D T,-'r0

    po=Ro Pc=RDPo=Ro

    a = II~I 2/tI4 jiI2yt+1= yi. + a.

    Iterative 1 i = &R - a;DDpphase bi = IIRiLl2/ IRiI2

    Pi = Ri + bipi

    Final

    phase x= (LU)~'y x =y x = U-y

    Table 4.1

    IV V VI

    D = B(LU)~' D = (LU)- 1I3 D = L BU-1y=z y=(LU)Tz y=zg =f g=(LU)-fg= f

    z = (LU)-l(LU)-TB-Tz x= B z x = U-U-TBTL-1z

    initially = zg =1 = zog = (LU)-f yo = xo, g = L-fphase Ro = g -DD yo Ro = g - DDTyo R = g - DDTyo

    po=Ro p 0 =Ro p=Ro

    a =IIR 1I2/|1|DII 2y = yi + a p%

    Iterative R+1 = Rt - aDTDptphase b = IR+I 2/ 1RI1 2

    Pi = R+, + bip1

    Finalphase z = (LU)-1(LU)-TBTy x = BT(LU)-y x = U-1U-TBTLTy

    TabLe 4.2

    We note that when the algorithms are written in this form, the iterative por-tion of each algorithm is the same. Also note that in the last three variants, thealgorithms are operating in the y -space and thereby requires an initial esti-mate yo. If these latter three algorithms were imbedded in some larger itera-tive procedure which was based in the zx-space, the y-vector would have to bestored between calls in order to use the previous time step solution as an initial

  • -10-

    approximation for the next time step. We also observe that for the last threevariants, we have equated a guess x0 in the x-space to a guess yo in the y-space,ignoring the fact that xu / yo and in fact are related by the expressions shown inthe final phase of Table 4.2.

    Considering the first three algorithms in Tbte 4.1, we observe that in eachcase five vectors are needed, the given vectors x andf with the three auxilaryvectors r,p, and s. In terms of computational work we see that the first threevariants differ only in the final phase with variant I having the least work. Forthe last three variants in Table 4.2, we observe that in general we will have tokeep six vectors; x and f with the four auxiliary vectors y, r, p, and s. The lastthree variants do differ in the work involved in three initial and final ?hases.However, multipling a vector by the matrices L- 1 or U1 involves only Mn opera-tions, thus all three variants involve the same amount of work.

    5. PROVIDING FILL IN FOR INCOMPLETE FACTORIZATIONThe incomplete factorization that has been described in se-tion 4, allowed

    storage for elements of L and U in the same positions as in the original matrixA. No additional elements where allowed to be formed during the decomposition.If this condition is relaxed, and fill in is allowed to occur in positions other thanthat occupied by the original matrix, a family of factorizations can be produced

    [4].Intuitively it is hoped that the approximation factors, L and U will more

    closely approximate the "true" factors of the original matrix as more and morefill in is provided. Thus, DT D or DDT will more closely approximate the identityand fewer iterations will be expected in order to obtain a solution. On the otherhand, as the fill-in increases, the computation time per iteration increases, sothat the overall computation time may not be decreased even though thenumber of iterations is reduced. When dealing with fill-in, there are threeaspects to be considered. First the amount of fill-in, second, the location of thefill-in, and third, the nature of the fill-in. We have restricted our fill-in to occuron diagonal stripes adjacent to the original pattern. With this fill-in pattern, wehave considered the effect of providing additional till-in sripes. (In ourapproach, we have provided additional storage for fill in to occur on these diago-nal stripes.)

    6. ERROR ANALYSIS

    The iterative phase of the above algorithms is centered around the compu-tation of f = DTDp or f = DD7p where D is one of the following three quanti-ties; A(LU)- 1, (LU)-'A, or U-'AL~1. In examining the behavior of the algorithmit is important to understand .he nature of the errors made in this computation.

    We will concentrate on the equation f = D'Dp where D = A(LU)-1, theanalysis follows in a similar fashion for the other definitions of D and f. We letB = LU and the equation for f can be written as:

    f = B~TATAI- 1p.

    We are interested in determining how the computed f , say f , is effected by thecondition of B.

    We will perform the following operations to compute f ;

  • - 11 -

    p1 = p ; solve for p1

    P = Ap; f'rm p 2p =ATp2 ; formp3

    BTf = p; solve for f.

    There are four sources of errors, one in each of the above steps. Using a Wilkin-son[18] rounding error analysis we know that the computed solution f willsatisfy

    (B + E1 )p1 =p

    P2 = (A + E2 )p1

    P= (AT + I3)P2

    (BT + E4 )f =p3,

    where |E 1| EIIA II for i = 2 and 3,IIEIIs e;IBII for i = 1 and 4.

    The error bounds for f can be stated as:

    Theorem: If IA II | aliBI! then 60 !9 aIAB~1II + 9 2K 2 2a, whereK = |B(II IIB~', o- , and reflects the machine precision.

    Proof.

    The computed f can be written as

    f = (B-T + F1 )(AT + E)(A + E4)(B'1 + F)p.

    where 1EIi1 set IA| . IIFdI| s eIIB~1II and t s g (n)e, g (n) is a modest function ofn, the order of the matrix, and a is the machine precision. Expanding we get

    f = (B-TAT + B-TEs + F1AT + F1E3 )(AB-1 + AF2 + E4B 1 + E4 F2 )p.

    If we let

    H = AB-I

    G1 = B-TEs + F1 AT + F1 Es

    G2 = AF2 + E4 B~1 + E4 F2

    ti lv

    f = (HT + G)(H + G2)p

    We can now determine bounds for G1, G. If we assume |A|| o||IIBII then

    IIG1II| es||Ail|||B~'||+ EIII 11|| |B~1 |+ c1est|Ai||B ~1B |

  • -12-

    s(2(e + ( 2,)us 3(,ca

    11G211 2IIAII IIB-'11 + E4JIA II 11111j1 + C264IIAII 1B'Ils (2t,c + 1r~

    s 3(,cu

    where ,c = |BHhI1B~111 and I = max E{. Then we can write the errors made in com-puting f as

    I 1i . sLI. 6 ,aIHj|| + 9 2 c2 a2 .

    If (ic < 1 then the order of magnitude of the error bound is the same as thatfor the direct solution. Whenever the square of the condition number occurs inthe error bound for the solution, it is effectively multiplied by the square of theprecision or something smaller. The |IHII is a reflection of how effective thepreconditioner is. This quantity is expected to be close to order unity.

    7. RESULTSThe matrices used in this study were generated from the finite difference

    equations described in section 2. We choose the domain to be the unit cube(4 = Ly = L = 1) and throughout the study we used the following data.

    V(z,y,z) = 800z(1 -z)y(1 - y)zRz(z,y,z)

    VW(x,y,z) = BOOx(1 -x)y(1 - y)zRv(z,yz) (7.1)V'(zy,z) = 4zyz2

    where Rz(z,y,z) = 1 !, R(z,y,z) = l 4. When R= = 1, tie prob-lem type will be referred to as non-rotational; while the other case will bereferred to as rotational. For our problem the absorbtion distribution, O(z,y ,z)is zero, and the source distributation, F(x,y.z) = zyz. For the boundary con-dition, when Dirchlet conditions are used,

    GL(z,y,O) = 1 and G'(x,y,1) = 2.

    We also allow Neumann conditions to be imposed on either the top or the bottomor both. When Neumann conditions are used on both the top and bottom, theresulting singular matrix is modified by zeroing out the first row and column andplacing a constant on the diagonal. This would correspond to fixing the value ofthe solution in the first mesh cell.

    Within the class of matrices generated in this manner, we have the followingparameters at our disposal:

    a) Size of the mesh, i.e. the order of the matrix.

  • 13 -

    b) Combination of boundary conditions used. For example, Dirichiet top andbottom, Dirichlet top and Neumann bottom, Neumann top and bottom, etc.

    c) Use of a rot national or non-rotational velocity field.

    In this numerical study we have focused our attention on the following threeareas.

    1. Differences in the behavior of the six varients discussed in section 3.2. Effects of fill-in along stripes on the computational times for achieving a solu-tion.3. Comparison of these six variants with an algorithm designed and implementedby Manteuffel [9,10] based on the use of Tchebychev iteration for non-symmetriclinear systems.

    To illustrate the differences in the six varients we used a problem based ona 7x7x7 mesh with no rotation in the velocity field. We used two types of boun-dary conditions. The first case used Dirichlet conditions on the top and bottomwhile the second case used Neumann conditions on the top and bottom with thesolution value fixed in the first mesh cell. In both cases we have compared thesix varients using no fill-in and using one inner and one outer stripe for fill-in.The results are shown in Tables 7.1 and 7.2. (The computations were run on aVAX 11/780 under UNIX and the timings are reported in seconds. For all thetables D-D refers to Dirchlet conditions imposed on the top and bottom, andN-N refers to Neumann conditions imposed on the top and bottom.)

    no fill-in one inner and outerstorg e -2287 fill-in stor e3551

    variant iterations time iterations time1 40 31 32 302 36 27 28 273 46 33 36 334 39 29 31 305 35 27 27 276 45 32 35 33

    Table 7.1

    D-D

    no fill-in one inner and outerstorage - 2287 fill-in storage 3551

    variant iterations time iterations time1 62 43 50 462 50 36 42 383 70 48 57 504 60 41 49 445 48 35 41 386 66 4 53 48

    Table 7.2

    N-N

    In all cases we see that the performance of the 2"d and the 59 variants are the

  • - 14-

    best while the 3'1 and the 6 th variants perform the worst. Moreover the varia-tions in performance are significant with a reduction factor of at least 0.75 in allcases.

    Using the same two test problems we studied the effects of fill-in on thenumber of iterations and ultimately on the iteration time. The results for the 214variant are shown in Table 7.3 and 7.4 which give the number of iterationsneeded to achieve convergence for a given pair (m,n), where m is the numberof inner fill-in stripes and n is the number of outer fill-in stripes. The results aresurprising and discouraging. They show that the number of iterations needed forconvergence is extremely insensitive to the number of fill-in stripes. Clearlythere is no point in considering fill-in beyond the (1,1) pattern when consideringiteration time. From Tables 7.5 and 7.6 it is clear that for these cases, thure isno point in using any fill-in.

    number inner stripesouter 0 1 2 3 4 5

    0 36 34 34 34 34 331 34 28 29 29 29 292 34 29 28 29 29 293 34 29 29 29 29 294 34 29 29 29 29 295 34 29 29 29 29 29

    Table 7.3

    Number of iterations for D-D , 2n" variant

    number inner stripesouter 0 1 2 3 4 5

    0 50 47 47 47 47 471 47 42 42 42 42 432 47 42 42 42 43 433 47 42 42 43 43 434 47 42 42 43 43 435 47 42 42 42 42 43

    Table 7.4

    Number of iterations for N -N , 2"d variantOne observation made in the course of these experiments is that although thenumber of iterations decreases by increasing the amount of fill-in allowed in thematrix, the overall time to solve the problem never decreases. This can beunderstood by looking at the iterative portion of the algorithm, which is basedon matrix-vector products. As the matrix is allowed to have more non-zero ele-ments the time to do a matrix-vertor product increases. This can be seen inTables 7.5 and 7.6.

  • - 15 -

    Table 7.5

    Iteration T 7ie for D -D, 2' v ariant

    Table 7.6

    Iteration Time for N -N, 21 varirant

    We have compared the method as described above with a non-symmetriclinear equation solver based on the use of Tchebychev polynominals in the com-plex plane. The Tchebychev iteration is based on work of Manteuf el[9,10], and isadaptive in the choice of estimating the optimal iteration parameters. Thismethod will converge whenever the spectrum of A can be enclosed in an ellipsethat does not contain the origin. It can be shown that the Tchebychev iterationis optimal over all other polynomial based methods. The software to do the Tche-bychev iteration was provided by Manteuflel.

    In the comparison we took a finite difference mesh of 1;)x 1 5x30 mesh cellswhich led to a system of order 6750, with 46288 non-zero coefficients in thematrix for a density of about 0.1 per cent. In the first case we used Dirichletconditions on top and bottom (D-D), and in the second case we used Neumannconditions on top and bottom sides (N-N), with a point fixed to insure a uniquesolution as before. The preconditioner was the same in both cases.

    method

    conditions CG TchebychevD-D 65 min 51 min

    168 iter 144 iterN-N 82 min 346 min

    248 iter 2686 iter

    Table 7.7

    Comparison CG vs Tchebychev

    Table 7.7 presents results for the total computation time and iterations. Whenconsidering time note that the Tchebyshev method requires periodic reevalua-tion of the eigenvalue estimates. In Figue 7.1 graphs of the iteration countversus the residual are given. Figure 7.1a gives results for (D-D) and Figure 7.1bgives results for (N-N). In Figure 7.1c the (N-N) results are graphed for just thefirst 400 iterations. Note that the CG residual is not a monotonically decreasingfunction since the only residual guaranteed to decrease is R for the secondvariant in Table 4.2 and the graph deals with the residual in equation (4.1).

    number innerouter 0 1

    0 27 291 29 27

    number innerouter 0 1

    0 36 381 37 38

  • - 16-

    Iterott cs vs ResLdual

    K..\

    LLGCIOM ta . .CC sift eLh e esd

    is r n 5s1tarot am is In a

    IterotLons vs Res eduolb

    -9

    6%uffeL e meho

    U a am ia ___s -

    0 ro m

    FIGURE 7.la

    'to

    ,"

    "5

    FIGURE 7.1b

    IterottLons vs ResldUoL

    .CAp."'.wtfs' . tS~e.m ore

    is a , a aItI. ~w :::

    FIGURE 7.1c

    With both methods we require a residual to be less than 10-13 before conver-gence was signaled. The results from this problem show that the CG variant(variant II of table 4.1) used is competive with the Tchebychev iteration for

    L o

    o

    C31

    1

    21

    T

    "

  • -17-

    Dirichet conditions. When Neumann conditions are imposed, the Tchebychevapproach is very slow to converge. (It should be noted that the default parame-ters were used in the Tchebychev code.) If no information is known about theproblem the CG variant would be a better choice.

    8. REFERENCES

    1. F.H. Harlow and A.A. Amsden, Flow of Interpenetratary Material Phases. J.Comp. Phys. 18, 440-464 (1975).

    2. V.L. Shah, et. al, Some Numerical Results with the COMMIX-2 Computer Code,Technical Memorandum ANILCT-79-30 (1979).

    3. A. Greenbaum, Comparison of Splittings Used with the Conjugate GradientAlgorithm, Numer. Math. 33, 181-194 (1979).

    4. 0. Axelsson and N. Munksgaard, A Class of Preconditioned Conjugate GradientMethods for the Solution of a Mixed Finite Element Discretization of the Pihar-monic Operator, Inter. J. for Num. Meth. Eng. 14, 1001-1019 (1979).

    5. D. Kershaw, On the Problem of thistable Pivots in the Incomplete LU-Cbnjugate Gradient Method, J. Comp. Phy. 38, 114-123 (1980).

    6. J. Daniel, The Approximate Minimization of Junctionals, Prentice-Hall, 1971.

    7. M.R. Hestenes and E. Stiefel, Methods of Conjugate Gradients for SolvingLinear Systems, NBS J. Res. 49, 409-436 (1952).

    8. P. Concus, G. Golub, and D. O'Leary, A Generalized Conjugate Gradient Methodfor the Numerical Solution of Elliptic Partial Differential Equations, SparseMatric Computations, Ed. J. Bunch and D. Rose, Academic Press (1976).

    9. T. ManteuiThl, The Tchebychev Iteration for Nonsymmetric Linear Systems,Numer. Math. 28, 307-327, (1977).

    10. T. Manteuffel, Adaptive Procedure for Estimating Parameters for the Non-symmetric Tchebychev Iteration Numer. Math. 31, 183-208, (1978).

    11. H.A. van der Vorst and J.M. van Kats, Marnteuffel's Algorithm with Precondi-tioning for the Iterative Solution of Certain Sparse Linear Systems with a Non-symmetric Matrix, Academisch Computer Centrum Report TR-11, Utrecht, TheNetherlands, August, 1979.

    12. 0. Axelsson, On Optimization Methods in the Numerical Solution of Boun-dary Value Problems: A Survey, Univ. of Texas Center for Numerical AnalysisReport CNA-137, Austin, Texas, 1978.

    13. 0. Axelsson, Cbnjugate Gadient Type Methods for Unsysmmetric and Incon-sistent Systems of Linear Equations Linear Algebra and its Applications 29:1-16(1980).

    14. J.A. Meijerink and H.A. van der Vorst, An Iterative Solution Method for LinearSystems of which the Coefficient Matrix is a Symjentric M-matrix, Math. of

  • - 18-

    Comp., 31, 148-162 (1977).

    15. I. Gustafsson, A Glass of First Order Factorization Methods, PIT, 18, 142-156(1978).

    16. N. Munksgaard and 0. Axelsson, Analysis of Incomplete Factorizations 'wthWised Storage Allocation, submitted SIAM Jour. on Scientific and Statistical Com-puting.

    17. A. Greenbaum, Behavior of the Conjugate Gradient Algorithm in Finite Preci-sion Arithmetic, Lawerence Livermore Laboratory, UCRL-85752, March 1981.

    18. J.H. Wilkinson, Rounding Errors in Algebraic Processes, Notes on Applied Sci-ences No. 32, Her Majesty's Stationary Office, London, Prentice-Hall, New Jersey(1963).

    19. C.C. Paige, An Error Analysis of a Method for Solving Matrix Equations,Math. Comp. 27, 355-359, April 1973.