Direct and Iterative Methods for Sparse Linear Systems. Shirley Moore email@example.com CPS5401 Fall 2013 svmoore.pbworks.com November 26, 2013. Learning Objectives. Describe advantages and disadvantages of direct and iterative methods for solving sparse linear systems. - PowerPoint PPT Presentation
Iterative Methods for Sparse Linear Systems
Direct and Iterative Methods for Sparse Linear SystemsShirley Mooresvmoore@utep.eduCPS5401 Fall 2013svmoore.pbworks.com November 26, 20131Learning ObjectivesDescribe advantages and disadvantages of direct and iterative methods for solving sparse linear systems.Describe the general methodology underlying the method of Conjugate Gradients (CG).Apply appropriate method from a solver library to solve a particular sparse linear system, including both symmetric positive definite and nonsymmetric matrices.Be able to find and make use of documentation on sparse solver libraries.2Direct vs. Iterative MethodsIn a direct method, the matrix of the initial linear system is transformed or factorized into a simpler form, which can be solved easily. The exact solution is obtained in a finite number of arithmetic operations, if not considering numerical rounding errors. Iterative methods compute a sequence of approximate solutions, which converges to the exact solution in the limit, i.e., in practice until a desired accuracy is obtained.
3Direct vs. Iterative Methods (cont.)Direct methods have been preferred to iterative methods for solving linear systems, mainly because of their simplicity and robustness. However, the emergence of conjugate gradient methods and Krylov subspace iterations has provided an efficient alternative to direct solvers. Nowadays, iterative methods are almost mandatory in complex applications, mainly because of memory and computational requirements that prohibit the use of direct methods. Iterative methods usually involve a matrix-vector multiplication procedure that is cheap to compute on modern computer architectures. When the matrix A is very large and is composed of a majority of nonzero elements, the LU factorization would contain many more nonzero coefficients than the matrix A itself. Nonetheless, in some peculiar applications, very ill-conditioned matrices arise that may require a direct method for solving the problem at hand.
4Direct Solvers for Sparse Linear SystemsDirect solvers for sparse matrices involve much more complicated algorithms than for dense matrices. The main complication is due to the need for efficient handling the fill-in in the factors L and U. A typical sparse solver consists of four distinct steps as opposed to two in the dense case: An ordering step that reorders the rows and columns so that the factors suffer little fill, or so that the matrix has special structure such as block triangular form. An analysis step or symbolic factorization that determines the nonzero structures of the factors and creates suitable data structures for the factors. Numerical factorization that computes the L and U factors. A solve step that performs forward and back substitution using the factors.
5Direct Solver PackagesSuperLUhttp://crd-legacy.lbl.gov/~xiaoye/SuperLU/ SuperLU for sequential machinesSuperLU_MT for shared memory parallel machinesSuperLU_DIST for distributed memory parallel machinesSee survey of direct solvers by Xiaoye Lihttp://crd-legacy.lbl.gov/~xiaoye/SuperLU/SparseDirectSurvey.pdf See also research by Tim Davishttp://www.cise.ufl.edu/~davis/welcome.html
6Iterative MethodsUse successive approximations to obtain more accurate solutions to a linear systemSuitable for large sparse linear systemsStationary methods are older, simple to understand and implement, not as effectivePerform same operation each iterationNonstationary methods are more recent, harder to understand, highly effectiveHave iteration-dependent coefficientsTypically use a transformation matrix called a preconditioner that improves convergence of the methodReference: Templates book http://www.netlib.org/linalg/html_templates/report.html 7Stationary MethodsJacobiBased on solving for every variable locally with respect to the other variablesOne iteration of the method corresponds to solving for every variable once. Resulting method is easy to understand and implement, but convergence is slow. Gauss-Seidel Like the Jacobi method, except that it uses updated values as soon as they are available In general, if the Jacobi method converges, the Gauss-Seidel method will converge faster than the Jacobi method, though still relatively slowly. Successive Overrelaxation (SOR) Can be derived from the Gauss-Seidel method by introducing an extrapolation parameter For the optimal choice of , SOR may converge faster than Gauss-Seidel by an order of magnitude. Symmetric Successive Overrelaxation (SSOR)No advantage over SOR as a stand-alone iterative methodHowever, it is useful as a preconditioner for nonstationary methods.8Nonstationary MethodsConjugate Gradient (CG)The conjugate gradient method derives its name from the fact that it generates a sequence of conjugate (or orthogonal) vectors. These vectors are the residuals of the iterates. They are also the gradients of a quadratic functional, the minimization of which is equivalent to solving the linear system. CG is an extremely effective method when the coefficient matrix is symmetric positive definite, since storage for only a limited number of vectors is required. Minimum Residual (MINRES) and Symmetric LQ (SYMMLQ) These methods are computational alternatives for CG for coefficient matrices that are symmetric but possibly indefinite. SYMMLQ will generate the same solution iterates as CG if the coefficient matrix is symmetric positive definite. Conjugate Gradient on the Normal Equations: CGNE and CGNR These methods are based on the application of the CG method to one of two forms of the normal equations When the coefficient matrix is nonsymmetric and nonsingular, the normal equations matrices will be symmetric and positive definite, and hence CG can be applied. The convergence may be slow, since the spectrum of the normal equations matrices will be less favorable.9Nonstationary Methods (cont.)Generalized Minimal Residual (GMRES) The Generalized Minimal Residual method computes a sequence of orthogonal vectors (like MINRES), and combines these through a least-squares solve and update. However, unlike MINRES (and CG) it requires storing the whole sequence, so that a large amount of storage is needed. For this reason, restarted versions of this method are used. In restarted versions, computation and storage costs are limited by specifying a fixed number of vectors to be generated. This method is useful for general nonsymmetric matrices. BiConjugate Gradient (BiCG)The Biconjugate Gradient method generates two CG-like sequences of vectors, one based on a system with the original coefficient matrix A , and one on AT. Instead of orthogonalizing each sequence, they are made mutually orthogonal, or ``bi-orthogonal''. This method, like CG, uses limited storage. It is useful when the matrix is nonsymmetric and nonsingular; however, convergence may be irregular, and there is a possibility that the method will break down. BiCG requires a multiplication with the coefficient matrix and with its transpose at each iteration. Quasi-Minimal Residual (QMR)The Quasi-Minimal Residual method applies a least-squares solve and update to the BiCG residuals, thereby smoothing out the irregular convergence behavior of BiCG, which may lead to more reliable approximations. In full glory, it has a look ahead strategy built in that avoids the BiCG breakdown. Even without look ahead, QMR largely avoids the breakdown that can occur in BiCG. On the other hand, it does not effect a true minimization of either the error or the residual, and while it converges smoothly, it often does not improve on the BiCG in terms of the number of iteration steps.10Nonstationary Methods (cont.)Conjugate Gradient Squared (CGS) The Conjugate Gradient Squared method is a variant of BiCG that applies the updating operations for the A-sequence and the AT-sequences both to the same vectors. Ideally, this would double the convergence rate, but in practice convergence may be much more irregular than for BiCG, which may sometimes lead to unreliable results. A practical advantage is that the method does not need the multiplications with the transpose of the coefficient matrix. Biconjugate Gradient Stabilized (Bi-CGSTAB)The Biconjugate Gradient Stabilized method is a variant of BiCG, like CGS, but using different updates for the -sequence in order to obtain smoother convergence than CGS. Chebyshev Iteration The Chebyshev Iteration recursively determines polynomials with coefficients chosen to minimize the norm of the residual in a min-max sense. The coefficient matrix must be positive definite and knowledge of the extremal eigenvalues is required. This method has the advantage of requiring no inner products.11Conjugate Gradient MethodPopular iterative method for solving large systems of sparse linear equations Ax=bA is known, square, symmetric, positive-definiteReference: Shewchuk
12Quadratic FormQuadratic form
If A is symmetric and positive-definite, f(x) is minimized by the solution to Ax=b
13Method of Steepest DescentChoose the direction in which f decreases the most: the direction opposite to f(x(i))
ErrorResidualResidual as the direction of steepest descent14Error e(i) = x(i) x is a vector that indicates how far we are from the solution.
residual r(i) = b Ax(i) indicates how far we are from the correct value of b.
Think of the residual as the error transformed by A into the same space as b.
Think of the residual as the direction of steepest descent.14Line Search
15line search chooses to minimize f along a line
minimizes f when the directional derivative is equal to zero
Setting this to zero, should be chosen so that r(0) and fx(1) are orthogonal
Bottom figure:f is minimized where the projection of the gradient onto the line is zero15Summary
b A( )b A
16Starting at [-2,-2]