Upload
buituyen
View
218
Download
0
Embed Size (px)
Citation preview
Preconditioning Techniques for a Newton–Krylov Algorithmfor the Compressible Navier–Stokes Equations
by
John Gatsis
A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy
Graduate Department of Institute for Aerospace StudiesUniversity of Toronto
Copyright c© 2013 by John Gatsis
Abstract
PRECONDITIONING TECHNIQUES FOR A NEWTON–KRYLOV ALGORITHM
FOR THE COMPRESSIBLE NAVIER–STOKES EQUATIONS
John Gatsis
Doctor of Philosophy
Graduate Department of Institute for Aerospace Studies
University of Toronto
2013
An investigation of preconditioning techniques is presented for a Newton–Krylov algorithm that is used
for the computation of steady, compressible, high Reynolds number flows about airfoils. A second-
order centred-difference method is used to discretize the compressible Navier–Stokes (NS) equations that
govern the fluid flow. The one-equation Spalart–Allmaras turbulence model is used. The discretized
equations are solved using Newton’s method and the generalized minimal residual (GMRES) Krylov
subspace method is used to approximately solve the linear system. These preconditioning techniques
are first applied to the solution of the discretized steady convection-diffusion equation.
Various orderings, iterative block incomplete LU (BILU) preconditioning and multigrid precondi-
tioning are explored. The baseline preconditioner is a BILU factorization of a lower-order discretization
of the system matrix in the Newton linearization. An ordering based on the minimum discarded fill
(MDF) ordering is developed and compared to the widely popular reverse Cuthill–McKee ordering. An
evolutionary algorithm is used to investigate and enhance this ordering. For the convection-diffusion
equation, the MDF-based ordering performs well and RCM is superior for the NS equations. Experiments
for inviscid, laminar and turbulent cases are presented to show the effectiveness of iterative BILU pre-
conditioning in terms of reducing the number of GMRES iterations, and hence the memory requirements
of the Newton–Krylov algorithm. Multigrid preconditioning also reduces the number of GMRES itera-
tions. The framework for the iterative BILU and BILU-smoothed multigrid preconditioning algorithms
is presented in detail.
ii
Acknowledgements
It is said that it takes a village to raise a child. This analogy serves well in the sense that without the
help and support of many people this thesis would not have been possible.
I would like to thank my parents, Peter and Angela, without whom I would not have been able to
reach this milestone.
Professor Zingg has been an incredible supervisor. A few words that best describe him include:
patient, brilliant and supportive. Through the highs and lows of this journey, he was a source of wisdom
and encouragement. I truly think of him as one of the most important mentors in my life.
Thank you to the members of the doctoral examination committee including its chair Professor
Clinton Groth and Professor Hugh Liu. Professor Groth, I truly appreciate our many discussions on the
progress of this research and your encouragement. Thank you also to Professor Christina Christara for
taking the time to meet and discuss important aspects of this research. Professor Anthony Straatman,
thank you for offering your expertise in your capacity as external reviewer.
I’d like to thank the members of both the computational aerodynamics and computational propulsion
groups that jointly share the research facility at UTIAS. From the past members, I’d especially like to
thank David Kam, Dr. Peterson Wong, Dr. Jai Sachdev and Dr. Tim Leung for their friendship and
Dr. Todd Chisholm for his help early in this research. From the current members, I’d like to thank
David Del Ray Fernandez and Michal Osusky for being sounding boards, as well as Mo Tabesh, Hugo
Gagnon, Ramy Rashad, Nasim Shahbazian, and Lana Olague for their friendship. Michal, thank you for
taking the time to integrate the reordering strategy to the 3D code and subsequently exploring it. I’d
like to thank Oleg Chernukhin for his efficient introduction on evolutionary algorithms. Thank you also
to Dr. Marc Charest and Dr. James McDonald for answering and tending to all computing questions.
Last and certainly not least, I’d like to thank Dr. Jason Hicken for his friendship and for lending me his
incredible talent to help understand some of the more advanced concepts in graph theory and discrete
mathematics. To all of you, as well as those not mentioned, I wholeheartedly wish the best of luck in
their current work and future endeavours.
Thank you to the entire UTIAS staff, especially Peter, Gail, Joan, Clara, Nora and Rosanna. Peter,
from day one you have been a great friend and mentor to me. I would also like to thank all of the
professors and instructors at UTIAS.
Thank you Anna and Joe for all of your moral support and encouragement over the years. Also,
thank you to all of my friends, loved ones, and mentors, not already mentioned. I truly appreciate all of
your help.
I would also like to acknowledge and thank the Government of Canada, the Government of Ontario,
and the University of Toronto for their financial support.
iii
CONTENTS
1 INTRODUCTION 11.1 Solution Methods for Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Multigrid Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.1 Incomplete Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.2 Parallel Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2.3 Multilevel Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 The Newton–Krylov Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.3.1 Multigrid Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.5 Organization of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 GOVERNING EQUATIONS 192.1 The Navier–Stokes Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.1 Generalized Curvilinear Coordinate Transformation . . . . . . . . . . . . . . . . . 212.1.2 Thin-Layer Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 The Spalart–Allmaras Turbulence Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2.1 Generalized Curvilinear Coordinate Transformation . . . . . . . . . . . . . . . . . 25
2.3 Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.4 The Convection-Diffusion Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.1 The Steady 1D Convection-Diffusion Equation . . . . . . . . . . . . . . . . . . . . 262.4.2 The Steady 2D Convection-Diffusion Equation . . . . . . . . . . . . . . . . . . . . 26
3 SPATIAL DISCRETIZATION 293.1 The Navier–Stokes Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2 The Spalart–Allmaras Turbulence Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.3 Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.1 Airfoil Body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3.2 Inflow and Outflow Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3.3 Wakecut Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 The Jacobian of the Nonlinear System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.5 The Convection-Diffusion Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5.1 The Grid Peclet Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.5.2 The 2D Convection-Diffusion Equation . . . . . . . . . . . . . . . . . . . . . . . . . 363.5.3 The Jacobian of the Discretized Equations . . . . . . . . . . . . . . . . . . . . . . . 38
4 SOLUTION ALGORITHM 394.1 Solving the Nonlinear System: Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . 404.2 Newton Globalization: Pseudo-Transient Continuation . . . . . . . . . . . . . . . . . . . . 404.3 Solving the Linear System: GMRES Krylov Subspace Method . . . . . . . . . . . . . . . . 42
4.3.1 Projection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.3.2 GMRES Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.3.3 Convergence of GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
iv
4.3.4 Practical Aspects of the Newton–GMRES Algorithm . . . . . . . . . . . . . . . . . 45
5 PRECONDITIONING 475.1 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.1.1 Jacobian-Vector Products in GMRES . . . . . . . . . . . . . . . . . . . . . . . . . 505.2 Incomplete LU (ILU) Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2.1 Effect of Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.3 Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3.1 Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.3.2 Minimum Degree (MD) Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.3.3 Reverse Cuthill–McKee (RCM) Ordering . . . . . . . . . . . . . . . . . . . . . . . 575.3.4 Minimum Discarded Fill (MDF) Ordering . . . . . . . . . . . . . . . . . . . . . . . 59
5.4 Multigrid Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.4.1 Stationary Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.4.2 ILU(p) as a Smoother . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.4.3 Iterative ILU(p) as a Preconditioner . . . . . . . . . . . . . . . . . . . . . . . . . . 685.4.4 ILU(p)-Smoothed Geometric Multigrid as a Preconditioner . . . . . . . . . . . . . 695.4.5 Reordering and Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.5 Chapter Summary and Highlights of Contributions . . . . . . . . . . . . . . . . . . . . . . 76
6 RESULTS 796.1 Convection-Diffusion Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.1.1 GMRES convergence and Peclet number . . . . . . . . . . . . . . . . . . . . . . . . 816.1.2 Iterative ILU(p) preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846.1.3 ILU(p) and multigrid preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . 846.1.4 Orderings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856.1.5 Further investigation of MDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.2 Euler and Navier–Stokes Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976.2.1 Test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976.2.2 Orderings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1036.2.3 Iterative BILU(p) preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 1056.2.4 BILU(p) and multigrid preconditioning . . . . . . . . . . . . . . . . . . . . . . . . 108
7 CONCLUSIONS, CONTRIBUTIONS AND RECOMMENDATIONS 1177.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.1.1 Convection-Diffusion Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1177.1.2 Euler and Navier–Stokes Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217.3 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
A OTHER PRECONDITIONING TECHNIQUES 125A.1 Domain Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125A.2 Sparse Approximate Inverse Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . 127
REFERENCES 129
v
LIST OF FIGURES
1.1 Geometric versus algebraic multigrid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Curvilinear coordinate transformation courtesy of Lomax, Pulliam, and Zingg [1]. . . . . . 222.2 A C-topology grid about a NACA0012 airfoil (units are in chord lengths). . . . . . . . . . 222.3 The solution to the 1D convection-diffusion equation for several Peclet numbers. . . . . . 27
3.1 Normal and tangential directions at the boundaries. . . . . . . . . . . . . . . . . . . . . . 323.2 Sparsity pattern of sample A1 and A2 Jacobians using a natural ordering. . . . . . . . . . 353.3 Close-up view of the numerical solution to the 1D convection-diffusion equation for various
grid Peclet numbers on a 101-node computational grid. . . . . . . . . . . . . . . . . . . . . 36
5.1 Contributions to ajk from pivot aii in the elimination algorithm. . . . . . . . . . . . . . . 545.2 A four-grid, multigrid V-cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.3 Full-weighting restriction operator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.4 Full-weighting prolongation operator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.1 Convergence of GMRES, solution, and eigenvalues of system matrix with and withoutILU(1) preconditioning of a uniform grid case with a Peclet number of 0.001. . . . . . . . 81
6.2 Convergence of GMRES, solution, and eigenvalues of system matrix with and withoutILU(1) preconditioning of a uniform grid case with a Peclet number of 1000. . . . . . . . 82
6.3 Initial system matrix for a 5 × 5–node grid with a Peclet number 109. Upward- anddownward-facing triangles represent positive and negative values, respectively. . . . . . . . 90
6.4 System matrix after very small entries are discarded. Upward- and downward-facingtriangles represent positive and negative values, respectively. . . . . . . . . . . . . . . . . 90
6.5 Resulting matrix after MDF ordering. Upward- and downward-facing triangles representpositive and negative values, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.6 Resulting matrix after a random permutation. Upward- and downward-facing trianglesrepresent positive and negative values, respectively. . . . . . . . . . . . . . . . . . . . . . . 91
6.7 LU-factorization of the randomly-permuted matrix. . . . . . . . . . . . . . . . . . . . . . . 926.8 Resulting matrix after MDF for the randomly-permuted matrix. Upward- and downward-
facing triangles represent positive and negative values, respectively. . . . . . . . . . . . . . 926.9 Convergence and solution for the subsonic inviscid case, E1. . . . . . . . . . . . . . . . . . 1006.10 Convergence and solution for the transonic inviscid case, E2. . . . . . . . . . . . . . . . . 1006.11 Convergence and solution for the laminar case, L1. . . . . . . . . . . . . . . . . . . . . . . 1016.12 Convergence and solution for the subsonic turbulent case, T1. . . . . . . . . . . . . . . . . 1016.13 Convergence and solution for the transonic turbulent case, T2. . . . . . . . . . . . . . . . 102
vi
LIST OF TABLES
1.1 A history of popular Krylov subspace and related methods (1950-1999). . . . . . . . . . . 4
3.1 Errors for various Peclet numbers for various discretizations of the 1D convection-diffusionequation on a uniform 101-node computational grid. . . . . . . . . . . . . . . . . . . . . . 37
4.1 Continuation parameters for Newton’s method. . . . . . . . . . . . . . . . . . . . . . . . . 42
5.1 SGS (left) and ILU(0) (right) iterations on a 21 × 21–node grid for various initial errorfrequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 SGS (left) and ILU(0) (right) iterations on a 41 × 41–node grid for various initial errorfrequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 ILU(0) (left) and ILU(1) (right) iterations on a 41× 41–node grid for various initial errorfrequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.4 ILU(0) (left) and ILU(1) (right) iterations on an 81×81–node grid for various initial errorfrequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.5 ILU(0) (left) and ILU(0)+MG (right) iterations on a 41× 41–node grid for various initialerror frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.6 ILU(0) (left) and ILU(0)+MG (right) iterations on an 81×81–node grid for various initialerror frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.7 ILU(0) (left) and ILU(0)+MG (right) iterations on a 161 × 161–node grid for variousinitial error frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.8 ILU(1) (left) and ILU(1)+MG (right) iterations on a 41× 41–node grid for various initialerror frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.9 ILU(1) (left) and ILU(1)+MG (right) iterations on an 81×81–node grid for various initialerror frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.10 ILU(1) (left) and ILU(1)+MG (right) iterations on a 161 × 161–node grid for variousinitial error frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.1 GMRES iterations and CPU times with (left) ILU(0) and (right) ILU(1) preconditioningon a 129× 129-node grid for various Peclet numbers. . . . . . . . . . . . . . . . . . . . . . 83
6.2 GMRES iterations and CPU times with (left) ILU(0) and (right) ILU(1) preconditioningon a 129× 129-node grid for a Peclet number of 0.001. . . . . . . . . . . . . . . . . . . . . 83
6.3 GMRES iterations and CPU times with (left) ILU(0) and (right) ILU(1) preconditioningon a 129× 129-node grid for a Peclet number of 1000. . . . . . . . . . . . . . . . . . . . . 84
6.4 GMRES iterations for various multigrid preconditioners with ILU(0) smoothing (Pe =0.001). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.5 GMRES iterations for various multigrid preconditioners (Pe = 1000). . . . . . . . . . . . 866.6 GMRES iterations for various orderings using ILU(0) multigrid preconditioning (129×129
nodes and Pe = 0.001). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866.7 GMRES iterations for various orderings using ILU(0) preconditioning (129 × 129 nodes
and Pe = 1000). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876.8 GMRES iterations for various orderings using ILU(0) multigrid preconditioning (257×257
nodes and Pe = 0.001). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876.9 GMRES iterations for various orderings using ILU(1) multigrid preconditioning (257×257
nodes and Pe = 0.001). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
vii
6.10 GMRES iterations for various orderings using ILU(1) preconditioning (257 × 257 nodesand Pe = 1000). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.11 Locations of root nodes that correspond to minimized discarded fill using an evolutionaryalgorithm for convection-dominated (Pe = 109) and diffusion-dominated (Pe = 10−9)cases for a 5× 5–node grid with flow angle of θ = 15. . . . . . . . . . . . . . . . . . . . . 94
6.12 GMRES iterations for MDF-ILU(p) preconditioners (Pe = 1000) and a comparison toRCM. Note: The most upstream node is (1,1). . . . . . . . . . . . . . . . . . . . . . . . . 96
6.13 Computational grids for Euler and Navier–Stokes calculations. . . . . . . . . . . . . . . . 976.14 Test cases for Euler and Navier–Stokes calculations. . . . . . . . . . . . . . . . . . . . . . 986.15 Baseline Newton (IN ) iterations, GMRES (IG) iterations, and CPU times for all Euler
and Navier-Stokes test cases solved using BILU(p) preconditioning. . . . . . . . . . . . . . 996.16 Performance of Newton–Krylov algorithm using BILU(p) with various orderings. . . . . . 1046.17 Performance of Newton–Krylov algorithm using multiple BILU(p) preconditioning for
inviscid test cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066.18 Performance of Newton–Krylov algorithm using multiple BILU(p) preconditioning for
laminar and turbulent test cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076.19 Performance of Newton–Krylov algorithm using multiple BILU(p) preconditioning for
inviscid subsonic test case, E1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1086.20 Performance of Newton–Krylov algorithm using multiple BILU(p) preconditioning for
turbulent transonic test case, T2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096.21 Performance of Newton–Krylov algorithm using BILU(p) or 2-level multigrid precondi-
tioning for inviscid test cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106.22 Performance of Newton–Krylov algorithm using BILU(p) or 2-level multigrid precondi-
tioning for laminar and turbulent test cases. . . . . . . . . . . . . . . . . . . . . . . . . . . 1116.23 Performance of Newton–Krylov algorithm using BILU(p) and 2- or 3-level multigrid pre-
conditioning for inviscid, laminar and turbulent test cases. . . . . . . . . . . . . . . . . . . 1136.24 Finer grid cases for Euler and Navier–Stokes calculations. . . . . . . . . . . . . . . . . . . 1146.25 Performance of Newton–Krylov algorithm using BILU(p) and 2-, 3- or 4-level multigrid
preconditioning for finer-grid test cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
viii
NOTATION
ABBREVIATIONSAF approximate factorization
AINV approximate inverse
AMG algebraic multigrid
ARMS algebraic recursive multilevel solver
BFILU block-filled incomplete lower-upper
BILU block-incomplete lower-upper
Bi-CG bi-conjugate gradient
Bi-CGStab bi-conjugate gradient-stable
CFD computational fluid dynamics
CG conjugate gradient
CGS conjugate gradient squared
CGW Concus–Golub–Widlund
CM Cuthill–McKee
DG discontinuous Galerkin
FAS full-approximation storage
FMG full multigrid
FOM full orthogonalization method
GA genetic algorithm
GCR generalized conjugate residual
GCROT generalized conjugate residual withinner orthogonalization and outertruncation
GMG geometric multigrid
GMRES generalized minimal residual
IC incomplete Cholesky
ILU incomplete lower-upper
ILUM incomplete lower-upper multilevel
ILUT incomplete lower-upper truncated
IOM incomplete orthogonalization method
LU lower-upper
MD minimum degree
MDF minimum discarded fill
MG multigrid
MINRES minimal residual
MR minimal residual
MRILU matrix renumbering ILU
NGILU nested grids ILU
NK Newton–Krylov
NKS Newton–Krylov–Schwarz
NP non-deterministic polynomial-hard
NS Navier–Stokes
QMR quasi minimal residual
RCM reverse Cuthill–McKee
SA Spalart–Allmaras
SGS symmetric Gauss–Seidel
SPAI sparse approximate inverse
TFQMR transpose-free quasi-minimal residual
ix
ALPHANUMERICAD artificial dissipation
D destructive turbulent term
E inviscid flux (x)
E transformed inviscid flux (x)
Ev viscous flux (x)
F inviscid flux (y)
F transformed inviscid flux (y)
Fv viscous flux (y)
J metric Jacobian
M Mach number;convective turbulent term
N diffusive turbulent term
Q conservative flow variables
Q transformed conservative flow variables
P productive turbulent term
R radius, residual
Rkv ratio of residuals
S entropy, vorticity
S transformed thin-layer viscous flux
U contravariant velocity (x)
V contravariant velocity (y)
Vn normal velocity component
Vt tangential velocity component
a speed of sound; continuation parameter
a∞ free-stream speed of sound
a∞ dimensional free-stream speed of sound
b linear system right hand side;continuation parameter
c chord
cp specific heat at constant pressure
cv specific heat at constant volume
dw Spalart–Allmaras wall distance
e total energy; error
e dimensional total energy
f source term
fti Spalart–Allmaras transition functions
fvi Spalart–Allmaras functions
fw Spalart–Allmaras destructive function
hi,m Hessenberg matrix entry
k ILU fill-in parameter
kstart continuation iterations threshold
mi viscous flux vector variables
n number of nodes
p pressure
r residual vector
rm residual vector
t time
u x-component of velocity
u dimensional x-component of velocity
v y-component of velocity
v dimensional y-component of velocity
vm Krylov search direction
ym Krylov update coefficient
zm preconditioned Krylov search direction
x
CALLIGRAPHICA linear system matrix, Newton Jacobian
A1 first-order Jacobian
A2 second-order Jacobian
Ah fine grid operator
A2h coarse grid operator
B domain decomposition submatrix
D diagonal
E domain decomposition submatrix
E error matrix
F domain decomposition submatrix
G domain decomposition submatrix;iteration matrix
Hm mth upper–Hessenberg matrix
I identity matrix
I2hh restriction operator
Ih2h prolongation (interpolation) operator
K Krylov subspace
L lower factorization; Krylov leftsubspace
L incomplete lower factorization
M preconditioning matrix;approximate inverse
Ml left preconditioning matrix
Mr right preconditioning matrix
N splitting matrix
O order
P relaxation splitting matrix; smoother
Pr Prandtl number
Pe Peclet number
PBGS backward Gauss–Seidel smoother
PFGS forward Gauss–Seidel smoother
PGS Gauss–Seidel smoother
PILU ILU smoother
PSGS symmetric Gauss–Seidel smoother
Q relaxation splitting matrix
Re Reynolds number
S relaxation iteration matrix;Schur complement;sparsity pattern
S1 row scaling matrix
S2 column scaling matrix
Sc row scaling matrix
Sr column scaling matrix
T relaxation scaled right hand side
U upper factorization
U incomplete upper factorization
V basis of Krylov subspace
W basis of Krylov left subspace
X matrix of eigenvectors
xi
GREEKΓ local preconditioner
∆ change
Λ matrix of eigenvalues
Υ scalar dissipation pressure switch
α angle of attack;continuation parameter
β continuation parameter
γ ratio of specific heats
ε finite-difference perturbation constant
εj,k scalar dissipation ratio function
η curvilinear normal coordinate
θx wave angle in the x-direction
θy wave angle in the y-direction
κ condition number
κi scalar dissipation coefficients
κe condition number for the Eulerequations
κt thermal conductivity
λmax maximum eigenvalue
λmin minimum eigenvalue
µ dynamic viscosity
µt turbulent dynamic eddy viscosity
ν Spalart–Allmaras working variable
νt kinematic eddy viscosity
ξ curvilinear tangential coordinate
ρ density; spectral radius
ρ dimensional density; spectral radius
ρ∞ free-stream density
ρ∞ dimensional free-stream density
σ spectral radius of the flux Jacobian
τ curvilinear time transformation
τij viscous stress tensor
φ ILU factorization pivot ratio;scalar quantity
xii
Chapter 1
INTRODUCTION
H γνωση ειναι η αληθινη απoψη. (Knowledge is true opinion.)
– Πλατων (Plato)
Fluid dynamics affects everyone. It relates to systems that are large like the currents of the Earth’s
ocean and atmosphere, to as small as the human cardiovascular system. It relates to both the natural
and technological world. For the latter, examples include aircraft aerodynamics, the drag of automobiles
and ships, the flow through a pipeline, and the performance of wind and hydroelectric turbines.
Predicting the aerodynamic performance of aircraft is essential to their design. Theoretical results can
only be taken so far and usually apply to simplified models of real flows. Although realistic, experimental
results for scaled prototypes are expensive and slow to generate. In this regard, and with increasing
computational resources, computational fluid dynamics (CFD) has come to the forefront as an essential
part of aircraft design. CFD algorithms are simply called flow solution algorithms, or flow solvers for
short.
A paramount goal in aerospace is to develop flow solvers that provide efficient numerical solutions
of the compressible Navier–Stokes equations. Having a fast and reliable flow solver is arguably the
most essential component in an optimization framework. Both gradient-based [2] and gradient-free [3]
1
Chapter 1. INTRODUCTION 2
optimization algorithms require many function evaluations, or flow solutions. Hence, improving the
efficiency of a flow solver will yield a compounded improvement in the efficiency of an optimization
algorithm.
The compressible Navier–Stokes equations are highly nonlinear. Aerodynamic flows are turbulent,
typically have high Reynolds numbers and especially in take-off and landing or transonic conditions are
extremely complicated. These conditions only add to the computational cost of the flow solver.
A popular approach to solving the discretized compressible Navier–Stokes equations is the Newton–
Krylov method . Newton–Krylov methods are discussed in detail in [4–6]. Some other fields where the
Newton–Krylov algorithm can be applied include combustion, magnetohydrodynamics, and structural
analysis. In most cases, each Newton iteration requires the solution of a large, sparse linear system.
A necessary component in an efficient Newton algorithm is a fast linear system solver. An iterative
approach is preferred in this work. Krylov methods are popular iterative methods for solving large,
sparse linear systems.
The generalized minimal residual (GMRES) [7] method is one of the most popular Krylov subspace
methods. While GMRES is fast when compared to other classical techniques such as approximate
factorization [8], other challenges arise. Practical applications involve solving poorly-conditioned non-
symmetric linear systems. Effective preconditioning of the system is an essential component in the
solution process. Current preconditioners contribute significantly to the overall computational cost of
the Newton–Krylov method. An effective way to improve the performance of a solution algorithm is
therefore to improve the preconditioner.
Preconditioning of the linear problem is the focus of this thesis. It spans many areas of applied math-
ematics: from classical iterative methods to advanced iterative methods; from incomplete factorizations
to sparse approximate inverses; from multigrid to general multilevel methods; from simple orderings to
parallelization; and from any of the aforementioned methods to combinations thereof.
In order to make sense of how all of these preconditioning methods interrelate, some of the history
of solution methods for linear systems is presented. From there, the discussion shifts to a delineation
of preconditioning and related methods. In particular, incomplete factorizations, parallel precondition-
ers, sparse approximate inverses, and multilevel preconditioners are discussed. Preconditioning is then
connected to the Newton–Krylov method. Once the background is presented, preconditioners that are
selected for this research are identified. This chapter concludes with a list of objectives, supported by
additional motivation. Although a brief survey of parallel preconditioning is presented, the focus of this
thesis is on serial aspects of preconditioning.
1.1 SOLUTION METHODS FOR LINEAR SYSTEMS 3
1.1 Solution Methods for Linear Systems
1.1.1 Iterative Methods
Classical iterative methods can be traced back to the work of Gauss and Seidel in the 19th century.
Approaches based on matrix splittings were developed and/or analyzed in the mid 20th century by
Richardson, Frankel, Young, Peaceman, Rachford, Varga, Stone, Kendall, and Dupont. Saad [9] provides
an excellent history of early iterative methods and is the key source of this review on contributors to
classical iterative methods.
Two key milestones in the advancement of iterative methods are the introduction of projection
methods as early as the 1930s and Chebyshev acceleration in the 1950s. Key contributors to the latter
include Frankel, Gavurin [9], Young, Lanczos, Golub and Varga. These milestones opened the door to
the very powerful family of iterative methods used to this day (and in this thesis): Krylov subspace
methods.
Simoncini and Szyld [10], van der Vorst [11] and Saad [9] discuss many of the details of Krylov
subspace methods. For completeness, the most popular Krylov methods that were introduced from their
inception in the 1950s until the 1990s are listed in Table 1.1. For symmetric systems, the conjugate
gradient (CG) method [12] is a very popular choice. A milestone solver by Meijerink and van der
Vorst [13] couples the CG method with an incomplete Cholesky (IC) factorization preconditioner and
call the method ICCG. For nonsymmetric problems, the bi-conjugate gradient (Bi-CG) [14], generalized
minimal residual (GMRES) [7], conjugate gradient squared (CGS) [15], bi-conjugate gradient-stable
(Bi-CGStab) [16], and generalized conjugate residual with inner orthogonalization and outer truncation
(GCROT) [17] methods are all popular choices. New Krylov methods (mostly based on hybridization
of existing methods) are developed regularly. A recent reference by Abe and Sleijpen [18] demonstrates
this.
Krylov subspace methods require additional measures to improve their robustness, speed, and to
satisfy memory limitations imposed by computers. There are (at least) four measures that can be
taken to improve the performance of these methods: restarting; truncating; deflating/augmenting; and
preconditioning. The last is of most concern in this research and will soon be discussed in intense detail.
Flexible variants of Krylov subspace methods that facilitate advanced preconditioning are discussed in
Section 1.2.
Restarting refers to allowing a maximum-allowable subspace size. GMRES(m) [38] is an example.
Truncating the orthogonalization procedure (inherent to these methods) yields an improvement. Exam-
ples include incomplete orthogonalization methods (IOMs) based on the full orthogonalization method
(FOM) [27], the aforementioned GCROT method, and LGMRES [39] method which truncates GMRES.
Deflating and augmenting the subspace with vectors from outer cycles can be used in conjunction with a
restarted algorithm to improve performance. Morgan [40] presented an example of a restarted GMRES
algorithm with deflation for block systems. In addition to these improvements, it is possible to adapt
1.1 SOLUTION METHODS FOR LINEAR SYSTEMS 4
Table 1.1: A history of popular Krylov subspace and related methods (1950-1999).
Year Creator(s) Method Reference
1950 Lanczos Lanczos [19]
1951 Arnoldi Arnoldi [20]
1952 Hestenes and Stiefel CG [12]
1952 Lanczos Lanczos (CG) [21]
1975 Paige and Saunders MINRES [22]
1975 Paige and Saunders SYMMLQ [22]
1975 Fletcher Bi-CG [14]
1976 Concus and Golub CGW [23]
1977 Vinsome ORTHOMIN [24]
1977 Meijerink and van der Vorst ICCG [13]
1978 Widlund CGW [25]
1980 Jea and Young ORTHODIR [26]
1981 Saad FOM [27]
1982 Paige and Saunders LSQR [28]
1983 Eisenstat et al. GCR [29]
1986 Saad and Schultz GMRES [7]
1989 Sonneveld CGS [15]
1991 Freund and Nachtigal QMR [30]
1992 van der Vorst Bi-CGStab [16]
1993 Gutknecht Bi-CGStab2 [31]
1993 Sleijpen and Fokkema Bi-CGStab(l) [32]
1994 Freund TFQMR [33]
1994 Weiss GMERR [34]
1994 Chan et al. QMR-BiCGStab [35]
1995 Kasenally and Ebrahim GMBACK [36]
1996 Fokkema et al. CGS2 [37]
1999 de Sturler GCROT [17]
Krylov methods to efficiently solve linear systems involving multiple right-hand sides [41–43].
Barrett et al. [44] deciphered the various stopping criteria that are used in many Krylov subspace
methods. Other recent advancements in Krylov subspace methods can be found in [10]. They include,
but are not limited to, application to general complex matrices and parametrized systems.
1.1 SOLUTION METHODS FOR LINEAR SYSTEMS 5
1.1.2 Multigrid Acceleration
In order to understand the concept of multigrid, one must understand the concept of a smoother. At each
iteration of an iterative method there remains an error in the solution, which is the difference between the
current iterate and the exact solution. Consider the error as being divided into high- and low-frequency
groups. A smoother is an operator that is able to rapidly-decrease high-frequency errors. For example,
the Gauss-Seidel operator is an excellent smoother for problems with symmetric positive definite (SPD)1
matrices. Entire methods can be smoothers. Krylov subspace methods are an example [45].
Multigrid is used to accelerate these smoothers. The multigrid method effectively reduces the low-
frequency errors by projecting the problem onto a coarser domain. On this domain, the projected
low-frequency errors appear as higher-frequency errors due to the increase in grid spacing (or cell or
element size in other discretizations). The smoother is applied to the coarsened problem resulting
in an improvement in performance. This process is usually repeated to a depth of several levels. A
very attractive property of the multigrid method is that for certain problems it is an algorithmically-
scalable [46], or equivalently, an O(n) method.
Multigrid methods are part of a broader classification called multilevel methods which will be discussed
in more detail in Section 1.2.3. Briefly, multilevel methods are a generalization of methods that employ
the solution of a problem on a fine domain by using one or several smaller coarser subdomains. Multigrid
methods, such as the full approximation storage (FAS) scheme, can also be applied to nonlinear problems.
Some excellent references for multigrid include works by Briggs et al. [47], Wesseling [48], Trottenberg
et al. [45], Wagner [49], Stuben [45,50,51], Lomax et al. [52], and a recent paper by Yavneh [53].
Linear multigrid is of particular interest in this research. There are two popular approaches to using
linear multigrid: geometric multigrid and algebraic multigrid. Geometric multigrid (GMG) is based on
using existing grid hierarchies to obtain coarse- and inter-grid operators. In algebraic multigrid (AMG),
these operators are obtained by using the matrix entries and no known grid hierarchy is needed. GMG
is simple to understand but its implementation must be catered to each problem. AMG is difficult to
set up but can be treated as a black box to solve many problems. AMG on its own is not comparable
to geometric multigrid in terms of speed, however, with its abstraction from the physical domain, AMG
can handle a wider range of problems than geometric multigrid, making it much more robust. In this
research, GMG is considered for three key reasons: AMG has a much higher setup cost that is difficult
to amortize over the linear iterations; it requires more memory; and the problems under investigation
here have been traditionally solved with GMG. Figure 1.1 provides a concise comparison of GMG and
AMG. There are other ways to classify linear multigrid methods. In the finite element community for
instance, one can compare h-multigrid (based on a grid hierarchy) to p-multigrid (based on the degree
of an element) to hp-multigrid.
Early contributors to multigrid include Southwell (as indicated by Saad [54]), Fedorenko [55–57],
1A symmetric positive definite matrix, A, is a matrix that, in addition to being symmetric, satisfies xTAx > 0 ∀ x 6= 0.
1.2 PRECONDITIONING 6
Multigrid Approach
Algebraic
Variable hierarchybased on algebraic
system
Algebraic systems
Geometric
Fixed hierarchybased on PDEs
or geometry
Grid equations
Figure 1.1: Geometric versus algebraic multigrid.
and Bakhvalov [58]. Brandt [59] is generally given credit as being the first to use multigrid for practical
applications. Algebraic multigrid was introduced in the 1980s by pioneers such as Brandt [60,61], Ruge
and Stuben [50,51,62]. Brandt [61] also credited McCormick as an early contributor to AMG.
Extensive research on the use of multigrid (both linear and nonlinear) in CFD applications has been
performed by researchers such as Jameson et al. [63–65], Mavriplis [66–68], Moinier and Giles [69], Zeng
and Wesseling [70], Allmaras [71], Weiss et al. [72], Griebel et al. [73], Ollivier–Gooch [74], Morano et
al. [75], Thomas et al. [76], Bordner and Saied [77], Lassaline [78,79], Manzano [80], and Chisholm [81].
Luksch [82] provides a brief online introduction to AMG. Raw [83,84] investigated AMG as a solver
for the 3D Navier–Stokes equations. Cleary et al. [85] investigated the robustness and scalability of
AMG for a broad range of problems. Brezina et al. [86] and Chartier [87] used AMG to solve problems
that are discretized by using finite elements, and Haase et al. [88] investigated the parallelization of such
applications.
1.2 Preconditioning
Some of the most popular preconditioning methods are presented. The methods introduce novel ideas
as well as inherent aspects from the linear solution methods already discussed. They include incomplete
factorizations, parallel preconditioners, and multilevel preconditioners. The use of multigrid, a specific
multilevel method, is discussed in detail in the context of preconditioning.
Krylov subspace methods have been adapted to facilitate more advanced preconditioners. These
preconditioners can themselves be methods and can vary from one Krylov subspace iteration to the
next. Flexible Krylov methods, that is Krylov subspace methods with flexible preconditioning, emerged
in the early 1990s by necessity to handle these more advanced preconditioners. Examples include: the
1.2 PRECONDITIONING 7
flexible conjugate gradient method by Axelsson and Vassilevski [89]; flexible GMRES by Saad [90];
another flexible GMRES variant by van der Vorst and Vuik [91]; and flexible QMR by Szyld and
Vogel [92]. More recent examples include flexible Bi-CG and Bi-CGStab methods by Vogel [93] and a
flexible GCROT method by Hicken and Zingg [94].
1.2.1 Incomplete Factorizations
Incomplete factorizations date back to the early 1960s to Buleev [95], Varga [96], Oliphant [97, 98],
and Dupont [99]. In 1977, Meijerink and van der Vorst [13] investigated the incomplete Cholesky
(IC) factorization as a preconditioner for the CG method. It is regarded as the first instance that an
incomplete LU factorization (ILU) was used as a preconditioner. Note, an ILU factorization of an M-
matrix2 is equivalent to an IC factorization. Manteuffel [100] investigated a similar preconditioner for
the conjugate gradient method for symmetric positive-definite (SPD) systems using a shifted incomplete
Cholesky factorization. Eisenstat [101] is also credited as having one of the earliest implementations of
an efficient form of ILU.
A key issue for ILU is stability. In general it is not known if a ILU will break down for a specific
matrix. Meijerink and van der Vorst [13], Elman [102], and Bruaset et al. [103] investigated the stability
of ILU and concluded that ILU is only guaranteed to be stable for M-matrices. Benzi [46] explained
some measures that can be taken for more general matrices to improve ILU. Her focus was on reducing
instability due to small pivots and triangular solves. She introduced measures of accuracy and stability.
Chisholm [4] used a condition number estimate (as well as other approaches) to measure the quality of
an incomplete factorization. Chow and Saad [104] experimented with various approaches for avoiding
instability in the ILU algorithm. Aspects such as pivoting, reordering, scaling, diagonal perturbation
and symmetrical preservation were explored. Recently, Gopaul et al. [105] investigated the stability of
ILU(0) for a nine-point high-order compact discretization of the convection-diffusion equation in two
dimensions.
There are several variants to ILU. They include the introduction of fill-in, use of a drop-tolerance,
modification of the diagonal, block representation and parallel implementation. The fill-in of a matrix
refers to where entries of the factorization are located relative to the original matrix sparsity pattern.
The original ILU approach is a zero fill-in approach. A non-zero fill-in was introduced by Gustafsson [106]
and later generalized by Watts [107]. Meijerink and van der Vorst [108] are also early contributors to
this concept. Chapman et al. [109] investigated high-accuracy ILU preconditioners which use a larger
than usual amount of fill-in. Much like fill-in can be used to control the newly-introduced entries in
the factorization based on the sparsity pattern, a drop-tolerance strategy controls what new entries
are permitted based on size. Zlatev [110] first introduced the concept of using a threshold for ILU.
Other contributors to this concept include Young et al. [111], Gallivan et al. [112], and D’Azevedo
2An M-matrix is a matrix with positive diagonal elements and non-positive off-diagonal elements. Hence, its inverseonly contains non-negative elements.
1.2 PRECONDITIONING 8
et al. [113, 114]. Saad [115] implemented a dual-parameter ILU strategy where both fill-in and drop-
tolerance control the entries of the factorization. Jones and Plassman [116] implemented a black-box
threshold-ILU strategy that automates the drop-tolerance strategy.
In fill-in and drop-tolerance ILU, the entries that do not satisfy criteria for retainment are simply
discarded. In modified ILU (or MILU), the entries are added (in some sense) to the main diagonal. MILU
dates back to the late 1970s and key contributors include Gustaffson [106], van der Vorst [117], Axelsson
and Lindskog [118], and Elman [119]. Wittum and Liebau [120] introduced a so-called truncated ILU
which is somewhat related to MILU. Benzi [46] pointed out that MILU tends to perform poorly on
nonmodel problems because it is more susceptible to rounding errors. This conclusion was drawn based
on the work of van der Vorst [121].
Block variations of ILU cater to matrices that arise from the discretization of a system of PDEs.
Examples include Underwood [122], Concus et al. [123,124], Axelsson [125], Magolu [126], and Yun [127].
Block ILU, or BILU, is different from block-fill ILU, or BFILU. In block ILU, the blocks in the matrix
are treated as matrices and are inverted during the factorization. In block-fill ILU, a fill-in level of zero
is assigned to the block pattern, but the factorization is performed on the matrix that is populated by
scalars. Examples of implementations for the latter include Pueyo [5] and Orkwis [128]. Chisholm [4]
used a block ILU preconditioner in his compressible Navier–Stokes equations solver.
The quality of ILU is very sensitive to ordering. However, finding an ordering of a system that
minimizes fill-in is an NP-complete problem [129]. Orderings that have been implemented over the
years have been satisfactory in improving ILU for specific applications without necessarily matching the
ordering that solves the NP-complete problem.
Heuristic ordering methods have been around since the 1950s. Ordering methods were originally
intended to minimize storage costs for direct solvers. This is achieved through the minimization of
fill-in. Orderings can be divided into two classes: graph-based and matrix-based.
Graph-based orderings include the original orderings intended to save memory for direct solvers.
The minimum degree algorithm is considered to be the earliest example. It dates back to the 1950s
to Markowitz [130], who looked at minimizing products within the elimination algorithm. Tinney and
Walker [131] later generalized this approach using graph theory and eliminated the column in the factor-
ization that has the fewest neighbours (i.e. minimum degree). Duff and Ucar [129] discussed the many
variations of the minimum degree ordering that have been examined over the years. The minimum degree
algorithms are inherently local since the choice of the next pivot is not connected to future elimination
steps.
Global (graph-based) ordering strategies enforce bounds on fill-in by applying strategic permutations
to the matrix. Examples (in chronological order) include: Rosen [132] ordering; Cuthill-McKee (CM)
ordering [133]; reverse Cuthill-McKee (RCM) ordering, by George [134]; nested dissection ordering by
George [135]; Gibbs ordering by Gibbs et al. [136]; Sloan ordering [137]; double ordering by Baumann
et al. [138]; snake ordering by Hassan et al. [139]; and orderings by Martin (found in [140]). Additional
1.2 PRECONDITIONING 9
ordering methods are presented in [141]. Of these orderings, the two most important would arguably be
nested dissection and RCM. The nested dissection ordering removes column(s) to decouple the system
matrix into two separate parts. It is inherently recursive and parallel. Hendrickson and Rothberg [142]
implemented a hybrid ordering of nested dissection and minimum degree. RCM attempts to minimize
the bandwidth (or profile) of the matrix, thus confining the fill-in to a ‘more narrow’ matrix. More recent
graph-based orderings use broader portions of the matrix graph, called cliques and supernodes [46]. The
latter leads to parallel implementation of ILU. A recent example of a transition to parallel ILU through
the use of supernodes is the work done by Henon et al. [143].
Since the 1990s, matrix-based orderings have increased in popularity. These orderings incorporate the
size of the entries in the matrix in some sense as well as the graph to the decision process for subsequent
pivots. Clift and Tang [144] compared several of these orderings. Benzi [46] discussed the permutation
of large entries to the main diagonal. The minimum discarded fill (MDF) ordering [113,114] introduced
by D’Azevedo, Forsyth, and Tang is of particular interest in this research. In this ordering, a pivot is
chosen so as to minimize the discarded fill-in for that pivoting step over all remaining candidate pivots.
The ordering is relatively slow compared to RCM but improvements and simplifications to the algorithm
have made it more competitive.
Persson and Peraire [145] recently implemented a reordering algorithm that is related to MDF and
that is used to form their ILU preconditioner for the Newton–Krylov solver for the Navier–Stokes equa-
tions. Their research is of particular interest and will be discussed in Section 1.3.1.
Numerous studies on orderings have been conducted over the years. Here, some important ones are
pointed out. Liu and Sherman [146] compared CM to RCM and found that RCM is at least as good as
CM in terms of storage and computational cost. They suggested it may have to do with the fact that
RCM minimizes the distance between the next vertex and its ordered neighbours, whereas CM minimizes
the distance between the next vertex and its unordered neighbours. Duff and Meurant [141] found that
for some simple SPD problems, the ICCG method is not improved by orderings such as minimum
degree and RCM unless a higher level of fill-in is allowed. However, Dutto [140] later investigated the
effect of various orderings (including minimum degree, (reverse) Cuthill-McKee, Gibbs and snake) for
incomplete factorization preconditioning for the compressible Navier-Stokes equations and found that
reordering the system favourably can greatly improve the quality of the preconditioner. She found
that for GMRES, an RCM reordering is an excellent choice. Clift and Tang [144] modified the RCM
algorithm so that nodes that tie are sorted by their ascending degree. Pueyo [5] further validated this
point in his comparison of various ordering strategies where he found RCM best improves his Newton–
Krylov flow solver’s performance. Benzi et al. [147] investigated the use of various orderings to construct
preconditioners for nonsymmetric linear problems, most notable of which was RCM. They used these
preconditioners for GMRES, Bi-CGSTAB and transpose-free QMR. Other orderings they considered
included CM and multiple minimum degree. ILU and variations were considered as preconditioners
for the iterative methods. They also determined that ordering methods originally intended for direct
1.2 PRECONDITIONING 10
methods perform competitively depending on how symmetric and diagonally dominant the system matrix
is. Pollul and Reusken [148] investigated orderings for preconditioners for an NK algorithm for the Euler
equations. Chisholm and Zingg [149] discussed the importance of root-node selection and tie-breaking
strategy for RCM for an NK algorithm for the compressible Navier–Stokes equations.
In the past decade, novel ordering methods have been investigated. For example, Bondarabady and
Kaveh [150] used a genetic algorithm to find an ordering that optimizes various graph properties. In
their survey paper, Duff and Ucar [129] discussed another family of preconditioners based on a support
graph. In this approach, combinatorics was used to determine the best reordering of vertices to generate
a matrix splitting.
1.2.2 Parallel Preconditioning
Parallel preconditioning has emerged as an important aspect of linear solvers. Numerous packages exist
that use parallel preconditioners and have been used widely in the applied mathematics community.
Of particular interest here are the parallel implementations of ILU preconditioners. Another branch of
inherently parallel preconditioners called sparse approximate inverses (SPAIs) gained popularity in the
1980s. In this section parallel ILU and SPAI preconditioners are discussed.
Originally, the parallelism of ILU was not obvious. It took investigations into the incomplete Cholesky
factorization (with a fill-in of zero) to change that opinion [46]. An example of this was the research
conducted by Dubois et al. [151]. They created a parallel preconditioner based on an approximation
to the inverse of a matrix. Two popular approaches to parallelizing ILU are through colouring (i.e.
ordering) and domain decomposition.
Various ordering and colouring techniques have been explored to parallelize ILU. George [135] in-
troduced the nested dissection ordering which is amenable to parallel applications. In the early 1980s,
van der Vorst [152] introduced his ordering, and it proved to be very scalable for the ICCG method. He
then focused on the parallelization of the forward and backward solves using a level scheduling or wave-
front approach [153]. Anderson and Saad [154] concurrently investigated a similar approach and found
good scalability of their algorithm. Elman and Golub [155] introduced their famous red/black ordering
in the early 1990s. Since then, more intricate colouring algorithms have been explored. For example,
Adams et al. [156] investigated a four-colour ordering strategy. More recently, Hysom and Pothem have
investigated parallel application of ILU with zero fill in [157] and fill in greater than zero [158]. They
concluded that the algorithm is very scalable; however for the latter it is difficult to deal with fill in
between subdomains.
The concept of domain decomposition first originated in the work by Schwarz [159] from as early as
the 1870s. He used it to prove the existence of the solution of the Dirichlet problem on irregular domains.
Many decades later, Miller [160] revisited the idea and applied it to solving systems of equations. There
are some variations in the way the restructured system is solved and yield the classification of Schwarz,
full matrix, and Schur complement methods. Further details can be found in the review given by
1.2 PRECONDITIONING 11
Saad [38]. A basic mathematical introduction into domain decomposition is provided in Appendix A.1.
Domain decomposition methods have been a popular component of solvers for practical applications,
especially in the past 20 years. Saad and van der Vorst [9] provide an extensive review, whereas here only
some examples are provided. Mandel [161] investigated domain decomposition preconditioning for finite
element applications. He also found some qualitative analogies to multigrid. Knoll et al. [162] compared
a domain-based Schwarz preconditioner to ILU for compressible combustion problems and found it to
be superior. Fischer et al. [163] investigated the use of a Schwarz preconditioner using overlapping
pressure subdomains for the incompressible Navier–Stokes equations. Saad et al. [164] implemented a
parallel ILU preconditioning method using the Schur complement. They found that the recursive Schur
preconditioner is difficult to parallelize. Gropp et al. [165] investigated the use of domain decomposition
preconditioners for various parallel applications. Their work is an example of a Newton–Krylov–Schwarz
(NKS) method. More recently, Hicken and Zingg [166] investigated additive Schwarz and approximate
Schur preconditioners and ILU for the 3D simulation of inviscid aerodynamic flows.
The standard ILU preconditioner looks to approximate the inverse of the system matrix by approxi-
mating the system matrix and then inverting it via the LU decomposition forward and back substitutions.
Sparse approximate inverse (SPAI) preconditioning is attractive because it looks to find an approxima-
tion to the inverse of the system’s matrix directly, rather than indirectly. Appendix A.2 outlines this form
of preconditioning in more detail. Benzi [46] provides an excellent discussion on SPAI preconditioners.
SPAI preconditioning traces back to work by Benson [167] and later Benson and Frederickson [168]
in the early 1980s. In its earliest formulation, the following minimization is executed:
minM∈S
||AM− I|| (1.1)
whereA is the system matrix,M is the approximate inverse preconditioner, and S is a predefined sparsity
pattern that limits the growth of fill of M. Various norms and approximations lead to a a family of
methods called non-factored SPAIs. Additional breadth in the topic is created by the choice of S. The
minimization problem can be decoupled and can conceptually be thought of as parallel. Factored-form
SPAI preconditioners are based on incomplete conjugation of the unit basis vectors are referred to as
AINV (approximate inverse) preconditioners. The original contribution for these types of preconditioners
is by Benzi et al. [169]. The approach looks to approximate the generalized Gram-Schmidt process in
forming A = LDU . The AINV approach is sensitive to ordering, whereas its non-factored counterpart
is not [46].
Kolotilina and Yeremin [170,171] studied the use of approximate inverses as parallel preconditioners
for elliptic boundary value problems using the finite element method. Additional early research was
conducted by Grote and Simon [172], Cosgrove and Fowler [173], Cosgrove et al. [174], Chow and
Saad [175], and Huckle and Grote [176, 177]. Benzi et al. [169] investigated the effectiveness of an
approximate inverse preconditioner for the conjugate gradient method. Chow and Saad [178] and later
Barnard and Grote [179] implemented a block version of SPAIs. In a later study they found their SPAI
preconditioner was slower but more robust than ILU [180]. Sosonkina [181] implemented an approximate
1.2 PRECONDITIONING 12
LU technique SPAI preconditioner. Huckle [182] studied the effect of sparsity pattern restriction on the
approximate inverse for positive definite matrices.
SPAI preconditioners have been used for a variety of applications, such as electromagnetics [183,184].
Furthermore, SPAI preconditioners have been connected to other preconditioning methods or used as
components in other methods. Guillaume et al. [185] used a rational approximation preconditioner which
parallels the idea of sparse approximate inverses. Chow [186] connected SPAIs to Schur complements.
Tang and Wan [187] and Broker et al. [188] used a SPAI smoother for multigrid. The latter found that a
SPAI is an attractive alternative to Gauss-Seidel. Bollhoefer and Saad [189] demonstrated a connection
between SPAI and ILU for certain matrices.
SPAI preconditioning, as well as many other preconditioning methods discussed in this dissertation
have blended into hybrids and the distinction between them is increasingly fading. Recent work by
Huckle et al. [190] is a good example. They improved the classical SPAI approach by using elements
from the Schur complement method, as well as other domain decomposition methods. The resulting
preconditioner was used in image deblurring applications.
1.2.3 Multilevel Preconditioning
A multilevel method seeks to solve a problem by coarsening or partitioning it and hence solving it on a
smaller domain. Multilevel preconditioning has been applied to ILU and SPAIs. Multilevel ILU can be
thought of as an ordering. The multigrid method is a multilevel method by definition. In this section
the history of these methods is reviewed.
Multilevel ILU first traces back to work by Axelsson and Vassilevski [191, 192]. They looked at a
multilevel ILU preconditioner for finite element applications. Less than a decade later, a famous paper
by van der Pleog, Botta and Wubs [193] was published. Their so-called nested grids ILU (NGILU)
method is based on the repeated use of red-black ordering. Botta and Wubs [194] also implemented
another technique called matrix renumbering ILU (MRILU) which reorders the matrix based on the size
of its elements. Saad [195] demonstrates in his paper on ILUM that multilevel ILU can be viewed as an
ordering. Furthermore, Vassilevski [196] found that ILU and AMG can be thought of as approximate
Schur complement methods. Bank and Wagner [197] drew a close comparison between multilevel ILU
and the classic multigrid algorithm for simple elliptic problems. The work by Saad and Zhang [198,199] is
somewhat related to the block multilevel ILU preconditioner developed by Botta and Wubs. They tested
these preconditioners on finite element applications. In the latter reference, they considered ILUT as the
base preconditioner and used a Schur complement approach. Later, Saad et al. [200–202] developed an
algebraic recursive multilevel solver (ARMS) and tested it on various CFD applications. They considered
various ordering strategies within their solver. Shen and Zhang [203] found their block multilevel ILU
preconditioner to be superior to a parallel two-level Schur preconditioner for some convection-diffusion
and Navier–Stokes matrices. Shen et al. [204] later improved on this preconditioner by increasing its
parallelism.
1.3 THE NEWTON–KRYLOV METHOD 13
Recent work continues to improve multilevel ILU preconditioning and incorporates other precon-
ditioners. Gu et al. [205] used sparse approximate inverses in their multilevel ILU preconditioner.
Saad [206] improved his ILUM preconditioner by adding diagonal dominance as a criterion in the con-
struction of the multilevel ordering. Mayer [207] recently developed a dual-pivoting strategy including
a novel dropping strategy for a multilevel ILU preconditioner. This approach was found to have compa-
rable performance to preconditioning software developed by Bollhoefer and Saad [208] and Saad [201].
Sparse approximate inverses are inherently parallel. The incorporation of multilevel approaches, how-
ever is a more recent phenomenon. Examples of multilevel sparse approximate inverse implementations
include [209,210] for the SPAI method, and [211] for the AINV method.
Multigrid preconditioning has been used to accelerate standalone linear solvers and solvers that exist
within a larger nonlinear solution algorithm. Multigrid preconditioning for the Newton–Krylov method
is deferred to Section 1.3.1.
Axelsson and Vassilevski [212] provided a mathematical derivation for two-level methods and applied
their method to symmetric and nonsymmetric problems. Oosterlee and Washio [213] developed a parallel
multigrid preconditioner for problems exhibiting multiple scales. Braess [214] investigated AMG as a
preconditioner for linear systems with positive definite matrices. Hager and Lee [215] showed that
GMRES preconditioned with ILU and MG is efficient in solving the Euler equations. Oliveira and
Deng [216] compared variations of MG preconditioning for GMRES and CGS to ILU(0) preconditioning
for solving transport equations. Oosterlee and Washio [213,217] [218] [219] investigated the use of AMG
as a preconditioner in parallel computing applications which are solved using domain decomposition
methods. Wienands et al. [220] provided a Fourier analysis for GMRES preconditioned by multigrid
with Gauss–Seidel as its smoother. The model problems they explored were diffusive in nature including
the Poisson equation in 3D and a case with mixed derivatives. Tuminaro et al. [221] implemented a
two-level Schwarz preconditioner for Navier–Stokes calculations on unstructured meshes. More recently,
Wang and Joshi [222] developed an agglomeration-type AMG preconditioner for a finite-volume Krylov
solver for 3D incompressible viscous channel and cavity flows. Pennachio [223] used AMG preconditioning
to accelerate solutions for the reaction-diffusion equation.
1.3 The Newton–Krylov Method
The Newton–Krylov (NK) method is a popular choice for solving the discretized compressible Navier–
Stokes equations. It dates back to the mid 1980s. Pueyo [5] provides an excellent description of the
early history of Newton–Krylov methods. Wigton et al. [224] is thought to have first implemented the
NK method. They modeled inviscid 2D flows.
Later, Venkatakrishnan [225] used the method to solve inviscid and viscous aerodynamic problems.
Dutto [140] and Johan et al. [226] applied the method to inviscid and laminar viscous flows. Ajmani et
al. [227,228] used the NK method to model hypersonic flow around a cylinder and transonic flow through
1.3 THE NEWTON–KRYLOV METHOD 14
a turbine. Venkatakrishnan and Mavriplis [229] used ILU preconditioning in their Newton-GMRES
algorithm to solve the discretized steady Navier–Stokes equations. Orkwis [128] compared Newton,
quasi-Newton, and inexact-Newton methods coupled with CGS to solve the Navier–Stokes equations.
McHugh and Knoll [230] compared matrix-free and matrix-present approaches in the treatment of the
system Jacobian. Barth and Linton [231] used the NK method to solve turbulent cases in 3D with the
aid of grid sequencing. Nielsen et al. [232] solved the Euler equations on 3D unstructured domains.
Cai et al. used a Newton–Krylov–Schwarz (NKS) method for inviscid calculations. Anderson [233, 234]
compared the NK method to multigrid for inviscid turbulent calculations and found multigrid to be faster.
Dawson [235] modeled multiphase flows using the NK method. Wille [236] investigated a globalization
strategy based on mesh sequencing for his ILU-preconditioned NK solver. Blanco and Zingg [237] used a
block ILU preconditioner based on a low-order approximation to the system Jacobian for their Newton-
Krylov method on unstructured grids. The preconditioner was reordered using RCM. Similarly, Pueyo
and Zingg [5, 8, 238] used a block-fill ILU preconditioner on a low-order Jacobian for their NK method
on structured grids, resulting in a very fast algorithm. They used approximate factorization to assist in
the globalization. Geuzaine [6,239,240] used the NK method on unstructured grids to solve the Navier–
Stokes equations coupled with the Spalart-Allmaras turbulence model. Gropp et al. [241] investigated
a globalization strategy for their NKS solver. Chisholm and Zingg [4, 149, 242–245] implemented an
effective NK solver and resolved many issues pertaining to the globalization of the method, especially
in terms of the Spalart–Allmaras turbulence model. Nemec and Zingg [2, 246–249, 249–252] extended
the work of Pueyo and developed an optimization framework for 2D turbulent airfoil design. Their
Krylov solver used for Newton’s method was also applied to the gradient evaluation problem. Gatsis
and Zingg [253, 254] created a novel, fully-coupled algorithm for aerodynamic shape optimization in
which the Navier–Stokes equations, adjoint equations and optimality conditions (i.e. Karush–Kuhn–
Tucker conditions) were solved using a Newton–GMRES algorithm. The flow system was solved only
once in their algorithm. Olawsky [255] applied the NK method to supersonic and hypersonic flows.
Harrison [256] used the method in modeling chemistry. Vanderchkove [257] implemented the NK method
for other physics applications. Nichols [258,259] extended the NK method to a structured solver for the
3D Euler equations. Groth and Northrup [260] implemented an NKS algorithm for 2D, steady Euler
calculations. They used a block, parallel additive Schwarz preconditioner with ILU on local blocks.
Bellavia and Berrone [261] improved the globalization for the NK method for Navier–Stokes calculations
using a finite element discretization. Nejat, Michalak, and Ollivier–Gooch [262–266] implemented an
NK method for an Euler solver that uses a high-order spatial discretization. Hicken and Zingg [267]
improved on the work of Nichols and developed a optimization framework and parallel solver for the
3D Euler equations. Northrup and Groth [268] extended their parallel, adaptive mesh refinement NK
algorithm to 3D, large-eddy simulation (LES) for reactive flows. Osusky and Zingg [269, 270] extended
Hicken’s algorithm to laminar and turbulent viscous calculations in 3D. Lucas, van Zuijlen, and Bijl
recently investigated the use of a Jacobian-free NK algorithm for unsteady flows [271]. They found their
1.3 THE NEWTON–KRYLOV METHOD 15
algorithm to be superior to nonlinear multigrid, especially for more difficult cases.
1.3.1 Multigrid Preconditioning
Multigrid preconditioning is of particular interest in this research. An early record of multigrid precon-
ditioning for a Newton–Krylov method is Brieger and Lecca [272]. They used a parallel implementation
of multigrid to solve a subsurface hydrology problem. Piquet and Vasseur [273] used a simple multigrid
preconditioner to solve the 3D incompressible Navier–Stokes equations.
In 1999, Geuzaine et al. [239] compared their finite-volume nonlinear multigrid solver to a newly-
developed NK solver with ILU(0)-smoothed multigrid preconditioning. The incomplete factorization
was on a low-order discretization of the system Jacobian. They found the two solvers to be competitive
for Euler and Navier–Stokes calculations. Concurrently, Knoll et al. used a multigrid preconditioner
for their NK solver for the radiation-diffusion equation. They found that their NK-MG algorithm is
superior to NK-ILU and nonlinear multigrid. Knoll et al. [274,275] extended the solver to multimaterial
equilibrium radiation diffusion.
Knoll and Mousseau [276] implemented a NK solver with AMG preconditioning for a finite-volume
incompressible Navier–Stokes solver. They used a block-SGS smoother. Knoll and Rider [277] used a
geometric multigrid preconditioner for their incompressible Navier–Stokes flow solver. They also solved
Burgers equation in 1D and 2D. Damped Jacobi and SGS were used as smoothers, with a low-order
discretization of the convective terms in the smoother was found to be effective.
In the past decade, there has been continued research into multigrid-preconditioned Newton–Krylov
methods. Jones and Woodward [278] used a red-black Gauss–Seidel smoother for the NK-MG algorithm
to solve Richard’s equation for saturated flow. Pernice and Tocci [279] solved the incompressible Navier–
Stokes equations using the pressure-correction method as a smoother. Mavriplis [68] continued his
earlier research by looking at line Jacobi and Gauss–Seidel smoothers. He solved the radiation diffusion
equation as well as the Navier–Stokes equations. Wu et al. [280] used a NK method with multigrid
preconditioning, smoothed by a tridiagonal matrix, for battery simulation problems. Syamsudhuha and
Silvester [281] solved the Navier–Stokes equations using a NK-MG method. Elman et al. [282] solved
the incompressible Navier–Stokes equations. In 2004, Knoll and Keyes [283] published a review paper
on Jacobian–free Newton–Krylov methods including the use of multigrid as a preconditioner.
The discontinuous-Galerkin (DG) finite-element formulation has gained popularity in recent years for
solving the Navier–Stokes equations. Persson and Peraire [145] and Diosady and Darmofal [284] looked
to improve their DG algorithm by using a Newton–Krylov method. Persson and Periare compared block-
Jabobi, block-Gauss–Seidel and a multilevel preconditioner and found them to not be robust for real
flows. They implemented a multigrid preconditioner that was superior to the aforementioned ones. The
preconditioner consisted of a coarse-scale correction based on a low-order polynomial (p-multigrid) and a
block-ILU(0) post smoother. Ordering was critical to the performance of the method. Specifically, they
used MDF-like ordering. They concluded that the multigrid correction was important for diffusion and
1.4 MOTIVATION AND OBJECTIVES 16
ILU was important for convection; however minimal theoretical explanation was provided. Diosady and
Darmofal explored additional orderings that are better-suited to the unstructured computational space.
Specifically, line orderings (in the streamwise direction) were found to be important. These orderings
were found to improve the effectiveness of their Jacobi and block-ILU(0) smoothers.
1.4 Motivation and Objectives
The scope of this thesis is to investigate preconditioning in the context of a flow solution algorithm.
In particular, the algorithm in this research uses a finite difference formulation on structured grids.
The most general equations that are solved are the discretized, compressible, thin-layer Navier–Stokes
equations with the one-equation Spalart–Allmaras turbulence model. The desired properties of this solver
are fast and reliable simulation of steady flow around wing sections and the prediction of aerodynamic
forces and moments around those shapes. This algorithm also serves as a function evaluation in a
gradient-based optimization framework, in which gradient evaluations require the solution of large linear
systems that are closely-related to the linear systems in the flow solver.
Preconditioning of the linear system is arguably the most important component in the Newton-Krylov
algorithm. For practical aerospace applications, an unpreconditioned linear system will not converge if it
is solved iteratively with GMRES. Furthermore, preconditioning encompasses orderings, that can greatly
impact the solution process.
A Newton–Krylov method is used to solve the nonlinear system of equations. However Newton’s
method requires a continuation approach to safely progress from transient to steady state. The base-
line continuation method is approximate factorization (AF). Although AF is very robust through the
transient phase, using it is undesirable since it is essentially a second flow solution algorithm that must
be maintained. Therefore, a pseudo-transient continuation method was implemented to globalize the
Newton algorithm. The approach followed the work of Chisholm [149].
The baseline preconditioner is the inverse of an incomplete LU factorization of the matrix in the
linear system. For simplicity, it is referred to as an ILU preconditioner. ILU preconditioning has its
drawbacks: it scales poorly with increasing problem size; for practical applications it is not guaranteed
to be stable; and it is sensitive to the ordering of the system matrix. The baseline ordering that is
used is reverse Cuthill–McKee (RCM). RCM is first performed and subsequently the ILU factorization
is determined.
The foremost objective of this research is to investigate preconditioning. First a review of precon-
ditioning methods is conducted. From this review, promising candidate preconditioners are selected
and compared to the baseline preconditioner in order to determine which is the best preconditioning
approach.
The literature review led to a couple of promising approaches. The first is the use of a multigrid
preconditioner. The second is the use of an integrated ordering/factorization strategy, in which the
1.5 ORGANIZATION OF THIS THESIS 17
ordering and the factorization are determined in a coupled manner. The minimum discarded fill (MDF)
ordering is an example of such an approach. Thus, the goal of this thesis is to determine from all
combinations of preconditioning and ordering which is the best approach to precondition the Navier–
Stokes flow solver of interest. There are additional variations in preconditioning and investigations that
are conducted to support the decision process.
The second objective of this thesis is to understand on a more fundamental level the effect of these
preconditioners and orderings. To facilitate this, a PDE solver is developed that solves the steady
convection-diffusion equation. All preconditioning investigations are first conducted using the convection-
diffusion solver, followed by the Navier–Stokes solver.
Superior preconditioning can greatly improve the flow solution process. To summarize, the objective
of this thesis is to identify candidate preconditioners, to determine the best possible preconditioner from
a set of preconditioners (including ILU), to perform a more fundamental investigation of preconditioning
using the discretized convection–diffusion equation, and finally to conduct supporting studies that enrich
the understanding of preconditioning on a more fundamental level.
1.5 Organization of this Thesis
In the next chapter, the governing equations are described in detail. Specifically, the thin-layer, com-
pressible Navier–Stokes equations are described, including the boundary conditions and the curvilinear
coordinate transformation that are used. The Spalart–Allmaras turbulence model PDE and the steady
convection-diffusion equation are also introduced.
In Chapter 3, the spatial discretization used to transform the partial differential equations into a
nonlinear system of ordinary differential equations is presented. This includes the boundary equations
and the turbulence model. The linearization of this system is also described, since Newton’s method is
used to solve this system.
In Chapter 4, the Newton–Krylov algorithm is described. Important aspects such as continuation
of the transient behaviour of the nonlinear variables and a thorough description of the GMRES Krylov
subspace method are presented. Preconditioning is essential to the performance of GMRES. Since
preconditioning is the focus of this research, the subsequent chapter is dedicated entirely to it.
There are several aspects of preconditioning that are described in Chapter 5. These aspects can be
summarized in three categories: incomplete LU preconditioning, orderings, and multigrid precondition-
ing. Preconditioners that are developed are summarized by detailed formulas or algorithms, supported
by theoretical exploration where possible.
Chapter 6 is the results chapter. The reader is encouraged to read Chapter 5 in detail to have a
complete understanding of the results presented in this chapter. Chapter 6 is divided into two distinct
components. First, the results relating to the convection-diffusion equation are presented. Next, the
results for the Navier–Stokes equations are described. For each respective solver, test cases are introduced
1.5 ORGANIZATION OF THIS THESIS 18
and various preconditioners are tested for those cases.
In Chapter 7 conclusions are summarized for both the convection-diffusion and Navier–Stokes equa-
tions. The performance of each preconditioner is assessed. A detailed summary of contributions made
follows these conclusions. Finally, recommendations are made for future research to extend the ideas
presented in this research.
Chapter 2
GOVERNING EQUATIONS
In this chapter the governing equations that are used to model aerodynamic flows are presented. Specif-
ically, these are the compressible, thin-layer Navier–Stokes equations coupled with the Spalart–Allmaras
one-equation turbulence model. A curvilinear coordinate transformation is applied to this nonlinear sys-
tem of partial differential equations to facilitate finite-difference calculations based on curvilinear grids.
Boundary conditions are also described.
The convection-diffusion equation is a simpler, linear partial differential equation that governs similar
physical processes to the Navier–Stokes equations: convection and diffusion. To facilitate some expla-
nations and investigations that are discussed later in this dissertation, this chapter concludes with the
convection-diffusion equation, supplemented by related theory, definitions, and basic insights.
2.1 The Navier–Stokes Equations
Before presenting the governing equations, the dimensional variables, density, ρ, velocities, u and v, and
total energy, e, are scaled using the free-stream density, ρ∞, and sound speed, a∞:
ρ =ρ
ρ∞u =
u
a∞v =
v
a∞e =
e
ρ∞a2∞
(2.1)
19
2.1 THE NAVIER–STOKES EQUATIONS 20
In two dimensions, the compressible Navier–Stokes equations are
∂Q
∂t+∂E
∂x+∂F
∂y=
1
Re
(∂Ev∂x
+∂Fv∂y
)(2.2)
where
Q =
ρ
ρu
ρv
e
(2.3)
and Re is the Reynolds number. The convective flux vectors are
E =
ρu
ρu2 + p
ρuv
u(e+ p)
and F =
ρv
ρvu
ρv2 + p
v(e+ p)
(2.4)
The viscous flux vectors are
Ev =
0
τxx
τxy
ϕ1
and Fv =
0
τxy
τyy
ϕ2
(2.5)
with
τxx = (µ+ µt)(4ux − 2vy)/3
τxy = (µ+ µt)(uy − vx)
τyy = (µ+ µt)(−2ux + 4vy)/3 (2.6)
ϕ1 = uτxx + vτxy + (µPr−1 + µtPr−1t )(γ − 1)−1∂x(a2)
ϕ2 = uτxy + vτyy + (µPr−1 + µtPr−1t )(γ − 1)−1∂y(a2)
where the variables τxx, τxy, and τyy are elements of the symmetric viscous stress tensor. A Newtonian
fluid is assumed. The dynamic viscosity is µ, and its eddy viscosity counterpart for the turbulent
formulation is µt. Pr is the Prandtl number. The ratio of specific heats is γ =cpcv
and for air has a value
of 1.4. The pressure, p, is related to the flow variables by the equation of state for a perfect gas:
p = (γ − 1)
[e− 1
2ρ(u2 + v2)
](2.7)
The speed of sound is
a =
√γp
ρ=√γRT (2.8)
2.1 THE NAVIER–STOKES EQUATIONS 21
Sutherland’s law is used to relate the dynamic viscosity, µ, to temperature:
µ =a3(1 + S∗/T∞)
a2 + S∗/T∞(2.9)
where T∞ denotes the freestream temperature, which is assumed to be 460.0R and the constant S∗ is
198.6R for air. The non-dimensional, laminar Prandtl number is defined as
Pr ≡ cpµ
κt(2.10)
where κt denotes thermal conductivity. The laminar and turbulent Prandtl numbers are 0.72 and 0.90,
respectively. The Reynolds number is defined as
Re ≡ ρ∞ c a∞µ∞
(2.11)
2.1.1 Generalized Curvilinear Coordinate Transformation
A curvilinear coordinate transformation is used to map the physical grid space onto a uniform computa-
tional domain. This is illustrated in Figure 2.1. A C-topology grid that is used in this work is shown in
Figure 2.2. The generalized transformation involves the introduction of two new directions and a time
parameter given by
τ = t
ξ = ξ(x, y, t) (2.12)
η = η(x, y, t)
The governing equations now operate on the state vector
Q = J−1Q = J−1
ρ
ρu
ρv
e
(2.13)
where the metric Jacobian of the transformation is given by
J−1 = xξyη − xηyξ (2.14)
2.1.2 Thin-Layer Approximation
For attached or mildly separated aerodynamic flows at high Reynolds numbers, the compressible Navier–
Stokes equations can be simplified by using a thin-layer approximation. This is because viscous effects
that occur in the streamwise direction along the body are much smaller when compared to those that
occur normal to the body. The compressible, thin-layer Navier–Stokes equations are
∂Q
∂τ+∂E
∂ξ+∂F
∂η= Re−1 ∂S
∂η(2.15)
2.1 THE NAVIER–STOKES EQUATIONS 22
x
y
Figure 2.1: Curvilinear coordinate transformation courtesy of Lomax, Pulliam, and Zingg [1].
X
Y
20 10 0 10 20
20
10
0
10
20
(a) Full-grid view
X
Y
0 0.5 1
0.5
0
0.5
(b) Close-up of grid
Figure 2.2: A C-topology grid about a NACA0012 airfoil (units are in chord lengths).
where the convective flux vectors are
E = J−1
ρU
ρUu+ ξxp
ρUv + ξyp
(e+ p)U − ξtp
, F = J−1
ρV
ρV u+ ηxp
ρV v + ηyp
(e+ p)V − ηtp
(2.16)
2.2 THE SPALART–ALLMARAS TURBULENCE MODEL 23
the contravariant velocities are
U = ξt + ξxu+ ξyv (2.17)
V = ηt + ηxu+ ηyv (2.18)
and the viscous flux vector is
S = J−1
0
ηxm1 + ηym2
ηxm2 + ηym3
ηx(um1 + vm3 +m4) + ηy(um2 + vm3 +m5)
(2.19)
with
m1 = (µ+ µt)(4ηxuη − 2ηyvη)/3
m2 = (µ+ µt)(ηyuη + ηxvη)
m3 = (µ+ µt)(−2ηxuη + 4ηyvη)/3 (2.20)
m4 = (µPr−1 + µtPr−1t )(γ − 1)−1ηx∂η(a2)
m5 = (µPr−1 + µtPr−1t )(γ − 1)−1ηy∂η(a2)
2.2 The Spalart–Allmaras Turbulence Model
The dynamic eddy viscosity, µt, in (2.6), accounts for the effects of turbulence. The Spalart–Allmaras
turbulence model [285] is used to determine the value of µt. This one-equation transport model, written
in non-dimensional and non-conservative form, is given by
Dν
Dt=cb1Re
(1− ft2) Sν +1
σRe
(1 + cb2)∇ · [(ν + ν)∇ν]− cb2 (ν + ν)∇2ν
− 1
Re
(cw1fw −
cb1κ2ft2
)( ν
dw
)2
+Reft1∆U2 (2.21)
where ν is the non-dimensional working variable. The kinematic eddy viscosity, νt = µt/ρ, is obtained
from
νt = νfv1 (2.22)
where
fv1 =χ3
χ3 + c3v1
(2.23)
and
χ =ν
ν(2.24)
The production term is given by
S = S Re+ν
κ2d2w
fv2 (2.25)
2.2 THE SPALART–ALLMARAS TURBULENCE MODEL 24
where
S =
∣∣∣∣∂v∂x − ∂u
∂y
∣∣∣∣ (2.26)
is the magnitude of the vorticity, dw is the distance to the closest wall and
fv2 = 1− χ
1 + χfv1(2.27)
The destruction function is given by
fw = g
[1 + c3w3
g6 + c6w3
] 16
(2.28)
where
g = r + cw2(r6 − r) (2.29)
and
r =ν
Sκ2d2w
(2.30)
The functions ft1 and ft2 control transition. For fully-turbulent flow, these functions are zero. For flow
with transition, these functions become
ft1 = ct1gt exp
[−ct2
ω2t
∆U2
(d2 + g2
t d2t
)](2.31)
ft2 = ct3 exp(−ct4χ2
)(2.32)
where dt is the distance to the nearest trip point, ωt is the vorticity at the wall at the trip point,
∆U is the difference between the velocity at the trip point and the field point under consideration,
and gt = min(
0.1, |∆U |ωt∆x
), where ∆x is the spacing along the wall at the trip point. The remaining
parameters are given by
cb1 = 0.1355 cb2 = 0.622 κ = 0.41 σ =2
3
cw1 =cb1κ2
+(1 + cb2)
σ
cw2 = 0.3 cw3 = 2.0 cv1 = 7.1 cv2 = 5.0
ct1 = 5 ct2 = 2 ct3 = 1.2 ct4 = 0.5
Details about these parameters can be found in [286]. Currently, the algorithm presented is for fully-
turbulent flow.
Ashford’s [287] suggested modifications to the Spalart–Allmaras turbulence model are implemented.
The quantity fv2 is redefined as
fv2 =
(1 +
χ
cv2
)−3
(2.33)
and a new quantity, fv3, is created. It is given by
fv3 =(1 + χfv1) (1− fv2)
χ(2.34)
2.3 BOUNDARY CONDITIONS 25
The modified production term, S, is given by
S = SRefv3 +ν
κ2d2w
fv2 (2.35)
2.2.1 Generalized Curvilinear Coordinate Transformation
In the transformation of (2.21), any terms containing mixed derivatives are neglected. Using (2.12), the
Spalart–Allmaras turbulence model becomes
∂ν
∂τ+ U
∂ν
∂ξ+ V
∂ν
∂η=
1
Re
cb1Sν − cw1fw
(ν
dw
)2
+1
σ[(1 + cb2)T1 − cb2T2]
(2.36)
where
T1 = ξx∂
∂ξ
[(ν + ν) ξx
∂ν
∂ξ
]+ ηx
∂
∂η
[(ν + ν) ηx
∂ν
∂η
]+ ξy
∂
∂ξ
[(ν + ν) ξy
∂ν
∂ξ
]+ ηy
∂
∂η
[(ν + ν) ηy
∂ν
∂η
](2.37)
and
T2 = (ν + ν)×[ξx∂
∂ξ
(ξx∂ν
∂ξ
)+ ηx
∂
∂η
(ηx∂ν
∂η
)+ ξy
∂
∂ξ
(ξy∂ν
∂ξ
)+ ηy
∂
∂η
(ηy∂ν
∂η
)](2.38)
2.3 Boundary Conditions
The boundary conditions must be specified for the entire computational domain. Examples of bound-
ary conditions include: inflow, outflow, body, and boundary interface. Inflow and outflow boundary
calculations are performed using Riemann invariants and/or extrapolations.
For inviscid flow, flow tangency is enforced at a solid wall. For viscous flow, a no-slip condition is
required. Consequently, the normal pressure gradient is set to zero. With the assumption of adiabatic
flow, the latter condition enforces a zero normal density gradient.
Finally, for a C-topology mesh wake cut, the interfaces are averaged in the normal direction. Each
conservative variable is averaged except for the energy: pressure is averaged instead.
For lifting bodies a circulation correction is used to minimize the far-field boundary effects. The
details of this correction are described by Pulliam [288].
Numerical implementation of the boundary conditions is described in Section 3.3.
2.4 The Convection-Diffusion Equation
The linear convection-diffusion equation describes the evolution of a scalar quantity φ subject to the
processes of convection and diffusion. If a source term is also considered, then
∂φ
∂t+ ~∇ ·
[~vφ− µ~∇φ
]= f (2.39)
2.4 THE CONVECTION-DIFFUSION EQUATION 26
The divergence operator acts on two terms. The first term describes convection and therefore contains a
velocity vector, ~v. The second term models diffusion. In this formulation, it contains a spatially-varying
diffusion coefficient, µ > 0. Finally, f is a source term.
For fixed µ and ~v, the Peclet number is defined as
Pe =|~v|Lµ
(2.40)
where L is a length scale of the problem. It essentially is a relative measure of convection to diffusion.
For convection-dominated flows, Pe→∞, and for diffusion-dominated flows, Pe→ 0.
2.4.1 The Steady 1D Convection-Diffusion Equation
Consider the steady, one-dimensional convection-diffusion equation with no source term:
udφ
dx− µd2φ
dx2= 0 (2.41)
Furthermore, assume that the velocity and the diffusion coefficient are constant. The solution of (2.41)
on the domain x ∈ [0, 1] with Dirichlet boundary conditions φ(0) = 0 and φ(1) = 1 is
φ(x) =ePe·x − 1
ePe − 1(2.42)
where the Peclet number is Pe = uµ . Figure 2.3 shows this solution for various Peclet numbers. When
the Peclet number is small, the solution is dominated by diffusion. When the Peclet number is large,
there is a convection-dominated solution with a thin boundary-layer-like region of diffusion.
2.4.2 The Steady 2D Convection-Diffusion Equation
The steady convection-diffusion equation (2.39) in 2D Cartesian coordinates is
∂
∂x
(a∂φ
∂x
)+
∂
∂y
(b∂φ
∂y
)+
∂
∂x(cφ) +
∂
∂y(dφ) + eφ = f (2.43)
where the coefficients a, b, c, d, e, and f are functions of x and y. In particular, the terms containing a
and b model the process of diffusion, and the terms containing c and d model the process of convection.
The velocity field, ~v = [c, d], is assumed to be divergence free
∂c
∂x+∂d
∂y= 0 (2.44)
and a = b = µ(x, y). The term eφ is included as a generalization. Dirichlet boundary conditions are
defined in the upstream direction.
2.4 THE CONVECTION-DIFFUSION EQUATION 27
0 0.5 10
0.2
0.4
0.6
0.8
1
x
φ(x
)
Pe = 0.01
0 0.5 10
0.2
0.4
0.6
0.8
1
x
φ(x
)
Pe = 1
0 0.5 10
0.2
0.4
0.6
0.8
1
x
φ(x
)
Pe = 10
0 0.5 10
0.2
0.4
0.6
0.8
1
x
φ(x
)
Pe = 100
Figure 2.3: The solution to the 1D convection-diffusion equation for several Peclet numbers.
Coordinate transformation
A curvilinear coordinate transformation is also used here to transform (2.43) from the physical domain
(x,y) to a uniform, computational domain (ξ,η). Using the chain rule, the first derivatives can be written
as
∂
∂x= ξx
∂
∂ξ+ ηx
∂
∂η(2.45)
∂
∂y= ξy
∂
∂ξ+ ηy
∂
∂η(2.46)
The work of Pulliam [288] is followed to obtain the metrics of the transformation:
ξx = Jyη (2.47)
ηx = −Jyξ (2.48)
ξy = −Jxη (2.49)
ηy = Jxξ (2.50)
where J is the metric Jacobian of the transformation defined in (2.14).
The terms in (2.43) are written in the computational domain’s coordinates using (2.47-2.50). For
2.4 THE CONVECTION-DIFFUSION EQUATION 28
simplicity, it helps to consider the diffusive terms together and the convective terms together:
∂
∂x
(a∂φ
∂x
)+
∂
∂y
(b∂φ
∂y
)=
∂
∂ξ
(a∂φ
∂ξ
)+
∂
∂ξ
(g∂φ
∂η
)+∂
∂η
(g∂φ
∂ξ
)+
∂
∂η
(b∂φ
∂η
)(2.51)
∂
∂x(cφ) +
∂
∂y(dφ) =
∂
∂ξ(cφ) +
∂
∂η
(dφ)
(2.52)
where
a = ξ2xa+ ξ2
yb (2.53)
b = η2xa+ η2
yb (2.54)
g = ξxηxa+ ξyηyb (2.55)
and
c = ξxc+ ξyd (2.56)
d = ηxc+ ηyd (2.57)
are the contravariant velocities.
Chapter 3
SPATIAL DISCRETIZATION
The thin-layer compressible Navier–Stokes equations are a system of nonlinear partial differential equa-
tions. When discretized in space, they yield a system of nonlinear ordinary differential equations (ODEs).
This chapter deals with the spatial discretization of these equations along with the Spalart–Allmaras
turbulence model. The spatial discretization follows the work of the NASA Ames ARC2D [288] algo-
rithm along with several others, of which most relevant are Nemec [252] and Chisholm [4]. Further
details about the discretization of the Spalart–Allmaras turbulence model are described in [285].
Newton’s method is used to solve the nonlinear system of equations. An approximation to the
Jacobian of the nonlinear system is computed to facilitate the Newton algorithm and the preconditioner
of the subsequent linear system. The linearization of the nonlinear system used to create the Jacobian
follows the work of Nemec [252] and Chisholm [4].
For the convection-diffusion equation a spatial discretization is used for convective and diffusive
derivatives that is analogous to the inviscid and viscous derivatives in the Navier–Stokes equations. The
operators are found in Pulliam’s report [288].
29
3.1 THE NAVIER–STOKES EQUATIONS 30
3.1 The Navier–Stokes Equations
For inviscid fluxes, a second-order centered-difference operator with second- and fourth-difference scalar
artificial dissipation is used. For the ξ direction,
∂E
∂ξ≈ Ej+1,k − Ej−1,k
2−∇ξAD (3.1)
where
AD = d(2)
j+ 12 ,k
∆ξ Qj,k − d(4)
j+ 12 ,k
∆ξ∇ξ∆ξ Qj,k (3.2)
d(2)
j+ 12 ,k
= 2(ε σ J−1
)j+ 1
2 ,k(3.3)
d(4)
j+ 12 ,k
= max[0, 2κ4
(σ J−1
)j+ 1
2 ,k− d(2)
j+ 12 ,k
](3.4)
σj,k = |U |+ a√ξ2x + ξ2
y (3.5)
εj,k = κ2
[0.5Υ∗j,k + 0.25
(Υ∗j−1,k + Υ∗j+1,k
)](3.6)
Υ∗j,k = max (Υj+1,k,Υj,k,Υj−1,k) (3.7)
Υj,k =|pj+1,k − 2pj,k + pj−1,k||pj+1,k + 2pj,k + pj−1,k|
(3.8)
and ∆ξ and ∇ξ are the first-order forward and backward difference operators. The artificial dissipation
constants are κ2 and κ4. The coefficient κ4 is usually much smaller than κ2. For example, they can have
values of 0.01 and 1.0 respectively. The spectral radius of the flux Jacobian matrix is given by σ. The
pressure switch, Υ(j, k) is used to control the use of first-order dissipation in the presence of shock waves.
Values at half-nodes are averages along the direction of the required derivative. The dissipation stencil
requires two points at both sides of the interior grid node quantity that is being differenced. Therefore,
modifications must be made to this stencil at the first and last interior nodes. The modifications to the
stencil can be found in [252,288].
The above process is repeated for the inviscid flux derivative in the η direction, however (3.7) is not
used.
The viscous terms in the thin-layer compressible Navier–Stokes equations resemble
∂η (αj,k ∂ηβj,k) (3.9)
and are discretized using a compact, three-point stencil
∇η (αj,k+ 12
∆ηβj,k) = αj,k+ 12(βj,k+1 − βj,k)− αj,k− 1
2(βj,k − βj,k−1) (3.10)
3.2 The Spalart–Allmaras Turbulence Model
The Spalart–Allmaras turbulence model is presented in the steady-state form
J−1 [M(ν)− P (ν) +D(ν)−N(ν)] = 0 (3.11)
3.3 BOUNDARY CONDITIONS 31
where J−1 is the metric Jacobian in (2.14). The terms M(ν), P (ν), D(ν), and N(ν) are the convective,
production, destruction, and diffusive terms respectively. Without considering transition, the terms are
listed as follows:
M(ν) = U∂ν
∂ξ+ V
∂ν
∂η(3.12)
P (ν) =cb1Re
Sν (3.13)
D(ν) =cw1fwRe
(ν
dw
)2
(3.14)
N(ν) =1
σRe[(1 + cb2)T1 − cb2T2] (3.15)
The production and destruction terms are source terms and therefore do not require differencing. The
convective term is differentiated using a first-order upwind difference. For example, in the ξ direction
M(ν)j,k =1
2(Uj,k + |Uj,k|)(νj,k − νj−1,k) +
1
2(Uj,k − |Uj,k|)(νj+1,k − νj,k) (3.16)
A similar term is formed for the η direction. The diffusive term is differentiated using (3.10), since it
resembles the viscous terms. Finally, the vorticity is approximately computed using centered differences
S ≈ 1
2
∣∣∣∣∣(vj+1,k − vj−1,k)(ξx)j,k + (vj,k+1 − vj,k−1)(ηx)j,k
− (uj+1,k − uj−1,k)(ξy)j,k − (uj,k+1 − uj,k−1)(ηy)j,k
∣∣∣∣∣ (3.17)
3.3 Boundary Conditions
The normal and tangential velocities are required when computing boundary conditions. The normal
velocity is perpendicular to the boundary. The normal and tangential directions are increase along the
respective ξ and η directions at each boundary individually. Figure 3.1 illustrates these directions.
According to this convention, the normal and tangential velocity components are expressed differently
for the various boundaries. At k = 1 and k = kmax, we have:
Vn =ηxu+ ηyv√η2x + η2
y
= ηxu+ ηyv (3.18)
Vt =ηyu− ηxv√η2x + η2
y
= ηyu− ηxv (3.19)
At j = 1 and j = jmax, we have:
Vn =ξxu+ ξyv√ξ2x + ξ2
y
= ξxu+ ξyv (3.20)
Vt =−ξyu+ ξxv√
ξ2x + ξ2
y
= −ξyu+ ξxv (3.21)
3.3 BOUNDARY CONDITIONS 32
η
ξ
ξ
η
t
n
y
x
n tn
t
t
n
OUTFLOW
FAR−FIELD
WAKE−CUT
Figure 3.1: Normal and tangential directions at the boundaries.
3.3.1 Airfoil Body
At an airfoil body, the boundary condition has two possibilities depending on whether the flow is inviscid
or viscous. For inviscid flow, flow tangency is enforced. Therefore, the normal velocity component is
zero. The tangential velocity component and pressure are extrapolated from the interior. The stagnation
enthalpy is set to the freestream value.
For viscous flow, a no-slip condition is required. Hence, u = 0 and v = 0. Subsequently, both
components of momentum at the body are zero. The normal pressure gradient component is set to zero.
Furthermore, the flow is assumed to be adiabatic and act as a perfect gas. This results in a flow that
has a zero normal density gradient component [289, 290]. For the Spalart–Allmaras turbulence model,
the turbulent state variable, ν, is set to zero. The equations are summarized as:
ρj,1 − ρj,2 = 0 (3.22)
(ρu)j,1 = 0 (3.23)
(ρv)j,1 = 0 (3.24)
pj,1 − pj,2 = 0 (3.25)
νj,1 = 0 (3.26)
3.3.2 Inflow and Outflow Boundaries
At the far-field boundary, extrapolations are performed using Riemann invariants. Depending on whether
the flow is subsonic or supersonic, certain extrapolations are performed from the interior. A complete
3.4 THE JACOBIAN OF THE NONLINEAR SYSTEM 33
discussion can be found in the work of Nemec [252] or Pueyo [5].
For subsonic inflow, (Vn −
2a
γ − 1
)j,kmax
−(Vn −
2a
γ − 1
)∞
= 0 (3.27)(Vn +
2a
γ − 1
)j,kmax
−(Vn +
2a
γ − 1
)j,kmax−1
= 0 (3.28)(ργ
p
)j,kmax
− S∞ = 0 (3.29)
(Vt)j,kmax− (Vt)∞ = 0 (3.30)
νj,kmax− ν∞ = 0 (3.31)
where ν∞=0.001. For subsonic outflow,(Vn −
2a
γ − 1
)j,kmax
−(Vn −
2a
γ − 1
)∞
= 0 (3.32)(Vn +
2a
γ − 1
)j,kmax
−(Vn +
2a
γ − 1
)j,kmax−1
= 0 (3.33)(ργ
p
)j,kmax
−(ργ
p
)j,kmax−1
= 0 (3.34)
(Vt)j,kmax− (Vt)j,kmax−1
= 0 (3.35)
νj,kmax− νj,kmax−1
= 0 (3.36)
For viscous outflow, the a zeroth-order extrapolation is used
ρ1,k − ρ2,k = 0 (3.37)
(ρu)1,k − (ρu)2,k = 0 (3.38)
(ρv)1,k − (ρv)2,k = 0 (3.39)
p1,k − p2,k = 0 (3.40)
ν1,k − ν2,k = 0 (3.41)
3.3.3 Wakecut Interface
For a C-topology grid, the conservative flow variables and the turbulent working variable are averaged
across the wakecut using
Qj,1 −1
2(Qj,2 +Qjmax−j+1,2) = 0 (3.42)
3.4 The Jacobian of the Nonlinear System
Newton’s method normally requires the formation of the Jacobian of the nonlinear system of equations
that arises after the spatial discretization. When coupled with a Krylov subspace iterative method
3.4 THE JACOBIAN OF THE NONLINEAR SYSTEM 34
that solves the resulting linear system of equations, the Jacobian does not have to be explicitly formed.
However, an approximation to the Jacobian is needed in the continuation of the Newton algorithm
(especially for turbulent flows) and to construct a preconditioner.
Nemec [252] uses a second-order and a first-order approximation to the Jacobian. The former is used
in his implementation of a discrete-adjoint optimization algorithm. The incomplete factorization of the
latter is used in the construction of the preconditioner of the linear system. The approach follows the
novel work by Pueyo and Zingg [238].
For this research, the second-order Jacobian is used mainly as an analysis tool. For example, its eigen-
values are studied along with the eigenvalues of related iteration matrices. It has a nine-point stencil.
The first-order Jacobian is used to construct the baseline incomplete LU-factorization preconditioner of
the linear system in Newton’s method. It is obtained by collapsing the nine-point, fourth-difference dissi-
pation stencil onto the five-point stencil that contains the second-difference dissipation. The relationship
is given by
εl2 = εr2 + σεr4 (3.43)
where σ is a parameter. The method is described in detail by Pueyo and Zingg [238].
The Jacobian has components relating to the interior nodes and the boundary nodes. Body, far-
field, outflow and wake-cut boundaries must be treated individually. Furthermore, the Spalart-Allmaras
turbulence model is also linearized.
For the remainder of this dissertation, the first- and second-order Jacobians will be
referred to as A1 and A2, respectively. Figure 3.2 shows example A1 and A2 Jacobians arising
from a very coarse grid. The sparsity patterns depend on the ordering of the grid nodes. In these plots
a natural ordering is used where the nodes are ordered along the η-direction first and then along the
ξ-direction of the computational grid. For this ordering, there are pronounced entries that correspond
to the wake cut. The bandwidth is extremely large for this particular type of ordering. Furthermore,
there is a block-pentadiagonal component for A2 compared to a block-tridiagonal component for A1 at
the main diagonal. The former is smaller due to the collapsed stencil (3.43). Ordering plays a central
part in this research and its discussion is deferred to a later chapter.
The block entries in the Jacobian for a given node contain entries relating to the mean-flow (i.e.
mass, momentum, and energy) and the turbulence model equations. Zero values on the diagonals of the
diagonal block entries of Jacobian are possible due to the linearization of the boundary conditions for the
mean-flow equations. For precautionary reasons (relating later to the formation of the preconditioner),
the rows of the Jacobian (for a given node) are exchanged in such a manner that the diagonal entry is
nonzero. It can be shown that this is always possible for the linearization used in this research. For
further details, see Pueyo [5].
3.5 THE CONVECTION-DIFFUSION EQUATION 35
(a) A1 Jacobian (b) A2 Jacobian
Figure 3.2: Sparsity pattern of sample A1 and A2 Jacobians using a natural ordering.
3.5 The Convection-Diffusion Equation
3.5.1 The Grid Peclet Number
In the convection-diffusion equation (2.39), the spatial derivatives are first and second order. The
former correspond to the convective terms in the equation. The stability of the numerical discretization
depends greatly on how these convective terms are modeled. As the Peclet number increases, the relative
importance of convection increases.
A more effective quantity that is used to define a stability limit on the discretization is the grid Peclet
number. Consider the 1D convection-diffusion equation. The grid Peclet number [291] is defined as
Peh =|~v|hµ
(3.44)
where h is the grid spacing.
Second-order centered differences require Peh < 2 for the entire computational domain to avoid
oscillations. Both second-order centered differences with added scalar artificial dissipation and first-
order upwinding provide results with fewer oscillations for Peh > 2 than second-order centered differences
alone.
Figure 3.3 shows the solution to the 1D convection-diffusion equation using second-order centered
differences, first-order upwinding, and second-order centered differences with scalar artificial dissipation.
The second-difference artificial dissipation model is defined in the next subsection and its coefficient is set
to 0.5 for this example. Figures 3.3(a)-3.3(c) show solutions of the 1D convection-diffusion equation for
various grid Peclet numbers. Specifically, Figure 3.3(d) illustrates that second-order centered differences
with artificial dissipation can be used to extend the stability of second-order centered-differences beyond
a grid Peclet number of 2. For this particular dissipation model (and dissipation coefficient) that is
3.5 THE CONVECTION-DIFFUSION EQUATION 36
0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
φ(x
)
Exact
Conv. Cen.
Conv. Upw.
Conv. Cen. w/ Diss. ε =0.5
(a) Peh = 0.1 (Pe = 20)
0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
φ(x
)
Exact
Conv. Cen.
Conv. Upw.
Conv. Cen. w/ Diss. ε =0.5
(b) Peh = 0.5 (Pe = 100)
0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
φ(x
)
Exact
Conv. Cen.
Conv. Upw.
Conv. Cen. w/ Diss. ε =0.5
(c) Peh = 1 (Pe = 200)
0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
x
φ(x
)
Exact
Conv. Cen.
Conv. Upw.
Conv. Cen. w/ Diss. ε =0.5
(d) Peh = 2.5 (Pe = 500)
Figure 3.3: Close-up view of the numerical solution to the 1D convection-diffusion equation for various
grid Peclet numbers on a 101-node computational grid.
added to the second-order centered-differences discretization, the results are consistent with first-order
upwinding. Table 3.1 shows the solution errors for the cases presented.
3.5.2 The 2D Convection-Diffusion Equation
The computational domain (ξ, η) is uniform with a grid-spacing of unity (∆ξ = ∆η = 1). Finite
differences are used to model each of the derivatives. For the diffusion terms, a compact second-order
centered-difference scheme is used. For the convection terms, a second-order centered-difference scheme
with artificial dissipation is used.
The following discretization is used to approximate each of the terms in (2.43). The differencing
3.5 THE CONVECTION-DIFFUSION EQUATION 37
Table 3.1: Errors for various Peclet numbers for various discretizations of the 1D convection-diffusion
equation on a uniform 101-node computational grid.
Error (||φ− φexact||)Pe Peh Centered Differences Centered Differences w/ ε = 0.5
20 0.1 0.0037 0.1059
100 0.5 0.0446 0.1976
200 1.0 0.1366 0.2217
500 2.5 0.4804 0.1624
stencils that are used for the diffusive terms are:
∂
∂ξ
(a∂φ
∂ξ
)≈ aj+ 1
2 ,k(φj+1,k − φj,k)− aj− 1
2 ,k(φj,k − φj−1,k) (3.45)
∂
∂ξ
(b∂φ
∂ξ
)≈ bj,k+ 1
2(φj,k+1 − φj,k)− bj,k− 1
2(φj,k − φj,k−1) (3.46)
∂
∂ξ
(g∂φ
∂η
)≈ 1
4[gj+1,kφj+1,k+1 − gj+1,kφj+1,k−1 − gj−1,kφj−1,k+1 + gj−1,kφj−1,k−1] (3.47)
∂
∂η
(g∂φ
∂ξ
)≈ 1
4[gj,k+1φj+1,k+1 − gj,k+1φj−1,k+1 − gj,k−1φj+1,k−1 + gj,k−1φj−1,k−1] (3.48)
where
aj+ 12 ,k
=aj,k + aj+1,k
2(3.49)
aj− 12 ,k
=aj,k + aj−1,k
2(3.50)
bj,k+ 12
=aj,k + aj,k+1
2(3.51)
bj,k− 12
=aj,k + aj,k−1
2(3.52)
For convection, a second-order central-differencing stencil with artificial dissipation is used
∂
∂ξ(cφ) ≈
[(cφ)j+1,k − (cφ)j−1,k
2
]−Dξ (3.53)
∂
∂η
(dφ)≈
[(dφ)j,k+1 − (dφ)j,k−1
2
]−Dη (3.54)
where
Dξ = εcj,k|cj,k|
[(cφ)j+1,k − 2(cφ)j,k + (cφ)j−1,k] (3.55)
Dη = εdj,k∣∣dj,k∣∣ [
(dφ)j,k+1 − 2(dφ)j,k + (dφ)j,k−1
](3.56)
3.5 THE CONVECTION-DIFFUSION EQUATION 38
and ε is the artificial dissipation coefficient. For a uniform square mesh with a constant velocity and
diffusion field, if ε = 0.5, then second-order centered differences with artificial dissipation produces the
same operator as first-order upwinding.
For boundaries that are not prescribed by a Dirichlet condition, the difference stencils for the various
derivatives are adjusted accordingly. The convection derivatives are modeled by first-order upwinding,
and the diffusion derivatives are adjusted in locations where the use of a half node is not necessary.
3.5.3 The Jacobian of the Discretized Equations
The use of an incomplete factorization of the Jacobian matrix as a preconditioner is an important area
of interest in this research. The sparsity pattern of the Jacobian of the discretized convection-diffusion
equation is similar to the Jacobian shown in Figure 3.2(b) (where block entries are replaced by scalars
and without the wake-cut interface). Since the matrix is banded and not triangular, any factorization
will introduce fill-in outside of the sparsity pattern of the original matrix. Hence, the structure of the
Jacobian is ideal for the study of incomplete factorizations. In contrast, if an upwinding discretization is
used, the resulting Jacobian is triangular and its factorization has the same triangular sparsity pattern.
Incomplete factorizations of the Jacobian are discussed in great detail in Chapter 5.
Chapter 4
SOLUTION ALGORITHM
The spatial discretization results in a nonlinear system of ordinary differential equations in time. Since
steady-state calculations are of interest it would initially make sense to discard the time derivatives
and solve the nonlinear system of algebraic equations. Newton’s method is used to solve the resulting
nonlinear system of equations. However, a pseudo-transient continuation method (e.g. implicit Euler) is
applied to ensure a robust overall algorithm. In this work, each iteration of Newton’s method is referred
to as an outer iteration.
At each Newton iteration, a large and sparse linear system of equations must be solved. The gener-
alized minimum residual (GMRES) Krylov subspace method is used to solve this system. Each iteration
of GMRES is referred to as an inner iteration. The conditioning of this system is poor. Precondition-
ing is used to improve the conditioning of the linear system and hence the performance of GMRES.
Preconditioning is discussed in detail in the next chapter.
39
4.1 SOLVING THE NONLINEAR SYSTEM: NEWTON’S METHOD 40
4.1 Solving the Nonlinear System: Newton’s Method
The discretized, steady, compressible Navier–Stokes equations is a system of nonlinear equations that
can be written in the form
R(Q) = 0 (4.1)
where R is a residual vector and Q defines the state, or flow, variables.
Newton’s method is used to solve (4.1) for Q. At the future iteration k + 1
Rk+1 ≡ R(Qk+1) = 0 (4.2)
is desired. Using the first-order linearization
Rk+1 ≈ Rk +
(∂R
∂Q
)k
∆Qk = 0 (4.3)
a Newton iteration consists of solving (∂R
∂Q
)k
∆Qk = −Rk (4.4)
for ∆Qk and updating the current state variables, Qk, with
Qk+1 = Qk + ∆Qk (4.5)
4.2 Newton Globalization: Pseudo-Transient Continuation
The reference Newton–Krylov code [252] that is used for this research is dependent on an approximately-
factored [288] continuation algorithm. This essentially means that any necessary modifications to the
formulation must be done to two distinct algorithms. Hence, an effective continuation technique for
Newton’s method eliminates the need to maintain two algorithms.
The continuation procedure follows the work of Hicken and Zingg [292]. It is divided into two phases.
For the first kstart Newton iterations, a reference time step is calculated using the power-law function
∆tkref = abk (4.6)
where a and b are parameters. Before introducing the second phase, the ratio of the norms of the
nonlinear residuals at Newton iterations k and 1 is defined as
Rkv ≡||Rk||2||R1||2
(4.7)
The reference time step for the second phase is given by
∆tkref = max(α(Rkv)−β
,∆tk−1ref
)(4.8)
4.2 NEWTON GLOBALIZATION: PSEUDO-TRANSIENT CONTINUATION 41
where α = abkstart(Rkstartv
)β, and β is a parameter.
For the mean-flow equations, the geometric time step is given by
∆tkq =∆tkref
1 +√J
(4.9)
The geometric time step for turbulent calculations is given by
∆tkν = τ∆tkq (4.10)
where τ is a constant. The time step vector at each node is given by
∆t =(
∆tkq ∆tkq ∆tkq ∆tkq ∆tkν
)T
(4.11)
For the remainder of the this discussion, the index k denoting the Newton iteration will be ignored.
Once the time step is computed for the mean flow equations and the turbulence model, the Newton
system is modified to that of implicit Euler’s method,(∂R
∂Q+
1
∆tI)k
∆Qk = −Rk (4.12)
where I is the identity matrix. The operation 1∆tI is performed on an element-by-element basis of
(4.11). For the boundary equations, ∆t is infinite.
It has been shown that the enforcement of a positive turbulent working variable is important in
terms of the stability of the Spalart–Allmaras turbulence model for various solution methods. Ilinca and
Pelletier [293] and Chisholm and Zingg [244] investigate this significant aspect of the turbulence model
solution algorithm.
A Jacobian-free version of Newton’s method is used, although for the early stages of the continuation
method an approximate–Jacobian-present approach can also be employed [149,237,294]. The details are
discussed in the subsequent section. Essentially, the linear system does not require the explicit formation
of the Jacobian on the left hand side of the Newton iteration system. Since the Jacobian is not explicitly
formed, it is not possible to modify the actual Jacobian to ensure a positive update for ν. A stabilization
technique is used here that follows the method presented by Chisholm and Zingg [149].
Consider a single equation for the working turbulent variable ν in (4.12). Furthermore, approximate
the row of the Jacobian, ∂R∂Q , by its diagonal, Jd. Let Rd be the corresponding scalar residual on the
right hand side. The resulting equation is(1
∆tν+ Jd
)∆ν = −Rd (4.13)
If the time step is infinite,
∆ν = −RdJd
(4.14)
Based on this limiting condition, Chisholm suggests
|∆ν| < |r|max(ν, 1) (4.15)
4.3 SOLVING THE LINEAR SYSTEM: GMRES KRYLOV SUBSPACE METHOD 42
Table 4.1: Continuation parameters for Newton’s method.
Parameter Inviscid Laminar Turbulent Turbulent
Subsonic Transonic
kstart 5 5 30 30
a 1 1 0.1 0.1
b 1.8 1.8 1.2 1.15
β 2.0 2.0 1.1 1.1
to keep ν positive and therefore stable, where r is a constant. Applying (4.15) to (4.13) gives(1
∆tν+ Jd
)|r|max(ν, 1) = −Rd (4.16)
and, when isolated,
∆tν =
[−Rd
|r|max(ν, 1)sign(∆ν)− Jd
]−1
(4.17)
Hence, if the proposed turbulent equation time step (4.10) violates condition (4.15), the more stable
time step of (4.17) is used. The values used for τ and r are 1 and 0.4, respectively.
The values for all continuation parameters for all cases are given in Table 4.1.
4.3 Solving the Linear System: GMRES Krylov Subspace Method
Equation (4.12) is a linear system that can be solved either directly, using for example LU decomposition,
or iteratively using one of several methods. Directly solving (4.12) for ∆Qk would require an enormous
computational effort for large systems. Hence, an iterative method is used to solve the system. There are
popular algorithms that can be used to solve this nonsymmetric system including CGS [15], BiCGStab
[16], GCROT [17], and GMRES [7]. The GMRES Krylov subspace method is used in this research.
The following sections briefly outline the basic theory of projection methods, Krylov subspace meth-
ods, and GMRES. Furthermore, a detailed explanation of the GMRES algorithm is presented, along
with practical aspects such as implementation, convergence acceleration (i.e. preconditioning) and a
convergence bound estimate. Further details can be found in [7, 38, 295, 296]. Other theoretical aspects
associated with Newton–Krylov methods in general can be found in [297–299].
4.3.1 Projection Methods
The objective of a projection method is to obtain an approximate solution of the linear system
Ax = b (4.18)
4.3 SOLVING THE LINEAR SYSTEM: GMRES KRYLOV SUBSPACE METHOD 43
where A ∈ Rn×n. The system given in (4.12) is of this form. The approximate solution, x, is contained
in subspace K of dimension m ≤ n. A constraint subspace, L, also of dimension m, is defined and the
residual
r = b−Ax (4.19)
is made orthogonal to it. Choosing K = L results in an orthogonal projection method and K 6= L, yields
an oblique projection method. With an initial guess, x0, the projection method formulation is:
Find x ∈ x0 +K such that b−Ax ⊥ L. (4.20)
If bases V = [v1 . . . vm] ∈ Rn×m and W = [w1 . . . wm] ∈ Rn×m are chosen for K and L, respectively,
then the problem statement is:
Find x = x0 + Vy such that WT (r0 −AVy) = 0. (4.21)
The vector r0 denotes the initial residual corresponding to x0. The projection method takes r0 and
makes an orthogonal projection with respect to the constraint subspace, L, which is the next residual,
r. The problem statement is equivalent to finding
x = x0 + V(WTAV
)−1WT r0 (4.22)
which means that having an invertibleWTAV ∈ Rm×m matrix is a necessary condition for any projection
method. When the matrix A is known to be positive definite, it can be shown that it is sufficient to have
W = V (i.e. L = K) to ensure that (WTAV)−1 exists. However, if it is only known that the matrix Ais invertible, then W = AV (i.e. L = AK). The latter approach is an example of an oblique projection
method.
4.3.2 GMRES Algorithm
A Krylov subspace method is an oblique projection method that uses the Krylov subspace for K. A
Krylov subspace of dimension m is
Km(A; b) = spanb,Ab,A2b, . . . ,Am−1b (4.23)
If qm−1(A) denotes a polynomial of degree m− 1 then the iterate xm is a polynomial approximation to
the exact solution
x = A−1b ≈ xm = x0 + qm−1(A) b (4.24)
The generalized minimal residual (GMRES) Krylov subspace method is a popular approach that is
used to solve linear systems where the system matrix A is nonsymmetric. Hence, an oblique projection
method is used, and the basis for the constraint subspace is given by Wm = AVm. The algorithm
consists of four basic components: initialization, Krylov subspace orthogonalization, solving a least-
squares problem, and updating the solution. The orthogonalization and least-squares problem solution
4.3 SOLVING THE LINEAR SYSTEM: GMRES KRYLOV SUBSPACE METHOD 44
occurs as each new Krylov subspace direction is introduced. The following is an outline of the basic
GMRES algorithm:
1. Start: Choose x0, compute r0 = b−Ax0 and v1 = r0/β where β = ‖r0‖2.
2. Iterate: For m = 1, 2, . . .
Precondition the search direction vector, vm, and generate the next Krylov subspace search
direction, vm+1, by the Arnoldi orthogonalization process and form the next column of the
upper-Hessenberg matrix, Hm ∈ R(m+1)×m
wm = Avm
hi,m = wTmvi, ∀ i = 1, 2, . . . ,m
vm+1 = wm −m∑i=1
hi,mvi
hm+1,m = ‖vm+1‖2
vm+1 = vm+1/hm+1,m
Solve the least squares problem
ym = argminy||βe1 − Hy||2
where the minimum value is ρm and e1 denotes the first column of the identity matrix
I ∈ R(m+1)×(m+1). Note the function being minimized is in fact the residual, ||rm||2. The
QR factorization algorithm is quite effective in converting Hm into an upper-triangular form,
making the minimization problem quite inexpensive.
If ρm ≤ ηk‖r0‖2 then exit loop. Note ηk is a relative tolerance that is defined in the outer
iteration.
3. Update the solution: Update the solution xm = x0 + Vm ym.
4.3.3 Convergence of GMRES
Consider an arbitrary diagonalizable n × n matrix A = XΛX−1. If λ1, . . . , λv are the eigenvalues of Awith non-positive real parts and λv+1, . . . , λn are the rest of the eigenvalues that are bound in a circle
centered at C > 0 with radius R < C, then Christara [295] states the residual of GMRES, at iteration
m, satisfies
||rm||2||r0||2
≤ ‖ X ‖2 ‖ X−1 ‖2(R
C
)m−vn
maxj=v+1
v∏i=1
|λi − λj ||λi|
(4.25)
This upper convergence bound for GMRES is related to the conditioning of A. It is precisely related to
the condition number of the matrix of eigenvectors, X . The condition number is inherently related to
the convergence of GMRES as well as other iterative methods.
4.3 SOLVING THE LINEAR SYSTEM: GMRES KRYLOV SUBSPACE METHOD 45
4.3.4 Practical Aspects of the Newton–GMRES Algorithm
If A is invertible, the GMRES algorithm will converge fully in at most n iterations. However, this would
be a very slow and expensive process since n search directions and minimization problems would be
solved, not to mention the enormous storage cost. The storage cost for the search directions alone, would
amount to the same storage cost of a dense Rn×n matrix! Therefore aspects such as preconditioning and
restarting are necessary to make the GMRES algorithm more robust, less expensive on memory, and
faster. Also, one generally does not need to converge the solution fully.
Jacobian-free GMRES
In matrix-present form, the matrix A is explicitly formed and stored. For the compressible Navier–
Stokes equations, this computation is difficult and expensive. Since GMRES only requires the product
of A with a vector, v, the formation of A (i.e. the Jacobian plus a diagonal continuation matrix) can be
avoided by using the first-order finite-difference approximation
Av =
(∂R
∂Q+
1
∆tI)v ≈ R(Q+ εv)−R(Q)
ε+
1
∆tv (4.26)
where
ε =εm||v||22
(4.27)
and εm is machine zero. This formulation is referred to as Jacobian-free GMRES. Other variations of
(4.26) are possible [300]. Chisholm [4] experimented with values of ε producing a finite difference that
has low roundoff and truncation error.
Restarted GMRES
In order to reduce the memory requirements of GMRES, a restarted algorithm can be used and is referred
to as GMRES(m). In the restarted algorithm, after a predefined number of Krylov search directions
have been formed, the update is computed, the subspace is discarded, and the algorithm is restarted
with an updated initial guess. Restarted GMRES is not used in this thesis for efficiency reasons.
Inexact Newton method
For practical problems, GMRES is typically not converged to machine zero. In this approach, a relative
tolerance, ηk, is imposed on the L2-norm of the linear residual. This tolerance can vary from one Newton
iteration to the next. The inexact tolerance is set to, ηk = 0.1, except for the first 15-20 iterations of
turbulent cases, where it is set to 10−5. Since the linear system is not converged to machine zero, the
method is referred to as an inexact Newton method.
In the early stages of Newton’s method, one can also modify the Jacobian and/or use a lower-order
approximation to it to make the algorithm more stable. This is referred to as an approximate Newton
4.3 SOLVING THE LINEAR SYSTEM: GMRES KRYLOV SUBSPACE METHOD 46
method. In this work, an approximate Newton method is considered where the matrix is a first-order
approximation to the Jacobian, A1.
Approximate Newton method
Newton globalization is further improved by using an approximate Jacobian at the onset of the nonlinear
algorithm. Specifically, the A1–Jacobian is used. The baseline preconditioner is derived from this matrix,
meaning no additional cost is incurred for using it. Once the nonlinear residual is below a certain
tolerance, the nonlinear algorithm reverts to a Jacobian-free formulation of the linear system.
Preconditioning
A major obstacle when using GMRES to solve (4.18) for x is that the conditioning of A is typically poor
for aerodynamic problems governed by the Navier–Stokes equations (insofar as the discretization that
is considered in this research). However, with a suitable preconditioner the linear solver performs much
better. One can use left preconditioning, right preconditioning, or a combination of both. Soulaimani
et al. [301,302] and Saad [303] discuss preconditioning techniques for GMRES for CFD applications.
Preconditioning is paramount to the iterative solution process. Whether in a continuation phase or
a Newton phase, a poorly-conditioned linear system will have slow convergence or diverge. Hence, the
objective is to find a preconditioner, or set thereof, that make the linear system Ax = b easily solvable.
It is important to have a preconditioner whose inverse, or approximation to its inverse, is known or
sparse. Techniques for constructing preconditioners vary from mathematical to heuristic ones. A good
measure of the preconditioned system matrix is a reduced condition number when compared to the
original system matrix.
The combined right- and left-preconditioned system is
(M−1l AM
−1r )(Mrx) = (M−1
l b) (4.28)
where Ml and Mr are the left and right preconditioners, respectively. The solution, x, of the original
system (4.18) is identical to that of the preconditioned system (4.28). Right preconditioning is considered
in this research. Specifically, Ml is replaced by the identity matrix. Hence, the right-preconditioned
system is given by
(AM−1r )(Mrx) = b (4.29)
It is essential that Mr be invertible and relatively inexpensive to compute. Furthermore, M−1r ≈ A−1.
Chapter 5
PRECONDITIONING
In the previous chapter, the Newton–Krylov method was introduced to solve the nonlinear system of
equations corresponding to steady-state solutions of the discretized, compressible Navier–Stokes equa-
tions. The GMRES Krylov subspace method is used to solve the linear systems.
It is well known that for realistic CFD problems, such as the ones investigated here, GMRES requires
acceleration through preconditioning. Right preconditioning is used. It transforms the original linear
system
Ax = b (5.1)
into another linear system (AM−1
)(Mx) = b (5.2)
that has the same solution, but is more easily solvable. The right-preconditioned system can be thought
of as solving (AM−1
)y = b
x = M−1y
An advantage of right-preconditioning is that it renders the residual of the linear system
r = b−Ax (5.3)
47
Chapter 5. PRECONDITIONING 48
Algorithm 1 Right-Preconditioned GMRES(A,M,b,x,m)
1. Initialize:
Choose x0, compute r0 = b−Ax0 and v1 = r0/β where β = ‖r0‖2.
2. Generate Krylov polynomial and compute its coeffients:
for m = 1, 2, . . . do
Precondition:
zm =M−1vm (5.4)
Form next Krylov subspace vector: wm = AzmAugment the upper-Hessenberg matrix: hi,m = (wm, vi), ∀ i = 1, 2, . . . ,m
Orthogonalize: vm+1 = wm −∑mi=1 hi,mvi
hm+1,m = ‖vm+1‖2vm+1 = vm+1/hm+1,m
Solve the least squares problem:
ρm = minym‖rm‖2 = min
ym||β e1 − Hm ym||2
if ρm ≤ TOL then
Exit loop
end if
end for
3. Update:
Compute um = Vm ym.
Update the solution xm = x0 + um.
unchanged. The right-preconditioned GMRES algorithm is presented in Algorithm 1.
Recall that the GMRES algorithm searches for an approximate solution within the Krylov subspace
Km(A; b) =b,Ab,A2b, . . . ,Am−1b
(5.5)
For the right-preconditioned linear system (5.2), GMRES searches for a solution within the Krylov
subspace
Km(AM−1; b) =b,(AM−1
)b,(AM−1
)2b, . . . ,
(AM−1
)m−1b
(5.6)
Within GMRES, the subspace vectors are generated and orthogonalized. Before orthogonalization, the
subsequent Krylov subspace vector, vi+1, is obtained by multiplying the previous Krylov subspace vector,
vi, by AM−1. The process is outlined by the following pattern:
v1 → M−1v1 → AM−1v1orthogonalize−−−−−−−−→ v2
v2 → M−1v2 → AM−1v2orthogonalize−−−−−−−−→ v3
v3 → . . . etc
5.1 SCALING 49
where v1 is a vector that is related to b. Every other step in this process involves the multiplication of
a vector by M−1. This is the preconditioning step. The operator M−1, or preconditioner, can be as
simple as a matrix splitting or as elaborate as an iterative method. In any case, the preconditioning step
in GMRES (5.4) is equivalently solving a system
Mz = v (5.7)
for a (preconditioned) vector z, where v is its unpreconditioned counterpart. To extend the precondi-
tioning step beyond a simple matrix splitting, one can interpret (5.7) as the first iteration in a stationary
iterative method with an initial guess of zero that solves the system
Az = v (5.8)
The three preconditioners of interest in this work are: an incomplete LU factorization with fill-in,
or simply ILU(p); an ILU(p)-smoothed relaxation method; and an ILU(p)-smoothed (linear) geometric
multigrid method. Furthermore, a comparison is made between the minimum discarded fill (MDF)
ordering (with block modifications for systems of PDEs) and the popular reverse Cuthill-McKee ordering
for the aforementioned preconditioners of interest. The effects of scaling and permutation matrices (for
these orderings) are also considered for these preconditioners.
To facilitate the description of these preconditioners and orderings, this chapter is strategically struc-
tured as follows: First, scaling is introduced. Next, ILU(p) is discussed and the effect of scaling on it
is touched on. The discussion then moves to graph theory and orderings. The minimum degree (MD)
ordering algorithm is briefly introduced using graph theory notation, followed by RCM. A thorough
discussion of MDF concludes the discussion on ordering.
The remainder of this chapter is devoted to multigrid preconditioning. First a relaxation method is
defined, followed by a demonstration of the effectiveness of ILU(p) as a smoother. Some mathematics
follows regarding the implementation of an ILU(p)-smoothed relaxation method as a preconditioner.
Finally, the geometric multigrid (GMG) preconditioner and its related inter-grid operators are described.
Attention is also paid toward scaling and reordering operators that can possibly be encountered at the
various grid levels.
5.1 Scaling
We are interested in iteratively solving a linear system
Ax = b (5.9)
corresponding to an iteration of Newton’s method, where A is the flow Jacobian plus a diagonal contin-
uation matrix. This system is scaled in order to improve the performance of the iterative method. In
general, row- and column-scaling matrices, S1 and S2 respectively, can be used, resulting in the system
S1AS2 S−12 x = S1b (5.10)
5.1 SCALING 50
Note that S1 and S2 are diagonal matrices. In the absence of round-off error, this system has the same
solution as the unscaled system.
Chisholm and Zingg [149] explain that row and column scaling influence the residual and state
vectors, respectively. For example, consider row scaling. If the residual vector at each node (containing
the discretized mass, momenta, energy, and turbulence model equations) is not scaled well, GMRES will
encourage the convergence of some of the equations, while allowing other equations to diverge. This
usually results in a very poor solution, x, that corresponds to a very poor update for the nonlinear
state in Newton’s method. Recall that an inexact Newton method is used and therefore GMRES is
not fully converged. While a large relative tolerance in GMRES can circumvent this problem, it means
that GMRES would take many more iterations, making the overall algorithm slow. The key disparity in
equation scaling can be due to the presence of the turbulence model, whose scaling can differ by several
orders of magnitude compared to its mean-flow equation counterparts. Chisholm [4] explains that various
scalings can be used to bring these equations to a closer relative size. Examples include scaling in terms
of the Reynolds number, the metric Jacobian, or a constant. In this research, the discretized turbulence
equation is scaled by the Reynolds number.
Another cause of disparity in scaling is due to the equations that correspond to boundary nodes.
The linearizations of the various boundary conditions (e.g. body, wakecut, farfield, and outflow) are
not of the same order as the the interior nodes. The interior nodes in the Jacobian have entries of
O(1). Therefore, the boundary equations in the linear system are scaled by the diagonal of the Jacobian
corresponding to each equation at each node.
The scalar diagonal elements of the Jacobian are well scaled prior to the aforementioned row scalings
that are applied to the equations. Therefore, the column scaling is set to
S2 = S−11 (5.11)
to preserve the original scaling of the diagonal elements (as well as the spectrum).
5.1.1 Jacobian-Vector Products in GMRES
Since scaling transforms the linear system, its effect must be accounted for in two important areas of
GMRES: the Jacobian-vector product and the preconditioner. Here the former is discussed in some
detail.
The scaled system (5.10) is solved using GMRES, which only requires matrix-vector products using
the matrix A. For simplicity, A is assumed to be only the flow Jacobian, and the diagonal continuation
matrix is ignored. This is achieved by using a Frechet derivative (4.26), as discussed in the previous
chapter. With right preconditioning, the Frechet derivative becomes
AM−1v =R[Q+ εM−1v]−R[Q]
ε(5.12)
5.2 INCOMPLETE LU (ILU) PRECONDITIONING 51
The right-preconditioned linear system with row and column scaling is given by[(S1AS2) (S1MS2)
−1] [
(S1MS2)(S−1
2 x)]
= S1b (5.13)
If row and column scaling matrices are used, the preconditioned Jacobian-vector product becomes
(S1AS2) (S1MS2)−1
(S1v) = S1
[R[Q+ ε (S1MS2)
−1(S1v)]−R[Q]
ε
](5.14)
Note that v itself must also have its rows scaled by S1.
5.2 Incomplete LU (ILU) Preconditioning
The inverse of the incomplete LU (ILU) factorization of a first-order approximation to the Jacobian
matrix, A1 is the baseline preconditioner that is used for this thesis. Specifically,
M−1 = (LU)−1 (5.15)
where L and U are the incomplete lower and upper factors, respectively. The subscript on A1 is dropped
for the remainder of this chapter since the theory applies to more general matrices in linear systems.
In this approach, the level of fill-in, p, with respect to the pattern of A is controlled. This method is
referred to as ILU(p). In its simplest form, ILU(0) refers to a factorization whose sparsity pattern is the
same as the matrix A, that is the sparsity pattern of L+ U is identical to the sparsity pattern of A. Any
entries that are outside of this pattern are discarded during the factorization. ILU(1) allows additional
fill-in from entries in the original matrix pattern. ILU(2) allows additional fill-in from entries within the
pattern of ILU(1). In general, the fill-in for ILU(p) is kept if it is due to entries from ILU(p− 1).
There are several variants to the ILU(p) factorization. Similarly, there are several variants to incom-
plete factorizations in general. They were reviewed in detail in Chapter 1.
The IKJ variant used in the SPARSKIT [304] package traverses the matrix in a row-wise sense
starting with row i = 1 until i = n, where n is the total number of rows in the matrix. For a given row,
i, contributions are factored into their row values from previous rows, k. The level of fill, LFIL or p, must
be considered if a zero entry in the matrix is to be modified. Since L and U are generated in a row-wise
manner, and they only rely on previous rows’ information, this variant of the algorithm is amenable to
the sparse-row storage formats used in SPARSKIT, such as the compressed sparse row (CSR) format.
Algorithm 2 offers a more detailed description of ILU(p) using an IKJ indexing strategy.
The Crout variation of the IKJ-indexed incomplete LU factorization modifies the original matrix
entries in the sub-block whose indices follow the pivot index. This is different than Algorithm 2 used
in SPARSKIT. The explanation provided for the minimum discarded fill (MDF) algorithm uses the
Crout variation of the IKJ-ILU factorization. Refer to Algorithm 3.
5.2 INCOMPLETE LU (ILU) PRECONDITIONING 52
Algorithm 2 SPARSKIT [304] ILU(p) factorization algorithm
Define a shifted level of fill-in: p = p+ 1
for i = 1, n do
for k = 1, i− 1 do
if levik <= p then
φ = aik/akk
for j = 1, n do
lev?ij = levik + levkj
if levij == 0 then
% Fill is unassigned
if lev?ij <= p then
aij = −φ akjlevij = lev?ij
end if
else
% Existing fill
aij ← aij − φ akjlevij ← min(levij , lev
?ij)
end if
end for % Index j
end if
end for % Index k
end for % Index i
A good measure of the quality of ILU is in terms of the preconditioned error of the factorization.
The incomplete factorization of A can be written as
A = LU + E (5.16)
where E is the error of the factorization. However, it is the preconditioned error that should be close to
zero. The preconditioned matrix can be written as
A(LU)−1 = I + E(LU)−1 (5.17)
where E(LU)−1 is the preconditioned error. When the preconditioner (5.15) is applied to the matrix
A, it should bring it closer to the identity matrix. For example, the eigenvalues of A(LU)−1 should be
closer to unity.
A major drawback of ILU is its failure to handle zero pivots. This problem can often be alleviated
by an intelligent pivoting strategy. Fortunately, the system considered here can always be ordered in
5.2 INCOMPLETE LU (ILU) PRECONDITIONING 53
Algorithm 3 Crout ILU(p) factorization algorithm
Define a shifted level of fill-in: p = p+ 1
for i = 1, n− 1 do
for k = i+ 1, n do
φ = aki/aii
for j = i+ 1, n do
lev?kj = levki + levij
if levkj == 0 then
% Fill is unassigned
if lev?ij <= p then
akj = −φ aijlevkj = lev?kj
end if
else
% Existing fill
akj ← akj − φ aijlevkj ← min(levkj , lev
?kj)
end if
end for % Index j
end for % Index k
end for % Index i
such a manner that avoids zero pivots. A second shortcoming of ILU is that it scales poorly with with
increasing problem size.
For discretized systems of PDEs, block forms of ILU are preferred. For example, a block-fill ILU(p),
or BFILU(p), algorithm can be used. Orkwis [128] and Pueyo [238] employ this preconditioner. In
BFILU(p), all entries within a block of A (corresponding to a single grid node) are assigned a fill-in
value of zero. The factorization proceeds as ILU(p) would with scalar quantities.
A block ILU(p), or BILU(p), algorithm is used in this work, where each block corresponds to a single
grid node. In contrast to BFILU(p), this approach is a true block incomplete factorization, where divisions
and multiplications in the scalar algorithm are replaced with matrix inversions and multiplications.
Chisholm [4] and Hicken [305] used BILU(p) for their Navier-Stokes and Euler simulations, respectively.
5.2.1 Effect of Scaling
It is important to determine the sensitivity of ILU(p) with respect to scaling. This is especially true
when considering multigrid preconditioning, since different scaling operators can be used on the various
5.2 INCOMPLETE LU (ILU) PRECONDITIONING 54
iia
jia a jk
a ik
multip
ly
multiply
divide
Row: j
Row: i
Column: i Column: k
Figure 5.1: Contributions to ajk from pivot aii in the elimination algorithm.
grid levels. In the absence of round-off errors,
ILU(S1AS2) = S1 ? ILU(A) ? S2 (5.18)
The proof is outlined below.
The ILU (and the more general LU) factorization process uses the elimination step
ajk ← ajk −ajiaikaii
(5.19)
Figure 5.1 shows the contribution of the of various elements of the matrix A to the entry ajk.
First consider the more general row and column scaling matrices S1 and S2 respectively. Next,
consider only the entries in S1, S2, and A that relate to rows i and j and columns i and p of Arespectively. The matrices are
. . .
s1ii
. . .
s1jj
. . .
......
. . . aii . . . aik . . ....
...
. . . aji . . . ajk . . ....
...
. . .
s2ii
. . .
s2kk
. . .
(5.20)
The product of these matrices becomes
......
. . . s1iiaiis2ii . . . s1iiaiks2kk. . .
......
. . . s1jjajis2ii . . . s1jjajks2kk. . .
......
(5.21)
5.3 ORDERING 55
Hence, the ILU elimination step applied to (5.21) becomes
s1jjajks2kk
← s1jjajks2kk
−s1jjajis2ii s1iiaiks2kk
s1iiaiis2ii
(5.22)
and simplifies (through the cancellation of s2ii and s1ii) to
s1jjajks2kk
← s1jj
(ajk −
ajiaikaii
)s2kk
(5.23)
Thus the ILU elimination step is insensitive to row and column scalings.
5.3 Ordering
The ordering of the equations and the unknowns in (5.9) is crucial to solution algorithm. The incomplete
factorization preconditioner depends heavily on ordering in terms of quality and stability. For example,
if a pivoting strategy fails to counter a zero pivot, ILU will break down. Furthermore, the amount of
fill-in that is discarded is heavily dependent on the ordering.
Fortunately, the matrix A does not contain zero pivots with the current discretization. The only
locations in A that possibly contain zeros on the diagonal correspond to the boundary conditions for a
given node. This is easily rectified by reordering the mass, momenta, and energy equations.
The second type of ordering that impacts the quality of BILU is on a nodal level. Larger blocks can
also be used, but are not considered in this research. The computational domain has a default ordering.
This is referred to as a grid-based ordering. Examples of grid-based orderings include natural and double-
bandwidth (for meshes with a wakecut). Natural ordering is a lexicographical ordering that traverses
one direction before another. For a C-topology mesh, the natural ordering can be in the normal direction
first and then in the streamwise direction. Proceeding in the normal direction first is preferred since the
normal direction typically has a smaller amount of nodes compared to the streamwise direction, leading
to a tighter clustering of bands around the main diagonal. Figure 3.2 shows the sparsity pattern for A2
and A1 matrices using this specific natural ordering. The bandwidth is very poor, however, because of
the entries in the upper-right and lower-left corners of the matrix resulting from the discretization across
the wakecut. Double-bandwidth ordering is a better grid-based ordering for C-topology meshes since it
traverses across the wakecut.
Nevertheless, reordering of the nodes can improve on these initial orderings. Two main categories
for reordering include graph-based and matrix-based. The latter include minimum degree [131] (MD)
and reverse Cuthill–McKee [134] (RCM). A matrix ordering that is researched in detail in this work is
minimum discarded fill (MDF) [114]. Only symmetric nodal reordering strategies are used in this work
so as to preserve the favourable (block) diagonal entries in A. The rest of this section outlines these
reordering approaches. The MD reordering is discussed since it aids in the explanation of RCM. The
reorderings are well described by using terminology from graph theory. Therefore, a brief review of the
basic aspects of graph theory is also presented.
5.3 ORDERING 56
The idea of domain decomposition is closely related to nodal reordering. The objective of domain
decomposition is to break the larger problem domain into smaller subproblem domains. The approach is
inherently parallel. Appendix A.1 discusses the general theory of domain decomposition and a detailed
literature review was provided in Chapter 1. Serial aspects of preconditioning are emphasized in this
work and therefore, domain decomposition will not be discussed any further.
5.3.1 Graph Theory
Graph theory is used to better describe the two key orderings that are compared in this thesis: reverse
Cuthill-McKee (RCM) and minimum discarded fill (MDF). Here, some of the basics of graph theory are
briefly outlined including notation. We limit our discussion to the graph associated with a given matrix
A ∈ Rn×n that has nonzero diagonal entries. A concise description of graph theory can be found in
works by Liu and Sherman [146], Dutto [140], and Kaveh et al. [306].
A graph G = 〈V,E〉 of a matrix A consists of a set of n vertices
V = v1, v2, . . . , vn (5.24)
and edges
E = vi, vj : i 6= j, aij 6= 0 and vi, vj ∈ V (5.25)
formed by adjacent vertices. The adjacency set of a given vertex, adjG(vi), contains all vertices that
share edges with vi. The cardinality of a set R is the number of elements contained in that set and is
written as |R|. Hence, the degree of a node vi is
degG(vi) = |adjG(vi)| (5.26)
Finally, f is the numbering of the graph. It is the index of a vertex.
An additional property of the graph G(A) that relates to the matrix is defined, since it is related to
many matrix reordering algorithms. The bandwidth of A is
b(A) = max |i− j| : aij 6= 0 (5.27)
5.3.2 Minimum Degree (MD) Ordering
The minimum degree (MD) ordering is entirely based on the graph G = 〈V,E〉 of a matrix A. First,
a root node of minimal degree is chosen. This node’s influence on the rest of the graph is erased by
deleting it from the graph and updating the edges and vertices of the graph. Next, another node of
minimal degree is chosen and the process is repeated until the graph is depleted. Algorithm 4 shows the
MD ordering. There are two ambiguities that exist in the ordering: root node selection and tie-breaking
strategy.
At the first iteration, there may be several nodes that have a minimal degree. For the classical MD
algorithm that is presented, the choice of root node is arbitrary. Furthermore, at each subsequent node
5.3 ORDERING 57
Algorithm 4 Minimum Degree: MINDEG(A)
Define the graph G = 〈V,E〉 associated with the matrix A.
while V 6= ∅ doSelect a node v ∈ V of minimum degree in G and order that v as next.
Let Vv be the subset of V with the vertex v removed:
Vv = V − v
Define Ev as the remaining set of edges in G that do not contain the removed node v:
Ev = a, b ∈ E : a, v ∈ Vv ∪ a, b : a 6= b and a, b ∈ adjG(v)
Redefine the graph G as the graph that would remain after vertex v is removed. That is, set V = Vv,
E = Ev, and update G = 〈V,E〉.end while
selection, there may also be ties. The tie-breaking strategy in the classical algorithm is also arbitrary.
The MD ordering is not used in this work. However, it clearly illustrates the ambiguities that are also
relevant to the discussion of the reverse Cuthill–McKee ordering strategy.
5.3.3 Reverse Cuthill–McKee (RCM) Ordering
The reverse Cuthill–McKee (RCM) ordering is designed to minimize the bandwidth of a matrix by using
its graph. The ordering is the Cuthill–McKee (CM) ordering, but reversed. Like the MD ordering, CM
also begins with a root node that is of minimal degree. From there, adjacent nodes are selected until the
entire graph is traversed. The algorithm visually appears to be a wavefront that emanates from the root
node and advances through the graph. Algorithm 5 outlines the classical RCM ordering, as presented
by George [134].
RCM suffers from the same ambiguities as MD. In particular, the selection of the root node is arbitrary
in the classical algorithm. Furthermore, tie-breaking between nodes of equal degree is also arbitrary.
For matrices arising from the discretization of the Navier–Stokes equations, the quality of the RCM-
reordered matrix depends greatly on the root-node selection and tie-breaking strategies. Chisholm [4]
investigated some root-node selection and tie-breaking strategies. For 2D inviscid and viscous cases,
he found that a good choice for the root node is downstream and in the middle of the grid. Various
tie-breaking strategies were investigated. Specifically, x- and y-position, up- or down-wind position, and
grid indices were considered. In most cases the choice was not as crucial, but his conclusion was to break
ties by selecting the upwind node first. The root-node selection and tie-breaking strategies used in this
research are discussed in the next chapter.
5.3 ORDERING 58
Algorithm 5 Reverse Cuthill–McKee: RCM(A)
Define the graph G = 〈V,E〉 associated with the matrix A. Let Q be the input queue containing all of
the nodes in an arbitrary order. Let R and S be working queues that are initially empty. Let TOP(P )
denote first entry for any queue P .
Choose a root node v1 ∈ V from Q.
Set i = 1
Q← Q− viR← adj(vi) ∩QQ← Q−Rloop
while R 6= ∅ doi← i+ 1
vi = TOP(R)
R← R− viPlace vi at the end of S
end while
if Q == ∅ then stop
z = TOP(S)
S ← S − zR← adj(z) ∩QQ← Q−R
end loop
Reverse the ordering of the nodes.
5.3 ORDERING 59
5.3.4 Minimum Discarded Fill (MDF) Ordering
Crout LU factorization
We first consider the LU-factorization of the matrixA ∈ Rn×n. Using the Crout form of the factorization,
the first iteration is written as
A = A0 =
(d1 βT1
α1 B1
)(5.28)
where d1 ∈ R is the first diagonal entry of A0 (i.e. pivot), βT1 ∈ R1×(n−1) is the remaining part of the
first row of A0, α1 ∈ R(n−1)×1 is the remaining part of the first column of A0, and B1 ∈ R(n−1)×(n−1)
is the submatrix of A0 after the removal of the first row and column. The initial matrix is written as
A0 = L1U1 (5.29)
where
L1 =
(1 0α1
d1In−1
)(5.30)
and
U1 =
(d1 βT1
0 B1 − α1βT1
d1
)(5.31)
The lower-right submatrix in U1 is defined as A1.
The factorization of the matrix A0 is therefore given by
A0 = L1U1 =
(1 0α1
d1In−1
)(d1 βT1
0 A1
)(5.32)
Without any permutations of row or columns, the factorization proceeds as(1 0αk
dkIn−k
)(dk βTk
0 Ak
)(5.33)
where ∀ k = 1, . . . , n− 1,
Ak = Bk −αkβ
Tk
dk(5.34)
Defining
Ck = [c(k)ij ] ≡ αkβ
Tk
dk(5.35)
gives
Ak = Bk − Ck (5.36)
If the LU factorization is performed in such a manner as not to drop any fill, then (5.36) represents the
exact factorization applied to each respective submatrix of the original matrix, A = A0.
5.3 ORDERING 60
Discarded fill
For incomplete factorizations there is information that is lost during the factorization process. Essen-
tially, some entries are discarded to prevent the accumulation of large amounts of fill. This fill-in that is
lost at each step of the factorization can be represented in the kth iteration of the algorithm by matrix
Fk. The iteration becomes
Ak−1 =
(1 0αk
dkIn−k
)[(dk βTk
0 Bk − αkβTk
dk−Fk
)+
(0 0
0 Fk
)](5.37)
Referring to (5.36), the submatrix Ak can be redefined as
Ak = Bk − Ck −Fk (5.38)
where Fk is a matrix containing the discarded fill.
MDF
The minimum discarded fill [113, 114] algorithm is a reordering strategy (i.e. pivoting) coupled with
an incomplete factorization process that minimizes the amount of fill that is dropped. Since the MDF
algorithm is an incomplete factorization (like ILU or IC), a fill and/or drop-tolerance strategy can be
employed. The former is used, and LFIL is defined as the maximum allowable fill per designated entry
in the matrix Ak.
The fill-in level of the matrix entries is updated using the formula
lev(k)ij ≡ min
(lev
(k−1)im + lev
(k−1)mj + 1, lev
(k−1)ij
)(5.39)
The discarded fill that results from the choice of a pivot in the factorization is given by the matrix
Fk = [f(k)ij ] ≡
0, b
(k)ij 6= 0
−c(k)ij , lev
(k)ij > LFIL
0, otherwise
(5.40)
where lev(k)ij is the fill-in level of a particular entry in the matrix Ak. The equivalent representation for
Ak is
Ak = [a(k)ij ] ≡
b(k) − c(k)
ij , b(k)ij 6= 0
b(k), lev(k)ij > LFIL
b(k) − c(k)ij , otherwise
(5.41)
In the MDF algorithm, at iteration k, the subsequent pivot node is chosen such that the Frobenius norm
of Fk is minimized. Refer to Algorithms 6 and 7. D’Azevedo et al. [114] discuss variations to their
original MDF algorithm. For example, the threshold MDF algorithm modifies (5.40) to
Fk = [f(k)ij ] ≡
0, b
(k)ij 6= 0
−c(k)ij , lev
(k)ij > LFIL or |c(k)
ij | < εmin(Ri, Rj)
0, otherwise
(5.42)
5.3 ORDERING 61
Algorithm 6 Minimum discarded fill: MDF(A)
Initialization:
Set: A0 ≡ A.
Set:
lev(0)ij ≡
0, aij 6= 0
∞, otherwise
Compute the discard value for all nodes vj in the graph of A0 using Algorithm 7.
for k = 1, . . . , n− 1 do
1. Choose the next pivot node vm is such that it has a minimal discard(vm). The tie-breaking strat-
egy hierarchy is: (a) minimum deficiency, (b) minimum degree, and (c) minimum lexicographical
ordering index.
2. Update the incomplete factorization Ak using (5.38) with the defined maximum allowable fill
level, LFIL.
3. Define the permutation matrix Pk to exchange vm to the first position in Ak.
4. Update the fill level of the elements in Ak using (5.39). Specifically,
for neighbour vi of vm, where (vi, vm) ∈ Ek−1 do
for neighbour vj of vm, where (vm, vj) ∈ Ek−1 do
lev(k)ij ≡ min
(lev
(k−1)im + lev
(k−1)mj + 1, lev
(k−1)ij
)end for
end for
5. Update the discard values of vm’s neigbours using the following iteration:
for each vi neighbour of vm, where (vi, vm) ∈ Ek−1 do
Using Algorithm 7, re-compute its discard value discard(vi) = ||Fk+1||F,
where Fk+1 is obtained from
Pk+1AkPTk+1 =
(dk+1 βTk+1
αk+1 Bk+1
)
and Pk+1 is the permutation matrix that exchanges vi to the first position in Ak.
end for
end for
where ε is a tolerance and
Ri = maxm=1,n
(|aim|) = ||ai∗||∞ (5.43)
Another variation of MDF is the minimum update matrix (MUM) [114] algorithm, which is related to
classic work by Markowitz [130].
5.3 ORDERING 62
Algorithm 7 Compute the discard value for node viInitialize: discard(vi) ≡ 0
Refer to Figure 5.1. Compute the discard value, discard(vk) = ||Fk||F using the following iterations:
for each neighbour vj of vi in Ek do
for each p such that a(k)ip 6= 0, a
(k)jp = 0, lev
(k+1)jp > LFIL?? do
discard(vi)← discard(vi) +
(a(k)ji a
(k)ip
a(k)ii
)2
end for
end for
discard(vi)←√
discard(vi)
?? Note: a(k)ip 6= 0, a
(k)jp = 0, lev
(k+1)jp > LFIL simply means that if a nonzero entry a
(k)ip is to introduce
some new fill into the matrix in entry a(k)jp and it exceeds the allowable fill-in level limit, it should be
treated as discarded fill.
Greedy MDF
The approach of Persson and Peraire [145] is followed to extend the MDF algorithm to a system of PDEs.
It is called the greedy MDF algorithm. Although they use the discontinuous Galerkin finite element
method, their approach is directly-applicable to finite-difference discretization used in this work. In the
greedy MDF algorithm, the blocks in the system matrix are approximated as scalars by taking their
Frobenius norms.
Beginning with the original system matrix, A, a block-scaled matrix
B = (AD)−1A (5.44)
is formed where AD is equivalent to the block diagonal of A. The block-diagonal entries of B are identity
matrices with dimensions equal to the block size. The reduced system matrix, C, has scalar entries equal
to the Frobenius norms of the block entries in B
Cij = ||Bij ||F (5.45)
The MDF algorithm is then performed on the reduced system matrix and the nodal reordering is
obtained. The block reduction is summarized in Algorithm 8.
Effect of scaling
The effect of scaling on the greedy MDF algorithm was investigated. Specifically, row and column
scaling were investigated, and it was found that MDF insensitive to row scaling. The proof that MDF
is insensitive to row scaling and sensitive to column scaling is as follows:
Consider a 2× 2 block matrix
A =
(A11 A12
A21 A22
)(5.46)
5.3 ORDERING 63
Algorithm 8 Block reduction for greedy MDF
for i = 1, nnblocks do
Compute the inverse of the diagonal block i of A, ADi
Scale block row i of A by ADi and store as block-row i of Bfor j = 1, nnblocks do
cij ← ||Bij ||F, where C = cij is the reduced matrix
end for
end for
For the greedy MDF algorithm, a diagonal block-row scaling given by (5.44) is applied. This results in
the matrix
B =
(I11 A−1
11 A12
A−122 A21 I22
)(5.47)
This matrix is then reduced to a scalar equivalent and the discard values are obtained.
Consider the a diagonal row scaling matrix, Sr, that is partitioned into blocks with equivalent di-
mensions to I11 and I22. Hence,
Sr =
(Sr1 0
0 Sr2
)(5.48)
If (5.46) is scaled by (5.48), then
SrA =
(Sr1 0
0 Sr2
)(A11 A12
A21 A22
)(5.49)
=
(Sr1A11 Sr1A12
Sr2A21 Sr2A22
)(5.50)
Applying the diagonal block-row scaling based on this matrix gives
Brow scale =
(I11 (Sr1A11)
−1 Sr1A12
(Sr2A22)−1 Sr2A21 I22
)(5.51)
=
(I11 A−1
11 S−1r1 Sr1A12
A−122 S−1
r2 Sr2A21 I22
)(5.52)
=
(I11 A−1
11 I11A12
A−122 I22A21 I22
)(5.53)
=
(I11 A−1
11 A12
A−122 A21 I22
)(5.54)
= B (5.55)
which is the original diagonal block-row scaled matrix given in (5.47). The diagonal block-row scaling
matrices cancel out in this formulation. Hence the greedy MDF algorithm is insensitive to row scaling.
5.4 MULTIGRID PRECONDITIONING 64
However, the algorithm is influenced by column scaling. If a diagonal column scaling matrix Sc (parti-
tioned into blocks with equivalent dimensions to I11 and I22) is considered and the same approach as
the above derivation is followed, the following diagonal block-column scaled matrix is obtained:
Bcolumn scale =
(I11 S−1
c1 A−111 A12Sc2
S−1c2 A
−122 A21Sc1 I22
)6= B (5.56)
Hence, the greedy MDF ordering is sensitive to column scaling.
5.4 Multigrid Preconditioning
The remainder of this chapter focuses on the use of multigrid as a preconditioner for GMRES. Multigrid
consists of a smoother and coarse-grid correction. The smoothers in this work are based on ILU(k), and
therefore belong to the family of stationary iterative methods that are based on matrix splittings. This
section begins with a brief review of matrix splittings, followed by a demonstration that ILU(k) is indeed
a good smoother. From there the discussion shifts to the iterative use of ILU(k) and ILU(k)-smoothed
multigrid as a preconditioner. Inter-grid operators, reordering, and scaling are also considered in the
discussion of multigrid preconditioning.
5.4.1 Stationary Iterative Methods
The linear system (5.9) is solved using GMRES. Here, a stationary iterative method is considered for
solving the linear system, thus introducing the concept of the smoother. In turn, that smoother will be
accelerated by multigrid, leading to a multigrid preconditioner.
Consider the splitting
A =M+N (5.57)
where the cost of inverting M is cheaper than A. A classic relaxation method is used to solve (5.9)
using this matrix splitting and is defined by
xm+1 = xm +M−1rm (5.58)
where
rm = b−Axm (5.59)
is the residual. A damping parameter, ω, can be introduced into (5.58) resulting in
xm+1 = xm + ωM−1rm (5.60)
If x is the exact solution, then the error at iteration m is
em = x− xm (5.61)
5.4 MULTIGRID PRECONDITIONING 65
and
Aem = rm (5.62)
Furthermore, the iteration matrix for the damped method is
G = I + ωM−1A (5.63)
It can be easily shown that
em = Gem−1 (5.64)
and
em = Gme0 (5.65)
where e0 is the initial error. Using the properties of norms,
||em|| ≤ ||Gm|| ||e0|| (5.66)
Convergence for the relaxation method is guaranteed if
limm→∞
||Gm|| = 0 (5.67)
or equivalently, if the spectral radius of G satisfies
ρ(G) < 1 (5.68)
5.4.2 ILU(p) as a Smoother
A relaxation method has a corresponding splitting matrix (or set thereof). For example, the Richardson
[307] and Jacobi methods have splitting matrices of MR = I and MJ = DA, respectively. I is the
identity matrix and DA is the diagonal of A. The Gauss–Seidel method uses more information from the
system matrix, A, and its corresponding splitting matrix is either
MGS = LA (5.69)
or
MGS = UA (5.70)
where LA and UA are the lower- and upper-triangular parts of A, respectively.
In order for a relaxation method to be effectively accelerated by multigrid, it must exhibit a good
smoothing behaviour. That is, the method must efficiently damp high-frequency errors, thus making it
amenable to coarse-grid corrections. The symmetric Gauss–Seidel (SGS) method, with splitting matrix
MSGS = LAUA, is an example of a good smoother. SGS alternates the forward and backward solves
indicated by the operators LA and UA.
5.4 MULTIGRID PRECONDITIONING 66
In addition to being a preconditioner for GMRES, ILU(p) can be a good smoother of high-frequency
errors and thus be accelerated by multigrid. ILU(p) can be represented as the matrix splitting
MILU(p) = LU (5.71)
where L and U are the incomplete factors of A.
A study was conducted to investigate the effectiveness of ILU(p) as a smoother. The study later
was extended to compare ILU(p) to SGS and to measure the importance of the coarse-grid correction.
A linear system was constructed using the convection-diffusion equation matrix operator, ACD, with a
Peclet number of 0.01 and a flow angle of θ = 45. Specifically, the operator was generated using an
n×n-node square grid (for n = 21, 41, 81, 161) using second-order centered differences. The right-hand
side was set to zero, resulting in the linear system:
ACD φ = 0 (5.72)
with an exact solution of φ = 0. This system was solved using a stationary iterative method with ILU(0)
as a smoother. For comparison, the same problem was solved using an SGS smoother. In order to
determine the effectiveness of the each smoother, the initial guess for the iterative method was set to
φ0 = e0(θx, θy) = sin θx sin θy (5.73)
where larger values of θx and θy correspond to increasing frequencies in each respective direction. Since
the exact solution is zero, the convergence of the method depends solely on the specific initial frequencies
related to θx and θy.
Tables 5.1 and 5.2 compare the number of iterations for the ILU(0) and SGS methods to converge the
linear residual by ten orders of magnitude for n = 21 and n = 41, respectively. Both methods require
significantly fewer relative iterations for high-frequency initial errors and this effect increases with n.
Since ILU(0) has better coupling, it generally requires fewer iterations than SGS. Furthermore, the CPU
time for ILU(0) is much lower than SGS for all cases.
Tables 5.3 and 5.4 compare the smoothing effectiveness of ILU(0) to ILU(1) for n = 41 and n = 81,
respectively. ILU(1) is also an effective smoother and requires fewer iterations overall than ILU(0).
5.4 MULTIGRID PRECONDITIONING 67
Table 5.1: SGS (left) and ILU(0) (right) iterations on a 21 × 21–node grid for various initial error
frequencies.
HHHHH
HHθx
θylow medium high
low 20284 15119 10092
medium 15119 10563 4402
high 10092 4402 4951
HHHHH
HHθx
θylow medium high
low 289 215 140
medium 215 150 83
high 140 83 82
Table 5.2: SGS (left) and ILU(0) (right) iterations on a 41 × 41–node grid for various initial error
frequencies.
HHHHH
HHθx
θylow medium high
low 274027 175009 100319
medium 175009 89515 28499
high 100319 28499 11329
HHHHH
HHθx
θylow medium high
low 986 627 336
medium 627 316 108
high 336 108 43
Table 5.3: ILU(0) (left) and ILU(1) (right) iterations on a 41 × 41–node grid for various initial error
frequencies.
HHHH
HHHθx
θylow medium high
low 986 627 336
medium 627 316 108
high 336 108 43
HHHH
HHHθx
θylow medium high
low 379 235 129
medium 242 116 49
high 155 46 20
Table 5.4: ILU(0) (left) and ILU(1) (right) iterations on an 81 × 81–node grid for various initial error
frequencies.
HHHHH
HHθx
θylow medium high
low 3466 1802 730
medium 1802 370 80
high 730 80 20
HHHHH
HHθx
θylow medium high
low 1326 664 210
medium 692 126 36
high 342 36 10
5.4 MULTIGRID PRECONDITIONING 68
Algorithm 9 Relaxation: RELAX(A,M,z,v,ν)
for i = 1, ν do
Compute the residual: r = v −AzSolve M∆z = r for the update ∆z.
Update the solution: z ← z + ∆z
end for % Index i
5.4.3 Iterative ILU(p) as a Preconditioner
Pueyo [5] examined the use of ILU(p) to solve the nonlinear system. Here, the repeated use of ILU(p) as
a preconditioner for GMRES is of interest for two reasons: it is generalization of ILU(p) to an iterative
method, thus offering more flexibility in its use and tuning, and it constitutes the smoothing component
to a more general multigrid preconditioner.
Consider the iterative solution of (5.8) using an ILU(p)-smoothed stationary method, where the
initial guess to the solution is z0 = 0. The baseline preconditioning step in GMRES (5.4) is consistent
with one iteration of this stationary method. Algorithm 9 summarizes this iterative method, where Mis the ILU(p)-factorization of the system matrix, A.
Iteration r of Algorithm 9 can be written in terms of the initial guess, z0, and the unpreconditioned
Krylov subspace vector, v, as
zr = Grz0 + (I − Gr)A−1v (5.74)
where
G = I −M−1A (5.75)
is the iteration matrix. Although the matrix A−1 appears in (5.74), A is not inverted in the iterative
method. Using the initial guess z0 = 0, equation (5.74) simplifies to
zr = (I − Gr)A−1v (5.76)
Using the notation of Algorithm 1, the preconditioning step at iteration m of GMRES is therefore
given by
wm = (I − Gr)A−1vm (5.77)
The smoothing operator is independent of vm. Furthermore, for r = 1, the baseline ILU(p) precondi-
tioning step (5.4) is recovered:
wm = (I − G)A−1vm (5.78)
⇒ wm =(I −
(I −M−1A
))A−1vm (5.79)
⇒ wm =(M−1A
)A−1vm (5.80)
⇒ wm = M−1vm (5.81)
5.4 MULTIGRID PRECONDITIONING 69
5.4.4 ILU(p)-Smoothed Geometric Multigrid as a Preconditioner
The right-preconditioned GMRES algorithm searches for a solution within the Krylov subspace (5.6).
In the previous section, it was shown that the baseline ILU(p) preconditioner is a matrix splitting of A,
and its application is equivalent to the first iteration of a stationary iterative method. If an iterative
method is used as a preconditioner, the preconditioned Krylov subspace can be represented as
Km(AM−1Iter; b) =
b,(AM−1
Iter
)b,(AM−1
Iter
)2b, . . . ,
(AM−1
Iter
)m−1b
(5.82)
where the operator M−1Iter represents the iterative preconditioner.
Earlier, ILU(p) was shown to be an excellent smoother of high-frequency errors; however it is not as
effective for damping low-frequency errors. A coarse-grid correction can be used to solve for the remaining
error by projecting the relationship (5.62) onto a coarse grid. On the coarse grid, the remaining error
waveform is represented by fewer nodes, thus appearing to have a higher frequency. The smoother can
be applied to effectively reduce the coarse-grid error, and this error can then be interpolated back to the
fine grid.
The original linear system (5.1) with multigrid preconditioning can be written as
AM−1MGMMGx = b (5.83)
and the solution found by using GMRES lies within the Krylov subspace
Km(AM−1MG; b) =
b,(AM−1
MG
)b,(AM−1
MG
)2b, . . . ,
(AM−1
MG
)m−1b
(5.84)
A geometric multigrid (GMG) approach is used to determine the coarse-grid operators in this re-
search. The operators on the various coarser grid levels are generated on each grid level in a similar
fashion to the fine-grid operators. A V-cycle is used for this research. Figure 5.2 shows a four-grid V-
cycle. Algorithm 10 shows a two-grid V-cycle multigrid preconditioner. The letters f and c denote the
fine and coarse grids, respectively. The restriction operator, Icf , interpolates the fine grid residual to the
coarse grid, and the prolongation operator, Ifc , interpolates the coarse grid correction to the remaining
error to the fine grid.
The effectiveness of multigrid depends on the restriction and prolongation operators. For geometric
multigrid, the inter-grid operators must satisfy the order rule [308]
mIcf +mIfc > morder (5.85)
where mIcf and mIfc are the orders of the restriction and prolongation operators, respectively, plus
one. For the problems considered in this work, morder = 2, since, at most, second-order operators are
considered for the discretization of the PDEs (e.g. the convection-diffusion and Navier–Stokes equations).
Bilinear interpolation (or full-weighting) is used for the 2D restriction and prolongation operators
and satisfy (5.85). Although not used, restriction by injection combined with linear interpolation prolon-
gation also satisfies the aforementioned order rule criterion. Figures 5.3 and 5.4 illustrate the restriction
5.4 MULTIGRID PRECONDITIONING 70
Smooth
Restrict
Prolong
Figure 5.2: A four-grid, multigrid V-cycle.
Algorithm 10 Multigrid V-cycle: MGV2(Af ,Mf ,zf ,vf ,Ac,Mc,ν1,ν2,νc)
Initialize: zf = 0
Perform ν1 pre-smoothing iterations: RELAX(Af ,Mf ,zf ,vf ,ν1)
Compute the residual: rf = vf −AfzfRestrict the residual: vc = Icf rfInitialize the coarse-grid correction: zc = 0
if νc == 0 then
Solve the coarse-grid system exactly: Aczc = vc
else
Solve the coarse-grid system inexactly: RELAX(Ac,Mc,zc,vc,νc)
end if
Prolong the coarse-grid correction: zf ← zf + Ifc zcPerform ν2 post-smoothing iterations: RELAX(Af ,Mf ,zf ,vf ,ν2)
and prolongation operators, respectively. Higher-order interpolation operators do not improve the per-
formance. Furthermore, other approaches at the boundaries were examined; however they perform
worse.
Tables 5.5-5.7 show the effectiveness of a coarse-grid correction for ILU(0) for the convection-diffusion
problem in Section 5.4.2. Tables 5.3 and 5.4 show that ILU(0) scales poorly with increasing problem
size. ILU(0) accelerated by a coarse-grid correction scales nearly optimally with increasing grid size.
For this case, information is projected to the coarse grid using a simple injection operator and the error
correction is projected to the fine grid using a weighted prolongation operator. These are classical results
that are expected of multigrid for diffusion-dominated problems. Recall, that the Peclet number for this
case is 0.01, which is characteristic of a diffusion-dominated flow. Tables 5.8-5.10 present similar results
with ILU(1) as a smoother.
5.4 MULTIGRID PRECONDITIONING 71
Figure 5.3: Full-weighting restriction operator.
Figure 5.4: Full-weighting prolongation operator.
5.4 MULTIGRID PRECONDITIONING 72
Table 5.5: ILU(0) (left) and ILU(0)+MG (right) iterations on a 41 × 41–node grid for various initial
error frequencies.
HHHHH
HHθx
θylow medium high
low 986 627 336
medium 627 316 108
high 336 108 43
HHHHH
HHθx
θylow medium high
low 21 19 16
medium 19 15 13
high 16 13 11
Table 5.6: ILU(0) (left) and ILU(0)+MG (right) iterations on an 81 × 81–node grid for various initial
error frequencies.
HHHHHHHθx
θylow medium high
low 3466 1802 730
medium 1802 370 80
high 730 80 20
HHHHHHHθx
θylow medium high
low 21 19 16
medium 19 14 12
high 16 12 10
Table 5.7: ILU(0) (left) and ILU(0)+MG (right) iterations on a 161× 161–node grid for various initial
error frequencies.
HHHH
HHHθx
θylow medium high
low 11703 4669 1091
medium 4669 256 46
high 1091 46 10
HHHH
HHHθx
θylow medium high
low 22 19 16
medium 19 14 12
high 16 12 9
5.4 MULTIGRID PRECONDITIONING 73
Table 5.8: ILU(1) (left) and ILU(1)+MG (right) iterations on a 41 × 41–node grid for various initial
error frequencies.
HHHHH
HHθx
θylow medium high
low 379 235 129
medium 242 116 49
high 155 46 20
HHHHH
HHθx
θylow medium high
low 17 15 13
medium 14 10 10
high 13 9 8
Table 5.9: ILU(1) (left) and ILU(1)+MG (right) iterations on an 81 × 81–node grid for various initial
error frequencies.
HHHHHHHθx
θylow medium high
low 1326 664 210
medium 692 126 36
high 342 36 10
HHHHHHHθx
θylow medium high
low 18 15 13
medium 15 10 9
high 13 9 7
Table 5.10: ILU(1) (left) and ILU(1)+MG (right) iterations on a 161× 161–node grid for various initial
error frequencies.
HHHH
HHHθx
θylow medium high
low 4470 1666 364
medium 1795 119 21
high 487 21 7
HHHH
HHHθx
θylow medium high
low 18 15 13
medium 14 9 8
high 12 9 7
5.4 MULTIGRID PRECONDITIONING 74
Algorithm 11 Relaxation (reordered): RELAX(A,M,P,z,v,ν)
for i = 1, ν do
Compute the residual: Pr = Pv − (PAPT)PzSolve (PMPT)(P∆z) = Pr for the update P∆z.
Update the solution: Pz ← Pz + P∆z
end for % Index i
5.4.5 Reordering and Scaling
Accounting for reordering
For the convection-diffusion equation, relaxation with ILU splitting can be accelerated by nodal re-
ordering. Essentially, the reordering leads to a more effective splitting matrix, M. For example, if the
criterion of minimum bandwidth is used for the system matrix A, it leads to a (reverse) Cuthill-McKee
reordering. An ILU factorization can be performed on the reordered matrix A, leading to a popular
choice of preconditioner.
If a pivoting strategy is used during the incomplete factorization process, then the reordering and the
factorization are done simultaneously. Therefore, a permutation must be applied to the operators used in
the solution process after the factorization is performed. An example of an incomplete LU factorization
process that uses a pivoting strategy is the minimum discarded fill (MDF) ILU strategy. In MDF-ILU,
pivot choices are made in order to minimize the amount of discarded fill.
After the factorization process is completed, operators such as the system matrix, A, and the restric-
tion and prolongation operators for the multigrid preconditioner, Icf and Ifc , must be reordered using
the permutation matrix that was obtained during the factorization process. The reordering is achieved
using the permutation matrix P and its transpose. Note that P is an orthonormal matrix (P−1 = PT).
In practice, the matrix P is stored as a vector.
In order to better understand how these permutations affect the multigrid preconditioner, Algorithms
9 and 10 are re-written using the permutation matrix, P. Algorithm 11 shows the effect of the reordering
on the smoothing process. The matrix PMPT is the only operator that has a built-in reordering since it
was subject to the pivoting process during or prior to the incomplete factorization. The system operator,
A has its rows and columns reordered according to P, leading to a new operator PAPT.
The permutations can be extended to the multigrid preconditioner. Algorithm 12 shows how the
permutations are used in the two-grid cycle preconditioner. The key operators that need to be considered
in terms of reordering are the system matrices on the fine and coarse grids, PfAfPTf and PcAcPT
c , and
the restriction and prolongation operators, PcIcfPTf and PfIfc PT
c .
5.4 MULTIGRID PRECONDITIONING 75
Algorithm 12 Multigrid V-cycle (reordered): MGV2(Af ,Mf ,Pf ,zf ,vf ,Ac,Mc,Pc,ν1,ν2,νc)
Initialize: Pfzf = 0
Perform ν1 pre-smoothing iterations: RELAX(PfAfPTf ,PfMfPT
f ,Pfzf ,Pfvf ,ν1)
Compute the residual: Pfrf = Pfvf − (PfAfPTf )Pfzf
Restrict the residual: Pcvc = (PcIcfPTf ) Pfrf
Initialize the coarse-grid correction: Pczc = 0
if νc == 0 then
Solve the coarse-grid system exactly: (PcAcPTc )Pczc = Pcvc
else
Solve the coarse-grid system inexactly: RELAX(PcAcPTc ,PfMcPT
f ,Pczc,Pcvc,νc)end if
Prolong the coarse-grid correction: Pfzf ← Pfzf + (PfIfc PTc ) Pczc
Perform ν2 post-smoothing iterations: RELAX(PfAfPTf ,PfMfPT
f ,Pfzf ,Pfvf ,ν2)
Algorithm 13 Multigrid V-cycle (scaled): MGV2(Af ,Mf ,S1,f ,S2,f ,zf ,vf ,Ac,Mc,S1,c,S2,c,ν1,ν2,νc)
Initialize: S−12,fzf = 0
Perform ν1 pre-smoothing iterations: RELAX(S1,fAfS2,f ,S1,fMfS2,f ,S−12,fzf ,S1,fvf ,ν1)
Compute the residual: S1,frf = S1,fvf − (S1,fAfS2,f )S−12,fzf
Restrict the residual: S1,cvc = (S1,cIcfS−11,f ) S1,frf
Initialize the coarse-grid correction: S−12,c zc = 0
if νc == 0 then
Solve the coarse-grid system exactly: (S1,cAcS2,c)S−12,c zc = S1,cvc
else
Solve the coarse-grid system inexactly: RELAX(S1,cAcS2,c,S1,cMcS2,c,S−12,c zc,S1,cvc,νc)
end if
Prolong the coarse-grid correction: S−12,fzf ← S
−12,fzf + (S−1
2,fIfc S2,c) S−12,c zc
Perform ν2 post-smoothing iterations: RELAX(S1,fAfS2,f ,S1,fMfS2,f ,S−12,fzf ,S1,fvf ,ν2)
Accounting for scaling
If a row and column scaling is applied to the general system that arises in the preconditioning step (5.8)
of the GMRES algorithm, the system becomes
(S1AfS2)(S−1
2 zf)
= (S1vf ) (5.86)
Algorithm 13 shows a two-grid-level multigrid V-cycle preconditioner that incorporates row and column
scalings for both the fine and coarse grid levels.
5.5 CHAPTER SUMMARY AND HIGHLIGHTS OF CONTRIBUTIONS 76
Accounting for reordering and scaling
In order to avoid a bookkeeping nightmare, all reordering and scaling operators are absorbed into the
restriction and prolongation operators. This means that the development of a multigrid preconditioner
can be done first without row and column scaling, as well as reordering. Once preliminary results are
obtained, the algorithm can be extended to include first reordering and then scaling(s).
In the case where reordering is performed before scaling, the restriction and prolongation operators
are
S1,c Pc Icf PTf S
−11,f and S−1
2,f Pf Ifc PTc S2,c (5.87)
In the case where scaling is performed before reordering, the inter-grid transfer operators are
Pc S1,c Icf S−11,f PT
f and Pf S−12,f Ifc S2,c PT
c (5.88)
5.5 Chapter Summary and Highlights of Contributions
This chapter outlines the various preconditioning techniques that were explored and developed. Pre-
conditioning is the focus of this thesis; hence, a summary is presented with particular focus on the
contributions that were made during this research.
In the first section, the right-preconditioned GMRES algorithm was presented. Particular focus was
made on how scaling of the linear system affects the algorithm.
In the second section, the ILU(p) preconditioner was introduced. In particular, the Crout formulation
was discussed in detail with particular focus on the error of the incomplete factorization. A proof was
presented, demonstrating that the incomplete factorization is insensitive to row and column scaling of
the system matrix.
The focus of the subsequent section was on orderings. Essential definitions from graph theory were
presented and used to briefly outline the minimum degree, reverse Cuthill–McKee (RCM) and minimum
discarded fill (MDF) orderings. RCM ordering is a baseline ordering in this research and is based on
minimizing the bandwidth of the system matrix. In contrast, the MDF ordering minimizes the fill-in
that is discarded during an incomplete factorization process. The MDF ordering, originally developed
for the system of equations resulting from discretized linear PDEs, was extended to the system of
equations resulting from the discretized Navier–Stokes equations. An approach was developed that
reduces the blocks in the Jacobian matrix in Newton’s method, using a similar approach to that of
Persson and Peraire [145] for finite-element discretizations. The resulting ordering algorithm was proven
to be insensitive to row scaling, but sensitive to column scaling. The latter phenomenon therefore
required particular consideration of how the linear system was scaled.
The final section of this chapter outlined the process of developing a linear multigrid preconditioner
by extending the baseline ILU(p) preconditioning step to an entire method. First, stationary methods
were introduced as well as the concept of a smoother. To facilitate and justify the use of ILU(p) in
5.5 CHAPTER SUMMARY AND HIGHLIGHTS OF CONTRIBUTIONS 77
the multigrid algorithm, its effectiveness as a smoother was demonstrated for the convection-diffusion
equation. It was shown that ILU(0) and ILU(1) exhibit excellent smoothing properties for high-frequency
errors. It was demonstrated that ILU(p) has greater coupling in its incomplete factors relative to lower-
and upper-triangular matrices that are used in classical symmetric Gauss–Seidel relaxation.
The ILU(p) preconditioning step was extended to an iterative method using a clearly-defined algo-
rithm. This algorithm was then embedded into a broader multigrid preconditioning algorithm. It is
believed that this algorithm is potentially one of the most clearly-defined approaches for a researcher
to transition from a simple preconditioning step (e.g. say ILU(p)) in a linear system solver to the linear
(geometric) multigrid method as a preconditioner. The use of ordering and scaling in the multigrid
preconditioning process was also described in detail, with special attention paid to the restriction and
prolongation operators.
Chapter 6
RESULTS
This chapter is divided into two sections: results from the convection-diffusion equation and results from
the Euler and Navier–Stokes equations. Within each section, several studies are presented in a manner
that mirrors the investigations that were conducted throughout this research. The studies that are
presented are a subset of a larger set of studies and represent the most relevant and novel with respect
to preconditioning.
In the section presenting the results from the convection-diffusion equation, the investigations include:
the impact of Peclet number on the performance of GMRES; the effect of the ILU(p) preconditioner
both with and without a multigrid correction on the performance of GMRES; the effect of iterative
ILU(p) preconditioning; the effect of ordering in the formation of the preconditioner on the performance
of GMRES; a comparison of MDF reordering for matrices arising from centered-difference discretizations
to matrices arising from upwinding; and the use of an evolutionary algorithm in identifying a root node
and a tie-breaking strategy for the MDF algorithm.
In the section presenting the results for the Euler and Navier–Stokes equations, the studies include:
the effect of ILU(p) preconditioning on GMRES; a comparison of ILU(p) preconditioning to iterative
ILU(p) preconditioning and ILU(p)-smoothed multigrid preconditioning; and a comparison of various
orderings including natural, RCM and MDF.
78
6.1 CONVECTION-DIFFUSION EQUATION 79
All cases (unless specified otherwise) are run on desktop computer with an Intel R© Dual-CoreTM i3
CPU 530 processor, with a 1.60GHz CPU per core and 4GB of RAM.
6.1 Convection-Diffusion Equation
A sequence of studies is presented here for the convection-diffusion equation. There are many parameters
that are constant for some studies and variable for others. Therefore, this section begins with a descrip-
tion of parameters and constructs of the convection-diffusion solver that remain unchanged, encompass
all cases, or are the default.
The 2D convection-diffusion equation has constants Pe, ~v, and µ which represent the Peclet number,
the velocity vector and diffusion coefficient respectively. For simplicity, a unit velocity vector and length
scale is assumed, which is inclined at an angle θ. Therefore Pe and θ are the physical constants that
define the flow and µ is defined implicitly by the definition of the Peclet number (2.40). The Peclet
number changes from one study to the next, and a baseline value of θ = 22 is used.
The boundary conditions used in (2.43) for all cases include the following Dirichlet conditions on the
upstream boundaries:
φ(x, 0) = [4x(x− 1)]2
(6.1)
φ(0, y) = [4y(y − 1)]2
(6.2)
The problem is discretized on the domain R[0, 1] × R[0, 1], using a uniform grid with a number of
nodes in each direction that facilitates a desired number of coarser grids for multigrid. A coarse grid is
derived from a finer grid by removing its even-numbered nodes in each direction. The n-th grid level is
denoted as CUn, where n = Z[0, 7], and the number of nodes for a given grid level is(29−n + 1
)2. The
finest grid level (n = 0) has 5132 nodes and the coarsest grid level (n = 7) has 52 nodes.
Second-order centred-differences are used to discretize both the first and second derivatives. An
artificial dissipation coefficient of ε = 0.1 is used for (3.55) and (3.56).
Based on preliminary tests, the baseline GMRES parameters include a restart value of 400 iterations,
a total number of 1200 iterations, and a relative tolerance of 10−8. In the case of multigrid precondi-
tioning, a default of ν1 = 1 pre-smoothing and ν2 = 0 post-smoothing iterations are used in the general
V-cycle as defined in Algorithm 10.
MATLAB R© R2011a is used for all convection-diffusion simulations. It contains its own ILU(0)
routine, in addition to other drop-tolerance and modified routines. Although its ILU(0) routine is
optimized and fast, it is not used because it does not permit non-zero fill-in values and it would make for
an unfair comparison to an external ILU(p) routine. In lieu of MATLAB’s ILU(0) routine, an external
ILU(p) algorithm was developed using the Crout formulation presented in Algorithm 3. This formulation
is amenable to the MDF algorithm, which is an essential component of this research.
6.1 CONVECTION-DIFFUSION EQUATION 80
0 50 100 15010
−8
10−6
10−4
10−2
100
102
GMRES Iteration
Resid
ual
(a) GMRES convergence (b) Solution
0 2 4 6 8 10 12 14
x 104
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2x 10
−4
cond(A) = 1130517
ℜ(λ)
ℑ(λ
)
(c) Unpreconditioned matrix
−1 −0.5 0 0.5 1−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
ρ(GILU(1)
) = 9.9863e−01
ℜ(λ)
ℑ(λ
)
(d) Preconditioned matrix
Figure 6.1: Convergence of GMRES, solution, and eigenvalues of system matrix with and without ILU(1)
preconditioning of a uniform grid case with a Peclet number of 0.001.
6.1.1 GMRES convergence and Peclet number
This investigation looks at the effectiveness of ILU(p) as a preconditioner for GMRES across a broad
range of Peclet numbers. Fill-in levels of 0 and 1 are used. All cases are run using grid CU2. A natural
ordering of grid nodes is used. Figures 6.1 and 6.2 show the convergence of GMRES and solutions
for Peclet numbers of 0.001 and 1000, respectively. For the diffusion-dominated case, information is
spread out from the Dirichlet boundary condition, and for the convection-dominated case, information
is propagated to the outflow boundary along the direction of the velocity.
The eigenvalue spectrum of the unpreconditioned system matrix, A, for a Peclet number of 0.001
is shown in Figure 6.1(c). The conditioning of this matrix is quite poor, thus warranting the use of a
6.1 CONVECTION-DIFFUSION EQUATION 81
0 5 10 1510
−8
10−6
10−4
10−2
100
102
GMRES Iteration
Resid
ual
(a) GMRES convergence (b) Solution
0 50 100 150 200−200
−150
−100
−50
0
50
100
150
200
cond(A) = 3062
ℜ(λ)
ℑ(λ
)
(c) Unpreconditioned matrix
−1 −0.5 0 0.5 1−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
ρ(GILU(1)
) = 8.8273e−02
ℜ(λ)
ℑ(λ
)
(d) Preconditioned matrix
Figure 6.2: Convergence of GMRES, solution, and eigenvalues of system matrix with and without ILU(1)
preconditioning of a uniform grid case with a Peclet number of 1000.
preconditioner. Figure 6.1(d) shows the eigenvalue spectrum of the iteration matrix, G = I −M−1A.
The iteration matrix is used in contrast to the preconditioned matrix to facilitate the computation of a
spectral radius. For this particular case, the spectral radius is 0.99863 (i.e. less than 1), meaning ILU
will dampen all error modes during the preconditioning step in GMRES.
Similar eigenvalue spectra are shown in Figures 6.2(c) and 6.2(d) for a Peclet number of 1000. For
this case, the condition number of the unpreconditioned matrix is not as poor as for the much lower
Peclet number. Furthermore, the spectral radius is 0.088273, meaning ILU will rapidly dampen all error
modes. This is evident from the convergence information for this case. Only 15 GMRES iterations
are required for convergence, compared to the case with a Peclet number of 0.001, which required 127
iterations.
6.1 CONVECTION-DIFFUSION EQUATION 82
Table 6.1: GMRES iterations and CPU times with (left) ILU(0) and (right) ILU(1) preconditioning on
a 129× 129-node grid for various Peclet numbers.
Pe GMRES Form Solve Total
Iterations (s) (s) (s)
10−3 186 16.3 10.0 26.3
10−2 190 16.3 10.7 27.0
10−1 194 16.3 11.0 27.3
1 198 16.3 11.5 27.8
101 167 16.4 8.2 24.6
102 68 16.4 1.8 18.2
103 63 16.3 1.7 18.0
Pe GMRES Form Solve Total
Iterations (s) (s) (s)
10−3 127 32.3 10.8 43.1
10−2 127 32.2 5.1 37.3
10−1 127 32.6 5.1 37.7
1 126 32.7 5.2 37.9
101 103 32.6 3.8 36.4
102 43 32.5 0.9 33.4
103 15 32.4 0.2 32.6
Table 6.2: GMRES iterations and CPU times with (left) ILU(0) and (right) ILU(1) preconditioning on
a 129× 129-node grid for a Peclet number of 0.001.
ILU(0) GMRES Form Solve Total
Iterations Iterations (s) (s) (s)
1 186 16.2 10.1 26.3
2 120 16.4 4.9 21.3
3 98 16.6 3.9 20.5
4 85 16.3 3.2 19.5
5 76 16.2 2.9 19.1
6 69 16.6 2.5 19.1
7 65 16.5 2.5 19.0
8 61 16.4 2.3 18.7
9 57 16.5 2.4 18.9
ILU(1) GMRES Form Solve Total
Iterations Iterations (s) (s) (s)
1 127 32.8 5.3 38.1
2 83 32.4 2.8 35.2
3 68 32.3 2.2 34.5
4 59 32.4 1.9 34.3
5 53 32.5 1.7 34.3
6 48 32.4 1.5 33.9
7 44 32.7 1.7 34.4
8 42 32.6 1.6 34.2
9 39 32.6 1.5 34.1
Table 6.1 summarizes the results over a broad range of Peclet numbers. The number of iterations for
the convection-dominated cases are much lower than for the diffusion-dominated cases. Furthermore, a
minimum number of iterations is found for a Peclet number of 1000. ILU(1) is a better preconditioner
than ILU(0) in terms of iterations. However, in terms of CPU time, it is roughly twice as slow in its
formation, resulting in a slower solver overall. CPU times are not emphasized because the implementation
of ILU(p) is not efficient and therefore constitutes the majority of the overall CPU time.
6.1 CONVECTION-DIFFUSION EQUATION 83
Table 6.3: GMRES iterations and CPU times with (left) ILU(0) and (right) ILU(1) preconditioning on
a 129× 129-node grid for a Peclet number of 1000.
ILU(0) GMRES Form Solve Total
Iterations Iterations (s) (s) (s)
1 61 16.1 1.7 17.8
2 *1200 16.3 127.2 143.5
3 505 16.4 47.0 63.4
ILU(1) GMRES Form Solve Total
Iterations Iterations (s) (s) (s)
1 15 32.6 0.2 32.8
2 9 33.1 0.2 33.3
3 6 32.5 0.1 32.6
6.1.2 Iterative ILU(p) preconditioning
The focus of this investigation is on how iterative ILU(p) preconditioning affects the performance of
GMRES for both convection- and diffusion-dominated flows. Specifically, fill-in values of p = 0 and
p = 1 are considered. All cases are run using grid CU2 with a natural nodal ordering. Table 6.2
summarizes the results for the diffusion-dominated case with a Peclet number of 0.001. For this case,
the general trend is that for an increasing number of iterations of ILU(p) the number of GMRES iterations
decreases. In terms of CPU time, it should be noted that an efficient implementation of ILU(p) is not
used here. Therefore, the formation time of the linear system and the ILU(p) preconditioner accounts
for the majority of the computational time.
Table 6.3 shows the results for the convection-dominated cases with a Peclet number of 1000. For a
fill-in level of p = 0, additional ILU iterations in the preconditioning step do not improve the performance
of GMRES. It is believed that this is caused by the presence of unstable eigenvalues in the preconditioned
matrix whose modes are effectively damped by GMRES if a single iteration of ILU(0) is used. However, if
multiple ILU(0) iterations are used, GMRES is unable to counteract the relative growth of these unstable
error modes. However, for a fill-in level of p = 1, the number of GMRES iterations is reduced. The
reference ILU(1) preconditioner for this problem is initially quite good, since GMRES only requires 15
iterations, compared to the diffusion-dominated case which requires 127 iterations. Therefore, multiple
ILU(1) preconditioning does not provide a substantial reduction in CPU time. For more practical
problems, there is a potential for increased reduction since the baseline number of GMRES iterations is
larger.
6.1.3 ILU(p) and multigrid preconditioning
This investigation focuses on the effectiveness of ILUp)-smoothed multigrid preconditioning on GMRES.
For this study, a set of uniform grids and a natural ordering of those grid nodes are used.
Table 6.4 shows the results for the diffusion-dominated case with a Peclet number of 0.001. As
expected, multigrid dramatically reduces the number of GMRES iterations as the the number of grid
nodes increases. On grid CU0, for the case with one grid level, the number of iterations exceeds the
6.1 CONVECTION-DIFFUSION EQUATION 84
Table 6.4: GMRES iterations for various multigrid preconditioners with ILU(0) smoothing (Pe = 0.001).
Grid Nodes Grid Levels
1 2 3 4 5 6 7 8
CU5 172 25 15 13 - - - - -
CU4 332 49 24 16 14 - - - -
CU3 652 95 45 25 18 16 - - -
CU2 1292 186 88 45 27 19 18 - -
CU1 2572 373 172 88 47 29 21 19 -
CU0 5132 *1200 347 173 91 54 50 46 22
maximum number of allowable GMRES iterations, represented by a *1200 on the table. Furthermore,
the relative increase in multigrid-preconditioned iterations (for a maximum number of grid levels) with
increasing grid size is quite small. This compares well to the theory, which estimates a complexity of
order n.
Table 6.5 shows the results for a much larger Peclet number of 1000. For the finest grid, CU2, the
number of iterations required by ILU(0)-preconditioned GMRES is approximately one third compared
to the diffusion-dominated case (i.e. 61 iterations, versus 186 iterations). A very important observa-
tion is that multigrid acceleration of the ILU(0) preconditioner does not offer an improvement in the
performance of GMRES and that this phenomenon is exacerbated with increasing grid size.
The baseline pre- and post-smoothing parameters in the multigrid preconditioner were used in this
study. Investigations were conducted to determine the optimal number of smoothing iterations before
and after the coarse-grid correction(s) for the diffusion-dominated cases on all uniform grids. It was
found that additional smoothing does not improve the performance of the multigrid preconditioner in
terms of CPU time. Furthermore, a comparison was made between solving the coarsest grid level problem
directly or by a smoothing iteration. For coarse grids, a direct solve was faster, however, with increasing
grid size, the performance of multigrid-preconditioned GMRES is insensitive to this decision.
6.1.4 Orderings
Two key investigations in this research are multigrid preconditioning and orderings. Multigrid precondi-
tioning significantly reduces the number of GMRES iterations for diffusion-dominated cases. However,
for convection-dominated cases, the results are not as promising. In this section, ordering strategies are
investigated and particular attention is paid to convection-dominated flows. Multigrid preconditioning
is also considered with these orderings for the diffusion-dominated cases.
For this study, the orderings considered include: natural; reverse; reverse Cuthill–McKee [146]
6.1 CONVECTION-DIFFUSION EQUATION 85
Table 6.5: GMRES iterations for various multigrid preconditioners (Pe = 1000).
Grid Nodes Grid Levels
1 2 3 4 5 6
CU5 172 22 24 22 - - -
CU4 332 25 29 26 27 - -
CU3 652 52 425 433 447 321 -
CU2 1292 61 *1200 419 431 437 436
Table 6.6: GMRES iterations for various orderings using ILU(0) multigrid preconditioning (129 × 129
nodes and Pe = 0.001).
Ordering Grid Levels
1 2 3 4 5 6
natural 186 88 45 27 19 18
reverse 188 88 47 29 21 19
RCM 186 88 45 27 19 18
MDF 186 88 45 27 19 18
(RCM); and minimum discarded fill [114] (MDF). MATLAB’s RCM routine is used in this work. In this
routine, the node of minimum degree with the lowest initial index is chosen as the root node. Ties are
broken by selecting the node with the lowest initial index. MDF was implemented in this research using
key aspects from the literature as well as novel contributions to the approach, most importantly to the
tie-breaking strategy. Various tie-breaking strategies for MDF are compared in a later section. For all
cases considered, the default parameters are used in the formation of the linear system and the GMRES
algorithm. Furthermore, the uniform CU2 and CU0 grids are used for these cases.
Tables 6.6 and 6.7 summarize the GMRES iterations for the diffusion-dominated and convection-
dominated cases for grid CU2 using ILU(0) preconditioning and smoothing for multigrid, respectively.
For the diffusion-dominated case, multigrid preconditioning is also considered. For this grid, little
variation is observed in the number of GMRES iterations for the various orderings.
Tables 6.8 and 6.9 summarize the results for the diffusion-dominated case for the various orderings
with ILU(0), ILU(1) and multigrid preconditioning on the finer CU1 grid. For ILU(0), the performance
of GMRES, with or without multigrid preconditioning, is virtually insensitive to the ordering that is
used. However, for ILU(1), MDF is the clear winner requiring roughly 20% fewer iterations than the
6.1 CONVECTION-DIFFUSION EQUATION 86
Table 6.7: GMRES iterations for various orderings using ILU(0) preconditioning (129× 129 nodes and
Pe = 1000).
Ordering GMRES Iterations
natural 61
reverse 62
RCM 62
MDF 61
Table 6.8: GMRES iterations for various orderings using ILU(0) multigrid preconditioning (257 × 257
nodes and Pe = 0.001).
Ordering Grid Levels
1 2 3 4 5 6 7
natural 373 172 88 47 29 21 19
reverse 375 172 89 49 31 23 21
RCM 373 172 88 47 29 21 19
MDF 373 172 88 47 29 21 19
RCM, natural, and reverse orderings.
Table 6.10 summarizes the results for the convection-dominated case for the various orderings with
ILU(1) preconditioning on the finer CU1 grid. For this case, as for grid CU2, MDF yields the fewest
iterations with roughly 40% and two-thirds the iterations of natural ordering and RCM, respectively.
Although MDF requires the fewest iterations, its cost of formation is much larger than that of RCM.
In the literature, there are suggestions to modify this algorithm to make it more efficient. In the main
focus of this research (nonlinear systems solved by a Newton-GMRES algorithm), the higher CPU cost
of MDF could be amortized over many linear system solves.
The relative performance of MDF with respect to the other orderings improves with a fill-in level of 1.
This suggests that the advantage of minimizing the discarded fill over a matrix bandwidth minimization
(RCM) is even more relevant as the allowable nonzero sparsity pattern of the factorization increases.
6.1 CONVECTION-DIFFUSION EQUATION 87
Table 6.9: GMRES iterations for various orderings using ILU(1) multigrid preconditioning (257 × 257
nodes and Pe = 0.001).
Ordering Grid Levels
1 2 3 4 5 6 7
natural 252 114 59 32 20 16 15
reverse 254 116 61 34 22 17 17
RCM 252 114 59 32 20 16 15
MDF 195 86 44 24 15 12 11
Table 6.10: GMRES iterations for various orderings using ILU(1) preconditioning (257× 257 nodes and
Pe = 1000).
Ordering GMRES Iterations
natural 24
reverse 24
RCM 15
MDF 10
6.1.5 Further investigation of MDF
The MDF reordering strategy showed promise in preliminary investigations when compared to other
orderings, especially for convection-dominated cases on finer grids with ILU(1). Two key components
of the ordering, as well as other orderings such as RCM, include the selection of the root node and
tie-breaking strategy. In the previous subsection, the node with the lowest index was chosen to break
ties. A novel idea to this research is to incorporate the physics and geometry of the problem into the
reordering strategy. This arises from the observation that MDF can result in a reordering that produces
a matrix whose sparsity pattern resembles one that arises from an upwinding discretization.
This subsection is divided into three components. First, a connection is made between MDF and the
flow direction (i.e. upwinding). Next, results from an evolutionary algorithm provide insight into root
node selection and tie-breaking strategies for a simple convection-dominated case. Finally, distance-
and line-distance-based tie-breaking strategies are compared to the baseline index-based tie-breaking
strategy.
Row and column scaling of the system matrix were also investigated. It was experimentally confirmed
6.1 CONVECTION-DIFFUSION EQUATION 88
that MDF is insensitive to row scaling, as proven in Chapter 5. Furthermore, experiments with column
scaling found that the MDF reordering of the original (unscaled) system matrix is superior to a matrix
whose columns are scaled by the diagonal entries or the square root of the diagonal entries of the matrix.
Connection between MDF and upwinding
The purpose of this investigation is to demonstrate a connection between MDF and the flow direction.
Furthermore, it clearly shows that depending on the initial ordering of the matrix, MDF can lead to
multiple orderings that are equally good.
Consider the matrix that arises from the discretization of the convection-diffusion equation on a
5 × 5–node grid. Specifically, the Peclet number is 109, the flow angle is 45, and a second-order
centered-difference discretization with a dissipation coefficient of ε = 0.5 is used. Figure 6.3 illustrates
the sparsity pattern of this matrix, along with the relative size and sign of each entry. Note: upward-
and downward-facing triangles indicate positive and negative entries, respectively. For this convection-
dominated case, the upper-triangular entries are very small. If the Peclet number was infinite, they
would be zero.
Now consider the system matrix with the small upper triangular entries removed, as shown in Figure
6.4. The resulting matrix resembles one that is obtained after a first-order upwinding discretization in
each spatial direction. Since the matrix is lower-triangular, its ILU(0) factorization is exact.
Figure 6.5 shows the resulting matrix after the MDF reordering algorithm is applied. Note that
MDF yields a lower-triangular matrix and hence would have an exact ILU(0) factorization. However,
this reordered matrix does not have the same pattern as the original matrix, yet leads to an exact
ILU(0) factorization. Furthermore, it does not simply correspond to the upwinding discretization using
a natural ordering in the alternate direction to the original ordering.
Now consider the application of a random permutation to the matrix in Figure 6.4, shown in Figure
6.6. The LU-factorization of this matrix, shown in Figure 6.7, clearly illustrates that there is additional
fill-in. Hence, ILU(0) will have an associated error, resulting from discarded entries.
The application of MDF to the randomly-permuted matrix leads to a reordering of the system matrix
shown in Figure 6.8. The MDF algorithm leads to an ordering that yields zero discard. An interesting
result is that the matrix is not lower-triangular, which has guaranteed zero discard. Furthermore,
boundary nodes are ordered first.
Depending on the initial ordering, MDF has led to different orderings that, for this study, all have
zero discard. The tie-breaking strategy used for this investigation is simply choosing the node with the
lowest initial index. This also applies in the selection of the initial (or root) node. Hence, it is important
to consider both root-node selection and tie-breaking strategies in more detail. Furthermore, the result-
ing multiple optimal orderings motivate answering a much more difficult and broad question: Which
orderings minimize discard? An evolutionary algorithm was developed to help answer this question as
well as to determine statistically the best root-node location(s) over a range of Peclet numbers.
6.1 CONVECTION-DIFFUSION EQUATION 89
0 5 10 15 20 25
0
5
10
15
20
25
N = 25 , Nnz
= 81
0 < |aij| < 1e−04 , N
nz = 24
1e−04 <= |aij| < 1e−02 , N
nz = 0
1e−02 <= |aij| < 1 , N
nz = 9
1 <= |aij| , N
nz = 48
Figure 6.3: Initial system matrix for a 5×5–node grid with a Peclet number 109. Upward- and downward-
facing triangles represent positive and negative values, respectively.
0 5 10 15 20 25
0
5
10
15
20
25
N = 25 , Nnz
= 57
0 < |aij| < 1e−04 , N
nz = 0
1e−04 <= |aij| < 1e−02 , N
nz = 0
1e−02 <= |aij| < 1 , N
nz = 9
1 <= |aij| , N
nz = 48
Figure 6.4: System matrix after very small entries are discarded. Upward- and downward-facing triangles
represent positive and negative values, respectively.
6.1 CONVECTION-DIFFUSION EQUATION 90
0 5 10 15 20 25
0
5
10
15
20
25
N = 25 , Nnz
= 57
0 < |aij| < 1e−04 , N
nz = 0
1e−04 <= |aij| < 1e−02 , N
nz = 0
1e−02 <= |aij| < 1 , N
nz = 9
1 <= |aij| , N
nz = 48
Figure 6.5: Resulting matrix after MDF ordering. Upward- and downward-facing triangles represent
positive and negative values, respectively.
0 5 10 15 20 25
0
5
10
15
20
25
N = 25 , Nnz
= 57
0 < |aij| < 1e−04 , N
nz = 0
1e−04 <= |aij| < 1e−02 , N
nz = 0
1e−02 <= |aij| < 1 , N
nz = 9
1 <= |aij| , N
nz = 48
Figure 6.6: Resulting matrix after a random permutation. Upward- and downward-facing triangles
represent positive and negative values, respectively.
6.1 CONVECTION-DIFFUSION EQUATION 91
0 5 10 15 20 25
0
5
10
15
20
25
N = 25 , Nnz
= 57
A pattern
L pattern
U pattern
Figure 6.7: LU-factorization of the randomly-permuted matrix.
0 5 10 15 20 25
0
5
10
15
20
25
N = 25 , Nnz
= 57
0 < |aij| < 1e−04 , N
nz = 0
1e−04 <= |aij| < 1e−02 , N
nz = 0
1e−02 <= |aij| < 1 , N
nz = 9
1 <= |aij| , N
nz = 48
Figure 6.8: Resulting matrix after MDF for the randomly-permuted matrix. Upward- and downward-
facing triangles represent positive and negative values, respectively.
6.1 CONVECTION-DIFFUSION EQUATION 92
Evolutionary algorithm and MDF
The goal of this study is to identify which orderings (with a particular focus on root nodes) correspond
with the lowest amount of discarded fill-in for ILU(0).
Consider the problem from the previous section: 25 equations and unknowns resulting from the
discretization of the convection-diffusion equation on a 5 × 5–node grid. In order to fully-determine
which ordering would lead to the least discarded fill, it would require investigating 25!, or approximately
16 trillion trillion, permutations. A deterministic approach is prohibitive in terms of CPU time and
memory. Therefore a stochastic approach is used to find the optimum. Specifically, an evolutionary
algorithm is developed and used to achieve this end. From the previous subsection, it is apparent that
there can be multiple orderings that will lead to a minimized discarded fill.
The order-based evolutionary algorithm developed for this research briefly consists of the following
components: Each member of the population is an array of natural numbers from 1 to 25, representing an
ordering of 25 nodes. The fitness function is the discarded fill-in corresponding to the ILU(0) factorization
resulting from the convection-diffusion system matrix. Specifically, the norm of the difference between
the original matrix and its ILU(0) factors, ||A − LU||, is minimized. This matrix difference represents
the discarded fill.
There is an allowance of pass-through to the next generation for the most fit members. Tournament
selection, crossover and mutation govern the survival and progress of the remaining population members.
Crossover is performed using an approach presented by Davis [309]. Mutation is performed by exchanging
the position of two random entries in a population member. For this study the parameters relating to
the PDE that are investigated include the Peclet number and flow angle. A population size of 50 is
used over 25 generations. Furthermore a crossover value of 90% is used. These parameters are based on
preliminary studies. It is important to note that for preliminary studies, for each given case, multiple
orderings resulted in the lowest amount of discarded fill. (The minimum value is small for large Peclet
numbers, and optimization tolerances are discussed in the next paragraph.) Since root nodes that
corresponded to these optima were of interest, it was important to conduct many optimizations in order
to identify all of these root nodes.
Trends for 100 and 1000 converged optimizations showed enough qualitative convergence in their
pattern to suggest that 1000 converged optimizations would be sufficient for each case investigated.
Specifically, preliminary runs for Peclet numbers of 10−9, 1, 104 and 109 were executed for flow angles of
0, 15, and 45. The angle 15 is somewhat arbitrary since its performance was similar to other angles
between 0 and 45. Ultimately, the 0 angle was rejected since it corresponded to a velocity along only
one direction. The 45 angle was rejected since it would lead to possible symmetry in the discretization.
Hence, the 15 angle was chosen for the test cases. The extremes for the Peclet numbers investigated
were chosen for the test cases. Specifically, the Peclet numbers of 10−9 and 109 exhibited the most
contrast in the discretization between a diffusion and a convection operator, respectively. For each test
case, 1000 converged optimizations were used. Many optimizations were considered since each case had
6.1 CONVECTION-DIFFUSION EQUATION 93
Table 6.11: Locations of root nodes that correspond to minimized discarded fill using an evolutionary
algorithm for convection-dominated (Pe = 109) and diffusion-dominated (Pe = 10−9) cases for a 5× 5–
node grid with flow angle of θ = 15.
Diffusion (d) x Index
Convection (c) 1 2 3 4 5y
Ind
ex
1 d c d c d c d c d c
2 d c d d d d
3 d c d d d d
4 d c d d d d
5 d c d d d d c
more than one optimal ordering. The optimization tolerances for the respective Peclet numbers were
1 and 10−8 for Peclet numbers of 10−9, and 109 respectively. Experiments show that it was possible
to achieve smaller tolerances for the convection-dominated cases because the upper-triangular entries in
the initial matrix decrease in magnitude as the Peclet number increases.
Table 6.11 summarizes the root node locations that corresponded to a minimized amount of discarded
fill for both convection (Pe = 109) and diffusion (Pe = 10−9). For diffusion, all nodes can be root nodes
that lead to an optimal ordering. This is consistent with results by d’Azevedo et al. [114] when studying
Laplace’s equation (i.e. a Peclet number of zero). For convection, the upstream boundary nodes and the
downstream corner node are the only root nodes that correspond to a minimal discarded fill.
The results of these studies indicate that an intelligent root-node selection strategy is important to
an MDF reordering strategy for convection-dominated flows. This is also true for RCM. Experiments
also showed that there were neighbouring nodes to the root node were subsequently chosen as second
and third nodes for MDF. This observation was incorporated into the development of more intelligent
root-node selection and tie-breaking strategies.
Tie-breaking strategy
Results from the studies involving the evolutionary algorithm indicate the importance of root-node
selection and tie-breaking strategy in the MDF algorithm, especially for convection-dominated flows, for
a 5×5–node grid. For earlier studies, ties were broken by selecting the node with the lowest index, where
the index was based on a natural ordering of the grid nodes. In this section, this strategy is compared
to two novel strategies. They are referred to as distance and line-distance tie breaking strategies.
In the distance tie-breaking strategy the node which is most upstream is chosen. Therefore, the
ordering routine is provided with the physical location of each grid node and the freestream velocity.
6.1 CONVECTION-DIFFUSION EQUATION 94
Algorithm 14 Downstream-line tie-breaking strategy for MDF
Define W as the set of nodes that have tied for minimum discarded fill. Let ~v∞ be the freestream
velocity vector. Let Ps ∈ R3 correspond to the physical location of node ws ∈W .
while W 6= ∅ doChoose wt ∈W which is the most upstream node as the next node to be ordered.
W ←W − wtwhile wt has downstream neighbours do
Let wn be the node that is the closest downstream node to wt:
wn = minws∈W
∣∣∣−−→PtPs∣∣∣ such that−−→PtPs · ~v∞ > 0
Order wn as the next node.
wt = wn
W ←W − wtend while
end while
The line-distance tie-breaking strategy, shown in Algorithm 14, begins by selecting a node that is
most upstream. However, it then chooses subsequent nodes that are downstream of the previous node.
If no other downstream nodes exist, the algorithm selects the most upstream node next. If there are
many nodes that tie, the resulting ordering of those nodes will be a collection of lines that progress in
the downstream direction.
Table 6.12 compares the performance of the index, distance, and line-distance tie-breaking strategies
for the MDF algorithm. Results for the RCM reordering are also shown for comparison. For all cases
presented, MDF outperforms RCM in terms of iterations. For the coarsest grid, CU3, there is no
noticeable difference in the performance of the various tie-breaking strategies. For the finer grids, the
line-distance tie-breaking strategy outperforms the index- and distance- based approaches, for a fill-in
level of p = 1.
6.1 CONVECTION-DIFFUSION EQUATION 95
Table 6.12: GMRES iterations for MDF-ILU(p) preconditioners (Pe = 1000) and a comparison to RCM.
Note: The most upstream node is (1,1).
Grid Ordering Root Node Tie-Breaking Preconditioner
Strategy ILU(0) ILU(1)
CU3 RCM (1,1) index 30 14
CU3 MDF (1,1) index 29 12
CU3 MDF (1,1) distance 29 12
CU3 MDF (1,1) distance / line 29 10
CU2 RCM (1,1) index 62 16
CU2 MDF (1,1) index 61 11
CU2 MDF (1,1) distance 60 11
CU2 MDF (1,1) distance / line 61 9
CU1 RCM (1,1) index - 15
CU1 MDF (1,1) index - 10
CU1 MDF (1,1) distance - 10
CU1 MDF (1,1) distance / line - 8
6.2 EULER AND NAVIER–STOKES EQUATIONS 96
Table 6.13: Computational grids for Euler and Navier–Stokes calculations.
Grid Geometry Nodes JMAX KMAX JTAIL1 JTAIL2 Off-wall Spacing
I0 NACA 0012 10,045 245 41 33 213 1× 10−3
I1 NACA 0012 2,583 123 21 17 107 2× 10−3
I2 NACA 0012 682 62 11 9 54 4× 10−3
V0 RAE 2822 18,785 289 65 33 257 2× 10−6
V1 RAE 2822 4,785 145 33 17 129 4× 10−6
V2 RAE 2822 1,241 73 17 9 65 8× 10−6
V3 RAE 2822 333 37 9 5 33 2× 10−5
Waux RAE 2822 263,425 1,025 257 129 897 6× 10−7
W0 RAE 2822 66,177 513 129 65 449 1× 10−6
W1 RAE 2822 16,705 257 65 33 225 2× 10−6
W2 RAE 2822 4,257 129 33 17 113 5× 10−6
W3 RAE 2822 1,105 65 17 9 57 1× 10−5
W4 RAE 2822 297 33 9 5 29 2× 10−5
W5 RAE 2822 85 17 5 3 15 5× 10−5
6.2 Euler and Navier–Stokes Equations
In the first half of this chapter, preconditioning and related topics were explored for the convection-
diffusion equation. Specifically, BILU(p) preconditioning, multigrid preconditioning, and orderings were
of particular interest. In this half of the chapter, many of the ideas explored thus far are extended to a
Newton–Krylov algorithm for the Euler and compressible Navier–Stokes equations.
6.2.1 Test cases
Details of computational grids that are used for the test cases are shown in Table 6.13. The family of
grids Ik are used for inviscid cases. The index k = 0 refers to the finest grid level. For the inviscid cases,
the geometry is a NACA 0012 airfoil. Similarly, for viscous cases, flow around the RAE 2822 airfoil is
simulated and the corresponding family of grids that are used are denoted as Vk. The coarser grids are
used for multigrid preconditioning, algorithm development, and eigenvalue computations. An additional
family of finer viscous grids is used for grid studies for multigrid preconditioning. This family of finer
grids is denoted as Wk.
Table 6.14 shows the test cases that are studied. Specifically, case E1 simulates inviscid, subsonic
flow, whereas E2 simulates transonic flow. There is one laminar test case L1, for which the Reynolds
6.2 EULER AND NAVIER–STOKES EQUATIONS 97
Table 6.14: Test cases for Euler and Navier–Stokes calculations.
Case Finest Grid Flow Mach Number Angle of Attack Reynolds Number
E1 I0 inviscid 0.3 0 -
E2 I0 inviscid 0.76 0 -
L1 V0 laminar 0.3 0 500
T1 V0 turbulent 0.3 0 3.0× 106
T2 V0 turbulent 0.729 2.31 6.5× 106
number is 500. Finally, there are two turbulent test cases, T1 and T2, that simulate subsonic and
transonic flow.
The baseline parameters and features of the Newton–Krylov algorithm are as follows: Jacobian-free
Newton’s method is used to solve the nonlinear problem with a pseudo-transient globalization approach,
outlined in detail in Chapter 4. To improve the robustness of the early stages of the Newton algorithm,
an approximate Jacobian is used (i.e. approximate Newton) until the L2-norm of the nonlinear residual
is below 1 × 10−5 and for a minimum of 5 nonlinear iterations. The latter condition is related to
inviscid and laminar cases where the L2-norm of the nonlinear residual is below 1× 10−5 for these early
iterations. GMRES is used to converge the linear problem by two orders of magnitude (ηk = 10−2)
outside of the transient phase. A maximum number of 80 Krylov subspace directions is permitted to
achieve the desired residual reduction. Furthermore, no restarting of the GMRES algorithm is used.
Reverse Cuthill–McKee ordering is used for the grid nodes and the linear system is preconditioned by
BILU(p). For inviscid and viscous cases the fill-in values are p = 3 and p = 4, respectively.
Table 6.15 shows the performance of the Newton-Krylov solver for the five test cases. The convergence
criterion is a reduction of the L2-norm of the nonlinear residual to 1 × 10−13. In the table, IG and IN
refer to the number of GMRES and Newton iterations, respectively. The ratio of IG to IN and CPU
times are also shown. CPU times based on the average of 5 runs for each case.
Figure 6.9 shows the convergence and Mach contour for case E1. The Mach contour shows the
presence of a stagnation point near at the leading edge of the NACA 0012 airfoil; a result expected for
inviscid flow. Furthermore, the contour is symmetric since the airfoil encounters the flow at a zero angle
of attack. Figure 6.9(a) shows the convergence of Newton’s method as a solid blue line. The residual
norm presented in this figure is based on the continuity equation. For all cases, each of the residuals
was monitored corresponding to mass momentum and energy. Superimposed to this information is the
convergence of GMRES, indicated by red circles. The independent variable is inner iterations. Inner
iterations are used here instead of CPU times in order for comparisons to be made across various
computers. The initial residual of GMRES for each nonlinear iteration matches the nonlinear residual,
6.2 EULER AND NAVIER–STOKES EQUATIONS 98
Table 6.15: Baseline Newton (IN ) iterations, GMRES (IG) iterations, and CPU times for all Euler and
Navier-Stokes test cases solved using BILU(p) preconditioning.
Case Fill-in, p IG IN IG / IN Time (s)
E1 3 197 12 16.4 4.3
E2 3 201 16 12.6 5.0
L1 4 354 13 27.2 14.5
T1 4 652 88 7.4 75.3
T2 4 522 107 4.9 83.2
since the nonlinear residual vector is the right-hand side of the linear system and an initial guess of zero
is used for GMRES.
Figure 6.10 shows the convergence and Mach contour for case E2. This case also represents symmetric
flow about a NACA 0012 airfoil with a zero angle of attack. In contrast to case E1, a transonic region,
terminated by a shock is illustrated. The convergence for cases E1 and E2 is similar, especially during
the continuation phase. The transonic case requires additional nonlinear iterations in middle phase of
the Newton algorithm, during which the transonic region is developed.
During this initial nonlinear solution phase GMRES is converged enough to ensure that the residuals
of all governing equations are being reduced at each node. If this preventative measure is not executed in
a satisfactory manner, undersolving will occur, leading to instability of the nonlinear algorithm. Once the
nonlinear residual is sufficiently small, the system is no longer influenced by the continuation parameter
(i.e. ∆t) and nonlinear convergence improves dramatically.
Figures 6.11, 6.12 and 6.13 show the convergence and Mach contours for viscous cases L1, T1 and
T2, respectively. The laminar case, L1, exhibits different nonlinear convergence behaviour compared
to the case E2. The nonlinear residual decreases significantly during the initial phase of the solution
process. The turbulent cases, T1 and T2, are the most complex flow that is simulated in the entire
suite of test cases. The latter simulation includes phenomena such as a shock, a boundary layer and
their interaction. During the continuation phase for the turbulent cases, the nonlinear residual does not
significantly decrease as the working turbulent variable and mean flow variables evolve. The residual of
the turbulent equation is monitored separately to the residual of the mean-flow equations.
The continuation method described in Chapter 4 contributes over one-half of the overall solution
cost, when measured in terms of inner iterations. For inviscid and laminar cases, this value is roughly
one-half and for turbulent cases it is two-thirds. These values are roughly related to CPU time.
6.2 EULER AND NAVIER–STOKES EQUATIONS 99
0 50 100 150 20010
−20
10−15
10−10
10−5
100
Inner Iterations
L2−
Norm
of R
esid
ual
Newton Convergence
GMRES Convergence
(a) Convergence
X
Y
0 0.5 1
0.5
0
0.5
M
0.362
0.332
0.302
0.272
0.242
0.212
0.182
0.152
0.122
0.092
0.062
0.032
0.002
(b) Mach contour
Figure 6.9: Convergence and solution for the subsonic inviscid case, E1.
0 50 100 150 200 25010
−20
10−15
10−10
10−5
100
Inner Iterations
L2−
Norm
of R
esid
ual
Newton Convergence
GMRES Convergence
(a) Convergence
X
Y
0 0.5 1
0.5
0
0.5
M
1.055
0.905
0.755
0.605
0.455
0.305
0.155
0.005
(b) Mach contour
Figure 6.10: Convergence and solution for the transonic inviscid case, E2.
6.2 EULER AND NAVIER–STOKES EQUATIONS 100
0 100 200 300 40010
−20
10−15
10−10
10−5
Inner Iterations
L2−
Norm
of R
esid
ual
Newton Convergence
GMRES Convergence
(a) Convergence
X
Y
0 0.5 1
0.5
0
0.5
M
0.331
0.301
0.271
0.241
0.211
0.181
0.151
0.121
0.091
0.061
0.031
0.001
(b) Mach contour
Figure 6.11: Convergence and solution for the laminar case, L1.
0 200 400 600 80010
−15
10−10
10−5
100
105
Inner Iterations
L2−
Norm
of R
esid
ual
Newton Convergence
GMRES Convergence
(a) Convergence
X
Y
0 0.5 1
0.5
0
0.5
M
0.331
0.301
0.271
0.241
0.211
0.181
0.151
0.121
0.091
0.061
0.031
0.001
(b) Mach contour
Figure 6.12: Convergence and solution for the subsonic turbulent case, T1.
6.2 EULER AND NAVIER–STOKES EQUATIONS 101
0 100 200 300 400 500 60010
−15
10−10
10−5
100
Inner Iterations
L2−
Norm
of R
esid
ual
Newton Convergence
GMRES Convergence
(a) Convergence
X
Y
0 0.5 1
0.5
0
0.5
M
1.205
1.055
0.905
0.755
0.605
0.455
0.305
0.155
0.005
(b) Mach contour
Figure 6.13: Convergence and solution for the transonic turbulent case, T2.
6.2 EULER AND NAVIER–STOKES EQUATIONS 102
6.2.2 Orderings
In this section three nodal orderings are compared. First, a natural ordering is considered for the C-
topology grid based on the arrangement of the nodes in the normal direction starting from the first
wake-cut, followed by the airfoil surface, and ending with the second wake-cut (see Figure 2.2). Second,
the reverse Cuthill–McKee (RCM) ordering is considered. Finally, the minimum discarded fill (MDF)
ordering is used. The continuation parameters outlined in Table 4.1 are used for all three of these
orderings.
Before discussing the results, some additional details are noted here regarding root node selection for
each ordering, and for the RCM and MDF orderings, tie-breaking strategy. The natural ordering has a
root node of (1,1) on the C-topology mesh, which lies at the downstream boundary along the wake cut
boundary interface. There is no tie-breaking strategy for natural ordering since the nodes are arranged
in order along the grid lines. Nodes are arranged first along the streamwise direction to minimize the
matrix bandwidth for rows that have no information relating to the wake cut. This is true because there
are typically fewer nodes in the streamwise direction compared to the normal direction.
The RCM reordering comes from the reversal of the Cuthill–McKee (CM) ordering. Hence, root-node
and tie-breaking strategies for the CM ordering are described here. Similar to natural ordering, the root
node for the CM ordering is the downstream node (1,1) along the wake-cut boundary. Ties are broken
by selecting the node that has the lowest initial index based on the natural ordering. The CM ordering
produces a final node that is located at the upstream farfield boundary. For a symmetric airfoil with
a symmetric C-topology mesh and an odd number of nodes in the streamwise direction, the final node
is specifically located at the upstream boundary and lies along the line extended from the chord in the
upstream direction. For RCM, the first node is therefore at the upstream boundary and the final node
is at the downstream corner along the wake-cut boundary.
Similar to CM, MDF has a root node that is at the downstream node (1,1) along the wake cut.
The criteria for selecting a root node for MDF are, in order: discard, degree, and initial index. These
criteria are discussed in detail in Chapter 5. Discard refers to information that would be lost during
the incomplete factorization, degree refers to the number of neighbours that a node possesses, and the
initial index is refers to the ordering that is used prior to MDF (i.e. natural ordering for this work
unless otherwise specified). Ties are broken for MDF based on these same three criteria. Additional tie-
breaking strategies were explored for MDF, including replacing the initial-index approach with distance
and line-distance approaches, however their performance did not yield an improvement. This observation
was in contrast to the results presented earlier for the convection-diffusion equation.
Table 6.16 compares the performance of the natural, RCM, and MDF orderings for the five test cases.
The performance measures include the total number of GMRES iterations, IG, the average number of
GMRES iterations required per Newton iteration IG / IN , and CPU time measured in seconds. RCM
outperforms the other two orderings in terms of all three performance measures.
The MDF reordering strategy did not yield an improvement compared to RCM. Recall that for the
6.2 EULER AND NAVIER–STOKES EQUATIONS 103
Table 6.16: Performance of Newton–Krylov algorithm using BILU(p) with various orderings.
Case Fill-in, p Ordering Tie Break IG IN IG / IN Time (s)
E1 3 natural - 243 16 15.2 6.3
E1 3 RCM index 197 12 16.4 4.3
E1 3 MDF index 911 17 53.6 25.4
E2 3 natural - 220 23 9.6 7.4
E2 3 RCM index 201 16 12.6 5.0
E2 3 MDF index 995 23 43.3 30.1
L1 4 natural - 1568 28 56.0 64.0
L1 4 RCM index 354 13 27.2 14.5
L1 4 MDF index - - - -
T1 4 natural - 1878 119 15.8 172.0
T1 4 RCM index 652 88 7.4 75.3
T1 4 MDF index 3121 125 25.0 424.5
T2 4 natural - 1188 153 7.8 177.0
T2 4 RCM index 522 107 4.9 83.2
T2 4 MDF index 2294 158 14.5 452.5
convection-diffusion equation, MDF outperformed RCM in terms of GMRES iterations. MDF exhibits
poor performance in the mid-late Newton stage of the nonlinear algorithm where the linear system
matrix is most stiff. It is believed that there are aspects associated with the C-topology mesh that were
not encountered with the rectangular mesh for the convection-diffusion equation that contribute to the
performance of the MDF. Since RCM performed so well relative to the natural and MDF orderings, the
tie-breaking strategy for MDF was modified to include using the index resulting from RCM instead of
the index resulting from natural ordering. However, this resulted in a decrease in the performance of
MDF. It is believed that this is because RCM and MDF have fundamentally different objectives. RCM
is a bandwidth minimization algorithm, whereas MDF is a local discard minimization algorithm. A key
difference between MDF and RCM is that RCM traverses across the wake-cut boundary and into the
interior as early as when it selects the first four nodes. In contrast, MDF selects all of the boundary
nodes, and then works its way into the interior thereafter.
6.2 EULER AND NAVIER–STOKES EQUATIONS 104
6.2.3 Iterative BILU(p) preconditioning
In this investigation, the performance of BILU(p) preconditioning is compared to iterative BILU(p)
preconditioning. RCM ordering is used for this study, since it yielded the best performance in the
previous section. All continuation parameters remain unchanged and GMRES is converged two orders
of magnitude for all cases.
Tables 6.17 and 6.18 compare the performance of the iterative BILU(p) preconditioner to the baseline
BILU(p) preconditioner for the Euler and Navier–Stokes equations. Similar to the ordering study,
performance is measured in terms of the total number of GMRES iterations, IG, the average number of
GMRES iterations per Newton iteration, IG / IN (i.e. inner per outer iterations ratio), and the CPU
time in seconds. One preconditioning cycle is equivalent to the baseline preconditioner.
A damping parameter is introduced for the iterative BILU(p) preconditioner for this study. For a sin-
gle iteration of BILU(p) the damped and undamped preconditioner have similar performance. However,
for multiple iterations of BILU(p) there is a noticeable difference in the performance of the precondi-
tioner. The damping parameters for the baseline cases, E1, E2, L1, T1 and T2, are 0.7, 0.7, 0.9, 0.6 and
1.0, respectively.
For the inviscid subsonic case, E1, additional iterations of BILU(3) reduce both the total number
of GMRES iterations and the ratio of inner to outer iterations. The approximate trend is that for an
increasing number of preconditioning cycles there is a reduction in the number of total iterations and
the ratio of inner to outer iterations, with diminishing returns. If a large enough number of iterations
of the preconditioner are executed, in theory the ratio should approach a value of 1. This is not evident
in practice because the iterative method in the preconditioner sometimes diverges (depending on the
stiffness of the linear problem for a given Newton iteration). This point is a topic of later discussion. In
terms of CPU time, the baseline preconditioner and 2 or 3 iterations of BILU(3) are competitive.
The results for the inviscid transonic case, E2, are similar to case E1 in terms of general trends.
The reduction in GMRES iterations and the ratio of inner to outer iterations are similar for both cases.
When comparing 3 preconditioning cycles to 1 cycle for example, there are 51% fewer GMRES iterations
for case E1 and 46% fewer iterations for case E2. Since the number of Newton iterations is coincidentally
unchanged for each case, the ratio of inner to outer iterations is also reduced by the same amount. In
terms of CPU time, the baseline preconditioner and 2 iterations of BILU(3) are competitive.
For the laminar case, L1, 2 to 5 iterations of BILU(4) preconditioning outperforms the baseline
BILU(4) preconditioner in terms of total number of GMRES iterations and the ratio of inner to outer
iterations. In terms of CPU time, the baseline preconditioner is the best choice for this case. In terms
of GMRES iterations, 5 iterations of BILU(4) is the best preconditioner. In terms of the ratio of inner
to outer iterations, the best preconditioner is 5 iterations of BILU(4).
The results for the turbulent subsonic case, T1, show that 5 iterations of BILU(4) preconditioning
reduces the number of GMRES iterations and the ratio of inner to outer iterations. However, this
decrease is small and does not offset the additional cost that is incurred to execute the additional
6.2 EULER AND NAVIER–STOKES EQUATIONS 105
Table 6.17: Performance of Newton–Krylov algorithm using multiple BILU(p) preconditioning for invis-
cid test cases.
Case Fill-in, p Prec. Cycs. IG IN IG / IN Time (s)
E1 3 1 197 12 16.4 4.3
E1 3 2 123 12 10.2 4.6
E1 3 3 96 12 8.0 5.0
E1 3 4 80 12 6.7 5.3
E1 3 5 72 12 6.0 5.6
E2 3 1 201 16 12.6 5.0
E2 3 2 134 16 8.4 5.5
E2 3 3 109 16 6.8 6.1
E2 3 4 94 16 5.9 6.5
E2 3 5 76 16 4.8 6.6
iterations in the preconditioning step. The baseline preconditioner requires the lowest amount of CPU
time. This case is especially difficult in terms of observing an improvement compared to the baseline
BILU(4) preconditioner. It is believed that this is due to the continuation algorithm. Specifically for
this case, the continuation algorithm contains more aggressive parameters than the turbulent transonic
case, T2. The results for the case T2 are more promising.
For the turbulent transonic case, T2, the total number of GMRES iterations and the ratio of inner to
outer iterations decrease as the number of preconditioner iterations increase for the first 5 iterations. The
baseline BILU(4) preconditioner yields the best CPU time and two iterations of BILU(4) preconditioning
results in a 46% reduction in both the total number of GMRES iterations and the ratio of inner to outer
iterations.
Recall that for case E1 it was mentioned that in theory, if enough iterations in the preconditioner are
executed, the ratio of inner to outer iterations should approach a value of 1. It is not evident, though,
in practice. In addition to this observation, the ratio of inner to outer iterations worsens for tougher
cases for a large amount of preconditioner iterations. Eigenvalue analysis of the iteration matrix for the
iterative BILU(p) preconditioner provides some insight. There are unstable eigenvalues (i.e. eigenvalues
that produce a spectral radius greater than 1) in the preconditioning iteration matrix for many cases
for given Newton iterations. It is believed that GMRES reduces the error modes associated with these
eigenvalues. However, if too many iterations of the preconditioner are performed, then GMRES is unable
to reduce the amplified error modes associated with these eigenvalues. The introduction of the damping
parameter for this study attenuated this effect. Eigenvalue analyses on coarser grids for unstable iterative
6.2 EULER AND NAVIER–STOKES EQUATIONS 106
Table 6.18: Performance of Newton–Krylov algorithm using multiple BILU(p) preconditioning for lam-
inar and turbulent test cases.
Case Fill-in, p Prec. Cycs. IG IN IG / IN Time (s)
L1 4 1 354 13 27.2 14.5
L1 4 2 217 13 16.7 15.5
L1 4 3 161 13 12.4 16.2
L1 4 4 157 13 12.1 19.4
L1 4 5 124 13 9.5 19.2
T1 4 1 657 88 7.5 76.4
T1 4 2 498 88 5.7 85.7
T1 4 3 419 89 4.7 93.7
T1 4 4 379 89 4.3 99.3
T1 4 5 336 87 3.9 103.5
T2 4 1 522 107 4.9 83.2
T2 4 2 385 107 3.6 90.4
T2 4 3 307 106 2.9 93.9
T2 4 4 294 106 2.8 101.0
T2 4 5 283 107 2.6 108.4
preconditioning cases typically have no more than 20 of these unstable eigenvalues.
The iterative BILU(p) preconditioner is further explored in terms of its performance with respect to
certain phases of the nonlinear algorithm. Specifically, the performance is examined when the iterative
preconditioner is only active for either the approximate Newton phase or the inexact Newton iterations.
Table 6.19 compares the performance of the iterative preconditioner for the subsonic inviscid case, E1.
The reference is that the iterative preconditioner is active for all nonlinear iterations. For 2-iteration
preconditioning, an increase in the total number of GMRES iterations as well as the ratio of inner to outer
iterations is observed. When the preconditioner is only active for the inexact Newton iterations, there are
substantially fewer GMRES iterations compared to when it is only active for the approximate Newton
iterations. However, the CPU time for the inexact Newton phase using the iterative preconditioner is
noticeably larger than the baseline BILU(p) preconditioner.
A similar study is presented in Table 6.20 for the turbulent transonic case, T2. For this case, it is
evident that the iterative preconditioner is effective in both the approximate and inexact Newton phases.
Specifically, when comparing two iterations of BILU(4) to the baseline BILU(4) preconditioner, there is
a 26% reduction in the total number of GMRES iterations when the iterative preconditioner is always
6.2 EULER AND NAVIER–STOKES EQUATIONS 107
Table 6.19: Performance of Newton–Krylov algorithm using multiple BILU(p) preconditioning for invis-
cid subsonic test case, E1.
Iterative Precon. Active Fill-in, p Prec. Cycs. IG IN IG / IN Time (s)
No (Baseline) 3 1 197 12 16.4 4.3
Always 3 2 148 12 12.3 5.2
Always 3 3 113 12 9.4 5.5
Always 3 4 97 12 8.1 6.0
Always 3 5 104 12 8.7 7.3
Approx. Newton 3 2 190 12 15.8 4.3
Approx. Newton 3 3 187 12 15.6 4.4
Approx. Newton 3 4 186 12 15.5 4.5
Approx. Newton 3 5 185 12 15.4 4.6
Inexact Newton 3 2 155 12 12.9 5.0
Inexact Newton 3 3 124 12 10.3 5.4
Inexact Newton 3 4 107 12 8.9 5.7
Inexact Newton 3 5 114 12 9.5 6.9
active. If the iterative preconditioner is only active for the approximate Newton phase, there is a 16%
reduction and if the preconditioner is only active for the inexact Newton phase, there is a comparable
11% reduction. There is a noticeable increase in CPU time when the iterative preconditioner is active
for all Newton iterations. This increase relative to the baseline BILU(p) preconditioner mostly occurs
during the inexact Newton phase.
6.2.4 BILU(p) and multigrid preconditioning
In this investigation, multigrid preconditioning is compared to both the baseline BILU(p) preconditioner
and its iterative extension. Similar to the previous section, RCM ordering is used as well as the default
continuation parameters, shown in Table 4.1. All preconditioner fill-in levels are the same for all grid
levels for each one of the respective five test cases. Furthermore, GMRES is converged two orders of
magnitude for all cases.
The multigrid preconditioner consists of an l-level V-cycle, where l ∈ 2, 3, 4. The components of
the multigrid preconditioner include a BILU(p) smoothing iteration followed by a restriction of the linear
residual to the coarser mesh, a calculation of the solution error estimate on the coarser mesh using one
or more smoothing iterations, and a prolongation of this error to the finer mesh. Full-weighting and
6.2 EULER AND NAVIER–STOKES EQUATIONS 108
Table 6.20: Performance of Newton–Krylov algorithm using multiple BILU(p) preconditioning for tur-
bulent transonic test case, T2.
Iterative Precon. Active Fill-in, p Prec. Cycs. IG IN IG / IN Time (s)
No (Baseline) 4 1 522 107 4.9 84.0
Always 4 2 385 107 3.6 90.4
Always 4 3 307 106 2.9 93.9
Always 4 4 294 106 2.8 101.0
Always 4 5 283 107 2.6 108.4
Approx. Newton 4 2 441 107 4.1 89.6
Approx. Newton 4 3 378 106 3.6 92.2
Approx. Newton 4 4 365 105 3.5 97.5
Approx. Newton 4 5 378 106 3.6 92.2
Inexact Newton 4 2 465 107 4.3 84.0
Inexact Newton 4 3 448 107 4.2 85.5
Inexact Newton 4 4 434 107 4.1 86.0
Inexact Newton 4 5 438 108 4.1 89.2
bi-linear interpolation operators are used for restriction and prolongation, respectively. The multigrid
preconditioner is not active for the first kstart iterations, since the linear problem is converged to its
required relative tolerance in a small number of iterations (most often less than 3). Furthermore, the
multigrid preconditioner is activated once the L2-norm of the nonlinear residual is below a predefined
tolerance. Multigrid preconditioning does not provide a significant reduction in the approximate Newton
phase since the ratio of inner to outer iterations is already quite small, thus not justifying the cost of
generating and incorporating the coarse-grid operators. Experiments show that good choices for this
tolerance for inviscid and viscous cases are 10−6 and 10−5, respectively. When multigrid preconditioning
is active, the relative tolerance is kept at the baseline value to make an objective comparison to the
other preconditioners.
Each grid level, for both descent and ascent in the l-level V-cycle preconditioner requires the definition
of parameters describing the number of smoothing iterations and if necessary a damping parameter. For
smoothing components that occur in the descent phase of the V-cycle, the parameter νl1 describes the
number of smoothing iterations for each grid level. For smoothing components that occur in the ascent
of the V-cycle, the parameter νl2 describes the number of smoothing iterations for each grid level. The
coarsest grid level has νngrids
1 iterations where ngrids is the number of grid levels. It is found that the
introduction of a damping parameter for the relaxation is also useful, ωl. For the cases described in
6.2 EULER AND NAVIER–STOKES EQUATIONS 109
Table 6.21: Performance of Newton–Krylov algorithm using BILU(p) or 2-level multigrid preconditioning
for inviscid test cases.
Case Fill-in, p Prec. Grid Prec. Coarse IG IN IG / IN Time (s)
Levels Grid Iters.
E1 3 1 - 197 12 16.4 4.3
E1 3 2 1 181 12 15.1 7.0
E1 3 2 2 178 12 14.8 7.3
E1 3 2 3 176 12 14.7 7.7
E1 3 2 10 210 12 17.5 12.3
E2 3 1 - 201 16 12.6 5.0
E2 3 2 1 193 16 12.1 8.1
E2 3 2 2 197 16 12.3 8.6
E2 3 2 3 204 16 12.8 9.2
E2 3 2 10 272 17 16.0 15.8
Tables 6.21 and 6.22 the value of ν21 corresponds to the number of coarse grid iterations and the damping
parameter is set to unity. For the remainder of this discussion all values of νl1, νl2, and ωl are assumed
to be unity unless otherwise specified.
It is important that the continuation parameter (i.e. time step) is treated properly for the coarser
grid levels. Experiments show that the time step should be generated based on the nonlinear residual
and the state variables within its own respective grid level.
Two-grid preconditioner performance for baseline cases
Tables 6.21 and 6.22 compare the performance of the 2-level multigrid preconditioner to the baseline
BILU(p) preconditioner for the 5 original test cases: E1, E2, L1, T1 and T2. One through five coarse
grid iterations are compared for each case. An additional grid level is considered later for this suite
of cases. The quantity IG refers to the total number of GMRES iterations on the fine mesh and each
GMRES iteration corresponds to one preconditioning step using either the baseline BILU(p) or an entire
multigrid cycle as the preconditioner. The baseline preconditioner is indicated by 1 grid level.
The damping parameter for the finest grid level, ω0, is consistent with the earlier study on iterative
BILU(p) preconditioning. Specifically, for cases E1, E2, L1, T1 and T2, the damping parameter has
values of 0.7, 0.7, 0.9, 0.6 and 1.0, respectively. Preliminary investigations indicate that several values for
these fine-grid damping parameters produce optimal performance in the 2-level preconditioner. However,
the study resulted in an enormous parametric study that would be further exacerbated by parametric
6.2 EULER AND NAVIER–STOKES EQUATIONS 110
Table 6.22: Performance of Newton–Krylov algorithm using BILU(p) or 2-level multigrid preconditioning
for laminar and turbulent test cases.
Case Fill-in, p Prec. Grid Prec. Coarse IG IN IG / IN Time (s)
Levels Grid Iters.
L1 4 1 - 354 13 27.2 14.5
L1 3 2 1 221 13 17.0 17.5
L1 3 2 2 220 13 16.9 18.6
L1 3 2 3 215 13 16.5 19.7
L1 3 2 10 203 13 15.6 25.8
T1 4 1 - 652 88 7.4 75.3
T1 4 2 1 585 86 6.8 81.3
T1 4 2 2 593 86 6.9 83.5
T1 4 2 3 604 86 7.0 87.8
T1 4 2 10 685 86 8.0 108.6
T2 4 1 - 522 107 4.9 83.2
T2 4 2 1 498 107 4.7 89.2
T2 4 2 2 503 107 4.7 90.2
T2 4 2 3 502 107 4.7 91.6
T2 4 2 10 495 107 4.6 98.3
investigation on additional grid levels. To simplify the parametric study of the preconditioner, the fine-
grid damping parameters are fixed. Furthermore, once the first coarser-grid damping parameter, ω1, is
optimized for each case, it is fixed for any subsequent studies involving additional grid levels.
For the inviscid subsonic case, E1, the multigrid preconditioner with 1, 2 or 3 coarse grid iterations
yields a reduction in the total number of GMRES iterations and the ratio of inner to outer iterations. The
coarse-grid damping parameter, ω1, is 0.2. An excess number of smoothing iterations on the coarse grid
level (e.g. 10), however, results in an increase in these performance measures. In terms of these measures,
the best performance occurs with 3 coarse-grid iterations. This particular preconditioner decreases the
number of GMRES iterations by 11%. In terms of CPU time, the baseline BILU(3) preconditioner is
the fastest.
For the inviscid transonic case, E2, a 2-level preconditioner with 1 coarse-grid iteration results in a
4% fewer GMRES iterations compared to the baseline preconditioner. Additional coarse grid iterations
result in an increase in the number of GMRES iterations. A coarse-grid damping parameter of ω1 = 0.05
is used.
6.2 EULER AND NAVIER–STOKES EQUATIONS 111
The laminar case, L1, has the greatest percent reduction in the number of GMRES iterations with
respect to the baseline BILU(4) preconditioner. The coarse-grid damping parameter, ω1, is set to 0.7. Of
the results presented, 10 coarse-grid iterations offers the best improvement in the iteration performance
measures of the Newton–Krylov algorithm. Specifically, the total number of GMRES iterations and the
ratio of inner to outer iterations are reduced by 43% compared to the baseline preconditioner.
The turbulent subsonic case, T1, exhibits a decrease in the number of GMRES iterations for an
increasing number of coarse-grid iterations, with diminishing returns. One course grid iteration (with
ω1 = 0.1) results in the fewest GMRES iterations and the lowest ratio of inner to outer iterations.
Specifically, a 10% reduction in these quantities is observed compared to the baseline preconditioner.
The turbulent transonic case, T2, exhibits similar performance to case T1. The reduction in GMRES
iterations for the multigrid preconditioner is 5%. Both 1 and 10 coarse grid iterations result in the fewest
number of GMRES iterations. The coarse-grid damping parameter, ω1, is set to 0.5.
Three-grid preconditioner performance for baseline cases
A 3-level preconditioner is also considered for the baseline cases. Since the third grid level is very coarse,
its grid metrics, which have thusfar been computed by finite differences often lead to negative values.
Therefore, the metric Jacobian of the generalized curvilinear coordinate transformation is approximated
by the cell area of a node. Table 6.23 compares the performance of BILU preconditioning to 2- and
3-level multigrid preconditioning for test cases E1, E2, L1, T1 and T2. The damping parameter, ω2, is
set to 0.1, 0.02, 0.5, 0.01 and 0.3 for these test cases, respectively. The number of smoothing iterations
on each grid level is another possible parameter that can be investigated. To reduce the size of the
parametric study, one smoothing iteration on each grid level is considered.
For the inviscid subsonic case, E1, the 3-level preconditioner results in the fewest number of GMRES
iterations. Specifically, a reduction of 10% compared to the baseline BILU(3) preconditioner is observed.
For the inviscid transonic case, E2, The 3-level preconditioner results in a slight increase in the
number of GMRES iterations compared to the 2-level preconditioner.
The performance for laminar viscous case, L1, is improved by the introduction of a 2- or 3-level
multigrid preconditioner. In comparison to the baseline BILU(4) preconditioner, the 2- and 3-level
preconditioners result in 38% and 42% fewer GMRES iterations, respectively.
Multigrid preconditioning reduces the iterations of GMRES for both turbulent cases, T1 and T2.
Similar behaviour is exhibited to case E2 in terms of the 3-grid level preconditioner improving on the
baseline BILU(4) preconditioner but not improving on the 2-grid level preconditioner.
A 3-level preconditioner can at least match the performance of the 2-level preconditioner for all
cases. However, for cases E2, T1 and T2 this performance in only achieved when venturing outside of
the current constraints on the damping parameters and the number of smoothing iterations on each
grid level. Recall, the damping parameter on the finest grid level, ω0, was optimized for the iterative
BILU(p) preconditioner. It’s value was fixed when ω1 was optimized and both ω0 and ω1 were fixed when
6.2 EULER AND NAVIER–STOKES EQUATIONS 112
Table 6.23: Performance of Newton–Krylov algorithm using BILU(p) and 2- or 3-level multigrid precon-
ditioning for inviscid, laminar and turbulent test cases.
Case Fill-in, p Preconditioner IG IN IG / IN Time (s)
E1 3 BILU 197 12 16.4 4.3
E1 3 2-level multigrid 181 12 15.1 7.0
E1 3 3-level multigrid 177 12 14.8 7.8
E2 3 BILU 201 16 12.6 5.0
E2 3 2-level multigrid 193 16 12.1 8.1
E2 3 3-level multigrid 199 16 12.4 9.2
L1 4 BILU 354 13 27.2 14.4
L1 4 2-level multigrid 221 13 17.0 17.5
L1 4 3-level multigrid 205 13 15.8 18.6
T1 4 BILU 658 88 7.5 75.7
T1 4 2-level multigrid 585 86 6.8 81.3
T1 4 3-level multigrid 596 86 6.9 84.0
T2 4 BILU 534 108 4.9 83.8
T2 4 2-level multigrid 498 107 4.7 89.2
T2 4 3-level multigrid 520 108 4.8 91.4
optimizing the parameter ω2. A superior 3-level preconditioner compared to the 2-level and baseline
preconditioners can ultimately be determined through an enormous parametric study involving all ωl,
νl1 and νl2 values simultaneously. A formal presentation this study is not presented here. Instead, the
investigation shifts to examining the performance of iterative BILU(p) an multigrid preconditioning on
finer grids, including the consideration of alternative inter-grid operators.
Iterative and multigrid preconditioner performance on finer grids
The performance of iterative and multigrid preconditioning on finer grids is examined and concludes this
results chapter. Specifically, these preconditioners are compared to the baseline BILU(p) preconditioner
for a transonic turbulent case about the RAE 2822 airfoil, on two fine grids. Table 6.24 summarizes
these two cases. Case F0 represents a very fine grid, W0, consisting of 513 × 129 nodes, resulting in
330,885 equations and unknowns. Case F1 uses grid W1 which is derived from grid W0 by removing
every other node in both the streamwise and normal directions.
Table 6.25 summarizes the results for both cases. Specifically, the performance of 1 through 4
6.2 EULER AND NAVIER–STOKES EQUATIONS 113
Table 6.24: Finer grid cases for Euler and Navier–Stokes calculations.
Case Finest Grid Flow Mach Number Angle of Attack Reynolds Number
F0 W0 turbulent transonic 0.729 2.31 6.5× 106
F1 W1 turbulent transonic 0.729 2.31 6.5× 106
Table 6.25: Performance of Newton–Krylov algorithm using BILU(p) and 2-, 3- or 4-level multigrid
preconditioning for finer-grid test cases.
Case Fill-in, p Preconditioner IG IN IG / IN Time (s)
F0 4 BILU 766 122 6.3 363.1
F0 4 2 iterations of BILU 552 120 4.6 388.3
F0 4 3 iterations of BILU 483 119 4.1 419.9
F0 4 4 iterations of BILU 413 119 3.5 440.7
F0 4 2-level multigrid 694 121 5.7 379.9
F0 4 3-level multigrid 735 122 6.0 402.4
F0 4 4-level multigrid 742 122 6.1 411.7
F1 4 BILU 627 121 5.2 84.3
F1 4 2 iterations of BILU 462 121 3.8 92.1
F1 4 3 iterations of BILU 382 121 3.2 96.3
F1 4 4 iterations of BILU 355 122 2.9 105.0
F1 4 2-level multigrid 608 121 5.0 88.9
F1 4 3-level multigrid 616 121 5.1 91.7
F1 4 4-level multigrid 619 121 5.1 93.1
iterations of BILU(4) and 2, 3 and 4 level multigrid preconditioning are compared. For the multigrid
preconditioners, one smoothing iteration is performed on each grid level.
Two, three and four iterations of BILU(4) perform better than BILU(4) in terms of number of
GMRES iterations. Since the number of Newton iterations varies for each preconditioner, the inner to
outer iterations measure, IG/IN , is used to compare the various preconditioners. For the coarser case,
F1, the reductions are 27%, 38%, and 44%, for 2, 3, and 4 iterations respectively. For the finer case, F0,
the reductions are 27%, 35% and 44% for 2, 3, and 4 iterations, respectively. These relative reductions
in GMRES iterations are similar from case F1 to case F0, although the absolute number of GMRES
iterations increases from case F1 to F0; a phenomenon that GMRES exhibits for increasing system size.
6.2 EULER AND NAVIER–STOKES EQUATIONS 114
The 2-level multigrid preconditioner reduces the number of GMRES iterations compared to the
baseline BILU(4) preconditioner. However, 3- and 4-level coarse-grid corrections do not improve on
the performance of the 2-level preconditioner. This behaviour is consistent for cases F0 and F1. Two
iterations of BILU(4) is a more effective preconditioner than multigrid in terms of iterations.
The multigrid preconditioning results that are presented use bilinear interpolation for both restriction
and prolongation, as shown in Figures 5.3 and 5.4. In an attempt to improve on the performance of
the multigrid preconditioner, alternative inter-grid operators were explored. Specifically, auxiliary cell
area-based restriction and prolongation operators were investigated. The approach follows the work of
Zuliani [310]. In this approach an auxiliary grid, Waux, that is finer than the finest grid is used to
compute neighbouring cell areas for a given node and ratios of these cell areas are used to compute the
inter-grid weighting factors. Multigrid preconditioning using these inter-grid operators did not improve
on the performance of the existing multigrid preconditioner.
Chapter 7
CONCLUSIONS, CONTRIBUTIONS
AND RECOMMENDATIONS
The results for the convection-diffusion, Euler and Navier–Stokes calculations led to considerable insight
on how preconditioning can impact the performance of a Newton–Krylov flow solver. This chapter begins
by providing conclusions for the research into preconditioning for the convection-diffusion equation.
Conclusions for the Euler and Navier–Stokes equations are then provided. Original contributions are
then summarized and the chapter ends with recommendations for future research.
7.1 Conclusions
7.1.1 Convection-Diffusion Equation
In the first component of this research a linear problem was examined. The convection-diffusion equation
is a linear partial differential equation, whose discretization leads to linear system of equations. Two
baseline cases were considered: diffusion-dominated and convection-dominated flow with Peclet number
115
7.1 CONCLUSIONS 116
of 0.001 and 1000, respectively.
First, the effect of Peclet number on the number of GMRES iterations was studied. For ILU(0) and
ILU(1) preconditioning, fewer GMRES iterations were required as the Peclet number increased from
0.001 to 1000. This effect was more dramatic when ILU(1) preconditioning was used. Specifically, for
ILU(0), the diffusion-dominated case required 186 GMRES iterations, whereas the convection-dominated
case required 63 GMRES iterations. For ILU(1), the GMRES iterations were 127 and 15 for the diffusion-
and convection-dominated cases, respectively. Therefore, ILU(p) had a greater potential to be improved
on for the diffusion-dominated case than the convection-dominated case.
Iterative ILU(p) preconditioning was subsequently studied, both for its own merit and in the develop-
ment of multigrid preconditioning. Iterative ILU(1) preconditioning resulted in fewer GMRES iterations.
For example, 3 iterations of ILU(1) reduced the number of GMRES iterations from 127 to 68 for the
diffusion-dominated case and from 15 to 6 for the convection-dominated case. A minimum fill-in level
of 1 is required for the convection-dominated case to improve on the baseline ILU(p) preconditioner.
Multigrid preconditioning for the diffusion-dominated case resulted in a nearly grid-independent
number of GMRES iterations. For example, a grid consisting of 172 nodes required 25 iterations without
multigrid preconditioning and 13 iterations with multigrid preconditioning. For a grid consisting of
2572 nodes, GMRES required 373 iterations without multigrid preconditioning and 19 iterations with
multigrid preconditioning. Furthermore, a grid consisting of 5132 nodes, GMRES required only 22
iterations when preconditioned by multigrid. Multigrid preconditioning for the convection-dominated
case did not reduce the number of GMRES iterations.
Four orderings were studied including: a natural ordering along each respective dimension; an or-
dering in the reverse direction of the natural ordering; reverse Cuthill–McKee (RCM); and minimum
discarded fill (MDF). Results for the convection-dominated case showed that the MDF ordering required
the fewest GMRES iterations for a grid consisting of 2572 nodes and ILU(1) preconditioning. Specif-
ically, the natural and reverse ordering each required 24 iterations, RCM required 15 iterations, and
MDF required 10 iterations. Multigrid preconditioning did not result in fewer GMRES iterations for
each respective ordering for the convection-dominated case.
The MDF ordering resulted in the fewest GMRES iterations for diffusion-dominated cases. This
phenomenon also occurred when multigrid preconditioning was considered. For example, with ILU(1)
preconditioning on a grid consisting of 2572 nodes, the RCM ordering resulted in 252 GMRES iterations
and the MDF ordering resulted in 195 iterations. With multigrid preconditioning these values became
15 and 11 for RCM and MDF, respectively.
Additional studies into the MDF algorithm were conducted. A connection between MDF and up-
winding was demonstrated. Through a series of examples it was shown that MDF can lead to an ordering
that results in a system matrix that is analogous to a matrix that would arise from an upwind discretiza-
tion. The importance of this is tremendous in terms of incomplete factorizations because a matrix that
arises from an upwind discretization (i.e. lower triangular), for the problems considered here, has an
7.1 CONCLUSIONS 117
exact ILU(0) factorization, thus resulting in a preconditioner that is the inverse of the linear system
matrix.
It is well known that the incomplete factorization of a lower-triangular matrix will have no discarded
fill in. It was also discovered in this study that MDF can lead to a matrix that is not lower triangular,
yet yields a discarded fill of zero.
The final investigation related to the convection-diffusion equation included the development of an
evolutionary algorithm in order to study the root-node selection and tie-breaking strategies in the MDF
algorithm. A distance-based and a novel line-distance tie-breaking strategies were compared. Results
for a 25-node grid suggest that root-node selection is not important for diffusion-dominated problems.
In contrast, for convection-dominated problems upstream nodes are excellent root-node candidate. Fur-
thermore, if the flow is not aligned to a particular direction, the downstream corner node is also a
good root node. The evolutionary algorithm also indicated that the line-distance tie-breaking strategy
resulted in the fewest GMRES iterations.
7.1.2 Euler and Navier–Stokes Equations
In the second component of this research, two nonlinear problems were considered: the discretized Euler
equations and the discretized, compressible Navier–Stokes equations fully coupled with the one-equation
Spalart–Allmaras turbulence model. Specifically, five baseline cases were used including an inviscid
subsonic case, E1, and inviscid transonic case, E2, a laminar subsonic case, L1, a turbulent subsonic
case, T1, and a turbulent transonic case, T2. The baseline preconditioner was BILU(3) for the inviscid
cases and BILU(4) for the viscous cases. The reverse Cuthill–McKee (RCM) ordering was the baseline
ordering. The performance measures included the total number of GMRES iterations (on the fine grid,
where relevant), the ratio of GMRES to Newton iterations (i.e. inner to outer iterations) and CPU time.
Many investigations were conducted in this research. The following studies were discussed in detail:
orderings; iterative BILU(p) preconditioning; and multigrid preconditioning. The iterative and multi-
grid preconditioners were also investigated on finer grids and the inter-grid operators based on bilinear
interpolation were compared to a more advanced formulation.
Three nodal orderings were compared for the C-topology computational grid: a natural ordering
based on the lexicographical arrangement of the nodes along the streamwise and normal directions, re-
spectively; the reverse Cuthill–McKee (RCM) ordering; and the minimum discarded fill (MDF) ordering.
Both RCM and MDF used the downstream corner node as a root node. In contrast to the MDF ordering
used for the convection-diffusion equation, the MDF ordering for the discretized Navier–Stokes equations
(i.e. a system of PDEs) required the use of a so-called greedy reduction of the block system matrix to a
smaller matrix of scalars whose dimensions equal the total number of grid nodes.
RCM required the fewest number of GMRES iterations for all five test cases in which the baseline
BILU(p) preconditioner was used. Hence, RCM was selected as the nodal ordering for the subsequent
studies involving iterative BILU(p) and multigrid preconditioning.
7.1 CONCLUSIONS 118
Iterative BILU(p) preconditioning was explored for the five baseline cases. Specifically, 2 through 5 it-
erations of BILU(p) were compared to the baseline BILU(p) preconditioner. The baseline preconditioner
required 197, 201, 354, 657 and 522 GMRES iterations for cases E1, E2, L1, T1 and T2, respectively.
Two iterations of BILU(p) reduced the number of GMRES iterations by 38%, 33%, 39%, 24% and 26%
for these cases, respectively. Three through five iterations if BILU(p) also reduced the number of GM-
RES iterations, however with diminishing returns. Five iterations of BILU(p) reduced the number of
GMRES iterations by 63%, 62%, 65%, 49% and 46% for these cases, respectively. The inviscid and
laminar cases produced the must significant reductions. For each respective case, if enough iterations of
BILU(p) were used in the preconditioner, this decreasing trend in GMRES iterations would cease. It is
believed that this is due to the existence of unstable eigenvalues in the iteration matrix related to the
BILU(p) relaxation method. Below a given number of BILU(p) preconditioning iterations, the modes
associated with these eigenvalues do not grow because GMRES effectively reduces them.
The dramatic reduction in the number of GMRES iterations that iterative BILU(p) produces provides
motivation for its use in situations where memory is limited. Restarting the Krylov subspace reduces
the amount of memory that is required for GMRES and iterative BILU(p) enhances this reduction.
Furthermore, for cases that require a prohibitive (in terms of storage) amount of fill-in, p, iterative
BILU(p) preconditioning with a lower fill-in parameter could potentially be used.
The final, and most extensive investigation for preconditioning of the Newton–Krylov algorithm
for the discretized Navier–Stokes equations related to multigrid. Specifically, 2-, 3- and 4-level V-cycle
multigrid preconditioners were compared to the iterative BILU(p) and baseline BILU(p) preconditioners.
Experiments showed that the multigrid preconditioner would potentially be most effective beyond the
pseudo-transient continuation phase of the Newton algorithm, where the L2-norm of the nonlinear resid-
ual is relatively large (e.g. 10−5 or 10−6) compared to the convergence tolerance (e.g. 10−14). Within the
pseudo-transient continuation phase, the number of GMRES iterations per Newton iterations is small
and any reduction these iterations that multigrid preconditioning would offer would be outweighed by
its relative cost per iteration.
Since the number of GMRES iterations per Newton iteration is quite small when the L2-norm of the
nonlinear residual is relatively large compared to the convergence tolerance (e.g. 10−5 of 10−6 compared
to 10−14), any reduction that multigrid would offer in terms of GMRES iterations would be outweighed
by its relative cost per iteration.
For the baseline cases E1, E2, L1, T1 and T2, 2-level multigrid preconditioning with one smoothing
iteration on the coarsest grid level resulted in 8%, 4%, 38%, 11% and 12% fewer GMRES iterations,
respectively when compared to BILU(p). The laminar subsonic case produced the largest reduction.
Additional smoothing iterations (up to a certain limit) on the second grid level for the inviscid subsonic
and laminar subsonic cases also produced a reduction with diminishing returns. As mentioned earlier,
the presence of unstable eigenvalues in the iteration matrix of the BILU(p) smoother eventually resulted
in its instability for each case on the coarsest grid level. Three-level multigrid preconditioning did not
7.2 CONTRIBUTIONS 119
improve on the performance of two-level preconditioning.
In an attempt to further understand the behaviour of the iterative and multigrid preconditioners,
two additional studies were conducted. The turbulent transonic case was investigated on finer grids
and additional inter-grid operators for multigrid were implemented. Iterative BILU(p) preconditioning
produced similar relative reductions in GMRES iterations for increasing grid size, compared to BILU(p)
preconditioning. For the finest grid, W0, 2, 3 and 4 iterations of BILU(p) preconditioning reduced the
number of GMRES iterations by 27%, 35% and 44%, respectively. Two-level multigrid preconditioning
reduced the number of GMRES iterations by 9%. Three- and four-level multigrid preconditioning also
reduced the number of GMRES iterations, however, these preconditioners performed worse than the two-
level preconditioner. Additional inter-grid transfer operators were explored in an attempt to improve
on the multigrid preconditioner. Specifically, the operator developed by Zuliani [310] was implemented.
However, its performance did not improve on the baseline bilinear restriction and prolongation operators.
Based on the results obtained using the multigrid preconditioner, including the consideration of
relative cost per iteration, its use would be most effective for situations in which the transient phase
of the Newton algorithm is short. Specifically, multigrid preconditioning would be useful for studies
involving flow solves that use a fully-converged flow solution as an initial guess, whose parameters are
close to the current parameters (i.e. warm starts). Examples of this type of situation would include
generating lift or drag versus angle of attack plots, drag polars and the linesearching process in a
gradient-based optimization algorithm. For the latter example, function evaluations in the linesearching
algorithm correspond to flow solves.
7.2 Contributions
A broad range of preconditioners were investigated in this research. The literature review in this dis-
sertation offers an extensive delineation of the history of preconditioning. Specifically, topics such as
ordering algorithms and BILU(p), iterative BILU(p) and multigrid preconditioning were investigated in
tremendous detail. Below are some of the most notable contributions that were made:
* A detailed comparison of the reverse Cuthill–McKee (RCM) ordering was made with respect to
minimum discarded fill (MDF) ordering. Various root-node selection and tie-breaking strategies
were compared for the latter. Distance and novel line-distance tie-breaking strategies were im-
plemented for the MDF algorithm. The MDF algorithm was also adapted to systems of PDEs
(e.g. the discretized compressible Navier–Stokes equations) by reducing the block system matrix
associated with the linearization of the discretized system to a matrix equivalent in dimension to
the number of grid nodes.
* A permutation-based evolutionary algorithm was created to determine the optimal root node for
convection- and diffusion-dominated problems. It is believed that this is the first instance of such a
7.3 RECOMMENDATIONS 120
study. The study clearly demonstrates that upstream boundary nodes and the downstream corner
node are effective root nodes for convection-dominated problems and any node can be a root node
for diffusion-dominated problems.
* A mathematical formulation was created for an iterative BILU(p) preconditioner including the
consideration of damping, scaling and reordering. The preconditioner was studied for both the
convection-diffusion equation and the discretized, compressible Navier–Stokes equations.
* It was found that BILU(p) as an iterative method has unstable eigenvalues in its iteration matrix.
The investigation of iterative BILU(p) preconditioning for GMRES suggests that GMRES and
BILU(p) work well together because GMRES effectively reduces the modes associated with those
unstable eigenvalues, in addition to other known reasons.
* A mathematical formulation was created for a BILU(p)-smoothed multigrid preconditioner, includ-
ing the consideration of scaling, reordering and the smoothing operator. A detailed investigation
of this preconditioner was conducted for both the convection-diffusion and the discretized, com-
pressible Navier–Stokes equations.
7.3 Recommendations
Some results in this research on their own demonstrated the effectiveness of the various preconditioners
and associated methods explored. Other results and investigations were intended to be more of a
foundation for future work. Below are some of ideas that would potentially be of most interest to other
researchers:
* The permutation-based evolutionary algorithm used in the investigation of the minimum discarded
fill (MDF) ordering can be extended to larger problem sizes that are governed by more sophisticated
equations (e.g. Navier–Stokes). Recall, that the problem size is tremendous for practical grids, since
it scales by the factorial of the number of grid nodes. Additional objective functions can also be
explored. For example, instead of minimizing the discard, ||A − LU||, one can minimize other
important properties relating to the matrix or the iterative algorithm. For example, the number
of GMRES iterations or a spectral property such as ||AU−1L−1 − I|| can be minimized.
* A variant of the reverse Cuthill–McKee reordering algorithm can be developed using a distance
or line-distance tie breaking in its approach. RCM is a bandwidth minimization algorithm, and
it would be interesting to investigate its response to the enforcement of geometric criteria in its
decision process.
* The investigation of the MDF algorithm was limited to C-topology meshes in this research. The
performance of the MDF algorithm can be assessed for multi-block and 3D structured grids. The
7.3 RECOMMENDATIONS 121
MDF-BFILU(p) preconditioner is written in the same syntax as the 3D finite-difference Navier–
Stokes flow solver, DIABLO [270]. This investigation should also include a comparison of the
tie-breaking strategies that have been developed.
* The iterative BILU(p) preconditioning algorithm can be used for more complicated simulations
(e.g. 3D turbulent flow) to exploit its memory saving benefits (i.e. less required GMRES iterations
and using a lower fill-in parameter value).
* Iterative BILU(p) and multigrid preconditioning can be studied for higher-order discretizations.
Memory considerations become increasingly important for such formulations.
Appendix A
OTHER PRECONDITIONING
TECHNIQUES
In addition to the preconditioning techniques that were studied and compared in detail in this research,
the following approaches were also reviewed: domain decomposition and sparse approximate inverses.
Section A.1 describes the general aspects of domain decomposition. Section A.2 gives a brief introduction
into sparse approximate inverse preconditioning.
A.1 Domain Decomposition
The basic idea behind domain decomposition is to break a large problem into smaller problems. This is
accomplished by subdividing the problem domain into smaller domains. Domain decomposition works
well for systems that arise from model PDEs and is well-suited for parallel applications. The domain
decomposition method is essentially a reordering strategy with special solution techniques.
The theory is taken from Saad [38] and Christara [295]. Consider the discretized physical domain,
Ω. The domain is subdivided into smaller domains Ωi, where i = 1, . . . , s, with interfaces Γjk between
122
A.1 DOMAIN DECOMPOSITION 123
adjacent subdomains. Now consider a linear system, Ap = q, that arises from the discretization of
a partial differential equation on Ω. For the purposes of this analysis, a linear system is considered,
although the linear system may also arise from an iteration of a nonlinear solution method such as
Newton’s method. The system is reordered into blocks(B EF C
)(x
y
)=
(f
g
)(A.1)
where the matrix B is a block diagonal matrix of s square block matrices, Bi, relating to interior of
the given ith subdomain. Hence, the vector x contains the unknowns in the subdomains. The vector
y contains the interface unknowns and the square matrix C describes their interaction. The matrix Edescribes the subdomain to interface coupling and the matrix F describes the interface to subdomain
coupling.
There are three types of methods that can be used to solve the system (A.1) for (x y)T . Here, the
Schur complement method is outlined. Other aspects that are important include optimal subdomain
partitioning, overlap selection, and whether to solve subproblems exactly or iteratively.
Consider again the system (A.1). Each block equation is
Bx+ Ey = f (A.2)
Fx+ Cy = g (A.3)
From (A.2) x is isolated as
x = B−1(f − Ey) (A.4)
which when substituted into (A.3) yields
(C − FB−1E)y = (g −FB−1f) (A.5)
Once (A.5) is solved for y, y can be substituted into (A.4) to find x.
The matrix
S ≡ C − FB−1E (A.6)
is the Schur complement of the matrix associated with the interface variable, y. Alternative names for the
Schur complement include the capacitance matrix and the Gauss transform. Since B is a block-diagonal
matrix, its inversion reduces to s inversions of its sub-blocks, Bi.A full-matrix method looks at solving the entire system (A.1) using intelligent preconditioners.
Schwarz decomposition looks at successive solution passes along B. If the interface values are updated
on-the-fly, then it is referred to as multiplicative Schwarz, and if the interface values are updated after
the entire pass of B, then the method is referred to as additive Schwarz. The latter is analogous to a
block Jacobi method, the former to a block Gauss–Seidel method.
A benefit to solving (A.5) instead of the whole system is that the number of interface nodes is
typically a lot smaller than the total number of nodes and hence the system is much smaller. However,
A.2 SPARSE APPROXIMATE INVERSE PRECONDITIONING 124
the trade-off is that many linear solutions are required in the formation of the Schur complement. If the
explicit formation of S can be avoided, then the computational storage and time can be reduced.
The Schur complement method refers to solving the reduced system (A.5) iteratively. For example,
the preconditioned GMRES Krylov subspace method can be used. Since GMRES only requires the
product of S with some vector, v, the explicit storage of S can be avoided entirely. To compute the
Krylov subspace direction w = Sv one first computes
v′ = Ev (A.7)
Next,
Bz = v′ (A.8)
is solved for z. Finally, w is formed as
w = Cv −Fz (= Sv) (A.9)
to complete the computation.
The challenge when using the Schur complement method is in finding a suitable preconditioner for
the Krylov subspace method that is used to solve the linear system (A.5). An obvious choice is to use
the matrix S itself as the preconditioner. This is referred to as induced preconditioning. An alternative
to the exact formation of the Schur complement as a preconditioner, is to use an incomplete factorization
of S. This is an attractive alternative since a sparse, parallel Gaussian elimination algorithm can be
exploited. Another form of preconditioning can be based on probing. Probing stems from approximating
sparse Jacobians for nonlinear equations [38]. A major drawback to probing is that there are increased
losses in accuracy due to roundoff error in the various subiterations that may significantly degrade the
performance of GMRES.
In the literature, there are many examples on the use of domain decomposition in the construction of
preconditioners. A recent example is the work by Hicken and Zingg [166]. In their paper, they presented
a parallel Newton–Krylov solver for the 3D Euler equations. Both additive Schwarz and approximate
Schur preconditioners were explored.
A.2 Sparse Approximate Inverse Preconditioning
The ILU preconditioning approach is to find a matrix M such that M−1 is a good approximation
to A−1. Rather than attempt to form M and in turn obtain M−1, the sparse approximate inverse
preconditioning technique attempts to model A−1 directly.
Benzi [46] and Saad [38] describe various approaches for finding a sparse approximate inverse, P,
to the matrix A. It involves minimizing the Frobenius norm1 of the residual matrix, I − AP. This
1The Frobenius norm of a matrix A is given by ||A||F =(∑
j
∑i |aij |2
) 12
. An alternative form is to use ||A||F =
tr(AAH
) 12 .
A.2 SPARSE APPROXIMATE INVERSE PRECONDITIONING 125
minimization problem is given by
minP
F (P) (A.10)
where
F (P) = ||I − AP||2F =
n∑j
||ej −Apj ||22 (A.11)
and ej and pj are the jth columns of the I and P respectively. This is referred to as a global iteration.
Alternatively, one can minimize each individual function in the summation as
minpj||ej −Apj ||22 ∀ j = 1, . . . , n (A.12)
This approach is often referred to as a column-oriented approach. The column-oriented approach favours
a parallel implementation. In either case, a sparsity pattern constraint is used to limit the amount of
fill in the approximate preconditioner [190]. Saad [38] describes some simple methods for performing
the constrained minimization. They include a minimal residual (MR) algorithm and a steepest descent
method. Typically, the minimization problem is solved inexactly (i.e. to some prescribed tolerance of
the objective function).
A sparse approximate inverse preconditioner may become a good alternative or complement to the
widely-used ILU preconditioner. In its column-oriented problem formulation an efficient parallel applica-
tion could be formulated. However, current research shows that it is still very expensive to form a sparse
approximate inverse preconditioner in comparison to ILU, although it may be more robust for some
cases. For the Newton–Krylov approach used in this thesis, the sparse approximate inverse computation
would be an expensive burden in the transient solution phase.
REFERENCES
[1] Lomax, H., Pulliam, T. H., and Zingg, D. W., Fundamentals of Computational Fluid Dynamics,Springer–Verlag, Berlin, Germany, 2001.
[2] Nemec, M., Zingg, D. W., and Pulliam, T. H., “Multipoint and multi-objective aerodynamic shapeoptimization,” AIAA Journal , Vol. 42, No. 6, 2004, pp. 1057–1065.
[3] Pulliam, T. H., Nemec, M., Holst, T., and Zingg, D. W., “Comparison of evolutionary (genetic)algorithm and adjoint methods for multi-objective viscous airfoil optimizations,” The 41st AIAAAerospace Sciences Meeting and Exhibit , No. AIAA–2003–0298, January 2003.
[4] Chisholm, T. T., A fully coupled Newton–Krylov solver with a one-equation turbulence model ,Ph.D. thesis, University of Toronto, 2007.
[5] Pueyo, A., An efficient Newton–Krylov method for the Euler and Navier–Stokes equations, Ph.D.thesis, University of Toronto, December 1997.
[6] Geuzaine, P., An implicit upwind finite volume method for compressible turbulent flows on un-structured grids, Ph.D. thesis, Universite de Liege, April 1999.
[7] Saad, Y. and Schultz, M. H., “GMRES: A generalized minimal residual algorithm for solving non-symmetric linear systems,” SIAM Journal on Scientific and Statistical Computing , 1986, pp. 7:856–869.
[8] Pueyo, A. and Zingg, D. W., “An efficient Newton–GMRES solver for aerodynamic computations,”AIAA Paper 97-1955, 1997.
[9] Saad, Y. and van der Vorst, H. A., “Iterative solution of linear systems in the 20th century,”Journal of Computational and Applied Mathematics, Vol. 123, 2000, pp. 1–33.
[10] Simoncini, V. and Szyld, D. B., “Recent computational developments in Krylov subspace methodsfor linear systems,” Numerical Linear Algebra with Applications, Vol. 14, 2007, pp. 1–59.
[11] van der Vorst, H., Iterative Krylov methods for large linear systems, Cambridge, 1st ed., 2003.
126
REFERENCES 127
[12] Hestenes, M. and Stiefel, E., “Methods of conjugate gradients for solving linear systems,” J. Res.Nat. Bur. Stand., Vol. 49, 1952, pp. 409–436.
[13] Meijerink, J. A. and van der Vorst, H. A., “An iterative solution method for linear systems ofwhich the coefficient matrix is a symmetric M-matrix,” Mathematics of Computation, Vol. 31, No.137, January 1977, pp. 148–162.
[14] Fletcher, R., “Conjugate gradient methods for indefinite systems,” Proceedings of the DundeeBiennal Conference on Numerical Analysis 1974 , edited by G. Watson, Springer Verlag, NewYork, 1975, pp. 73–89.
[15] Sonneveld, P., “CGS, a fast Lanczos-type solver for nonsymmetric linear systems,” SIAM J. Sci.Stat. Comput., Vol. 10, January 1989, pp. 36–52.
[16] van der Vorst, H. A., “BI-CGSTAB: a fast and smoothly converging variant of BI-CG for thesolution of nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing ,Vol. 13, No. 2, March 1992, pp. 631–644.
[17] de Sturler, E., “Truncation strategies for optimal Krylov subspace methods,” SIAM Journal onNumerical Analysis, Vol. 36, No. 3, 1999, pp. 864–889.
[18] Abe, K. and Sleijpen, G. L. G., “BiCR variants of the hybrid BiCG methods for solving linear sys-tems with nonsymmetric matrices,” Journal of Computational and Applied Mathematics, Vol. 234,June 2010, pp. 985–994.
[19] Lanczos, C., “An iteration method for the solution of the eigenvalue problem of linear differentialand integral operators,” J. Res. Nat. Bur. Stand., Vol. 45, 1950, pp. 255–282.
[20] Arnoldi, W., “The principle of minimized iterations in the solution of the matrix eigenvalue prob-lem,” Quarterly of Applied Mathematics, Vol. 9, 1951, pp. 17–29.
[21] Lanczos, C., “Solution of systems of linear equations by minimized iterations,” Journal of Researchof the National Bureau of Standards, Vol. 49, 1952, pp. 33–53.
[22] Paige, C. and Saunders, M., “Solution of sparse indefinite systems of linear equations,” SIAMJournal on Numerical Analysis, Vol. 12, 1975, pp. 617–629.
[23] Concus, P. and Golub, G., “A generalized conjugate gradient method for nonsymmetric systems oflinear equations,” Computer methods in Applied Sciences and Engineering, Second InternationalSymposium, edited by R. Glowinski and J. Lions, Springer Verlag, New York, December 1976, pp.56–65.
[24] Vinsome, P., “ORTHOMIN: an iterative method for solving sparse sets of simultaneous linearequations,” Proceedings of the Fourth Symposium of Reservoir Simulation, Society of PetroleumEngineers of AIME, 1976, pp. 149–159.
[25] Widlund, O., “A Lanczos method for a class of non-symmetric systems of linear equations,” SIAMJ. Numer. Anal., Vol. 15, 1978, pp. 801–802.
[26] Jea, K. and Young, D., “Generalized conjugate-gradient acceleration of nonsymmetrizable iterativemethods,” Linear Algebra and its Applications, Vol. 34, 1980, pp. 159–194.
[27] Saad, Y., “Krylov subspace methods for solving large unsymmetric linear systems,” Mathematicsof Computation, Vol. 37, 1981, pp. 105–126.
REFERENCES 128
[28] Paige, C. C. and Saunders, M. A., “LSQR: an algorithm for sparse linear equations and sparseleast squares,” ACM Transactions on Mathematical Software, Vol. 8, March 1982, pp. 43–71.
[29] Eisenstat, S. C., Elman, H. C., and Schultz, M. H., “Variational iterative methods for nonsym-metric systems of linear equations,” SIAM Journal on Numerical Analysis, Vol. 20, No. 2, April1983, pp. 345–357.
[30] Freund, R. W. and Nachtigal, N. M., “QMR: a quasi-minimal residual method for non-Hermitianlinear systems,” Numer. Math., 1991, pp. 60:315–339.
[31] Gutknecht, M. H., “Variants of BICGSTAB for matrices with complex spectrum,” SIAM J. Sci.Comput., Vol. 14, September 1993, pp. 1020–1033.
[32] Sleijpen, G. L. G. and Fokkema, D. R., “BiCGStab(l) for linear equations involving unsymmetricmatrices with complex spectrum,” Electronic Transactions on Numerical Analysis, 1993, pp. 1:11–32.
[33] Freund, R. W. and Nachtigal, N. M., “An implementation of the QMR method based on coupledtwo-term recurrences,” SIAM Journal on Scientific Computing , Vol. 15, March 1994, pp. 313–337.
[34] Weiss, R., “Error-minimizing Krylov subspace methods,” SIAM Journal on Scientific Computing ,Vol. 15, May 1994, pp. 511–527.
[35] Chan, T. F., Gallopoulos, E., Simoncini, V., Szeto, T., and Tong, C. H., “A quasi-minimal residualvariant of the Bi-CGSTAB algorithm for nonsymmetric systems,” SIAM Journal on ScientificComputing , Vol. 15, March 1994, pp. 338–347.
[36] Kasenally, E. M., “GMBACK: a generalized minimum backward error algorithm for nonsymmetriclinear systems,” SIAM J. Sci. Comput., Vol. 16, May 1995, pp. 698–719.
[37] Fokkema, D. R., Sleijpen, G. L. G., and van der Vorst, H. A., “Generalized conjugate gradientsquared,” Journal of Computational and Applied Mathematics, Vol. 71, July 1996, pp. 125–146.
[38] Saad, Y., Iterative Methods for Sparse Linear Systems, PWS Publishing Company, 1996.
[39] Baker, A. H., Jessup, E. R., and Manteuffel, T., “A technique for accelerating the convergenceof restarted GMRES,” SIAM Journal on Matrix Analysis and Applications, Vol. 26, No. 4, 2005,pp. 962–984.
[40] Morgan, R. B., “Restarted block-GMRES with deflation of eigenvalues,” Applied Numerical Math-ematics, Vol. 54, July 2005, pp. 222–236.
[41] Simoncini, V. and Gallopoulos, E., “An iterative method for nonsymmetric systems with multipleright-hand sides,” SIAM Journal on Scientific Computing , Vol. 16, No. 4, 1995, pp. 917–933.
[42] Simoncini, V. and Gallopoulos, E., “A hybrid bock GMRES method for nonsymmetric systemswith multiple right hand sides,” Journal of Computational and Applied Mathematics, Vol. 66, 1996.
[43] Kilmer, M., Miller, E., and Rappaport, C., “QMR-based projection techniques for the solutionof non-Hermitian systems with multiple right-hand sides,” SIAM J. Sci. Comput., Vol. 23, No. 3,2001, pp. 761–780.
[44] Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R.,Romine, C., and der Vorst, H. V., Templates for the solution of linear systems: building blocks foriterative methods, 2nd edition, SIAM, Philadelphia, Pennsylvania, 1994.
REFERENCES 129
[45] Trottenberg, U., Oosterlee, C., and Schuller, A., Multigrid , chap. An introduction to algebraicmultigrid by K. Stuben, Academic Press, 2001, pp. 413–532.
[46] Benzi, M., “Preconditioning techniques for large linear systems: a survey,” Journal of Computa-tional Physics, Vol. 182, No. 2, 2002, pp. 418–477.
[47] Briggs, W. L., Henson, V. E., and McCormick, S. F., A multigrid tutorial (2nd ed.), Society forIndustrial and Applied Mathematics, Philadelphia, Pennsylvania, 2000.
[48] Wesseling, P., “Introduction to Multigrid Methods,” ICASE Report No. 95-11, February 1995.
[49] Wagner, C., “Introduction to algebraic multigrid,” Course Notes of an Algebraic Multigrid Courseat the University of Heidelberg.
[50] Stuben, K., “A review of algebraic multigrid,” Tech. Rep. REP-AiS-1999-69, German NationalResearch Centre for Information Technology, November 1999.
[51] Stuben, K., “Algebraic multigrid (AMG): an introduction with applications,” Tech. Rep. REP-AiS-1999-70, German National Research Centre for Information Technology, November 1999.
[52] Lomax, H., Pulliam, T. H., and Zingg, D. W., Fundamentals of computational fluid dynamics,chap. Multigrid, Springer-Verlag, 2001, pp. 177–187.
[53] Yavneh, I., “Why multigrid methods are so efficient,” Computing in Science and Engineering ,Vol. 8, November 2006, pp. 12–22.
[54] Southwell, R., “Stress calculation in frameworks by the method of systematic relaxation of con-straints,” Proc. Roy. Soc. London, Vol. 151, 1935, pp. 56–95.
[55] Fedorenko, R., “Finite difference scheme for the Stefan problem,” Zhurnal Vychislitel’noi Matem-atiki i Matematicheskoi Fiziki , Vol. 15, 1961, pp. 1339–1344.
[56] Fedorenko, R., “A relaxation method for solving elliptic difference equations,” USSR Computa-tional Mathematics and Mathematical Physics, Vol. 1, No. 4, 1962, pp. 1092–1096.
[57] Fedorenko, R., “The speed of convergence of one iterative process,” USSR Computational Mathe-matics and Mathematical Physics, Vol. 4, No. 3, 1964, pp. 227–235.
[58] Bakhvalov, N., “On the convergence of a relaxation method with natural constraints on the ellipticoperator,” USSR Computational Mathematics and Mathematical Physics, Vol. 6, No. 5, 1966,pp. 101–135.
[59] Brandt, A., “A multi-level adaptive technique MLAT for fast numerical solution to boundary valueproblems,” Proc. 3rd Int’l Conf. Numerical Methods in Fluid Mechanics, edited by H. Cabannesand R. Temam, Springer, Paris, 1973, pp. 82–89, Lecture Notes in Physics 18.
[60] Brandt, A., “Algebraic multigrid theory: the symmetric case,” Preliminary Proceedings for theInternational Multigrid Conference, Copper Mountain, Colorado, April 1983.
[61] Brandt, A., McCormick, S., and Ruge, J., “Algebraic multigrid (AMG) for automated multigridsolutions with applications to geodetic computations,” Tech. rep., Inst. for Computational Studies,Fort Collins, Colorado, October 1982.
[62] Ruge, J. and Stuben, K., “Algebraic multigrid,” Frontiers in Applied Mathematics, Vol. 5, chap.Multigrid Methods, SIAM Press, Philadelphia, McCormick ed., 1987, pp. 73–130.
REFERENCES 130
[63] Jameson, A., “Solution of the Euler equations for two dimensional transonic flow by a multigridmethod,” Applied Mathematics and Computation, Vol. 13, 1983, pp. 327–356.
[64] Jameson, A., “Multigrid solutions of the Euler equations using implicit schemes,” AIAA Journal ,Vol. 24, No. 11, 1986, pp. 1737–1743.
[65] Martinelli, L. and Jameson, A. and Grasso, F., “A multigrid method for the Navier–Stokes equa-tions,” 1986.
[66] Mavriplis, D., “Multigrid Strategies for Viscous Flow Solvers on Anisotropic UnstructuredMeshes,” Journal of Computational Physics, Vol. 145, No. 1, 1998, pp. 141 – 165.
[67] Mavriplis, D. J., “Multigrid approaches to non-linear diffusion problems on unstructured meshes,”Numerical Linear Algebra with Applications, Vol. 8, 2001, pp. 499–512.
[68] Mavriplis, D. J., “As assessment of linear versus nonlinear multigrid methods for unstructuredsolvers,” Journal of Computational Physics, Vol. 175, 2002, pp. 302–325.
[69] Moinier, P. and Giles, M. B., “Preconditioned Euler and Navier–Stokes calculations on unstruc-tured grids,” 6th Conference on Numerical Methods for Fluid Dynamics, ICFD, Oxford, UK, 1998.
[70] Zeng, S. and Wesseling, P., “Multigrid solution of the incompressible Navier–Stokes equations ingeneral coordinates,” SIAM Journal on Numerical Analysis, Vol. 31, No. 6, 1994, pp. 1764–1784.
[71] Allmaras, S. R., “Multigrid for the 2-D compressible Navier–Stokes equations,” AIAA Paper 99-3336, 1999.
[72] Weiss, J. M., Maruszewski, J. P., and Smith, W. A., “Implicit solution of preconditioned Navier–Stokes equations using algebraic multigrid,” AIAA Journal , Vol. 37, 1999, pp. 29–36.
[73] Griebel, M., Neunhoeffer, T., and Regler, H., “Algebraic multigrid methods for the solution of theNavier–Stokes equations in complicated geometries,” International Journal for Numerical Methodsin Fluids, Vol. 26, 1998, pp. 281–301.
[74] Ollivier-Gooch, C., “Multigrid acceleration of an upwind Euler solver on unstructured meshes,”AIAA Journal , Vol. 33, No. 10, October 1995, pp. 1822–1827.
[75] Morano, E., Mavriplis, D., and Venkatakrishnan, V., “Coarsening strategies for unstructured multi-grid techniques with application to anisotropic problems,” ICASE Report No. 95-34, May 1995.
[76] Thomas, J. L., Diskin, B., and Brandt, A., “Textbook multigrid efficiency for the incompressibleNavier–Stokes equations: high Reynolds number wakes and boundary layers,” Computers andFluids, Vol. 30, 2001, pp. 853–874.
[77] Bordner, J. and Saied, F., “MGLab: An interactive multigrid environment,” Seventh CopperMountain Conference on Multigrid Methods, edited by N. D. Melson, T. A. Manteuffel, S. F.McCormick, and C. C. Douglas, Vol. CP 3339, NASA, Hampton, Virginia, 1996, pp. 57–71.
[78] Lassaline, J. V., A Navier–Stokes equation solver using agglomerated multigrid featuring directionalcoarsening and line implicit smoothing , Ph.D. thesis, University of Toronto, 2003.
[79] Lassaline, J. V. and Zingg, D. W., “Development of an agglomeration multigrid algorithm withdirectional coarsening,” AIAA Paper 99-3338, 1999.
[80] Manzano, L.M., Implementation of multigrid for aerodynamic computations on multi-block grids,Master’s thesis, University of Toronto, January 1999.
REFERENCES 131
[81] Chisholm, T., Multigrid acceleration of an approximately-factored algorithm for steady aerodynamicflows, Master’s thesis, University of Toronto, January 1997.
[82] Luksch, P., “Algebraic multigrid,” 2002, www.bode.cs.tum.edu/Par/appls/apps/amg.html.
[83] Raw, M., “A coupled algebraic multigrid method for the 3D Navier–Stokes equations,” Proceed-ings of the 10th GAMM-Seminar, Notes on numerical fluid mechanics, Vol. 49, Vieweg-Verlag,Wiesbaden, 1995.
[84] Raw, M., “Robustness of coupled algebraic multigrid for the Navier–Stokes equations,” AIAAPaper 96-0297, Reno, Nevada, January 1996.
[85] Cleary, A. J., Falgout, R. D., Henson, V. E., Jones, J. E., Manteuffel, T. A., McCormick, S. F.,Miranda, G. N., and Ruge, J. W., “Robustness and scalability of algebraic multigrid,” SIAM J.Sci. Comput., Vol. 21, No. 5, 2000, pp. 1886–1908.
[86] Brezina, M., Cleary, A. J., Falgout, R. D., Henson, V. E., Jones, J. E., Manteuffel, T. A., Mc-Cormick, S. F., and Ruge, J. W., “Algebraic multigrid based on element interpolation (AMGe),”SIAM Journal on Scientific Computing , Vol. 22, 2000, pp. 1570–1592.
[87] Chartier, T. P., Element-based algebraic multigrid (AMGe) and spectral AMGe, Ph.D. thesis,University of Colorado, 2001.
[88] Haase, G., Kuhn, M., and Reitzinger, S., “Parallel algebraic multigrid methods on distributedmemory computers,” SIAM Journal on Scientific Computing , Vol. 24, No. 2, 2002, pp. 410–427.
[89] Axelsson, O. and Vassilevski, P. S., “A black box generalized conjugate gradient solver with inneriterations and variable-step preconditioning,” SIAM Journal on Matrix Analysis and Applications,Vol. 12, August 1991, pp. 625–644.
[90] Saad, Y., “A flexible inner-outer preconditioned GMRES algorithm,” SIAM J. Sci. Stat. Comput.,Vol. 14, 1993, pp. 461–469.
[91] van der Vorst, H. A. and Vuik, C., “GMRESR: a family of nested GMRES methods,” NumericalLinear Algebra with Applications, Vol. 1, 1994, pp. 369–386.
[92] Szyld, D. B. and Vogel, J. A., “FQMR: a flexible quasi-minimal residual method with inexactpreconditioning,” SIAM Journal on Scientific Computing , Vol. 23, February 2001, pp. 363–380.
[93] Vogel, J. A., “Flexible BiCG and flexible Bi-CGSTAB for nonsymmetric linear systems,” AppliedMathematics and Computation, Vol. 188, No. 1, 2007, pp. 226 – 233.
[94] Hicken, J. E. and Zingg, D. W., “A simplified and flexible variant of GCROT for solving nonsym-metric linear systems,” SIAM Journal on Scientific Computing , Vol. 32, No. 3, 2010, pp. 1672–1694.
[95] Buleev, N. I., “A numerical method for solving two-dimensional diffusion equations,” AtomicEnergy , Vol. 6, 1960, pp. 222–224, 10.1007/BF01481461.
[96] Varga, R., “Factorization and normalized iterative methods,” Boundary problems in differentialequations, edited by R. Langer, University of Wisconsin Press, Madison, 1960, pp. 121–142.
[97] Oliphant, T., “An implicit, numerical method for solving two-dimensional time-dependent diffusionproblems,” Quarterly of Applied Mathematics, Vol. 19, 1961, pp. 221–229.
[98] Oliphant, T., “An extrapolation process for solving linear systems,” Quarterly of Applied Mathe-matics, Vol. 20, 1962, pp. 257–267.
REFERENCES 132
[99] Dupont, T., Kendall, R., and Rachford, H., “An approximate factorization procedure for solvingself-adjoint elliptic difference equations,” SIAM Journal on Numerical Analysis, Vol. 5, 1968,pp. 559–573.
[100] Manteuffel, T., “An incomplete factorization technique for positive definite linear systems,” Math-ematics of Computation, Vol. 34, No. 150, April 1980, pp. 473–497.
[101] Eisenstat, S., “Efficient implementation of a class of preconditioned conjugate gradient methods,”SIAM J. Sci. Statist. Comput., Vol. 2, 1981, pp. 1–4.
[102] Elman, H. C., “A stability analysis of incomplete LU factorizations,” Mathematics of Computation,Vol. 47, No. 175, July 1986, pp. 191–217.
[103] Bruaset, A. M., Tveito, A., and Winther, R., “On the stability of relaxed incomplete LU factor-izations,” Mathematics of Computation, Vol. 54, No. 190, April 1990, pp. 701–719.
[104] Chow, E. and Saad, Y., “Experimental study of ILU preconditioners for indefinite matrices,”Journal of Computational and Applied Mathematics, Vol. 86, No. 2, 1997, pp. 387–414.
[105] Gopaul, A., Sunhaloo, M., Boojhawon, R., and Bhuruth, M., “Analysis of incomplete factorizationsfor a nine-point approximation to a convection-diffusion model problem,” Journal of Computationaland Applied Mathematics, Vol. 224, 2009, pp. 719–733.
[106] Gustafsson, I., “A class of first-order factorization methods,” BIT , Vol. 18, 1978, pp. 142–156.
[107] Watts III, J., “A conjugate gradient truncated direct method for the iterative solution of thereservoir simulation pressure equation,” Society of Petroleum Engineers Journal , Vol. 21, 1981,pp. 345–353.
[108] Meijerink, J. A. and van der Vorst, H. A., “Guidelines for the usage of incomplete decompositionsin solving sets of linear equations as they occur in practical problems,” Journal of ComputationalPhysics, Vol. 44, 1981, pp. 134–155.
[109] Chapman, A., Saad, Y., and Wigton, L., “High-order ILU preconditioners for CFD problems,”International Journal for Numerical Methods in Fluids, Vol. 33, 2000, pp. 767–788.
[110] Zlatev, Z., “Use of iterative refinement in the solution of sparse linear systems,” SIAM Journal onNumerical Analysis, Vol. 19, 1982, pp. 381–399.
[111] Young, D. P., Melvin, R. G., Johnson, F. T., Bussoletti, J. E., Wigton, L. B., and Samant, S. S.,“Application of sparse matrix solvers as effective preconditioners,” SIAM Journal on Scientificand Statistical Computing , Vol. 10, November 1989, pp. 1186–1199.
[112] Gallivan, K., Sameh, A., and Zlatev, Z., “A parallel hybrid sparse linear system solver,” ComputingSystems in Engineering , Vol. 1, No. 2-4, 1990, pp. 183–195.
[113] D’Azevedo, E. F., Forsyth, P. A., and Tang, W.-P., “Ordering methods for preconditioned conju-gate gradient methods applied to unstructured grid problems,” SIAM Journal on Matrix Analysisand Applications, Vol. 13, July 1992, pp. 944–961.
[114] D’Azevedo, E. F., Forsyth, P. A., and Tang, W.-P., “Towards a cost-effective ILU preconditionerwith high-level fill,” BIT , Vol. 32, October 1992, pp. 442–463.
[115] Saad, Y., “ILUT: A dual threshold incomplete ILU factorization,” Numerical Linear Algebra withApplications, Vol. 1, 1994, pp. 387–402.
REFERENCES 133
[116] Jones, M. and Plassman, P., “An improved Cholesky factorization,” ACM Transactions on Math-ematical Software, Vol. 21, No. 5, 1995.
[117] van der Vorst, H. A., “Iterative solution methods for certain sparse linear systems with a non-symmetric matrix arising from PDE-problems,” Journal of Computational Physics, Vol. 44, No. 1,1981, pp. 1–19.
[118] Axelsson, O. and Lindskog, G., “On the eigenvalue distribution of a class of preconditioningmethods,” Numerische Mathematik , Vol. 48, 1986, pp. 479–498.
[119] Elman, H. C., “Relaxed and stabilized incomplete factorizations for non-self-adjoint linear sys-tems,” BIT , Vol. 29, 1989, pp. 890–915.
[120] Wittum, G. and Liebau, F., “On truncated incomplete decompositions,” BIT , Vol. 29, 1989,pp. 179–740.
[121] van der Vorst, H. A., “The convergence behaviour of preconditioned CG and CG-S in the presenceof rounding errors,” Preconditioned Conjugate Gradient Methods, edited by O. Axelsson and L. Y.Kolotilina, Nijmegen 1989, 1990, Lecture Notes in Mathematics 1457.
[122] Underwood, R., “An approximate factorization procedure based on the block Cholesky decompo-sition and its use with the conjugate gradient method,” Tech. Rep. NEDO-11386, General ElectricCo., Nuclear Energy Div., San Jose, California, 1976.
[123] Concus, P., Golub, G., and Meurant, G., “Block preconditioning for the conjugate gradientmethod,” SIAM Journal on Scientific and Statistical Computing , Vol. 6, 1985, pp. 220–252.
[124] Concus, P. and Meurant, G., “On computing INV block preconditioning for the conjugate gradientmethod,” BIT , Vol. 26, December 1986, pp. 493–504.
[125] Axelsson, O., “A general incomplete block-matrix factorization method,” Linear Algebra and itsApplications, Vol. 74, 1986, pp. 179–190.
[126] Magolu, M., “Modified-block-approximate factorization strategies,” Numerische Mathematik ,Vol. 61, 1992, pp. 91–110.
[127] Yun, J. H., “Block ILU preconditioners for a nonsymmetric block-tridiagonal M-matrix,” BITNumerical Mathematics, Vol. 40, 2000, pp. 583–605, 10.1023/A:1022328131952.
[128] Orkwis, P. D., “Comparison of Newton’s and quasi-Newton’s method solvers for the Navier–Stokesequations,” AIAA Journal , Vol. 31, No. 5, 1993, pp. 832–836.
[129] Duff, I. S. and Ucar, B., “Combinatorial problems in solving linear systems,” Invited presentationat Dagstuhl Seminar on Combinatorial Scientific Computing delivered by Iain S. Duff, February2009.
[130] Markowitz, H. M., “The elimination form of the inverse and its application to linear programming,”Management Science, Vol. 3, No. 3, April 1957, pp. 255–269.
[131] Tinney, W. F. and Walker, J. W., “Direct solutions of sparse network equations by optimallyordered triangular factorization,” Proceedings of the IEEE , Vol. 55, No. 11, 1967, pp. 1801–1809.
[132] Rosen, R., “Matrix bandwidth minimization,” Proceedings of the 1968 23rd ACM national confer-ence, ACM ’68, ACM, New York, New York, 1968, pp. 585–595.
REFERENCES 134
[133] Cuthill, E. and McKee, J., “Reducing the bandwidth of sparse symmetric matrices,” 24th NationalConference of the Association for Computing Machinery , No. ACM P-69, Brandon Press, NewYork, 1969.
[134] George, A., Computer implementation of the finite element method , Ph.D. thesis, Stanford Uni-versity, 1971.
[135] George, A., “Nested dissection of a regular finite-element mesh,” SIAM Journal on NumericalAnalysis, Vol. 10, 1973, pp. 345–363.
[136] Gibbs, N., Poole, Jr, W., and Stockmeyer, P., “An algorithm for reducing the bandwidth andprofile of a sparse matrix,” SIAM Journal on Numerical Analysis, Vol. 13, 1976, pp. 236–250.
[137] Sloan, S., “An algorithm for profile and wavefront reduction of sparse matrices,” InternationalJournal for Numerical Methods in Engineering , Vol. 23, No. 239, 1986.
[138] Baumann, M., Fleischmann, P., and Mutzbauer, O., “Double ordering and fill-in for the LUfactorization,” SIAM Journal on Matrix Analysis and Applications, Vol. 25, No. 3, 2003, pp. 630–641.
[139] Hassan, O., Morgan, K., and Peraire, J., “An implicit finite element method for high speed flows,”AIAA Paper 90-0402, Reno, Nevada, January 1990.
[140] Dutto, L. C., “The effect of ordering on preconditioned GMRES algorithm, for solving the com-pressible Navier–Stokes equations,” International Journal for Numerical Methods in Engineering ,Vol. 36, 1993, pp. 457–497.
[141] Duff, I. and Meurant, G., “The effect of ordering on preconditioned conjugate gradients,” BIT ,Vol. 29, No. 4, 1989, pp. 635–657.
[142] Hendrickson, B. and Rothberg, E., “Proceedings of the Eighth SIAM Conference on Parallel Pro-cessing for Scientific Computing, Hyatt Regency Minneapolis on Nicollel Mall Hotel, Minneapolis,Minnesota, USA,” PPSC , SIAM, 1997.
[143] Henon, P., Ramen, P., and Roman, J., “On finding approximate supernodes for an efficient block-ILU(k) factorization,” Parallel Computing , Vol. 34, 2008, pp. 345–362.
[144] Clift, S. and Tang, W.-P., “Wieghted graph-based ordering techniques for preconditioned conjugategradient methods,” BIT , Vol. 35, No. 30, 1995.
[145] Persson, P.-O. and Peraire, J., “Newton–GMRES preconditioning for discontinuous Galerkin dis-cretizations of the Navier–Stokes equations,” SIAM Journal on Scientific Computing , Vol. 30,No. 6, 2008, pp. 2709–2733.
[146] Liu, W. and Sherman, A. H., “Comparative analysis of the Cuthill-McKee and the reverse Cuthill-McKee ordering algorithms for sparse matrices,” SIAM Journal on Numerical Analysis, Vol. 13,No. 2, 1976, pp. 198–213.
[147] Benzi, M., Szyld, D. B., and Duin, A. V., “Orderings for incomplete factorization preconditioning ofnonsymmetric problems,” SIAM Journal on Scientific Computing , Vol. 20, No. 5, 1999, pp. 1652–1670.
[148] Pollul, B. and Reusken, A., “Numbering techniques for preconditioners in iterative solvers forcompressible flows,” International Journal for Numerical Methods in Fluids, Vol. 55, 2007, pp. 241–261.
REFERENCES 135
[149] Chisholm, T. T. and Zingg, D. W., “A Jacobian-free Newton-Krylov algorithm for compressibleturbulent fluid flows,” Journal of Computational Physics, Vol. 228, No. 9, 2009, pp. 3490–3507.
[150] Bondarabady, H. A. R. and Kaveh, A., “Nodal ordering using graph theory and a genetic algo-rithm,” Finite Elements in Analysis and Design, Vol. 40, June 2004, pp. 1271–1280.
[151] Dubois, P., Greenbaum, A., and Rodrigue, G., “Approximating the inverse of a matrix foruse in iterative algorithms on vector processors,” Computing , Vol. 22, 1979, pp. 257–268,10.1007/BF02243566.
[152] van der Vorst, H. A., “A vectorizable variant of some ICCG methods,” SIAM Journal on Scientificand Statistical Computing , Vol. 3, No. 3, September 1982, pp. 350–356.
[153] van der Vorst, H. A., “High performance preconditioning,” SIAM Journal on Scientific and Sta-tistical Computing , Vol. 10, No. 6, November 1989, pp. 1174–1185.
[154] Anderson, E. C. and Saad, Y., “Solving sparse triangular systems on parallel computers.” Inter-national Journal of High Speed Computing , Vol. 1, 1989, pp. 73–96.
[155] Elman, H. C. and Golub, G. H., “Line iterative methods for cyclically reduced discrete convection-diffusion problems,” SIAM J. Sci. Stat. Comput., Vol. 13, January 1992, pp. 339–363.
[156] Adams, L., LeVeque, R., and Young, D., “Analysis of the SOR iteration fo the q-point Laplacian,”SIAM Journal on Numerical Analysis, Vol. 25, 1988, pp. 1156–1180.
[157] Hysom, D. and Pothen, A., “Parellel ILU ordering and convergence relationships: numerical ex-periments,” Tech. Rep. CR-2000-210119, NASA, May 2000.
[158] Hysom, D. and Pothen, A., “Efficient parallel computation of ILU(k) preconditioners,” Tech. Rep.CR-2000-210120, NASA, May 2000.
[159] Schwarz, H. A., Gesammelte Mathematische Abhandlungen, Vol. 2, Springer, Berlin, 1890, pages133-143.
[160] Miller, K., “Numerical analogs to the Schwarz alternating procedure,” Numerische Mathematik ,Vol. 7, 1965, pp. 91–103.
[161] Mandel, J., “Two-level domain decomposition preconditioning for the p-version finite elementmethod in three dimensions,” Fourth Copper Mountain Conference on Multigrid Methods, CopperMountain, Colorado, April 1989.
[162] Knoll, D. A., McHugh, P. R., and Keyes, D. E., “Newton–Krylov methods for low-Mach-Numbercompressible combustion,” AIAA Journal , Vol. 34, No. 5, May 1996, pp. 961–967.
[163] Fischer, P. F., Miller, N. I., and Tufo, H. M., “An overlapping Schwarz method for spectral elementsimulation of three-dimensional incompressible flows,” IMA Domain Decomposition Workshop Pro-ceedings, 1997.
[164] Saad, Y., Sosonkina, M., and Zhang, J., “Domain decomposition and multi-level type techniquesfor general sparse linear systems,” Tech. Rep. umsi-97-244, Minnesota Supercomputer Institute,University of Minnesota, 1997.
[165] Gropp, W. D., Keyes, D. E., McInnes, L. C., and Tidriri, M. D., “Globalized Newton–Krylov-Schwarz algorithms and software for parallel implicit CFD,” The International Journal of HighPerformance Computing Applications, Vol. 14, No. 2, 2000, pp. 102–136.
REFERENCES 136
[166] Hicken, J. E. and Zingg, D. W., “A parallel Newton-Krylov solver for the Euler equations dis-cretized using simultaneous approximation terms,” AIAA Journal , Vol. 46, No. 11, Nov. 2008,pp. 2773–2786.
[167] Benson, M., Iterative solution of large scale linear systems, Master’s thesis, Lakehead University,1973.
[168] Benson, M. and Frederickson, P., “Iterative solution of large sparse linear systems arising in certainmultidimensional approximation problems,” Utilitas Mathematica, Vol. 22, 1982, pp. 127–140.
[169] Benzi, M., Meyer, C. D., and Tuma, M., “A sparse approximate inverse preconditioner for the con-jugate gradient method,” SIAM Journal on Scientific Computing , Vol. 17, No. 5, 1996, pp. 1135–1149.
[170] Kolotilina, L. Y. and Yeremin, A. Y., “On a family of two-level preconditionings of the incompleteblock factorization type,” Soviet Journal of Numerical Analysis and Mathematical Modeling , Vol. 1,1993, pp. 293–320.
[171] Kolotilina, L. Y. and Yeremin, A. Y., “Factorized sparse approximate inverse preconditionings I:theory,” SIAM Journal on Matrix Analysis and Applications, Vol. 14, January 1993, pp. 45–58.
[172] Grote, M. and Simon, H., “Parallel preconditioning and approximate inverses on the connectionmachine,” Parallel processing for scientific computing , edited by R. Sincovec, K. D.E., P. L.R.,and R. D.A., Vol. 2, SIAM, 1992, pp. 519–523.
[173] Cosgrove, J. D. F., Approximate inverses as parallel preconditionings, Ph.D. thesis, University ofOklahoma, 1992.
[174] Cosgrove, J., Diaz, J., and Griewank, A., “Approximate inverse preconditioning for sparse linearsystems,” International Journal of Computer Mathematics, Vol. 44, 1992, pp. 91–110.
[175] Chow, E. and Saad, Y., “Approximate inverse preconditioners for general sparse matrices,” Tech.Rep. umsi-94-101, Minnesota Supercomputer Institute, University of Minnesota, 1994.
[176] Huckle, T. and Grote, M. J., “A new approach to parallel preconditioning with sparse approximateinverses,” Tech. Rep. SCCM-94-03, SCCM Program, Stanford University, September 1994.
[177] Grote, M. J. and Huckle, T., “Parallel preconditioning with sparse approximate inverses,” SIAMJournal on Scientific Computing , Vol. 18, May 1997, pp. 838–853.
[178] Chow, E. and Saad, Y., “Approximate inverse techniques for block-partitioned matrices,” Tech.Rep. umsi-95-13, Minnesota Supercomputer Institute, University of Minnesota, 1995.
[179] Barnard, S. T. and Grote, M. J., “A block version of the SPAI preconditioner,” 9th SIAM Con-ference on Parallel Processing for Scientific Computing , March 1999.
[180] Chow, E. and Saad, Y., “Approximate inverse preconditioners via sparse-sparse iterations,” SIAMJournal on Scientific Computing , Vol. 19, 1998, pp. 995–1023.
[181] Sosonkina, M., “Sparse approximate inverses in preconditioning distributed linear systems,” Tech.Rep. TR-97-11, Department of Computer Science, Virginia Polytechnic Institute and State Uni-versity, 1997.
[182] Huckle, T., “Factorized sparse approximate inverses for preconditioning,” Journal of Supercom-puting , Vol. 25, June 2003, pp. 109–117.
REFERENCES 137
[183] Alleon, G., Benzi, M., and Giraud, L., “Sparse approximate inverse preconditioning for denselinear systems arising in computational electromagnetics,” Tech. Rep. TR/PA/97/05, CERFACS,1997.
[184] Carpentier, B., Duff, I. S., Giraud, L., and monga Made, M. M., “Sparse symmetric preconditionersfor dense linear systems in electromagnetism,” Tech. Rep. TR/PA/01/35, CERFACS, 2001.
[185] Guillaume, P., Saad, Y., and Sosonika, M., “Rational approximation preconditioners for generalsparse linear systems,” Tech. Rep. umsi-99-209, Minnesota Supercomputer Institute, University ofMinnesota, 1999.
[186] Chow, E., “A preori sparsity patterns for parallel sparse approximate inverse preconditioners,”SIAM Journal on Scientific Computing , Vol. 21, No. 5, 2000, pp. 1804–1822.
[187] Tang, W. P. and Wan, W. L., “Sparse approximate inverse smoother for multigrid,” SIAM Journalon Matrix Analysis and Applications, Vol. 21, No. 4, 2000, pp. 1236–1252.
[188] Broker, O., Grote, M. J., Mayer, C., and Reusken, A., “Robust parallel smoothing for multigridvia sparse approximate inverses,” SIAM J. Sci. Comput., Vol. 23, No. 4, 2001, pp. 1396–1417.
[189] Bollhoefer, M. and Saad, Y., “On the relations between ILUs and factored approximate inverses,”SIAM Journal on Matrix Analysis, Vol. 24, 2002, pp. 219–237.
[190] Huckle, T., Kallischko, A., Roy, A., Sedlacek, M., and Weinzierl, T., “An efficient parallel imple-mentation of the MSPAI preconditioner,” Parallel Computing , Vol. 36, 2010, pp. 273–284.
[191] Axelsson, O. and Vassilevski, P. S., “Algebraic multilevel preconditioning methods, I,” NumerischeMathematik , Vol. 56, 1989, pp. 157–177.
[192] Axelsson, O. and VAssilevski, P. S., “Algebraic multilevel preconditioning methods, II,” SIAMJournal on Numerical Analysis, Vol. 27, November 1990, pp. 1569–1590.
[193] van der Ploeg, A., Botta, E. F. F., and Wubs, F. W., “Nested grids ILU-decomposition (NGILU),”Journal of Computational and Applied Mathematics, Vol. 66, January 1996, pp. 515–526.
[194] Botta, E. F. F. and Wubs, F. W., “Matrix renumbering ILU: an effective algebraic multilevel ILUpreconditioner for sparse matrices,” SIAM Journal on Matrix Analysis and Applications, Vol. 20,No. 4, 1999, pp. 1007–1026.
[195] Saad, Y., “ILUM: A multi-elimination ILU preconditioner for general sparse matrices,” SIAMJournal on Scientific Computing , Vol. 17, No. 4, 1996, pp. 830–847.
[196] Vassilevski, P., “A block-factorization (algebraic) formulation of multigrid and Schwarz methods,”East-West Journal of Numerical Mathematics, Vol. 6, 1998, pp. 65–79.
[197] Bank, R. and Wagner, C., “Multilevel ILU Decomposition,” Numerische Mathematik , Vol. 82,1999, pp. 543–576.
[198] Saad, Y. and Zhang, J., “BILUM: Block versions of multielimination and multilevel ILU precondi-tioner for general sparse linear systems,” SIAM Journal on Scientific Computing , Vol. 20, No. 6,1999, pp. 2103–2121.
[199] Saad, Y. and Zhang, J., “BILUTM: A domain-based multilevel block ILUT preconditioner forgeneral sparse matrices,” SIAM Journal on Scientific Computing , Vol. 21, No. 1, 1999, pp. 279–299.
REFERENCES 138
[200] Saad, Y. and Zhang, J., “Enhanced multi-level block ILU preconditioning strategies for generalsparse linear systems,” Computational and Applied Mathematics, Vol. 130, 2001, pp. 99–188.
[201] Saad, Y. and Suchomel, B., “ARMS: An algebraic recursive multilevel solver for general sparselinear systems,” NLAA, Vol. 9, 2001, pp. 359–378.
[202] Saad, Y., Soulaimani, A., and Touihri, R., “Adapting algebraic recursive multilevel solvers (ARMS)for solving CFD problems,” Tech. Rep. umsi-2002-105, Minnesota Supercomputer Institute, Uni-versity of Minnesota, 2002.
[203] Shen, C. and Zhang, J., “Parallel two level block ILU preconditioning techniques for solving largesparse linear systems,” Parallel Computing , Vol. 28, 2002, pp. 1451–1475.
[204] Shen, C., Zhang, J., and Wang, K., “Distributed block independent set algorithms and parallelmultilevel ILU preconditioners,” Journal of Parallel and Distributed Computing , Vol. 65, No. 3,2005, pp. 331–346.
[205] Gu, T.-X., Chi, X.-B., and Liu, X.-P., “AINV and BILUM preconditioning techniques,” AppliedMathematics and Mechanics (English Edition), Vol. 25, No. 9, 2004, pp. 1012–1021.
[206] Saad, Y., “Multilevel ILU With reorderings for diagonal dominance,” SIAM Journal on ScientificComputing , Vol. 27, No. 3, 2005, pp. 1032–1057.
[207] Mayer, J., “A multilevel Crout ILU preconditioner with pivoting and row permutation,” NumericalLinear Algebra with Applications, Vol. 14, 2007, pp. 771–789.
[208] Bollhoefer, M. and Saad, Y., “Multilevel preconditioners constructed from inverse-based ILUs,”SIAM Journal on Scientific Computing , Vol. 27, No. 5, 2006, pp. 1627–1650.
[209] Notay, Y., “Using approximate inverses in multilevel methods,” Numerische Mathematik , Vol. 80,1998, pp. 397–417.
[210] Bollhoefer, M. and Mehrmann, V., “Algebraic multilevel methods and sparse approximate in-verses,” SIAM Journal on Matrix Analysis and Applications, Vol. 1, 2002, pp. 191–218.
[211] Meurant, G., “A multilevel AINV preconditioner,” Numerical Algorithms, Vol. 29, 2002, pp. 107–129.
[212] Axelsson, O. and Vasilevski, P. S., “A survey of multilevel preconditioned iterative methods,” BIT ,Vol. 29, No. 4, 1989, pp. 769–793.
[213] Oosterlee, C. W. and Washio, T., “An evaluation of parallel multigrid as a solver and a precondi-tioner for singularly perturbed problems,” SIAM Journal on Scientific Computing , Vol. 19, No. 1,1998, pp. 87–110.
[214] Braess, D., “Towards algebraic multigrid for elliptic problems of second order,” Computing , 1995,pp. 379–393.
[215] Hager, J. O. and Lee, K. D., “Effects of implicit preconditioners on solution acceleration schemesin CFD,” International Journal for Numerical Methods in Fluids, Vol. 22, No. 10, 1996, pp. 1023–1035.
[216] Oliveira, S. and Deng, Y., “Preconditioned Krylov subspace methods for transport equations,”Progress in Nuclear Energy , Vol. 33, No. 1/2, 1998, pp. 155–174.
REFERENCES 139
[217] Oosterlee, C. W. and Washio, T., “On the use of multigrid as a preconditioner,” Ninth InternationalConference on Domain Decomposition Methods, No. 52, Bergen, Norway, 1998, pp. 441–448.
[218] Washio, T. and Oosterlee, C. W., “Krylov subspace acceleration for nonlinear multigrid schemes,”Electronic Transactions on Numerical Analysis, Vol. 6, December 1997, pp. 271–290.
[219] Oosterlee, C. W. and Washio, T., “Krylov subspace acceleration of nonlinear multigrid with ap-plication to recirculating flows,” SIAM Journal on Scientific Computing , Vol. 21, No. 5, 2000,pp. 1670–1690.
[220] Wienands, R., Oosterlee, C. W., and Washio, T., “Fourier analysis of GMRES(m) preconditionedby multigrid,” SIAM Journal of Scientific Computing , Vol. 22, No. 2, 2000, pp. 582–603.
[221] Tuminaro, R., Tong, C., Shadid, J., Devine, K., and Day, D., “On a multilevel preconditioningmodule for unstructured mesh Krylov solvers: Two-level Schwarz,” Communications in NumericalMethods in Engineering , Vol. 18, 2002, pp. 363–389.
[222] Wang, Q. and Joshi, Y., “Algebraic multigrid preconditioned Krylov subspace methods for fluidflow and heat transfer on unstructured meshes,” Numerical Heat Transfer, Part B , Vol. 49, 2006,pp. 197–221.
[223] Pennacchio, M. and Simoncini, V., “Algebraic multigrid preconditioners for the bidomain reaction–diffusion system,” Appl. Numer. Math., Vol. 59, December 2009, pp. 3033–3050.
[224] Wigton, L., Yu, N., and Young, D., “GMRES acceleration of computational fluid dynamics codes,”No. 1494, AIAA, July 1985.
[225] Venkatakrishnan, V., “Newton solution of inviscid and viscous problems,” AIAA Journal , Vol. 27,No. 7, 1989, pp. 885–891.
[226] Johan, Z., Hughes, T., and Shakib, F., “A globally convergent matrix-free algorithm for implicittime-marching schemes arising in finite element analysis in fluids,” Computer Methods in AppliedMechanics and Engineering , Vol. 87, 1991, pp. 281–304.
[227] Ajmani, K., Preconditioned conjugate gradient methods for the Navier–Stokes equations, Ph.D.thesis, Virginia Polytechnic Institute and State University, 1991.
[228] Ajmani, K., Ng, W., and Liou, M., “Preconditioned conjugate gradient methods for the Navier–Stokes equations,” Journal of Computational Physics, Vol. 110, 1994, pp. 68–81.
[229] Venkatakrishnan, V. and Mavriplis, D. J., “Implicit solvers for unstructured meshes,” Journal ofComputational Physics, Vol. 105, 1993, pp. 83–91.
[230] McHugh, P. R. and Knoll, D. A., “Comparison of standard and matrix-free implementations ofseveral Newton–Krylov solvers,” AIAA Journal , Vol. 32, No. 12, 1994, pp. 2394–2400.
[231] Barth, T. J. and Linton, S. W., “An unstructured mesh Newton solver for compressible fluid flowand its parallel implementation,” AIAA Paper 95-0221, 1995.
[232] Nielsen, E. J. and Anderson, W. K. and Walters, R. W. and Keyes, D. E., “Application of Newton–Krylov methodology to a three-dimensional unstructured Euler code,” AIAA Paper 95-1733, 1995.
[233] Anderson, W. K., Rausch, R. D., and Bonhaus, D. L., “Implicit/Multigrid Algorithms for Incom-pressible Turbulent Flows on Unstructured Grids,” AIAA Paper 95-1740, 1995.
REFERENCES 140
[234] Anderson, W. K., Rausch, R. D., and Bonhaus, D. L., “Implicit/multigrid algorithms for incom-pressible turbulent flows on unstructured grids,” Journal of Computational Physics, Vol. 128, 1996,pp. 391–408.
[235] Dawson, C. N., Klie, H., Wheeler, M. F., and Woodward, C. S., “A parallel, implicit, cell-centeredmethod for two-phase flow with a preconditioned Newton–Krylov solver,” Computational Geo-sciences, Vol. 1, 1997, pp. 215–249.
[236] Wille, S. O., “Adaptive linearization and grid iterations with the tri-tree multigrid refinement-recoarsement algorithm for the Navier–Stokes equations,” International Journal for NumericalMethods in Fluids, Vol. 24, 1997, pp. 155–168.
[237] Blanco, M. and Zingg, D. W., “Fast Newton–Krylov method for unstructured grids,” AIAA Jour-nal , Vol. 36, No. 4, 1998, pp. 607–612.
[238] Pueyo, A. and Zingg, D. W., “Efficient Newton–Krylov solver for aerodynamic computations,”AIAA Journal , Vol. 36, No. 11, 1998, pp. 1991–1997.
[239] Geuzaine, P., Lepot, I., Meers, F., and Essers, J.-A., “Multilevel Newton–Krylov algorithms forcomputing compressible flows on unstructured meshes,” AIAA 14th Computational Fluid Dynam-ics Conference, No. 99-3341, Norfolk, Virginia, June 1999, pp. 750–760.
[240] Geuzaine, P., “Newton–Krylov strategy for compressible turbulent flows on unstructured meshes,”AIAA Journal , Vol. 39, No. 3, 2000, pp. 528–531.
[241] Gropp, W., Keyes, D., McInnes, L. C., and Tidriri, M. D., “Globalized Newton–Krylov–Schwarzalgorithms and software for parallel implicit CFD,” International Journal of High PerformanceComputing Applications, Vol. 14, May 2000, pp. 102–136.
[242] Chisholm, T. T. and Zingg, D. W., “A fully coupled Newton–Krylov solver for turbulent aerody-namic flows,” ICAS 2002 Congress, No. 333, 2002.
[243] Chisholm, T. T. and Zingg, D. W., “A Newton-Krylov algorithm for turbulent aerodynamic flows,”AIAA Paper 2003–0071, Reno, Nevada, January 2003.
[244] Chisholm, T. T. and Zingg, D. W., “Start-up issues in a Newton–Krylov algorithm for turbulentaerodynamic flows,” AIAA Paper 2003-3708, Orlando, Florida, June 2003.
[245] Zingg, D. W. and Chisholm, T. T., “Jacobian-free Newton-Krylov methods: issues and solutions,”Proceedings of The Fourth International Conference on Computational Fluid Dynamics, Ghent,Belgium, July 2006.
[246] Nemec, M. and Zingg, D. W., “Towards efficient aerodynamic shape optimization based on theNavier–Stokes equations,” AIAA Paper 2001-2532, June 2001.
[247] Nemec, M. and Zingg, D. W., “Newton–Krylov algorithm for aerodynamic design using the Navier–Stokes equations,” AIAA Journal , Vol. 40, No. 6, June 2002, pp. 1146–1154.
[248] Nemec, M., Zingg, D. W., and Pulliam, T. H., “Multi-point and multi-objective aerodynamicshape optimization,” AIAA Paper 2002-5548, September 2002.
[249] Nemec, M. and Zingg, D. W., “Optimization of high-lift configurations using a Newton–Krylovalgorithm,” AIAA Paper 2003-3957, Orlando, Florida, June 2003.
REFERENCES 141
[250] Nemec, M., Aftosmis, M. J., Murman, S. M., and Pulliam, T. H., “Adjoint formulation for anembedded-boundary Cartesian method,” 43rd AIAA Aerospace Sciences Meeting and Exhibit , No.AIAA–2005–0877, Reno, Nevada, 2005, NAS Technical Report NAS–05–008.
[251] Nemec, M. and Aftosmis, M. J., “Aerodynamic shape optimization using a Cartesian adjointmethod and CAD geometry,” AIAA Paper 2006-3456, San Francisco, California, June 2006.
[252] Nemec, M., Optimal shape design of aerodynamic configurations: A Newton–Krylov approach,Ph.D. thesis, University of Toronto, 2003.
[253] Gatsis, J., A fully-coupled algorithm for aerodynamic design optimization, Master’s thesis, Univer-sity of Toronto, 2001.
[254] Gatsis, J. and Zingg, D. W., “A fully-coupled Newton–Krylov algorithm for aerodynamic designoptimization,” AIAA Paper 2003-3956, Orlando, Florida, June 2003.
[255] Olawsky, F., Infed, F., and Auweter-Kurtz, M., “Preconditioned Newton method for computingsupersonic and hypersonic nonequilibrium flows,” AIAA Paper 2003-3072, 2003.
[256] Harrison, R. J., “Krylov subspace accelerated inexact Newton method for linear and nonlinearequations,” Journal of Computational Chemistry , Vol. 3, February 2004, pp. 328–334.
[257] Vandekerckhove, C., Kevrekidis, I., and Roose, D., “An Efficient Newton–Krylov implementationof the constrained runs scheme for initializing on a slow manifold,” Journal of Scientific Computing ,Vol. 39, May 2009, pp. 167–188.
[258] Nichols, J. C., A three-dimensional multi-block Newton–Krylov flow solver for the Euler equations,Master’s thesis, University of Toronto, 2004.
[259] Nichols, J. and Zingg, D. W., “A three-dimensional multi-block Newton-Krylov flow solver for theEuler equations,” AIAA Paper 2005-5230, Toronto, Canada, June 2005.
[260] Groth, C. and Northrup, S., “Parallel implicit adaptive mesh refinement scheme for body-fittedmulti-block mesh,” AIAA Paper 2005-5333, Toronto, Canada, June 2005.
[261] Bellavia, S. and Berrone, S., “Globalization strategies for Newton-Krylov methods for stabilizedFEM discretization of Navier-Stokes equations,” Journal of Computational Physics, Vol. 226, Oc-tober 2007, pp. 2317–2340.
[262] Nejat, A. and Ollivier-Gooch, C., “A high-order accurate unstructured GMRES algorithm forinviscid compressible flows,” AIAA Paper 2005-5341, Toronto, Canada, June 2005.
[263] Michalak, K. and Ollivier-Gooch, C., “Matrix-explicit GMRES for a higher-order accurate inviscidcompressible flow solver,” AIAA-Paper 2007-3943, Miami, Florida, June 2007.
[264] “A high-order accurate unstructured finite volume Newton-Krylov algorithm for inviscid compress-ible flows,” Journal of Computational Physics, Vol. 227, No. 4, 2008, pp. 2582–2609.
[265] Nejat, A. and Ollivier-Gooch, C., “Effect of discretization order on preconditioning and convergenceof a high-order unstructured Newton-GMRES solver for the Euler equations,” J. Comput. Phys.,Vol. 227, February 2008, pp. 2366–2386.
[266] Michalak, C. and Ollivier-Gooch, C., “Globalized matrix-explicit Newton–GMRES for the high-order accurate solution of the Euler equations,” Computers and Fluids, Vol. 39, No. 7, August2010, pp. 1156–1167.
REFERENCES 142
[267] Hicken, J. E. and Zingg, D. W., “Aerodynamic optimization algorithm with integrated geometryparameterization and mesh movement,” AIAA Journal , Vol. 48, No. 2, Feb. 2010, pp. 400–413.
[268] Northrup, S. and Groth, C., “Parallel implicit AMR scheme for unsteady reactive flows,” 18thAnnual Conference of the CFD Society of Canada, London, Canada, May 2010.
[269] Osusky, M., Hicken, J. E., and Zingg, D. W., “A parallel Newton–Krylov–Schur flow solver for theNavier-Stokes equations using the SBP–SAT approach,” AIAA Paper 2010-116, Orlando, Florida,January 2010.
[270] Osusky, M. and Zingg, D. W., “A parallel Newton–Krylov–Schur flow solver for the Reynolds-averaged Navier–Stokes equations,” AIAA Paper 2012-442, 2012.
[271] Lucas, P., van Zuijlen, A. H., and Bijl, H., “Fast unsteady flow computations with a Jacobian-free Newton-Krylov algorithm,” Journal of Computational Physics, Vol. 229, December 2010,pp. 9201–9215.
[272] Brieger, L. and Lecca, G., “Parallel multigrid preconditioning of the conjugate gradient method forsystems of subsurface hydrology,” Journal of Computational Physics, Vol. 142, 1998, pp. 148–162.
[273] Piquet, J. and Vasseur, X., “Multigrid preconditioned Krylov subspace methods for three-dimensional numerical solutions of the incompressible Navier–Stokes equations,” Numerical Al-gorithms, Vol. 17, 1998, pp. 1–32.
[274] Rider, W. J., Knoll, D. A., and Olson, G. L., “A multigrid Newton–Krylov method for mul-timaterial equilibrium radiation diffusion,” Journal of Computational Physics, Vol. 152, 1999,pp. 164–191.
[275] Mousseau, V. A., Knoll, D. A., and Rider, W. J., “Physics-based preconditioning and the Newton–Krylov method for non-equilibrium radiation diffusion,” Journal of Computational Physics,Vol. 160, 2000, pp. 743–765.
[276] Knoll, D. A. and Mousseau, V. A., “On Newton–Krylov multigrid methods for the incompressibleNavier–Stokes equations,” Journal of Computational Physics, Vol. 163, No. 1, 2000, pp. 262–267.
[277] Knoll, D. A. and Rider, W. J., “A multigrid preconditioned Newton–Krylov method,” SIAMJournal on Scientific Computing , Vol. 21, No. 2, 1999, pp. 691–710.
[278] Jones, J. E. and Woodward, C. S., “Newton–Krylov-multigrid solvers for large-scale, highly hetero-geneous, variably saturated flow problems,” Advances in Water Resources, Vol. 24, 2001, pp. 763–774.
[279] Pernice, M. and Tocci, M. D., “A multigrid-preconditioned Newton–Krylov method for the incom-pressible Navier–Stokes equations,” SIAM Journal on Scientific Computing , Vol. 23, No. 2, 2001,pp. 398–418.
[280] Wu, J., Srinivasan, V., Xu, J., and Wang, C. Y., “Newton–Krylov Multigrid Algorithms for BatterySimulation,” Journal of teh Electrochemical Society , Vol. 149, No. 10, 2002, pp. A1342–A1348.
[281] Syamsudhuha and Silvester, D. J., “Efficient solution of the steady-state Navier–Stokes equationsusing a multigrid preconditioned Newton–Krylov method,” International Journal for NumericalMethods in Fluids, Vol. 43, 2003, pp. 1407–1427.
[282] Elman, H. C., Loghin, D., and Wathen, A. J., “Preconditioning techniques for Newton’s method forthe incompressible Navier–Stokes equations,” BIT Numerical Mathematics, Vol. 43, 2003, pp. 961–974.
REFERENCES 143
[283] Knoll, D. A. and Keyes, D. E., “Jacobian-free Newton–Krylov methods: a survey of approachesand applications,” Journal of Computational Physics, Vol. 193, 2004, pp. 357–397.
[284] Diosady, L. T. and Darmofal, D. L., “Preconditioning methods for discontinuous Galerkin solutionsof the Navier–Stokes equations,” Journal of Computational Physics, Vol. 228, June 2009, pp. 3917–3935.
[285] Spalart, P. R. and Allmaras, S. R., “A one-equation turbulence model for aerodynamic flows,”AIAA Paper 92-0439, January 1992.
[286] Spalart, P. R. and Allmaras, S. R., “A one-equation turbulence model for aerodynamic flows,” LaRecherche Aerospatiale, 1994, pp. 5–21.
[287] Ashford, G. A., An unstructured grid generation and adaptive solution technique for high Reynoldsnumber compressible flows, Ph.D. thesis, University of Michigan, 1996.
[288] Pulliam, T. H., “Efficient solution methods for the Navier–Stokes equations,” Lecture Notes ForThe Von Karman Institute For Fluid Dynamics Lecture Series: Numerical Techniques For ViscousFlow Computation In Turbomachinery Bladings, January 1986.
[289] Hirsch, C., Numerical computation of internal and external flows, Vol. 2, John Wiley & Sons, 1994.
[290] Anderson, J., Fundamentals of aerodynamics, McGraw Hill Inc., 2nd ed., 1991.
[291] Patankar, S. V., Numerical Heat Transfer and Fluid Flow , chap. Convectiona and Diffusion,McGraw-Hill, 1980.
[292] Hicken, J. E. and Zingg, D. W., “Globalization strategies for inexact-Newton solvers,” AIAA Paper2009-4139, San Antonio, Texas, June 2009.
[293] Ilinca, F. and Pelletier, D., “Positivity preservation and adaptive solution for the k − ε model ofturbulence,” AIAA Journal , Vol. 36, No. 1, 1998, pp. 44–50.
[294] Wong, P. and Zingg, D. W., “Three-dimensional aerodynamic computations on unstructured gridsusing a Newton–Krylov approach,” Computers and Fluids, Vol. 37, No. 2, 2008, pp. 107–120.
[295] Christara, C. C., “Matrix Computations, Numerical Linear Algebra,” 2001, Course notes.
[296] Eiermann, M., Ernst, O. G., and Schneider, O., “Analysis of acceleration strategies for restartedminimal residual methods,” Journal of Computational and Applied Mathematics, Vol. 123, 2000,pp. 261–292.
[297] Brown, P. N. and Hindmarsh, A. C., “Matrix-free methods for stiff systems of ODE’s,” SIAMJournal on Numerical Analysis, Vol. 23, No. 3, June 1986, pp. 610–638.
[298] Eiermann, M. and Ernst, O. G., “Geometric aspects of the theory of Krylov subspace methods,”Acta Numerica, 2001, pp. 251–312.
[299] Catinas, E., “Inexact perturbed Newton methods and applications to a class of Krylov solvers,”Journal of Optimization Theory and Applications, Vol. 108, No. 3, 2001, pp. 543–571.
[300] Lyness, J. N. and Moler, C. B., “Numerical differentiation of analytic functions,” SIAM Journalon Numerical Analysis, Vol. 4, No. 2, 1967, pp. 202–210.
[301] Soulaimani, A., Salah, N. B., and Saad, Y., “Acceleration of GMRES convergence for some CFDproblems: Preconditioning and stabilization techniques,” Tech. Rep. umsi-2000-165, MinnesotaSupercomputer Institute, University of Minnesota, 2000.
REFERENCES 144
[302] Soulaimani, A., Salah, N. B., and Saad, Y., “Enhanced GMRES acceleration techniques for someCFD problems,” International Journal of CFD , Vol. 16, No. 1, March 2002, pp. 1–20.
[303] Saad, Y., “Preconditioned Krylov subspace methods for CFD applications,” Tech. Rep. umsi-94-171, Minnesota Supercomputer Institute, University of Minnesota, 1994.
[304] Saad, Y., “SPARSKIT: a basic tool kit for sparse matrix computations,” Tech. rep., http://www.cs.umn.edu/ Research/ arpa/ SPARSKIT/ sparskit.html, 1994.
[305] Hicken, J. E., Efficient algorithms for future aircraft design: Contributions to aerodynamic shapeoptimization, Ph.D. thesis, University of Toronto, 2009.
[306] Kaveh, A., Zahedi, A., and Laknegadi, K., “A novel ordering algorithm for profile optimization byefficient solution of a differential equation,” International Journal for Computer-Aided Engineer-ing , Vol. 24, No. 6, 2007, pp. 572–585.
[307] Richardson, L. F., “The approximate arithmetical solution by finite differences of physical problemsinvolving differential equations, with an application to the stresses in a masonry dam,” Philosoph-ical Transactions of the Royal Society of London, Vol. 210, 1911, pp. 307–357.
[308] Hemker, P. W., “On the order of prolongations and restrictions in multigrid procedures,” Journalof Computational and Applied Mathematics, Vol. 32, No. 3, 1990, pp. 423–429.
[309] Davis, L., “Order-based genetic algorithm and the graph coloring problem,” Handbook of GeneticAlgorithms, Von Nostrand Reinhold, 1991.
[310] Zuliani, G., Aerodynamic flow calculations using finite-differences and multigrid , Ph.D. thesis,University of Toronto, 2004.