Preconditioning Techniques for a Newton{Krylov Algorithm ... · The one-equation Spalart{Allmaras turbulence model is used. ... {McKee ordering. An evolutionary algorithm is used

Preconditioning Techniques for a Newton–Krylov Algorithmfor the Compressible Navier–Stokes Equations

by

John Gatsis

A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Graduate Department of Institute for Aerospace StudiesUniversity of Toronto

Copyright c© 2013 by John Gatsis

Abstract

PRECONDITIONING TECHNIQUES FOR A NEWTON–KRYLOV ALGORITHM

FOR THE COMPRESSIBLE NAVIER–STOKES EQUATIONS

John Gatsis

<[email protected]>

Doctor of Philosophy

Graduate Department of Institute for Aerospace Studies

University of Toronto

2013

An investigation of preconditioning techniques is presented for a Newton–Krylov algorithm that is used

for the computation of steady, compressible, high Reynolds number flows about airfoils. A second-

order centred-difference method is used to discretize the compressible Navier–Stokes (NS) equations that

govern the fluid flow. The one-equation Spalart–Allmaras turbulence model is used. The discretized

equations are solved using Newton’s method and the generalized minimal residual (GMRES) Krylov

subspace method is used to approximately solve the linear system. These preconditioning techniques

are first applied to the solution of the discretized steady convection-diffusion equation.

Various orderings, iterative block incomplete LU (BILU) preconditioning and multigrid precondi-

tioning are explored. The baseline preconditioner is a BILU factorization of a lower-order discretization

of the system matrix in the Newton linearization. An ordering based on the minimum discarded fill

(MDF) ordering is developed and compared to the widely popular reverse Cuthill–McKee ordering. An

evolutionary algorithm is used to investigate and enhance this ordering. For the convection-diffusion

equation, the MDF-based ordering performs well and RCM is superior for the NS equations. Experiments

for inviscid, laminar and turbulent cases are presented to show the effectiveness of iterative BILU pre-

conditioning in terms of reducing the number of GMRES iterations, and hence the memory requirements

of the Newton–Krylov algorithm. Multigrid preconditioning also reduces the number of GMRES itera-

tions. The framework for the iterative BILU and BILU-smoothed multigrid preconditioning algorithms

is presented in detail.

ii

Acknowledgements

It is said that it takes a village to raise a child. This analogy serves well in the sense that without the

help and support of many people this thesis would not have been possible.

I would like to thank my parents, Peter and Angela, without whom I would not have been able to

reach this milestone.

Professor Zingg has been an incredible supervisor. A few words that best describe him include:

patient, brilliant and supportive. Through the highs and lows of this journey, he was a source of wisdom

and encouragement. I truly think of him as one of the most important mentors in my life.

Thank you to the members of the doctoral examination committee including its chair Professor

Clinton Groth and Professor Hugh Liu. Professor Groth, I truly appreciate our many discussions on the

progress of this research and your encouragement. Thank you also to Professor Christina Christara for

taking the time to meet and discuss important aspects of this research. Professor Anthony Straatman,

thank you for offering your expertise in your capacity as external reviewer.

I’d like to thank the members of both the computational aerodynamics and computational propulsion

groups that jointly share the research facility at UTIAS. From the past members, I’d especially like to

thank David Kam, Dr. Peterson Wong, Dr. Jai Sachdev and Dr. Tim Leung for their friendship and

Dr. Todd Chisholm for his help early in this research. From the current members, I’d like to thank

David Del Ray Fernandez and Michal Osusky for being sounding boards, as well as Mo Tabesh, Hugo

Gagnon, Ramy Rashad, Nasim Shahbazian, and Lana Olague for their friendship. Michal, thank you for

taking the time to integrate the reordering strategy to the 3D code and subsequently exploring it. I’d

like to thank Oleg Chernukhin for his efficient introduction on evolutionary algorithms. Thank you also

to Dr. Marc Charest and Dr. James McDonald for answering and tending to all computing questions.

Last and certainly not least, I’d like to thank Dr. Jason Hicken for his friendship and for lending me his

incredible talent to help understand some of the more advanced concepts in graph theory and discrete

mathematics. To all of you, as well as those not mentioned, I wholeheartedly wish the best of luck in

their current work and future endeavours.

Thank you to the entire UTIAS staff, especially Peter, Gail, Joan, Clara, Nora and Rosanna. Peter,

from day one you have been a great friend and mentor to me. I would also like to thank all of the

professors and instructors at UTIAS.

Thank you Anna and Joe for all of your moral support and encouragement over the years. Also,

thank you to all of my friends, loved ones, and mentors, not already mentioned. I truly appreciate all of

your help.

I would also like to acknowledge and thank the Government of Canada, the Government of Ontario,

and the University of Toronto for their financial support.

iii

CONTENTS

1 INTRODUCTION 11.1 Solution Methods for Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Multigrid Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.1 Incomplete Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.2 Parallel Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2.3 Multilevel Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3 The Newton–Krylov Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.3.1 Multigrid Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.4 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.5 Organization of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 GOVERNING EQUATIONS 192.1 The Navier–Stokes Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1.1 Generalized Curvilinear Coordinate Transformation . . . . . . . . . . . . . . . . . 212.1.2 Thin-Layer Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2 The Spalart–Allmaras Turbulence Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2.1 Generalized Curvilinear Coordinate Transformation . . . . . . . . . . . . . . . . . 25

2.3 Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.4 The Convection-Diffusion Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.4.1 The Steady 1D Convection-Diffusion Equation . . . . . . . . . . . . . . . . . . . . 262.4.2 The Steady 2D Convection-Diffusion Equation . . . . . . . . . . . . . . . . . . . . 26

3 SPATIAL DISCRETIZATION 293.1 The Navier–Stokes Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2 The Spalart–Allmaras Turbulence Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.3 Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3.1 Airfoil Body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3.2 Inflow and Outflow Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3.3 Wakecut Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4 The Jacobian of the Nonlinear System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.5 The Convection-Diffusion Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.5.1 The Grid Peclet Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.5.2 The 2D Convection-Diffusion Equation . . . . . . . . . . . . . . . . . . . . . . . . . 363.5.3 The Jacobian of the Discretized Equations . . . . . . . . . . . . . . . . . . . . . . . 38

4 SOLUTION ALGORITHM 394.1 Solving the Nonlinear System: Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . 404.2 Newton Globalization: Pseudo-Transient Continuation . . . . . . . . . . . . . . . . . . . . 404.3 Solving the Linear System: GMRES Krylov Subspace Method . . . . . . . . . . . . . . . . 42

4.3.1 Projection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.3.2 GMRES Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.3.3 Convergence of GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

iv

4.3.4 Practical Aspects of the Newton–GMRES Algorithm . . . . . . . . . . . . . . . . . 45

5 PRECONDITIONING 475.1 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.1.1 Jacobian-Vector Products in GMRES . . . . . . . . . . . . . . . . . . . . . . . . . 505.2 Incomplete LU (ILU) Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.2.1 Effect of Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.3 Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.3.1 Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.3.2 Minimum Degree (MD) Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.3.3 Reverse Cuthill–McKee (RCM) Ordering . . . . . . . . . . . . . . . . . . . . . . . 575.3.4 Minimum Discarded Fill (MDF) Ordering . . . . . . . . . . . . . . . . . . . . . . . 59

5.4 Multigrid Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.4.1 Stationary Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.4.2 ILU(p) as a Smoother . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.4.3 Iterative ILU(p) as a Preconditioner . . . . . . . . . . . . . . . . . . . . . . . . . . 685.4.4 ILU(p)-Smoothed Geometric Multigrid as a Preconditioner . . . . . . . . . . . . . 695.4.5 Reordering and Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.5 Chapter Summary and Highlights of Contributions . . . . . . . . . . . . . . . . . . . . . . 76

6 RESULTS 796.1 Convection-Diffusion Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.1.1 GMRES convergence and Peclet number . . . . . . . . . . . . . . . . . . . . . . . . 816.1.2 Iterative ILU(p) preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846.1.3 ILU(p) and multigrid preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . 846.1.4 Orderings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856.1.5 Further investigation of MDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.2 Euler and Navier–Stokes Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976.2.1 Test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976.2.2 Orderings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1036.2.3 Iterative BILU(p) preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 1056.2.4 BILU(p) and multigrid preconditioning . . . . . . . . . . . . . . . . . . . . . . . . 108

7 CONCLUSIONS, CONTRIBUTIONS AND RECOMMENDATIONS 1177.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

7.1.1 Convection-Diffusion Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1177.1.2 Euler and Navier–Stokes Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

7.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217.3 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

A OTHER PRECONDITIONING TECHNIQUES 125A.1 Domain Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125A.2 Sparse Approximate Inverse Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . 127

REFERENCES 129

v

LIST OF FIGURES

1.1 Geometric versus algebraic multigrid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1 Curvilinear coordinate transformation courtesy of Lomax, Pulliam, and Zingg [1]. . . . . . 222.2 A C-topology grid about a NACA0012 airfoil (units are in chord lengths). . . . . . . . . . 222.3 The solution to the 1D convection-diffusion equation for several Peclet numbers. . . . . . 27

3.1 Normal and tangential directions at the boundaries. . . . . . . . . . . . . . . . . . . . . . 323.2 Sparsity pattern of sample A1 and A2 Jacobians using a natural ordering. . . . . . . . . . 353.3 Close-up view of the numerical solution to the 1D convection-diffusion equation for various

grid Peclet numbers on a 101-node computational grid. . . . . . . . . . . . . . . . . . . . . 36

5.1 Contributions to ajk from pivot aii in the elimination algorithm. . . . . . . . . . . . . . . 545.2 A four-grid, multigrid V-cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.3 Full-weighting restriction operator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.4 Full-weighting prolongation operator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.1 Convergence of GMRES, solution, and eigenvalues of system matrix with and withoutILU(1) preconditioning of a uniform grid case with a Peclet number of 0.001. . . . . . . . 81

6.2 Convergence of GMRES, solution, and eigenvalues of system matrix with and withoutILU(1) preconditioning of a uniform grid case with a Peclet number of 1000. . . . . . . . 82

6.3 Initial system matrix for a 5 × 5–node grid with a Peclet number 109. Upward- anddownward-facing triangles represent positive and negative values, respectively. . . . . . . . 90

6.4 System matrix after very small entries are discarded. Upward- and downward-facingtriangles represent positive and negative values, respectively. . . . . . . . . . . . . . . . . 90

6.5 Resulting matrix after MDF ordering. Upward- and downward-facing triangles representpositive and negative values, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.6 Resulting matrix after a random permutation. Upward- and downward-facing trianglesrepresent positive and negative values, respectively. . . . . . . . . . . . . . . . . . . . . . . 91

6.7 LU-factorization of the randomly-permuted matrix. . . . . . . . . . . . . . . . . . . . . . . 926.8 Resulting matrix after MDF for the randomly-permuted matrix. Upward- and downward-

facing triangles represent positive and negative values, respectively. . . . . . . . . . . . . . 926.9 Convergence and solution for the subsonic inviscid case, E1. . . . . . . . . . . . . . . . . . 1006.10 Convergence and solution for the transonic inviscid case, E2. . . . . . . . . . . . . . . . . 1006.11 Convergence and solution for the laminar case, L1. . . . . . . . . . . . . . . . . . . . . . . 1016.12 Convergence and solution for the subsonic turbulent case, T1. . . . . . . . . . . . . . . . . 1016.13 Convergence and solution for the transonic turbulent case, T2. . . . . . . . . . . . . . . . 102

vi

LIST OF TABLES

1.1 A history of popular Krylov subspace and related methods (1950-1999). . . . . . . . . . . 4

3.1 Errors for various Peclet numbers for various discretizations of the 1D convection-diffusionequation on a uniform 101-node computational grid. . . . . . . . . . . . . . . . . . . . . . 37

4.1 Continuation parameters for Newton’s method. . . . . . . . . . . . . . . . . . . . . . . . . 42

5.1 SGS (left) and ILU(0) (right) iterations on a 21 × 21–node grid for various initial errorfrequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.2 SGS (left) and ILU(0) (right) iterations on a 41 × 41–node grid for various initial errorfrequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.3 ILU(0) (left) and ILU(1) (right) iterations on a 41× 41–node grid for various initial errorfrequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.4 ILU(0) (left) and ILU(1) (right) iterations on an 81×81–node grid for various initial errorfrequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.5 ILU(0) (left) and ILU(0)+MG (right) iterations on a 41× 41–node grid for various initialerror frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.6 ILU(0) (left) and ILU(0)+MG (right) iterations on an 81×81–node grid for various initialerror frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.7 ILU(0) (left) and ILU(0)+MG (right) iterations on a 161 × 161–node grid for variousinitial error frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.8 ILU(1) (left) and ILU(1)+MG (right) iterations on a 41× 41–node grid for various initialerror frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.9 ILU(1) (left) and ILU(1)+MG (right) iterations on an 81×81–node grid for various initialerror frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.10 ILU(1) (left) and ILU(1)+MG (right) iterations on a 161 × 161–node grid for variousinitial error frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.1 GMRES iterations and CPU times with (left) ILU(0) and (right) ILU(1) preconditioningon a 129× 129-node grid for various Peclet numbers. . . . . . . . . . . . . . . . . . . . . . 83

6.2 GMRES iterations and CPU times with (left) ILU(0) and (right) ILU(1) preconditioningon a 129× 129-node grid for a Peclet number of 0.001. . . . . . . . . . . . . . . . . . . . . 83

6.3 GMRES iterations and CPU times with (left) ILU(0) and (right) ILU(1) preconditioningon a 129× 129-node grid for a Peclet number of 1000. . . . . . . . . . . . . . . . . . . . . 84

6.4 GMRES iterations for various multigrid preconditioners with ILU(0) smoothing (Pe =0.001). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.5 GMRES iterations for various multigrid preconditioners (Pe = 1000). . . . . . . . . . . . 866.6 GMRES iterations for various orderings using ILU(0) multigrid preconditioning (129×129

nodes and Pe = 0.001). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866.7 GMRES iterations for various orderings using ILU(0) preconditioning (129 × 129 nodes

and Pe = 1000). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876.8 GMRES iterations for various orderings using ILU(0) multigrid preconditioning (257×257

nodes and Pe = 0.001). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876.9 GMRES iterations for various orderings using ILU(1) multigrid preconditioning (257×257

nodes and Pe = 0.001). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

vii

6.10 GMRES iterations for various orderings using ILU(1) preconditioning (257 × 257 nodesand Pe = 1000). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.11 Locations of root nodes that correspond to minimized discarded fill using an evolutionaryalgorithm for convection-dominated (Pe = 109) and diffusion-dominated (Pe = 10−9)cases for a 5× 5–node grid with flow angle of θ = 15. . . . . . . . . . . . . . . . . . . . . 94

6.12 GMRES iterations for MDF-ILU(p) preconditioners (Pe = 1000) and a comparison toRCM. Note: The most upstream node is (1,1). . . . . . . . . . . . . . . . . . . . . . . . . 96

6.13 Computational grids for Euler and Navier–Stokes calculations. . . . . . . . . . . . . . . . 976.14 Test cases for Euler and Navier–Stokes calculations. . . . . . . . . . . . . . . . . . . . . . 986.15 Baseline Newton (IN ) iterations, GMRES (IG) iterations, and CPU times for all Euler

and Navier-Stokes test cases solved using BILU(p) preconditioning. . . . . . . . . . . . . . 996.16 Performance of Newton–Krylov algorithm using BILU(p) with various orderings. . . . . . 1046.17 Performance of Newton–Krylov algorithm using multiple BILU(p) preconditioning for

inviscid test cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066.18 Performance of Newton–Krylov algorithm using multiple BILU(p) preconditioning for

laminar and turbulent test cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076.19 Performance of Newton–Krylov algorithm using multiple BILU(p) preconditioning for

inviscid subsonic test case, E1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1086.20 Performance of Newton–Krylov algorithm using multiple BILU(p) preconditioning for

turbulent transonic test case, T2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096.21 Performance of Newton–Krylov algorithm using BILU(p) or 2-level multigrid precondi-

tioning for inviscid test cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106.22 Performance of Newton–Krylov algorithm using BILU(p) or 2-level multigrid precondi-

tioning for laminar and turbulent test cases. . . . . . . . . . . . . . . . . . . . . . . . . . . 1116.23 Performance of Newton–Krylov algorithm using BILU(p) and 2- or 3-level multigrid pre-

conditioning for inviscid, laminar and turbulent test cases. . . . . . . . . . . . . . . . . . . 1136.24 Finer grid cases for Euler and Navier–Stokes calculations. . . . . . . . . . . . . . . . . . . 1146.25 Performance of Newton–Krylov algorithm using BILU(p) and 2-, 3- or 4-level multigrid

preconditioning for finer-grid test cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

viii

NOTATION

ABBREVIATIONSAF approximate factorization

AINV approximate inverse

AMG algebraic multigrid

ARMS algebraic recursive multilevel solver

BFILU block-filled incomplete lower-upper

BILU block-incomplete lower-upper

Bi-CG bi-conjugate gradient

Bi-CGStab bi-conjugate gradient-stable

CFD computational fluid dynamics

CG conjugate gradient

CGS conjugate gradient squared

CGW Concus–Golub–Widlund

CM Cuthill–McKee

DG discontinuous Galerkin

FAS full-approximation storage

FMG full multigrid

FOM full orthogonalization method

GA genetic algorithm

GCR generalized conjugate residual

GCROT generalized conjugate residual withinner orthogonalization and outertruncation

GMG geometric multigrid

GMRES generalized minimal residual

IC incomplete Cholesky

ILU incomplete lower-upper

ILUM incomplete lower-upper multilevel

ILUT incomplete lower-upper truncated

IOM incomplete orthogonalization method

LU lower-upper

MD minimum degree

MDF minimum discarded fill

MG multigrid

MINRES minimal residual

MR minimal residual

MRILU matrix renumbering ILU

NGILU nested grids ILU

NK Newton–Krylov

NKS Newton–Krylov–Schwarz

NP non-deterministic polynomial-hard

NS Navier–Stokes

QMR quasi minimal residual

RCM reverse Cuthill–McKee

SA Spalart–Allmaras

SGS symmetric Gauss–Seidel

SPAI sparse approximate inverse

TFQMR transpose-free quasi-minimal residual

ix

ALPHANUMERICAD artificial dissipation

D destructive turbulent term

E inviscid flux (x)

E transformed inviscid flux (x)

Ev viscous flux (x)

F inviscid flux (y)

F transformed inviscid flux (y)

Fv viscous flux (y)

J metric Jacobian

M Mach number;convective turbulent term

N diffusive turbulent term

Q conservative flow variables

Q transformed conservative flow variables

P productive turbulent term

R radius, residual

Rkv ratio of residuals

S entropy, vorticity

S transformed thin-layer viscous flux

U contravariant velocity (x)

V contravariant velocity (y)

Vn normal velocity component

Vt tangential velocity component

a speed of sound; continuation parameter

a∞ free-stream speed of sound

a∞ dimensional free-stream speed of sound

b linear system right hand side;continuation parameter

c chord

cp specific heat at constant pressure

cv specific heat at constant volume

dw Spalart–Allmaras wall distance

e total energy; error

e dimensional total energy

f source term

fti Spalart–Allmaras transition functions

fvi Spalart–Allmaras functions

fw Spalart–Allmaras destructive function

hi,m Hessenberg matrix entry

k ILU fill-in parameter

kstart continuation iterations threshold

mi viscous flux vector variables

n number of nodes

p pressure

r residual vector

rm residual vector

t time

u x-component of velocity

u dimensional x-component of velocity

v y-component of velocity

v dimensional y-component of velocity

vm Krylov search direction

ym Krylov update coefficient

zm preconditioned Krylov search direction

x

CALLIGRAPHICA linear system matrix, Newton Jacobian

A1 first-order Jacobian

A2 second-order Jacobian

Ah fine grid operator

A2h coarse grid operator

B domain decomposition submatrix

D diagonal

E domain decomposition submatrix

E error matrix

F domain decomposition submatrix

G domain decomposition submatrix;iteration matrix

Hm mth upper–Hessenberg matrix

I identity matrix

I2hh restriction operator

Ih2h prolongation (interpolation) operator

K Krylov subspace

L lower factorization; Krylov leftsubspace

L incomplete lower factorization

M preconditioning matrix;approximate inverse

Ml left preconditioning matrix

Mr right preconditioning matrix

N splitting matrix

O order

P relaxation splitting matrix; smoother

Pr Prandtl number

Pe Peclet number

PBGS backward Gauss–Seidel smoother

PFGS forward Gauss–Seidel smoother

PGS Gauss–Seidel smoother

PILU ILU smoother

PSGS symmetric Gauss–Seidel smoother

Q relaxation splitting matrix

Re Reynolds number

S relaxation iteration matrix;Schur complement;sparsity pattern

S1 row scaling matrix

S2 column scaling matrix

Sc row scaling matrix

Sr column scaling matrix

T relaxation scaled right hand side

U upper factorization

U incomplete upper factorization

V basis of Krylov subspace

W basis of Krylov left subspace

X matrix of eigenvectors

xi

GREEKΓ local preconditioner

∆ change

Λ matrix of eigenvalues

Υ scalar dissipation pressure switch

α angle of attack;continuation parameter

β continuation parameter

γ ratio of specific heats

ε finite-difference perturbation constant

εj,k scalar dissipation ratio function

η curvilinear normal coordinate

θx wave angle in the x-direction

θy wave angle in the y-direction

κ condition number

κi scalar dissipation coefficients

κe condition number for the Eulerequations

κt thermal conductivity

λmax maximum eigenvalue

λmin minimum eigenvalue

µ dynamic viscosity

µt turbulent dynamic eddy viscosity

ν Spalart–Allmaras working variable

νt kinematic eddy viscosity

ξ curvilinear tangential coordinate

ρ density; spectral radius

ρ dimensional density; spectral radius

ρ∞ free-stream density

ρ∞ dimensional free-stream density

σ spectral radius of the flux Jacobian

τ curvilinear time transformation

τij viscous stress tensor

φ ILU factorization pivot ratio;scalar quantity

xii

Chapter 1

INTRODUCTION

H γνωση ειναι η αληθινη απoψη. (Knowledge is true opinion.)

– Πλατων (Plato)

Fluid dynamics affects everyone. It relates to systems that are large like the currents of the Earth’s

ocean and atmosphere, to as small as the human cardiovascular system. It relates to both the natural

and technological world. For the latter, examples include aircraft aerodynamics, the drag of automobiles

and ships, the flow through a pipeline, and the performance of wind and hydroelectric turbines.

Predicting the aerodynamic performance of aircraft is essential to their design. Theoretical results can

only be taken so far and usually apply to simplified models of real flows. Although realistic, experimental

results for scaled prototypes are expensive and slow to generate. In this regard, and with increasing

computational resources, computational fluid dynamics (CFD) has come to the forefront as an essential

part of aircraft design. CFD algorithms are simply called flow solution algorithms, or flow solvers for

short.

A paramount goal in aerospace is to develop flow solvers that provide efficient numerical solutions

of the compressible Navier–Stokes equations. Having a fast and reliable flow solver is arguably the

most essential component in an optimization framework. Both gradient-based [2] and gradient-free [3]

1

Chapter 1. INTRODUCTION 2

optimization algorithms require many function evaluations, or flow solutions. Hence, improving the

efficiency of a flow solver will yield a compounded improvement in the efficiency of an optimization

algorithm.

The compressible Navier–Stokes equations are highly nonlinear. Aerodynamic flows are turbulent,

typically have high Reynolds numbers and especially in take-off and landing or transonic conditions are

extremely complicated. These conditions only add to the computational cost of the flow solver.

A popular approach to solving the discretized compressible Navier–Stokes equations is the Newton–

Krylov method . Newton–Krylov methods are discussed in detail in [4–6]. Some other fields where the

Newton–Krylov algorithm can be applied include combustion, magnetohydrodynamics, and structural

analysis. In most cases, each Newton iteration requires the solution of a large, sparse linear system.

A necessary component in an efficient Newton algorithm is a fast linear system solver. An iterative

approach is preferred in this work. Krylov methods are popular iterative methods for solving large,

sparse linear systems.

The generalized minimal residual (GMRES) [7] method is one of the most popular Krylov subspace

methods. While GMRES is fast when compared to other classical techniques such as approximate

factorization [8], other challenges arise. Practical applications involve solving poorly-conditioned non-

symmetric linear systems. Effective preconditioning of the system is an essential component in the

solution process. Current preconditioners contribute significantly to the overall computational cost of

the Newton–Krylov method. An effective way to improve the performance of a solution algorithm is

therefore to improve the preconditioner.

Preconditioning of the linear problem is the focus of this thesis. It spans many areas of applied math-

ematics: from classical iterative methods to advanced iterative methods; from incomplete factorizations

to sparse approximate inverses; from multigrid to general multilevel methods; from simple orderings to

parallelization; and from any of the aforementioned methods to combinations thereof.

In order to make sense of how all of these preconditioning methods interrelate, some of the history

of solution methods for linear systems is presented. From there, the discussion shifts to a delineation

of preconditioning and related methods. In particular, incomplete factorizations, parallel precondition-

ers, sparse approximate inverses, and multilevel preconditioners are discussed. Preconditioning is then

connected to the Newton–Krylov method. Once the background is presented, preconditioners that are

selected for this research are identified. This chapter concludes with a list of objectives, supported by

additional motivation. Although a brief survey of parallel preconditioning is presented, the focus of this

thesis is on serial aspects of preconditioning.

1.1 SOLUTION METHODS FOR LINEAR SYSTEMS 3

1.1 Solution Methods for Linear Systems

1.1.1 Iterative Methods

Classical iterative methods can be traced back to the work of Gauss and Seidel in the 19th century.

Approaches based on matrix splittings were developed and/or analyzed in the mid 20th century by

Richardson, Frankel, Young, Peaceman, Rachford, Varga, Stone, Kendall, and Dupont. Saad [9] provides

an excellent history of early iterative methods and is the key source of this review on contributors to

classical iterative methods.

Two key milestones in the advancement of iterative methods are the introduction of projection

methods as early as the 1930s and Chebyshev acceleration in the 1950s. Key contributors to the latter

include Frankel, Gavurin [9], Young, Lanczos, Golub and Varga. These milestones opened the door to

the very powerful family of iterative methods used to this day (and in this thesis): Krylov subspace

methods.

Simoncini and Szyld [10], van der Vorst [11] and Saad [9] discuss many of the details of Krylov

subspace methods. For completeness, the most popular Krylov methods that were introduced from their

inception in the 1950s until the 1990s are listed in Table 1.1. For symmetric systems, the conjugate

gradient (CG) method [12] is a very popular choice. A milestone solver by Meijerink and van der

Vorst [13] couples the CG method with an incomplete Cholesky (IC) factorization preconditioner and

call the method ICCG. For nonsymmetric problems, the bi-conjugate gradient (Bi-CG) [14], generalized

minimal residual (GMRES) [7], conjugate gradient squared (CGS) [15], bi-conjugate gradient-stable

(Bi-CGStab) [16], and generalized conjugate residual with inner orthogonalization and outer truncation

(GCROT) [17] methods are all popular choices. New Krylov methods (mostly based on hybridization

of existing methods) are developed regularly. A recent reference by Abe and Sleijpen [18] demonstrates

this.

Krylov subspace methods require additional measures to improve their robustness, speed, and to

satisfy memory limitations imposed by computers. There are (at least) four measures that can be

taken to improve the performance of these methods: restarting; truncating; deflating/augmenting; and

preconditioning. The last is of most concern in this research and will soon be discussed in intense detail.

Flexible variants of Krylov subspace methods that facilitate advanced preconditioning are discussed in

Section 1.2.

Restarting refers to allowing a maximum-allowable subspace size. GMRES(m) [38] is an example.

Truncating the orthogonalization procedure (inherent to these methods) yields an improvement. Exam-

ples include incomplete orthogonalization methods (IOMs) based on the full orthogonalization method

(FOM) [27], the aforementioned GCROT method, and LGMRES [39] method which truncates GMRES.

Deflating and augmenting the subspace with vectors from outer cycles can be used in conjunction with a

restarted algorithm to improve performance. Morgan [40] presented an example of a restarted GMRES

algorithm with deflation for block systems. In addition to these improvements, it is possible to adapt


Table 1.1: A history of popular Krylov subspace and related methods (1950-1999).

Year Creator(s) Method Reference

1950 Lanczos Lanczos [19]

1951 Arnoldi Arnoldi [20]

1952 Hestenes and Stiefel CG [12]

1952 Lanczos Lanczos (CG) [21]

1975 Paige and Saunders MINRES [22]

1975 Paige and Saunders SYMMLQ [22]

1975 Fletcher Bi-CG [14]

1976 Concus and Golub CGW [23]

1977 Vinsome ORTHOMIN [24]

1977 Meijerink and van der Vorst ICCG [13]

1978 Widlund CGW [25]

1980 Jea and Young ORTHODIR [26]

1981 Saad FOM [27]

1982 Paige and Saunders LSQR [28]

1983 Eisenstat et al. GCR [29]

1986 Saad and Schultz GMRES [7]

1989 Sonneveld CGS [15]

1991 Freund and Nachtigal QMR [30]

1992 van der Vorst Bi-CGStab [16]

1993 Gutknecht Bi-CGStab2 [31]

1993 Sleijpen and Fokkema Bi-CGStab(l) [32]

1994 Freund TFQMR [33]

1994 Weiss GMERR [34]

1994 Chan et al. QMR-BiCGStab [35]

1995 Kasenally and Ebrahim GMBACK [36]

1996 Fokkema et al. CGS2 [37]

1999 de Sturler GCROT [17]

Krylov methods to efficiently solve linear systems involving multiple right-hand sides [41–43].

Barrett et al. [44] deciphered the various stopping criteria that are used in many Krylov subspace

methods. Other recent advancements in Krylov subspace methods can be found in [10]. They include,

but are not limited to, application to general complex matrices and parametrized systems.


1.1.2 Multigrid Acceleration

In order to understand the concept of multigrid, one must understand the concept of a smoother. At each

iteration of an iterative method there remains an error in the solution, which is the difference between the

current iterate and the exact solution. Consider the error as being divided into high- and low-frequency

groups. A smoother is an operator that is able to rapidly-decrease high-frequency errors. For example,

the Gauss-Seidel operator is an excellent smoother for problems with symmetric positive definite (SPD)1

matrices. Entire methods can be smoothers. Krylov subspace methods are an example [45].

Multigrid is used to accelerate these smoothers. The multigrid method effectively reduces the low-

frequency errors by projecting the problem onto a coarser domain. On this domain, the projected

low-frequency errors appear as higher-frequency errors due to the increase in grid spacing (or cell or

element size in other discretizations). The smoother is applied to the coarsened problem resulting

in an improvement in performance. This process is usually repeated to a depth of several levels. A

very attractive property of the multigrid method is that for certain problems it is an algorithmically-

scalable [46], or equivalently, an O(n) method.

Multigrid methods are part of a broader classification called multilevel methods which will be discussed

in more detail in Section 1.2.3. Briefly, multilevel methods are a generalization of methods that employ

the solution of a problem on a fine domain by using one or several smaller coarser subdomains. Multigrid

methods, such as the full approximation storage (FAS) scheme, can also be applied to nonlinear problems.

Some excellent references for multigrid include works by Briggs et al. [47], Wesseling [48], Trottenberg

et al. [45], Wagner [49], Stuben [45,50,51], Lomax et al. [52], and a recent paper by Yavneh [53].

Linear multigrid is of particular interest in this research. There are two popular approaches to using

linear multigrid: geometric multigrid and algebraic multigrid. Geometric multigrid (GMG) is based on

using existing grid hierarchies to obtain coarse- and inter-grid operators. In algebraic multigrid (AMG),

these operators are obtained by using the matrix entries and no known grid hierarchy is needed. GMG

is simple to understand but its implementation must be catered to each problem. AMG is difficult to

set up but can be treated as a black box to solve many problems. AMG on its own is not comparable

to geometric multigrid in terms of speed, however, with its abstraction from the physical domain, AMG

can handle a wider range of problems than geometric multigrid, making it much more robust. In this

research, GMG is considered for three key reasons: AMG has a much higher setup cost that is difficult

to amortize over the linear iterations; it requires more memory; and the problems under investigation

here have been traditionally solved with GMG. Figure 1.1 provides a concise comparison of GMG and

AMG. There are other ways to classify linear multigrid methods. In the finite element community for

instance, one can compare h-multigrid (based on a grid hierarchy) to p-multigrid (based on the degree

of an element) to hp-multigrid.

Early contributors to multigrid include Southwell (as indicated by Saad [54]), Fedorenko [55–57],

1A symmetric positive definite matrix, A, is a matrix that, in addition to being symmetric, satisfies xTAx > 0 ∀ x 6= 0.

1.2 PRECONDITIONING 6

Multigrid Approach

Algebraic

Variable hierarchybased on algebraic

system

Algebraic systems

Geometric

Fixed hierarchybased on PDEs

or geometry

Grid equations

Figure 1.1: Geometric versus algebraic multigrid.

and Bakhvalov [58]. Brandt [59] is generally given credit as being the first to use multigrid for practical

applications. Algebraic multigrid was introduced in the 1980s by pioneers such as Brandt [60,61], Ruge

and Stuben [50,51,62]. Brandt [61] also credited McCormick as an early contributor to AMG.

Extensive research on the use of multigrid (both linear and nonlinear) in CFD applications has been

performed by researchers such as Jameson et al. [63–65], Mavriplis [66–68], Moinier and Giles [69], Zeng

and Wesseling [70], Allmaras [71], Weiss et al. [72], Griebel et al. [73], Ollivier–Gooch [74], Morano et

al. [75], Thomas et al. [76], Bordner and Saied [77], Lassaline [78,79], Manzano [80], and Chisholm [81].

Luksch [82] provides a brief online introduction to AMG. Raw [83,84] investigated AMG as a solver

for the 3D Navier–Stokes equations. Cleary et al. [85] investigated the robustness and scalability of

AMG for a broad range of problems. Brezina et al. [86] and Chartier [87] used AMG to solve problems

that are discretized by using finite elements, and Haase et al. [88] investigated the parallelization of such

applications.

1.2 Preconditioning

Some of the most popular preconditioning methods are presented. The methods introduce novel ideas

as well as inherent aspects from the linear solution methods already discussed. They include incomplete

factorizations, parallel preconditioners, and multilevel preconditioners. The use of multigrid, a specific

multilevel method, is discussed in detail in the context of preconditioning.

Krylov subspace methods have been adapted to facilitate more advanced preconditioners. These

preconditioners can themselves be methods and can vary from one Krylov subspace iteration to the

next. Flexible Krylov methods, that is Krylov subspace methods with flexible preconditioning, emerged

in the early 1990s by necessity to handle these more advanced preconditioners. Examples include: the


flexible conjugate gradient method by Axelsson and Vassilevski [89]; flexible GMRES by Saad [90];

another flexible GMRES variant by van der Vorst and Vuik [91]; and flexible QMR by Szyld and

Vogel [92]. More recent examples include flexible Bi-CG and Bi-CGStab methods by Vogel [93] and a

flexible GCROT method by Hicken and Zingg [94].

1.2.1 Incomplete Factorizations

Incomplete factorizations date back to the early 1960s to Buleev [95], Varga [96], Oliphant [97, 98],

and Dupont [99]. In 1977, Meijerink and van der Vorst [13] investigated the incomplete Cholesky

(IC) factorization as a preconditioner for the CG method. It is regarded as the first instance that an

incomplete LU factorization (ILU) was used as a preconditioner. Note, an ILU factorization of an M-

matrix2 is equivalent to an IC factorization. Manteuffel [100] investigated a similar preconditioner for

the conjugate gradient method for symmetric positive-definite (SPD) systems using a shifted incomplete

Cholesky factorization. Eisenstat [101] is also credited as having one of the earliest implementations of

an efficient form of ILU.

A key issue for ILU is stability. In general it is not known if a ILU will break down for a specific

matrix. Meijerink and van der Vorst [13], Elman [102], and Bruaset et al. [103] investigated the stability

of ILU and concluded that ILU is only guaranteed to be stable for M-matrices. Benzi [46] explained

some measures that can be taken for more general matrices to improve ILU. Her focus was on reducing

instability due to small pivots and triangular solves. She introduced measures of accuracy and stability.

Chisholm [4] used a condition number estimate (as well as other approaches) to measure the quality of

an incomplete factorization. Chow and Saad [104] experimented with various approaches for avoiding

instability in the ILU algorithm. Aspects such as pivoting, reordering, scaling, diagonal perturbation

and symmetrical preservation were explored. Recently, Gopaul et al. [105] investigated the stability of

ILU(0) for a nine-point high-order compact discretization of the convection-diffusion equation in two

dimensions.

There are several variants to ILU. They include the introduction of fill-in, use of a drop-tolerance,

modification of the diagonal, block representation and parallel implementation. The fill-in of a matrix

refers to where entries of the factorization are located relative to the original matrix sparsity pattern.

The original ILU approach is a zero fill-in approach. A non-zero fill-in was introduced by Gustafsson [106]

and later generalized by Watts [107]. Meijerink and van der Vorst [108] are also early contributors to

this concept. Chapman et al. [109] investigated high-accuracy ILU preconditioners which use a larger

than usual amount of fill-in. Much like fill-in can be used to control the newly-introduced entries in

the factorization based on the sparsity pattern, a drop-tolerance strategy controls what new entries

are permitted based on size. Zlatev [110] first introduced the concept of using a threshold for ILU.

Other contributors to this concept include Young et al. [111], Gallivan et al. [112], and D’Azevedo

2An M-matrix is a matrix with positive diagonal elements and non-positive off-diagonal elements. Hence, its inverseonly contains non-negative elements.


et al. [113, 114]. Saad [115] implemented a dual-parameter ILU strategy where both fill-in and drop-

tolerance control the entries of the factorization. Jones and Plassman [116] implemented a black-box

threshold-ILU strategy that automates the drop-tolerance strategy.

In fill-in and drop-tolerance ILU, the entries that do not satisfy criteria for retainment are simply

discarded. In modified ILU (or MILU), the entries are added (in some sense) to the main diagonal. MILU

dates back to the late 1970s and key contributors include Gustaffson [106], van der Vorst [117], Axelsson

and Lindskog [118], and Elman [119]. Wittum and Liebau [120] introduced a so-called truncated ILU

which is somewhat related to MILU. Benzi [46] pointed out that MILU tends to perform poorly on

nonmodel problems because it is more susceptible to rounding errors. This conclusion was drawn based

on the work of van der Vorst [121].

Block variations of ILU cater to matrices that arise from the discretization of a system of PDEs.

Examples include Underwood [122], Concus et al. [123,124], Axelsson [125], Magolu [126], and Yun [127].

Block ILU, or BILU, is different from block-fill ILU, or BFILU. In block ILU, the blocks in the matrix

are treated as matrices and are inverted during the factorization. In block-fill ILU, a fill-in level of zero

is assigned to the block pattern, but the factorization is performed on the matrix that is populated by

scalars. Examples of implementations for the latter include Pueyo [5] and Orkwis [128]. Chisholm [4]

used a block ILU preconditioner in his compressible Navier–Stokes equations solver.

The quality of ILU is very sensitive to ordering. However, finding an ordering of a system that

minimizes fill-in is an NP-complete problem [129]. Orderings that have been implemented over the

years have been satisfactory in improving ILU for specific applications without necessarily matching the

ordering that solves the NP-complete problem.

Heuristic ordering methods have been around since the 1950s. Ordering methods were originally

intended to minimize storage costs for direct solvers. This is achieved through the minimization of

fill-in. Orderings can be divided into two classes: graph-based and matrix-based.

Graph-based orderings include the original orderings intended to save memory for direct solvers.

The minimum degree algorithm is considered to be the earliest example. It dates back to the 1950s

to Markowitz [130], who looked at minimizing products within the elimination algorithm. Tinney and

Walker [131] later generalized this approach using graph theory and eliminated the column in the factor-

ization that has the fewest neighbours (i.e. minimum degree). Duff and Ucar [129] discussed the many

variations of the minimum degree ordering that have been examined over the years. The minimum degree

algorithms are inherently local since the choice of the next pivot is not connected to future elimination

steps.

Global (graph-based) ordering strategies enforce bounds on fill-in by applying strategic permutations

to the matrix. Examples (in chronological order) include: Rosen [132] ordering; Cuthill-McKee (CM)

ordering [133]; reverse Cuthill-McKee (RCM) ordering, by George [134]; nested dissection ordering by

George [135]; Gibbs ordering by Gibbs et al. [136]; Sloan ordering [137]; double ordering by Baumann

et al. [138]; snake ordering by Hassan et al. [139]; and orderings by Martin (found in [140]). Additional


ordering methods are presented in [141]. Of these orderings, the two most important would arguably be

nested dissection and RCM. The nested dissection ordering removes column(s) to decouple the system

matrix into two separate parts. It is inherently recursive and parallel. Hendrickson and Rothberg [142]

implemented a hybrid ordering of nested dissection and minimum degree. RCM attempts to minimize

the bandwidth (or profile) of the matrix, thus confining the fill-in to a ‘more narrow’ matrix. More recent

graph-based orderings use broader portions of the matrix graph, called cliques and supernodes [46]. The

latter leads to parallel implementation of ILU. A recent example of a transition to parallel ILU through

the use of supernodes is the work done by Henon et al. [143].

Since the 1990s, matrix-based orderings have increased in popularity. These orderings incorporate the

size of the entries in the matrix in some sense as well as the graph to the decision process for subsequent

pivots. Clift and Tang [144] compared several of these orderings. Benzi [46] discussed the permutation

of large entries to the main diagonal. The minimum discarded fill (MDF) ordering [113,114] introduced

by D’Azevedo, Forsyth, and Tang is of particular interest in this research. In this ordering, a pivot is

chosen so as to minimize the discarded fill-in for that pivoting step over all remaining candidate pivots.

The ordering is relatively slow compared to RCM but improvements and simplifications to the algorithm

have made it more competitive.

Persson and Peraire [145] recently implemented a reordering algorithm that is related to MDF and

that is used to form their ILU preconditioner for the Newton–Krylov solver for the Navier–Stokes equa-

tions. Their research is of particular interest and will be discussed in Section 1.3.1.

Numerous studies on orderings have been conducted over the years. Here, some important ones are

pointed out. Liu and Sherman [146] compared CM to RCM and found that RCM is at least as good as

CM in terms of storage and computational cost. They suggested it may have to do with the fact that

RCM minimizes the distance between the next vertex and its ordered neighbours, whereas CM minimizes

the distance between the next vertex and its unordered neighbours. Duff and Meurant [141] found that

for some simple SPD problems, the ICCG method is not improved by orderings such as minimum

degree and RCM unless a higher level of fill-in is allowed. However, Dutto [140] later investigated the

effect of various orderings (including minimum degree, (reverse) Cuthill-McKee, Gibbs and snake) for

incomplete factorization preconditioning for the compressible Navier-Stokes equations and found that

reordering the system favourably can greatly improve the quality of the preconditioner. She found

that for GMRES, an RCM reordering is an excellent choice. Clift and Tang [144] modified the RCM

algorithm so that nodes that tie are sorted by their ascending degree. Pueyo [5] further validated this

point in his comparison of various ordering strategies where he found RCM best improves his Newton–

Krylov flow solver’s performance. Benzi et al. [147] investigated the use of various orderings to construct

preconditioners for nonsymmetric linear problems, most notable of which was RCM. They used these

preconditioners for GMRES, Bi-CGSTAB and transpose-free QMR. Other orderings they considered

included CM and multiple minimum degree. ILU and variations were considered as preconditioners

for the iterative methods. They also determined that ordering methods originally intended for direct


methods perform competitively depending on how symmetric and diagonally dominant the system matrix

is. Pollul and Reusken [148] investigated orderings for preconditioners for an NK algorithm for the Euler

equations. Chisholm and Zingg [149] discussed the importance of root-node selection and tie-breaking

strategy for RCM for an NK algorithm for the compressible Navier–Stokes equations.

In the past decade, novel ordering methods have been investigated. For example, Bondarabady and

Kaveh [150] used a genetic algorithm to find an ordering that optimizes various graph properties. In

their survey paper, Duff and Ucar [129] discussed another family of preconditioners based on a support

graph. In this approach, combinatorics was used to determine the best reordering of vertices to generate

a matrix splitting.

1.2.2 Parallel Preconditioning

Parallel preconditioning has emerged as an important aspect of linear solvers. Numerous packages exist

that use parallel preconditioners and have been used widely in the applied mathematics community.

Of particular interest here are the parallel implementations of ILU preconditioners. Another branch of

inherently parallel preconditioners called sparse approximate inverses (SPAIs) gained popularity in the

1980s. In this section parallel ILU and SPAI preconditioners are discussed.

Originally, the parallelism of ILU was not obvious. It took investigations into the incomplete Cholesky

factorization (with a fill-in of zero) to change that opinion [46]. An example of this was the research

conducted by Dubois et al. [151]. They created a parallel preconditioner based on an approximation

to the inverse of a matrix. Two popular approaches to parallelizing ILU are through colouring (i.e.

ordering) and domain decomposition.

Various ordering and colouring techniques have been explored to parallelize ILU. George [135] in-

troduced the nested dissection ordering which is amenable to parallel applications. In the early 1980s,

van der Vorst [152] introduced his ordering, and it proved to be very scalable for the ICCG method. He

then focused on the parallelization of the forward and backward solves using a level scheduling or wave-

front approach [153]. Anderson and Saad [154] concurrently investigated a similar approach and found

good scalability of their algorithm. Elman and Golub [155] introduced their famous red/black ordering

in the early 1990s. Since then, more intricate colouring algorithms have been explored. For example,

Adams et al. [156] investigated a four-colour ordering strategy. More recently, Hysom and Pothem have

investigated parallel application of ILU with zero fill in [157] and fill in greater than zero [158]. They

concluded that the algorithm is very scalable; however for the latter it is difficult to deal with fill in

between subdomains.

The concept of domain decomposition first originated in the work by Schwarz [159] from as early as

the 1870s. He used it to prove the existence of the solution of the Dirichlet problem on irregular domains.

Many decades later, Miller [160] revisited the idea and applied it to solving systems of equations. There

are some variations in the way the restructured system is solved and yield the classification of Schwarz,

full matrix, and Schur complement methods. Further details can be found in the review given by


Saad [38]. A basic mathematical introduction into domain decomposition is provided in Appendix A.1.

Domain decomposition methods have been a popular component of solvers for practical applications,

especially in the past 20 years. Saad and van der Vorst [9] provide an extensive review, whereas here only

some examples are provided. Mandel [161] investigated domain decomposition preconditioning for finite

element applications. He also found some qualitative analogies to multigrid. Knoll et al. [162] compared

a domain-based Schwarz preconditioner to ILU for compressible combustion problems and found it to

be superior. Fischer et al. [163] investigated the use of a Schwarz preconditioner using overlapping

pressure subdomains for the incompressible Navier–Stokes equations. Saad et al. [164] implemented a

parallel ILU preconditioning method using the Schur complement. They found that the recursive Schur

preconditioner is difficult to parallelize. Gropp et al. [165] investigated the use of domain decomposition

preconditioners for various parallel applications. Their work is an example of a Newton–Krylov–Schwarz

(NKS) method. More recently, Hicken and Zingg [166] investigated additive Schwarz and approximate

Schur preconditioners and ILU for the 3D simulation of inviscid aerodynamic flows.

The standard ILU preconditioner looks to approximate the inverse of the system matrix by approxi-

mating the system matrix and then inverting it via the LU decomposition forward and back substitutions.

Sparse approximate inverse (SPAI) preconditioning is attractive because it looks to find an approxima-

tion to the inverse of the system’s matrix directly, rather than indirectly. Appendix A.2 outlines this form

of preconditioning in more detail. Benzi [46] provides an excellent discussion on SPAI preconditioners.

SPAI preconditioning traces back to work by Benson [167] and later Benson and Frederickson [168]

in the early 1980s. In its earliest formulation, the following minimization is executed:

minM∈S

||AM− I|| (1.1)

whereA is the system matrix,M is the approximate inverse preconditioner, and S is a predefined sparsity

pattern that limits the growth of fill of M. Various norms and approximations lead to a a family of

methods called non-factored SPAIs. Additional breadth in the topic is created by the choice of S. The

minimization problem can be decoupled and can conceptually be thought of as parallel. Factored-form

SPAI preconditioners are based on incomplete conjugation of the unit basis vectors are referred to as

AINV (approximate inverse) preconditioners. The original contribution for these types of preconditioners

is by Benzi et al. [169]. The approach looks to approximate the generalized Gram-Schmidt process in

forming A = LDU . The AINV approach is sensitive to ordering, whereas its non-factored counterpart

is not [46].

Kolotilina and Yeremin [170,171] studied the use of approximate inverses as parallel preconditioners

for elliptic boundary value problems using the finite element method. Additional early research was

conducted by Grote and Simon [172], Cosgrove and Fowler [173], Cosgrove et al. [174], Chow and

Saad [175], and Huckle and Grote [176, 177]. Benzi et al. [169] investigated the effectiveness of an

approximate inverse preconditioner for the conjugate gradient method. Chow and Saad [178] and later

Barnard and Grote [179] implemented a block version of SPAIs. In a later study they found their SPAI

preconditioner was slower but more robust than ILU [180]. Sosonkina [181] implemented an approximate


LU technique SPAI preconditioner. Huckle [182] studied the effect of sparsity pattern restriction on the

approximate inverse for positive definite matrices.

SPAI preconditioners have been used for a variety of applications, such as electromagnetics [183,184].

Furthermore, SPAI preconditioners have been connected to other preconditioning methods or used as

components in other methods. Guillaume et al. [185] used a rational approximation preconditioner which

parallels the idea of sparse approximate inverses. Chow [186] connected SPAIs to Schur complements.

Tang and Wan [187] and Broker et al. [188] used a SPAI smoother for multigrid. The latter found that a

SPAI is an attractive alternative to Gauss-Seidel. Bollhoefer and Saad [189] demonstrated a connection

between SPAI and ILU for certain matrices.

SPAI preconditioning, as well as many other preconditioning methods discussed in this dissertation

have blended into hybrids and the distinction between them is increasingly fading. Recent work by

Huckle et al. [190] is a good example. They improved the classical SPAI approach by using elements

from the Schur complement method, as well as other domain decomposition methods. The resulting

preconditioner was used in image deblurring applications.

1.2.3 Multilevel Preconditioning

A multilevel method seeks to solve a problem by coarsening or partitioning it and hence solving it on a

smaller domain. Multilevel preconditioning has been applied to ILU and SPAIs. Multilevel ILU can be

thought of as an ordering. The multigrid method is a multilevel method by definition. In this section

the history of these methods is reviewed.

Multilevel ILU first traces back to work by Axelsson and Vassilevski [191, 192]. They looked at a

multilevel ILU preconditioner for finite element applications. Less than a decade later, a famous paper

by van der Pleog, Botta and Wubs [193] was published. Their so-called nested grids ILU (NGILU)

method is based on the repeated use of red-black ordering. Botta and Wubs [194] also implemented

another technique called matrix renumbering ILU (MRILU) which reorders the matrix based on the size

of its elements. Saad [195] demonstrates in his paper on ILUM that multilevel ILU can be viewed as an

ordering. Furthermore, Vassilevski [196] found that ILU and AMG can be thought of as approximate

Schur complement methods. Bank and Wagner [197] drew a close comparison between multilevel ILU

and the classic multigrid algorithm for simple elliptic problems. The work by Saad and Zhang [198,199] is

somewhat related to the block multilevel ILU preconditioner developed by Botta and Wubs. They tested

these preconditioners on finite element applications. In the latter reference, they considered ILUT as the

base preconditioner and used a Schur complement approach. Later, Saad et al. [200–202] developed an

algebraic recursive multilevel solver (ARMS) and tested it on various CFD applications. They considered

various ordering strategies within their solver. Shen and Zhang [203] found their block multilevel ILU

preconditioner to be superior to a parallel two-level Schur preconditioner for some convection-diffusion

and Navier–Stokes matrices. Shen et al. [204] later improved on this preconditioner by increasing its

parallelism.

1.3 THE NEWTON–KRYLOV METHOD 13

Recent work continues to improve multilevel ILU preconditioning and incorporates other precon-

ditioners. Gu et al. [205] used sparse approximate inverses in their multilevel ILU preconditioner.

Saad [206] improved his ILUM preconditioner by adding diagonal dominance as a criterion in the con-

struction of the multilevel ordering. Mayer [207] recently developed a dual-pivoting strategy including

a novel dropping strategy for a multilevel ILU preconditioner. This approach was found to have compa-

rable performance to preconditioning software developed by Bollhoefer and Saad [208] and Saad [201].

Sparse approximate inverses are inherently parallel. The incorporation of multilevel approaches, how-

ever is a more recent phenomenon. Examples of multilevel sparse approximate inverse implementations

include [209,210] for the SPAI method, and [211] for the AINV method.

Multigrid preconditioning has been used to accelerate standalone linear solvers and solvers that exist

within a larger nonlinear solution algorithm. Multigrid preconditioning for the Newton–Krylov method

is deferred to Section 1.3.1.

Axelsson and Vassilevski [212] provided a mathematical derivation for two-level methods and applied

their method to symmetric and nonsymmetric problems. Oosterlee and Washio [213] developed a parallel

multigrid preconditioner for problems exhibiting multiple scales. Braess [214] investigated AMG as a

preconditioner for linear systems with positive definite matrices. Hager and Lee [215] showed that

GMRES preconditioned with ILU and MG is efficient in solving the Euler equations. Oliveira and

Deng [216] compared variations of MG preconditioning for GMRES and CGS to ILU(0) preconditioning

for solving transport equations. Oosterlee and Washio [213,217] [218] [219] investigated the use of AMG

as a preconditioner in parallel computing applications which are solved using domain decomposition

methods. Wienands et al. [220] provided a Fourier analysis for GMRES preconditioned by multigrid

with Gauss–Seidel as its smoother. The model problems they explored were diffusive in nature including

the Poisson equation in 3D and a case with mixed derivatives. Tuminaro et al. [221] implemented a

two-level Schwarz preconditioner for Navier–Stokes calculations on unstructured meshes. More recently,

Wang and Joshi [222] developed an agglomeration-type AMG preconditioner for a finite-volume Krylov

solver for 3D incompressible viscous channel and cavity flows. Pennachio [223] used AMG preconditioning

to accelerate solutions for the reaction-diffusion equation.

1.3 The Newton–Krylov Method

The Newton–Krylov (NK) method is a popular choice for solving the discretized compressible Navier–

Stokes equations. It dates back to the mid 1980s. Pueyo [5] provides an excellent description of the

early history of Newton–Krylov methods. Wigton et al. [224] is thought to have first implemented the

NK method. They modeled inviscid 2D flows.

Later, Venkatakrishnan [225] used the method to solve inviscid and viscous aerodynamic problems.

Dutto [140] and Johan et al. [226] applied the method to inviscid and laminar viscous flows. Ajmani et

al. [227,228] used the NK method to model hypersonic flow around a cylinder and transonic flow through


a turbine. Venkatakrishnan and Mavriplis [229] used ILU preconditioning in their Newton-GMRES

algorithm to solve the discretized steady Navier–Stokes equations. Orkwis [128] compared Newton,

quasi-Newton, and inexact-Newton methods coupled with CGS to solve the Navier–Stokes equations.

McHugh and Knoll [230] compared matrix-free and matrix-present approaches in the treatment of the

system Jacobian. Barth and Linton [231] used the NK method to solve turbulent cases in 3D with the

aid of grid sequencing. Nielsen et al. [232] solved the Euler equations on 3D unstructured domains.

Cai et al. used a Newton–Krylov–Schwarz (NKS) method for inviscid calculations. Anderson [233, 234]

compared the NK method to multigrid for inviscid turbulent calculations and found multigrid to be faster.

Dawson [235] modeled multiphase flows using the NK method. Wille [236] investigated a globalization

strategy based on mesh sequencing for his ILU-preconditioned NK solver. Blanco and Zingg [237] used a

block ILU preconditioner based on a low-order approximation to the system Jacobian for their Newton-

Krylov method on unstructured grids. The preconditioner was reordered using RCM. Similarly, Pueyo

and Zingg [5, 8, 238] used a block-fill ILU preconditioner on a low-order Jacobian for their NK method

on structured grids, resulting in a very fast algorithm. They used approximate factorization to assist in

the globalization. Geuzaine [6,239,240] used the NK method on unstructured grids to solve the Navier–

Stokes equations coupled with the Spalart-Allmaras turbulence model. Gropp et al. [241] investigated

a globalization strategy for their NKS solver. Chisholm and Zingg [4, 149, 242–245] implemented an

effective NK solver and resolved many issues pertaining to the globalization of the method, especially

in terms of the Spalart–Allmaras turbulence model. Nemec and Zingg [2, 246–249, 249–252] extended

the work of Pueyo and developed an optimization framework for 2D turbulent airfoil design. Their

Krylov solver used for Newton’s method was also applied to the gradient evaluation problem. Gatsis

and Zingg [253, 254] created a novel, fully-coupled algorithm for aerodynamic shape optimization in

which the Navier–Stokes equations, adjoint equations and optimality conditions (i.e. Karush–Kuhn–

Tucker conditions) were solved using a Newton–GMRES algorithm. The flow system was solved only

once in their algorithm. Olawsky [255] applied the NK method to supersonic and hypersonic flows.

Harrison [256] used the method in modeling chemistry. Vanderchkove [257] implemented the NK method

for other physics applications. Nichols [258,259] extended the NK method to a structured solver for the

3D Euler equations. Groth and Northrup [260] implemented an NKS algorithm for 2D, steady Euler

calculations. They used a block, parallel additive Schwarz preconditioner with ILU on local blocks.

Bellavia and Berrone [261] improved the globalization for the NK method for Navier–Stokes calculations

using a finite element discretization. Nejat, Michalak, and Ollivier–Gooch [262–266] implemented an

NK method for an Euler solver that uses a high-order spatial discretization. Hicken and Zingg [267]

improved on the work of Nichols and developed a optimization framework and parallel solver for the

3D Euler equations. Northrup and Groth [268] extended their parallel, adaptive mesh refinement NK

algorithm to 3D, large-eddy simulation (LES) for reactive flows. Osusky and Zingg [269, 270] extended

Hicken’s algorithm to laminar and turbulent viscous calculations in 3D. Lucas, van Zuijlen, and Bijl

recently investigated the use of a Jacobian-free NK algorithm for unsteady flows [271]. They found their


algorithm to be superior to nonlinear multigrid, especially for more difficult cases.

1.3.1 Multigrid Preconditioning

Multigrid preconditioning is of particular interest in this research. An early record of multigrid precon-

ditioning for a Newton–Krylov method is Brieger and Lecca [272]. They used a parallel implementation

of multigrid to solve a subsurface hydrology problem. Piquet and Vasseur [273] used a simple multigrid

preconditioner to solve the 3D incompressible Navier–Stokes equations.

In 1999, Geuzaine et al. [239] compared their finite-volume nonlinear multigrid solver to a newly-

developed NK solver with ILU(0)-smoothed multigrid preconditioning. The incomplete factorization

was on a low-order discretization of the system Jacobian. They found the two solvers to be competitive

for Euler and Navier–Stokes calculations. Concurrently, Knoll et al. used a multigrid preconditioner

for their NK solver for the radiation-diffusion equation. They found that their NK-MG algorithm is

superior to NK-ILU and nonlinear multigrid. Knoll et al. [274,275] extended the solver to multimaterial

equilibrium radiation diffusion.

Knoll and Mousseau [276] implemented a NK solver with AMG preconditioning for a finite-volume

incompressible Navier–Stokes solver. They used a block-SGS smoother. Knoll and Rider [277] used a

geometric multigrid preconditioner for their incompressible Navier–Stokes flow solver. They also solved

Burgers equation in 1D and 2D. Damped Jacobi and SGS were used as smoothers, with a low-order

discretization of the convective terms in the smoother was found to be effective.

In the past decade, there has been continued research into multigrid-preconditioned Newton–Krylov

methods. Jones and Woodward [278] used a red-black Gauss–Seidel smoother for the NK-MG algorithm

to solve Richard’s equation for saturated flow. Pernice and Tocci [279] solved the incompressible Navier–

Stokes equations using the pressure-correction method as a smoother. Mavriplis [68] continued his

earlier research by looking at line Jacobi and Gauss–Seidel smoothers. He solved the radiation diffusion

equation as well as the Navier–Stokes equations. Wu et al. [280] used a NK method with multigrid

preconditioning, smoothed by a tridiagonal matrix, for battery simulation problems. Syamsudhuha and

Silvester [281] solved the Navier–Stokes equations using a NK-MG method. Elman et al. [282] solved

the incompressible Navier–Stokes equations. In 2004, Knoll and Keyes [283] published a review paper

on Jacobian–free Newton–Krylov methods including the use of multigrid as a preconditioner.

The discontinuous-Galerkin (DG) finite-element formulation has gained popularity in recent years for

solving the Navier–Stokes equations. Persson and Peraire [145] and Diosady and Darmofal [284] looked

to improve their DG algorithm by using a Newton–Krylov method. Persson and Periare compared block-

Jabobi, block-Gauss–Seidel and a multilevel preconditioner and found them to not be robust for real

flows. They implemented a multigrid preconditioner that was superior to the aforementioned ones. The

preconditioner consisted of a coarse-scale correction based on a low-order polynomial (p-multigrid) and a

block-ILU(0) post smoother. Ordering was critical to the performance of the method. Specifically, they

used MDF-like ordering. They concluded that the multigrid correction was important for diffusion and

1.4 MOTIVATION AND OBJECTIVES 16

ILU was important for convection; however minimal theoretical explanation was provided. Diosady and

Darmofal explored additional orderings that are better-suited to the unstructured computational space.

Specifically, line orderings (in the streamwise direction) were found to be important. These orderings

were found to improve the effectiveness of their Jacobi and block-ILU(0) smoothers.

1.4 Motivation and Objectives

The scope of this thesis is to investigate preconditioning in the context of a flow solution algorithm.

In particular, the algorithm in this research uses a finite difference formulation on structured grids.

The most general equations that are solved are the discretized, compressible, thin-layer Navier–Stokes

equations with the one-equation Spalart–Allmaras turbulence model. The desired properties of this solver

are fast and reliable simulation of steady flow around wing sections and the prediction of aerodynamic

forces and moments around those shapes. This algorithm also serves as a function evaluation in a

gradient-based optimization framework, in which gradient evaluations require the solution of large linear

systems that are closely-related to the linear systems in the flow solver.

Preconditioning of the linear system is arguably the most important component in the Newton-Krylov

algorithm. For practical aerospace applications, an unpreconditioned linear system will not converge if it

is solved iteratively with GMRES. Furthermore, preconditioning encompasses orderings, that can greatly

impact the solution process.

A Newton–Krylov method is used to solve the nonlinear system of equations. However Newton’s

method requires a continuation approach to safely progress from transient to steady state. The base-

line continuation method is approximate factorization (AF). Although AF is very robust through the

transient phase, using it is undesirable since it is essentially a second flow solution algorithm that must

be maintained. Therefore, a pseudo-transient continuation method was implemented to globalize the

Newton algorithm. The approach followed the work of Chisholm [149].

The baseline preconditioner is the inverse of an incomplete LU factorization of the matrix in the

linear system. For simplicity, it is referred to as an ILU preconditioner. ILU preconditioning has its

drawbacks: it scales poorly with increasing problem size; for practical applications it is not guaranteed

to be stable; and it is sensitive to the ordering of the system matrix. The baseline ordering that is

used is reverse Cuthill–McKee (RCM). RCM is first performed and subsequently the ILU factorization

is determined.

The foremost objective of this research is to investigate preconditioning. First a review of precon-

ditioning methods is conducted. From this review, promising candidate preconditioners are selected

and compared to the baseline preconditioner in order to determine which is the best preconditioning

approach.

The literature review led to a couple of promising approaches. The first is the use of a multigrid

preconditioner. The second is the use of an integrated ordering/factorization strategy, in which the

1.5 ORGANIZATION OF THIS THESIS 17

ordering and the factorization are determined in a coupled manner. The minimum discarded fill (MDF)

ordering is an example of such an approach. Thus, the goal of this thesis is to determine from all

combinations of preconditioning and ordering which is the best approach to precondition the Navier–

Stokes flow solver of interest. There are additional variations in preconditioning and investigations that

are conducted to support the decision process.

The second objective of this thesis is to understand on a more fundamental level the effect of these

preconditioners and orderings. To facilitate this, a PDE solver is developed that solves the steady

convection-diffusion equation. All preconditioning investigations are first conducted using the convection-

diffusion solver, followed by the Navier–Stokes solver.

Superior preconditioning can greatly improve the flow solution process. To summarize, the objective

of this thesis is to identify candidate preconditioners, to determine the best possible preconditioner from

a set of preconditioners (including ILU), to perform a more fundamental investigation of preconditioning

using the discretized convection–diffusion equation, and finally to conduct supporting studies that enrich

the understanding of preconditioning on a more fundamental level.

1.5 Organization of this Thesis

In the next chapter, the governing equations are described in detail. Specifically, the thin-layer, com-

pressible Navier–Stokes equations are described, including the boundary conditions and the curvilinear

coordinate transformation that are used. The Spalart–Allmaras turbulence model PDE and the steady

convection-diffusion equation are also introduced.

In Chapter 3, the spatial discretization used to transform the partial differential equations into a

nonlinear system of ordinary differential equations is presented. This includes the boundary equations

and the turbulence model. The linearization of this system is also described, since Newton’s method is

used to solve this system.

In Chapter 4, the Newton–Krylov algorithm is described. Important aspects such as continuation

of the transient behaviour of the nonlinear variables and a thorough description of the GMRES Krylov

subspace method are presented. Preconditioning is essential to the performance of GMRES. Since

preconditioning is the focus of this research, the subsequent chapter is dedicated entirely to it.

There are several aspects of preconditioning that are described in Chapter 5. These aspects can be

summarized in three categories: incomplete LU preconditioning, orderings, and multigrid precondition-

ing. Preconditioners that are developed are summarized by detailed formulas or algorithms, supported

by theoretical exploration where possible.

Chapter 6 is the results chapter. The reader is encouraged to read Chapter 5 in detail to have a

complete understanding of the results presented in this chapter. Chapter 6 is divided into two distinct

components. First, the results relating to the convection-diffusion equation are presented. Next, the

results for the Navier–Stokes equations are described. For each respective solver, test cases are introduced

1.5 ORGANIZATION OF THIS THESIS 18

and various preconditioners are tested for those cases.

In Chapter 7 conclusions are summarized for both the convection-diffusion and Navier–Stokes equa-

tions. The performance of each preconditioner is assessed. A detailed summary of contributions made

follows these conclusions. Finally, recommendations are made for future research to extend the ideas

presented in this research.

Chapter 2

GOVERNING EQUATIONS

In this chapter the governing equations that are used to model aerodynamic flows are presented. Specif-

ically, these are the compressible, thin-layer Navier–Stokes equations coupled with the Spalart–Allmaras

one-equation turbulence model. A curvilinear coordinate transformation is applied to this nonlinear sys-

tem of partial differential equations to facilitate finite-difference calculations based on curvilinear grids.

Boundary conditions are also described.

The convection-diffusion equation is a simpler, linear partial differential equation that governs similar

physical processes to the Navier–Stokes equations: convection and diffusion. To facilitate some expla-

nations and investigations that are discussed later in this dissertation, this chapter concludes with the

convection-diffusion equation, supplemented by related theory, definitions, and basic insights.

2.1 The Navier–Stokes Equations

Before presenting the governing equations, the dimensional variables, density, ρ, velocities, u and v, and

total energy, e, are scaled using the free-stream density, ρ∞, and sound speed, a∞:

ρ =ρ

ρ∞u =

u

a∞v =

v

a∞e =

e

ρ∞a2∞

(2.1)

19

2.1 THE NAVIER–STOKES EQUATIONS 20

In two dimensions, the compressible Navier–Stokes equations are

∂Q

∂t+∂E

∂x+∂F

∂y=

1

Re

(∂Ev∂x

+∂Fv∂y

)(2.2)

where

Q =

ρ

ρu

ρv

e

(2.3)

and Re is the Reynolds number. The convective flux vectors are

E =

ρu

ρu2 + p

ρuv

u(e+ p)

and F =

ρv

ρvu

ρv2 + p

v(e+ p)

(2.4)

The viscous flux vectors are

Ev =

0

τxx

τxy

ϕ1

and Fv =

0

τxy

τyy

ϕ2

(2.5)

with

τxx = (µ+ µt)(4ux − 2vy)/3

τxy = (µ+ µt)(uy − vx)

τyy = (µ+ µt)(−2ux + 4vy)/3 (2.6)

ϕ1 = uτxx + vτxy + (µPr−1 + µtPr−1t )(γ − 1)−1∂x(a2)

ϕ2 = uτxy + vτyy + (µPr−1 + µtPr−1t )(γ − 1)−1∂y(a2)

where the variables τxx, τxy, and τyy are elements of the symmetric viscous stress tensor. A Newtonian

fluid is assumed. The dynamic viscosity is µ, and its eddy viscosity counterpart for the turbulent

formulation is µt. Pr is the Prandtl number. The ratio of specific heats is γ =cpcv

and for air has a value

of 1.4. The pressure, p, is related to the flow variables by the equation of state for a perfect gas:

p = (γ − 1)

[e− 1

2ρ(u2 + v2)

](2.7)

The speed of sound is

a =

√γp

ρ=√γRT (2.8)


Sutherland’s law is used to relate the dynamic viscosity, µ, to temperature:

µ =a3(1 + S∗/T∞)

a2 + S∗/T∞(2.9)

where T∞ denotes the freestream temperature, which is assumed to be 460.0R and the constant S∗ is

198.6R for air. The non-dimensional, laminar Prandtl number is defined as

Pr ≡ cpµ

κt(2.10)

where κt denotes thermal conductivity. The laminar and turbulent Prandtl numbers are 0.72 and 0.90,

respectively. The Reynolds number is defined as

Re ≡ ρ∞ c a∞µ∞

(2.11)

2.1.1 Generalized Curvilinear Coordinate Transformation

A curvilinear coordinate transformation is used to map the physical grid space onto a uniform computa-

tional domain. This is illustrated in Figure 2.1. A C-topology grid that is used in this work is shown in

Figure 2.2. The generalized transformation involves the introduction of two new directions and a time

parameter given by

τ = t

ξ = ξ(x, y, t) (2.12)

η = η(x, y, t)

The governing equations now operate on the state vector

Q = J−1Q = J−1

ρ

ρu

ρv

e

(2.13)

where the metric Jacobian of the transformation is given by

J−1 = xξyη − xηyξ (2.14)

2.1.2 Thin-Layer Approximation

For attached or mildly separated aerodynamic flows at high Reynolds numbers, the compressible Navier–

Stokes equations can be simplified by using a thin-layer approximation. This is because viscous effects

that occur in the streamwise direction along the body are much smaller when compared to those that

occur normal to the body. The compressible, thin-layer Navier–Stokes equations are

∂Q

∂τ+∂E

∂ξ+∂F

∂η= Re−1 ∂S

∂η(2.15)


x

y

Figure 2.1: Curvilinear coordinate transformation courtesy of Lomax, Pulliam, and Zingg [1].

X

Y

20 10 0 10 20

20

10

0

10

20

(a) Full-grid view

X

Y

0 0.5 1

0.5

0

0.5

(b) Close-up of grid

Figure 2.2: A C-topology grid about a NACA0012 airfoil (units are in chord lengths).

where the convective flux vectors are

E = J−1

ρU

ρUu+ ξxp

ρUv + ξyp

(e+ p)U − ξtp

, F = J−1

ρV

ρV u+ ηxp

ρV v + ηyp

(e+ p)V − ηtp

(2.16)

2.2 THE SPALART–ALLMARAS TURBULENCE MODEL 23

the contravariant velocities are

U = ξt + ξxu+ ξyv (2.17)

V = ηt + ηxu+ ηyv (2.18)

and the viscous flux vector is

S = J−1

0

ηxm1 + ηym2

ηxm2 + ηym3

ηx(um1 + vm3 +m4) + ηy(um2 + vm3 +m5)

(2.19)

with

m1 = (µ+ µt)(4ηxuη − 2ηyvη)/3

m2 = (µ+ µt)(ηyuη + ηxvη)

m3 = (µ+ µt)(−2ηxuη + 4ηyvη)/3 (2.20)

m4 = (µPr−1 + µtPr−1t )(γ − 1)−1ηx∂η(a2)

m5 = (µPr−1 + µtPr−1t )(γ − 1)−1ηy∂η(a2)

2.2 The Spalart–Allmaras Turbulence Model

The dynamic eddy viscosity, µt, in (2.6), accounts for the effects of turbulence. The Spalart–Allmaras

turbulence model [285] is used to determine the value of µt. This one-equation transport model, written

in non-dimensional and non-conservative form, is given by

Dν

Dt=cb1Re

(1− ft2) Sν +1

σRe

(1 + cb2)∇ · [(ν + ν)∇ν]− cb2 (ν + ν)∇2ν

− 1

Re

(cw1fw −

cb1κ2ft2

)( ν

dw

)2

+Reft1∆U2 (2.21)

where ν is the non-dimensional working variable. The kinematic eddy viscosity, νt = µt/ρ, is obtained

from

νt = νfv1 (2.22)

where

fv1 =χ3

χ3 + c3v1

(2.23)

and

χ =ν

ν(2.24)

The production term is given by

S = S Re+ν

κ2d2w

fv2 (2.25)

2.2 THE SPALART–ALLMARAS TURBULENCE MODEL 24

where

S =

∣∣∣∣∂v∂x − ∂u

∂y

∣∣∣∣ (2.26)

is the magnitude of the vorticity, dw is the distance to the closest wall and

fv2 = 1− χ

1 + χfv1(2.27)

The destruction function is given by

fw = g

[1 + c3w3

g6 + c6w3

] 16

(2.28)

where

g = r + cw2(r6 − r) (2.29)

and

r =ν

Sκ2d2w

(2.30)

The functions ft1 and ft2 control transition. For fully-turbulent flow, these functions are zero. For flow

with transition, these functions become

ft1 = ct1gt exp

[−ct2

ω2t

∆U2

(d2 + g2

t d2t

)](2.31)

ft2 = ct3 exp(−ct4χ2

)(2.32)

where dt is the distance to the nearest trip point, ωt is the vorticity at the wall at the trip point,

∆U is the difference between the velocity at the trip point and the field point under consideration,

and gt = min(

0.1, |∆U |ωt∆x

), where ∆x is the spacing along the wall at the trip point. The remaining

parameters are given by

cb1 = 0.1355 cb2 = 0.622 κ = 0.41 σ =2

3

cw1 =cb1κ2

+(1 + cb2)

σ

cw2 = 0.3 cw3 = 2.0 cv1 = 7.1 cv2 = 5.0

ct1 = 5 ct2 = 2 ct3 = 1.2 ct4 = 0.5

Details about these parameters can be found in [286]. Currently, the algorithm presented is for fully-

turbulent flow.

Ashford’s [287] suggested modifications to the Spalart–Allmaras turbulence model are implemented.

The quantity fv2 is redefined as

fv2 =

(1 +

χ

cv2

)−3

(2.33)

and a new quantity, fv3, is created. It is given by

fv3 =(1 + χfv1) (1− fv2)

χ(2.34)

2.3 BOUNDARY CONDITIONS 25

The modified production term, S, is given by

S = SRefv3 +ν

κ2d2w

fv2 (2.35)

2.2.1 Generalized Curvilinear Coordinate Transformation

In the transformation of (2.21), any terms containing mixed derivatives are neglected. Using (2.12), the

Spalart–Allmaras turbulence model becomes

∂ν

∂τ+ U

∂ν

∂ξ+ V

∂ν

∂η=

1

Re

cb1Sν − cw1fw

(ν

dw

)2

+1

σ[(1 + cb2)T1 − cb2T2]

(2.36)

where

T1 = ξx∂

∂ξ

[(ν + ν) ξx

∂ν

∂ξ

]+ ηx

∂

∂η

[(ν + ν) ηx

∂ν

∂η

]+ ξy

∂

∂ξ

[(ν + ν) ξy

∂ν

∂ξ

]+ ηy

∂

∂η

[(ν + ν) ηy

∂ν

∂η

](2.37)

and

T2 = (ν + ν)×[ξx∂

∂ξ

(ξx∂ν

∂ξ

)+ ηx

∂

∂η

(ηx∂ν

∂η

)+ ξy

∂

∂ξ

(ξy∂ν

∂ξ

)+ ηy

∂

∂η

(ηy∂ν

∂η

)](2.38)

2.3 Boundary Conditions

The boundary conditions must be specified for the entire computational domain. Examples of bound-

ary conditions include: inflow, outflow, body, and boundary interface. Inflow and outflow boundary

calculations are performed using Riemann invariants and/or extrapolations.

For inviscid flow, flow tangency is enforced at a solid wall. For viscous flow, a no-slip condition is

required. Consequently, the normal pressure gradient is set to zero. With the assumption of adiabatic

flow, the latter condition enforces a zero normal density gradient.

Finally, for a C-topology mesh wake cut, the interfaces are averaged in the normal direction. Each

conservative variable is averaged except for the energy: pressure is averaged instead.

For lifting bodies a circulation correction is used to minimize the far-field boundary effects. The

details of this correction are described by Pulliam [288].

Numerical implementation of the boundary conditions is described in Section 3.3.

2.4 The Convection-Diffusion Equation

The linear convection-diffusion equation describes the evolution of a scalar quantity φ subject to the

processes of convection and diffusion. If a source term is also considered, then

∂φ

∂t+ ~∇ ·

[~vφ− µ~∇φ

]= f (2.39)

2.4 THE CONVECTION-DIFFUSION EQUATION 26

The divergence operator acts on two terms. The first term describes convection and therefore contains a

velocity vector, ~v. The second term models diffusion. In this formulation, it contains a spatially-varying

diffusion coefficient, µ > 0. Finally, f is a source term.

For fixed µ and ~v, the Peclet number is defined as

Pe =|~v|Lµ

(2.40)

where L is a length scale of the problem. It essentially is a relative measure of convection to diffusion.

For convection-dominated flows, Pe→∞, and for diffusion-dominated flows, Pe→ 0.

2.4.1 The Steady 1D Convection-Diffusion Equation

Consider the steady, one-dimensional convection-diffusion equation with no source term:

udφ

dx− µd2φ

dx2= 0 (2.41)

Furthermore, assume that the velocity and the diffusion coefficient are constant. The solution of (2.41)

on the domain x ∈ [0, 1] with Dirichlet boundary conditions φ(0) = 0 and φ(1) = 1 is

φ(x) =ePe·x − 1

ePe − 1(2.42)

where the Peclet number is Pe = uµ . Figure 2.3 shows this solution for various Peclet numbers. When

the Peclet number is small, the solution is dominated by diffusion. When the Peclet number is large,

there is a convection-dominated solution with a thin boundary-layer-like region of diffusion.

2.4.2 The Steady 2D Convection-Diffusion Equation

The steady convection-diffusion equation (2.39) in 2D Cartesian coordinates is

∂

∂x

(a∂φ

∂x

)+

∂

∂y

(b∂φ

∂y

)+

∂

∂x(cφ) +

∂

∂y(dφ) + eφ = f (2.43)

where the coefficients a, b, c, d, e, and f are functions of x and y. In particular, the terms containing a

and b model the process of diffusion, and the terms containing c and d model the process of convection.

The velocity field, ~v = [c, d], is assumed to be divergence free

∂c

∂x+∂d

∂y= 0 (2.44)

and a = b = µ(x, y). The term eφ is included as a generalization. Dirichlet boundary conditions are

defined in the upstream direction.


0 0.5 10

0.2

0.4

0.6

0.8

1

x

φ(x

)

Pe = 0.01

0 0.5 10

0.2

0.4

0.6

0.8

1

x

φ(x

)

Pe = 1

0 0.5 10

0.2

0.4

0.6

0.8

1

x

φ(x

)

Pe = 10

0 0.5 10

0.2

0.4

0.6

0.8

1

x

φ(x

)

Pe = 100

Figure 2.3: The solution to the 1D convection-diffusion equation for several Peclet numbers.

Coordinate transformation

A curvilinear coordinate transformation is also used here to transform (2.43) from the physical domain

(x,y) to a uniform, computational domain (ξ,η). Using the chain rule, the first derivatives can be written

as

∂

∂x= ξx

∂

∂ξ+ ηx

∂

∂η(2.45)

∂

∂y= ξy

∂

∂ξ+ ηy

∂

∂η(2.46)

The work of Pulliam [288] is followed to obtain the metrics of the transformation:

ξx = Jyη (2.47)

ηx = −Jyξ (2.48)

ξy = −Jxη (2.49)

ηy = Jxξ (2.50)

where J is the metric Jacobian of the transformation defined in (2.14).

The terms in (2.43) are written in the computational domain’s coordinates using (2.47-2.50). For


simplicity, it helps to consider the diffusive terms together and the convective terms together:

∂

∂x

(a∂φ

∂x

)+

∂

∂y

(b∂φ

∂y

)=

∂

∂ξ

(a∂φ

∂ξ

)+

∂

∂ξ

(g∂φ

∂η

)+∂

∂η

(g∂φ

∂ξ

)+

∂

∂η

(b∂φ

∂η

)(2.51)

∂

∂x(cφ) +

∂

∂y(dφ) =

∂

∂ξ(cφ) +

∂

∂η

(dφ)

(2.52)

where

a = ξ2xa+ ξ2

yb (2.53)

b = η2xa+ η2

yb (2.54)

g = ξxηxa+ ξyηyb (2.55)

and

c = ξxc+ ξyd (2.56)

d = ηxc+ ηyd (2.57)

are the contravariant velocities.

Chapter 3

SPATIAL DISCRETIZATION

The thin-layer compressible Navier–Stokes equations are a system of nonlinear partial differential equa-

tions. When discretized in space, they yield a system of nonlinear ordinary differential equations (ODEs).

This chapter deals with the spatial discretization of these equations along with the Spalart–Allmaras

turbulence model. The spatial discretization follows the work of the NASA Ames ARC2D [288] algo-

rithm along with several others, of which most relevant are Nemec [252] and Chisholm [4]. Further

details about the discretization of the Spalart–Allmaras turbulence model are described in [285].

Newton’s method is used to solve the nonlinear system of equations. An approximation to the

Jacobian of the nonlinear system is computed to facilitate the Newton algorithm and the preconditioner

of the subsequent linear system. The linearization of the nonlinear system used to create the Jacobian

follows the work of Nemec [252] and Chisholm [4].

For the convection-diffusion equation a spatial discretization is used for convective and diffusive

derivatives that is analogous to the inviscid and viscous derivatives in the Navier–Stokes equations. The

operators are found in Pulliam’s report [288].

29


3.1 The Navier–Stokes Equations

For inviscid fluxes, a second-order centered-difference operator with second- and fourth-difference scalar

artificial dissipation is used. For the ξ direction,

∂E

∂ξ≈ Ej+1,k − Ej−1,k

2−∇ξAD (3.1)

where

AD = d(2)

j+ 12 ,k

∆ξ Qj,k − d(4)

j+ 12 ,k

∆ξ∇ξ∆ξ Qj,k (3.2)

d(2)

j+ 12 ,k

= 2(ε σ J−1

)j+ 1

2 ,k(3.3)

d(4)

j+ 12 ,k

= max[0, 2κ4

(σ J−1

)j+ 1

2 ,k− d(2)

j+ 12 ,k

](3.4)

σj,k = |U |+ a√ξ2x + ξ2

y (3.5)

εj,k = κ2

[0.5Υ∗j,k + 0.25

(Υ∗j−1,k + Υ∗j+1,k

)](3.6)

Υ∗j,k = max (Υj+1,k,Υj,k,Υj−1,k) (3.7)

Υj,k =|pj+1,k − 2pj,k + pj−1,k||pj+1,k + 2pj,k + pj−1,k|

(3.8)

and ∆ξ and ∇ξ are the first-order forward and backward difference operators. The artificial dissipation

constants are κ2 and κ4. The coefficient κ4 is usually much smaller than κ2. For example, they can have

values of 0.01 and 1.0 respectively. The spectral radius of the flux Jacobian matrix is given by σ. The

pressure switch, Υ(j, k) is used to control the use of first-order dissipation in the presence of shock waves.

Values at half-nodes are averages along the direction of the required derivative. The dissipation stencil

requires two points at both sides of the interior grid node quantity that is being differenced. Therefore,

modifications must be made to this stencil at the first and last interior nodes. The modifications to the

stencil can be found in [252,288].

The above process is repeated for the inviscid flux derivative in the η direction, however (3.7) is not

used.

The viscous terms in the thin-layer compressible Navier–Stokes equations resemble

∂η (αj,k ∂ηβj,k) (3.9)

and are discretized using a compact, three-point stencil

∇η (αj,k+ 12

∆ηβj,k) = αj,k+ 12(βj,k+1 − βj,k)− αj,k− 1

2(βj,k − βj,k−1) (3.10)

3.2 The Spalart–Allmaras Turbulence Model

The Spalart–Allmaras turbulence model is presented in the steady-state form

J−1 [M(ν)− P (ν) +D(ν)−N(ν)] = 0 (3.11)


where J−1 is the metric Jacobian in (2.14). The terms M(ν), P (ν), D(ν), and N(ν) are the convective,

production, destruction, and diffusive terms respectively. Without considering transition, the terms are

listed as follows:

M(ν) = U∂ν

∂ξ+ V

∂ν

∂η(3.12)

P (ν) =cb1Re

Sν (3.13)

D(ν) =cw1fwRe

(ν

dw

)2

(3.14)

N(ν) =1

σRe[(1 + cb2)T1 − cb2T2] (3.15)

The production and destruction terms are source terms and therefore do not require differencing. The

convective term is differentiated using a first-order upwind difference. For example, in the ξ direction

M(ν)j,k =1

2(Uj,k + |Uj,k|)(νj,k − νj−1,k) +

1

2(Uj,k − |Uj,k|)(νj+1,k − νj,k) (3.16)

A similar term is formed for the η direction. The diffusive term is differentiated using (3.10), since it

resembles the viscous terms. Finally, the vorticity is approximately computed using centered differences

S ≈ 1

2

∣∣∣∣∣(vj+1,k − vj−1,k)(ξx)j,k + (vj,k+1 − vj,k−1)(ηx)j,k

− (uj+1,k − uj−1,k)(ξy)j,k − (uj,k+1 − uj,k−1)(ηy)j,k

∣∣∣∣∣ (3.17)

3.3 Boundary Conditions

The normal and tangential velocities are required when computing boundary conditions. The normal

velocity is perpendicular to the boundary. The normal and tangential directions are increase along the

respective ξ and η directions at each boundary individually. Figure 3.1 illustrates these directions.

According to this convention, the normal and tangential velocity components are expressed differently

for the various boundaries. At k = 1 and k = kmax, we have:

Vn =ηxu+ ηyv√η2x + η2

y

= ηxu+ ηyv (3.18)

Vt =ηyu− ηxv√η2x + η2

y

= ηyu− ηxv (3.19)

At j = 1 and j = jmax, we have:

Vn =ξxu+ ξyv√ξ2x + ξ2

y

= ξxu+ ξyv (3.20)

Vt =−ξyu+ ξxv√

ξ2x + ξ2

y

= −ξyu+ ξxv (3.21)


η

ξ

ξ

η

t

n

y

x

n tn

t

t

n

OUTFLOW

FAR−FIELD

WAKE−CUT

Figure 3.1: Normal and tangential directions at the boundaries.

3.3.1 Airfoil Body

At an airfoil body, the boundary condition has two possibilities depending on whether the flow is inviscid

or viscous. For inviscid flow, flow tangency is enforced. Therefore, the normal velocity component is

zero. The tangential velocity component and pressure are extrapolated from the interior. The stagnation

enthalpy is set to the freestream value.

For viscous flow, a no-slip condition is required. Hence, u = 0 and v = 0. Subsequently, both

components of momentum at the body are zero. The normal pressure gradient component is set to zero.

Furthermore, the flow is assumed to be adiabatic and act as a perfect gas. This results in a flow that

has a zero normal density gradient component [289, 290]. For the Spalart–Allmaras turbulence model,

the turbulent state variable, ν, is set to zero. The equations are summarized as:

ρj,1 − ρj,2 = 0 (3.22)

(ρu)j,1 = 0 (3.23)

(ρv)j,1 = 0 (3.24)

pj,1 − pj,2 = 0 (3.25)

νj,1 = 0 (3.26)

3.3.2 Inflow and Outflow Boundaries

At the far-field boundary, extrapolations are performed using Riemann invariants. Depending on whether

the flow is subsonic or supersonic, certain extrapolations are performed from the interior. A complete

3.4 THE JACOBIAN OF THE NONLINEAR SYSTEM 33

discussion can be found in the work of Nemec [252] or Pueyo [5].

For subsonic inflow, (Vn −

2a

γ − 1

)j,kmax

−(Vn −

2a

γ − 1

)∞

= 0 (3.27)(Vn +

2a

γ − 1

)j,kmax

−(Vn +

2a

γ − 1

)j,kmax−1

= 0 (3.28)(ργ

p

)j,kmax

− S∞ = 0 (3.29)

(Vt)j,kmax− (Vt)∞ = 0 (3.30)

νj,kmax− ν∞ = 0 (3.31)

where ν∞=0.001. For subsonic outflow,(Vn −

2a

γ − 1

)j,kmax

−(Vn −

2a

γ − 1

)∞

= 0 (3.32)(Vn +

2a

γ − 1

)j,kmax

−(Vn +

2a

γ − 1

)j,kmax−1

= 0 (3.33)(ργ

p

)j,kmax

−(ργ

p

)j,kmax−1

= 0 (3.34)

(Vt)j,kmax− (Vt)j,kmax−1

= 0 (3.35)

νj,kmax− νj,kmax−1

= 0 (3.36)

For viscous outflow, the a zeroth-order extrapolation is used

ρ1,k − ρ2,k = 0 (3.37)

(ρu)1,k − (ρu)2,k = 0 (3.38)

(ρv)1,k − (ρv)2,k = 0 (3.39)

p1,k − p2,k = 0 (3.40)

ν1,k − ν2,k = 0 (3.41)

3.3.3 Wakecut Interface

For a C-topology grid, the conservative flow variables and the turbulent working variable are averaged

across the wakecut using

Qj,1 −1

2(Qj,2 +Qjmax−j+1,2) = 0 (3.42)

3.4 The Jacobian of the Nonlinear System

Newton’s method normally requires the formation of the Jacobian of the nonlinear system of equations

that arises after the spatial discretization. When coupled with a Krylov subspace iterative method

3.4 THE JACOBIAN OF THE NONLINEAR SYSTEM 34

that solves the resulting linear system of equations, the Jacobian does not have to be explicitly formed.

However, an approximation to the Jacobian is needed in the continuation of the Newton algorithm

(especially for turbulent flows) and to construct a preconditioner.

Nemec [252] uses a second-order and a first-order approximation to the Jacobian. The former is used

in his implementation of a discrete-adjoint optimization algorithm. The incomplete factorization of the

latter is used in the construction of the preconditioner of the linear system. The approach follows the

novel work by Pueyo and Zingg [238].

For this research, the second-order Jacobian is used mainly as an analysis tool. For example, its eigen-

values are studied along with the eigenvalues of related iteration matrices. It has a nine-point stencil.

The first-order Jacobian is used to construct the baseline incomplete LU-factorization preconditioner of

the linear system in Newton’s method. It is obtained by collapsing the nine-point, fourth-difference dissi-

pation stencil onto the five-point stencil that contains the second-difference dissipation. The relationship

is given by

εl2 = εr2 + σεr4 (3.43)

where σ is a parameter. The method is described in detail by Pueyo and Zingg [238].

The Jacobian has components relating to the interior nodes and the boundary nodes. Body, far-

field, outflow and wake-cut boundaries must be treated individually. Furthermore, the Spalart-Allmaras

turbulence model is also linearized.

For the remainder of this dissertation, the first- and second-order Jacobians will be

referred to as A1 and A2, respectively. Figure 3.2 shows example A1 and A2 Jacobians arising

from a very coarse grid. The sparsity patterns depend on the ordering of the grid nodes. In these plots

a natural ordering is used where the nodes are ordered along the η-direction first and then along the

ξ-direction of the computational grid. For this ordering, there are pronounced entries that correspond

to the wake cut. The bandwidth is extremely large for this particular type of ordering. Furthermore,

there is a block-pentadiagonal component for A2 compared to a block-tridiagonal component for A1 at

the main diagonal. The former is smaller due to the collapsed stencil (3.43). Ordering plays a central

part in this research and its discussion is deferred to a later chapter.

The block entries in the Jacobian for a given node contain entries relating to the mean-flow (i.e.

mass, momentum, and energy) and the turbulence model equations. Zero values on the diagonals of the

diagonal block entries of Jacobian are possible due to the linearization of the boundary conditions for the

mean-flow equations. For precautionary reasons (relating later to the formation of the preconditioner),

the rows of the Jacobian (for a given node) are exchanged in such a manner that the diagonal entry is

nonzero. It can be shown that this is always possible for the linearization used in this research. For

further details, see Pueyo [5].


(a) A1 Jacobian (b) A2 Jacobian

Figure 3.2: Sparsity pattern of sample A1 and A2 Jacobians using a natural ordering.

3.5 The Convection-Diffusion Equation

3.5.1 The Grid Peclet Number

In the convection-diffusion equation (2.39), the spatial derivatives are first and second order. The

former correspond to the convective terms in the equation. The stability of the numerical discretization

depends greatly on how these convective terms are modeled. As the Peclet number increases, the relative

importance of convection increases.

A more effective quantity that is used to define a stability limit on the discretization is the grid Peclet

number. Consider the 1D convection-diffusion equation. The grid Peclet number [291] is defined as

Peh =|~v|hµ

(3.44)

where h is the grid spacing.

Second-order centered differences require Peh < 2 for the entire computational domain to avoid

oscillations. Both second-order centered differences with added scalar artificial dissipation and first-

order upwinding provide results with fewer oscillations for Peh > 2 than second-order centered differences

alone.

Figure 3.3 shows the solution to the 1D convection-diffusion equation using second-order centered

differences, first-order upwinding, and second-order centered differences with scalar artificial dissipation.

The second-difference artificial dissipation model is defined in the next subsection and its coefficient is set

to 0.5 for this example. Figures 3.3(a)-3.3(c) show solutions of the 1D convection-diffusion equation for

various grid Peclet numbers. Specifically, Figure 3.3(d) illustrates that second-order centered differences

with artificial dissipation can be used to extend the stability of second-order centered-differences beyond

a grid Peclet number of 2. For this particular dissipation model (and dissipation coefficient) that is


0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

φ(x

)

Exact

Conv. Cen.

Conv. Upw.

Conv. Cen. w/ Diss. ε =0.5

(a) Peh = 0.1 (Pe = 20)

0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

φ(x

)

Exact

Conv. Cen.

Conv. Upw.


(b) Peh = 0.5 (Pe = 100)

0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

φ(x

)

Exact

Conv. Cen.

Conv. Upw.


(c) Peh = 1 (Pe = 200)

0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

x

φ(x

)

Exact

Conv. Cen.

Conv. Upw.


(d) Peh = 2.5 (Pe = 500)

Figure 3.3: Close-up view of the numerical solution to the 1D convection-diffusion equation for various

grid Peclet numbers on a 101-node computational grid.

added to the second-order centered-differences discretization, the results are consistent with first-order

upwinding. Table 3.1 shows the solution errors for the cases presented.

3.5.2 The 2D Convection-Diffusion Equation

The computational domain (ξ, η) is uniform with a grid-spacing of unity (∆ξ = ∆η = 1). Finite

differences are used to model each of the derivatives. For the diffusion terms, a compact second-order

centered-difference scheme is used. For the convection terms, a second-order centered-difference scheme

with artificial dissipation is used.

The following discretization is used to approximate each of the terms in (2.43). The differencing


Table 3.1: Errors for various Peclet numbers for various discretizations of the 1D convection-diffusion

equation on a uniform 101-node computational grid.

Error (||φ− φexact||)Pe Peh Centered Differences Centered Differences w/ ε = 0.5

20 0.1 0.0037 0.1059

100 0.5 0.0446 0.1976

200 1.0 0.1366 0.2217

500 2.5 0.4804 0.1624

stencils that are used for the diffusive terms are:

∂

∂ξ

(a∂φ

∂ξ

)≈ aj+ 1

2 ,k(φj+1,k − φj,k)− aj− 1

2 ,k(φj,k − φj−1,k) (3.45)

∂

∂ξ

(b∂φ

∂ξ

)≈ bj,k+ 1

2(φj,k+1 − φj,k)− bj,k− 1

2(φj,k − φj,k−1) (3.46)

∂

∂ξ

(g∂φ

∂η

)≈ 1

4[gj+1,kφj+1,k+1 − gj+1,kφj+1,k−1 − gj−1,kφj−1,k+1 + gj−1,kφj−1,k−1] (3.47)

∂

∂η

(g∂φ

∂ξ

)≈ 1

4[gj,k+1φj+1,k+1 − gj,k+1φj−1,k+1 − gj,k−1φj+1,k−1 + gj,k−1φj−1,k−1] (3.48)

where

aj+ 12 ,k

=aj,k + aj+1,k

2(3.49)

aj− 12 ,k

=aj,k + aj−1,k

2(3.50)

bj,k+ 12

=aj,k + aj,k+1

2(3.51)

bj,k− 12

=aj,k + aj,k−1

2(3.52)

For convection, a second-order central-differencing stencil with artificial dissipation is used

∂

∂ξ(cφ) ≈

[(cφ)j+1,k − (cφ)j−1,k

2

]−Dξ (3.53)

∂

∂η

(dφ)≈

[(dφ)j,k+1 − (dφ)j,k−1

2

]−Dη (3.54)

where

Dξ = εcj,k|cj,k|

[(cφ)j+1,k − 2(cφ)j,k + (cφ)j−1,k] (3.55)

Dη = εdj,k∣∣dj,k∣∣ [

(dφ)j,k+1 − 2(dφ)j,k + (dφ)j,k−1

](3.56)


and ε is the artificial dissipation coefficient. For a uniform square mesh with a constant velocity and

diffusion field, if ε = 0.5, then second-order centered differences with artificial dissipation produces the

same operator as first-order upwinding.

For boundaries that are not prescribed by a Dirichlet condition, the difference stencils for the various

derivatives are adjusted accordingly. The convection derivatives are modeled by first-order upwinding,

and the diffusion derivatives are adjusted in locations where the use of a half node is not necessary.

3.5.3 The Jacobian of the Discretized Equations

The use of an incomplete factorization of the Jacobian matrix as a preconditioner is an important area

of interest in this research. The sparsity pattern of the Jacobian of the discretized convection-diffusion

equation is similar to the Jacobian shown in Figure 3.2(b) (where block entries are replaced by scalars

and without the wake-cut interface). Since the matrix is banded and not triangular, any factorization

will introduce fill-in outside of the sparsity pattern of the original matrix. Hence, the structure of the

Jacobian is ideal for the study of incomplete factorizations. In contrast, if an upwinding discretization is

used, the resulting Jacobian is triangular and its factorization has the same triangular sparsity pattern.

Incomplete factorizations of the Jacobian are discussed in great detail in Chapter 5.

Chapter 4

SOLUTION ALGORITHM

The spatial discretization results in a nonlinear system of ordinary differential equations in time. Since

steady-state calculations are of interest it would initially make sense to discard the time derivatives

and solve the nonlinear system of algebraic equations. Newton’s method is used to solve the resulting

nonlinear system of equations. However, a pseudo-transient continuation method (e.g. implicit Euler) is

applied to ensure a robust overall algorithm. In this work, each iteration of Newton’s method is referred

to as an outer iteration.

At each Newton iteration, a large and sparse linear system of equations must be solved. The gener-

alized minimum residual (GMRES) Krylov subspace method is used to solve this system. Each iteration

of GMRES is referred to as an inner iteration. The conditioning of this system is poor. Precondition-

ing is used to improve the conditioning of the linear system and hence the performance of GMRES.

Preconditioning is discussed in detail in the next chapter.

39

4.1 SOLVING THE NONLINEAR SYSTEM: NEWTON’S METHOD 40

4.1 Solving the Nonlinear System: Newton’s Method

The discretized, steady, compressible Navier–Stokes equations is a system of nonlinear equations that

can be written in the form

R(Q) = 0 (4.1)

where R is a residual vector and Q defines the state, or flow, variables.

Newton’s method is used to solve (4.1) for Q. At the future iteration k + 1

Rk+1 ≡ R(Qk+1) = 0 (4.2)

is desired. Using the first-order linearization

Rk+1 ≈ Rk +

(∂R

∂Q

)k

∆Qk = 0 (4.3)

a Newton iteration consists of solving (∂R

∂Q

)k

∆Qk = −Rk (4.4)

for ∆Qk and updating the current state variables, Qk, with

Qk+1 = Qk + ∆Qk (4.5)

4.2 Newton Globalization: Pseudo-Transient Continuation

The reference Newton–Krylov code [252] that is used for this research is dependent on an approximately-

factored [288] continuation algorithm. This essentially means that any necessary modifications to the

formulation must be done to two distinct algorithms. Hence, an effective continuation technique for

Newton’s method eliminates the need to maintain two algorithms.

The continuation procedure follows the work of Hicken and Zingg [292]. It is divided into two phases.

For the first kstart Newton iterations, a reference time step is calculated using the power-law function

∆tkref = abk (4.6)

where a and b are parameters. Before introducing the second phase, the ratio of the norms of the

nonlinear residuals at Newton iterations k and 1 is defined as

Rkv ≡||Rk||2||R1||2

(4.7)

The reference time step for the second phase is given by

∆tkref = max(α(Rkv)−β

,∆tk−1ref

)(4.8)

4.2 NEWTON GLOBALIZATION: PSEUDO-TRANSIENT CONTINUATION 41

where α = abkstart(Rkstartv

)β, and β is a parameter.

For the mean-flow equations, the geometric time step is given by

∆tkq =∆tkref

1 +√J

(4.9)

The geometric time step for turbulent calculations is given by

∆tkν = τ∆tkq (4.10)

where τ is a constant. The time step vector at each node is given by

∆t =(

∆tkq ∆tkq ∆tkq ∆tkq ∆tkν

)T

(4.11)

For the remainder of the this discussion, the index k denoting the Newton iteration will be ignored.

Once the time step is computed for the mean flow equations and the turbulence model, the Newton

system is modified to that of implicit Euler’s method,(∂R

∂Q+

1

∆tI)k

∆Qk = −Rk (4.12)

where I is the identity matrix. The operation 1∆tI is performed on an element-by-element basis of

(4.11). For the boundary equations, ∆t is infinite.

It has been shown that the enforcement of a positive turbulent working variable is important in

terms of the stability of the Spalart–Allmaras turbulence model for various solution methods. Ilinca and

Pelletier [293] and Chisholm and Zingg [244] investigate this significant aspect of the turbulence model

solution algorithm.

A Jacobian-free version of Newton’s method is used, although for the early stages of the continuation

method an approximate–Jacobian-present approach can also be employed [149,237,294]. The details are

discussed in the subsequent section. Essentially, the linear system does not require the explicit formation

of the Jacobian on the left hand side of the Newton iteration system. Since the Jacobian is not explicitly

formed, it is not possible to modify the actual Jacobian to ensure a positive update for ν. A stabilization

technique is used here that follows the method presented by Chisholm and Zingg [149].

Consider a single equation for the working turbulent variable ν in (4.12). Furthermore, approximate

the row of the Jacobian, ∂R∂Q , by its diagonal, Jd. Let Rd be the corresponding scalar residual on the

right hand side. The resulting equation is(1

∆tν+ Jd

)∆ν = −Rd (4.13)

If the time step is infinite,

∆ν = −RdJd

(4.14)

Based on this limiting condition, Chisholm suggests

|∆ν| < |r|max(ν, 1) (4.15)

4.3 SOLVING THE LINEAR SYSTEM: GMRES KRYLOV SUBSPACE METHOD 42

Table 4.1: Continuation parameters for Newton’s method.

Parameter Inviscid Laminar Turbulent Turbulent

Subsonic Transonic

kstart 5 5 30 30

a 1 1 0.1 0.1

b 1.8 1.8 1.2 1.15

β 2.0 2.0 1.1 1.1

to keep ν positive and therefore stable, where r is a constant. Applying (4.15) to (4.13) gives(1

∆tν+ Jd

)|r|max(ν, 1) = −Rd (4.16)

and, when isolated,

∆tν =

[−Rd

|r|max(ν, 1)sign(∆ν)− Jd

]−1

(4.17)

Hence, if the proposed turbulent equation time step (4.10) violates condition (4.15), the more stable

time step of (4.17) is used. The values used for τ and r are 1 and 0.4, respectively.

The values for all continuation parameters for all cases are given in Table 4.1.

4.3 Solving the Linear System: GMRES Krylov Subspace Method

Equation (4.12) is a linear system that can be solved either directly, using for example LU decomposition,

or iteratively using one of several methods. Directly solving (4.12) for ∆Qk would require an enormous

computational effort for large systems. Hence, an iterative method is used to solve the system. There are

popular algorithms that can be used to solve this nonsymmetric system including CGS [15], BiCGStab

[16], GCROT [17], and GMRES [7]. The GMRES Krylov subspace method is used in this research.

The following sections briefly outline the basic theory of projection methods, Krylov subspace meth-

ods, and GMRES. Furthermore, a detailed explanation of the GMRES algorithm is presented, along

with practical aspects such as implementation, convergence acceleration (i.e. preconditioning) and a

convergence bound estimate. Further details can be found in [7, 38, 295, 296]. Other theoretical aspects

associated with Newton–Krylov methods in general can be found in [297–299].

4.3.1 Projection Methods

The objective of a projection method is to obtain an approximate solution of the linear system

Ax = b (4.18)


where A ∈ Rn×n. The system given in (4.12) is of this form. The approximate solution, x, is contained

in subspace K of dimension m ≤ n. A constraint subspace, L, also of dimension m, is defined and the

residual

r = b−Ax (4.19)

is made orthogonal to it. Choosing K = L results in an orthogonal projection method and K 6= L, yields

an oblique projection method. With an initial guess, x0, the projection method formulation is:

Find x ∈ x0 +K such that b−Ax ⊥ L. (4.20)

If bases V = [v1 . . . vm] ∈ Rn×m and W = [w1 . . . wm] ∈ Rn×m are chosen for K and L, respectively,

then the problem statement is:

Find x = x0 + Vy such that WT (r0 −AVy) = 0. (4.21)

The vector r0 denotes the initial residual corresponding to x0. The projection method takes r0 and

makes an orthogonal projection with respect to the constraint subspace, L, which is the next residual,

r. The problem statement is equivalent to finding

x = x0 + V(WTAV

)−1WT r0 (4.22)

which means that having an invertibleWTAV ∈ Rm×m matrix is a necessary condition for any projection

method. When the matrix A is known to be positive definite, it can be shown that it is sufficient to have

W = V (i.e. L = K) to ensure that (WTAV)−1 exists. However, if it is only known that the matrix Ais invertible, then W = AV (i.e. L = AK). The latter approach is an example of an oblique projection

method.

4.3.2 GMRES Algorithm

A Krylov subspace method is an oblique projection method that uses the Krylov subspace for K. A

Krylov subspace of dimension m is

Km(A; b) = spanb,Ab,A2b, . . . ,Am−1b (4.23)

If qm−1(A) denotes a polynomial of degree m− 1 then the iterate xm is a polynomial approximation to

the exact solution

x = A−1b ≈ xm = x0 + qm−1(A) b (4.24)

The generalized minimal residual (GMRES) Krylov subspace method is a popular approach that is

used to solve linear systems where the system matrix A is nonsymmetric. Hence, an oblique projection

method is used, and the basis for the constraint subspace is given by Wm = AVm. The algorithm

consists of four basic components: initialization, Krylov subspace orthogonalization, solving a least-

squares problem, and updating the solution. The orthogonalization and least-squares problem solution


occurs as each new Krylov subspace direction is introduced. The following is an outline of the basic

GMRES algorithm:

1. Start: Choose x0, compute r0 = b−Ax0 and v1 = r0/β where β = ‖r0‖2.

2. Iterate: For m = 1, 2, . . .

Precondition the search direction vector, vm, and generate the next Krylov subspace search

direction, vm+1, by the Arnoldi orthogonalization process and form the next column of the

upper-Hessenberg matrix, Hm ∈ R(m+1)×m

wm = Avm

hi,m = wTmvi, ∀ i = 1, 2, . . . ,m

vm+1 = wm −m∑i=1

hi,mvi

hm+1,m = ‖vm+1‖2

vm+1 = vm+1/hm+1,m

Solve the least squares problem

ym = argminy||βe1 − Hy||2

where the minimum value is ρm and e1 denotes the first column of the identity matrix

I ∈ R(m+1)×(m+1). Note the function being minimized is in fact the residual, ||rm||2. The

QR factorization algorithm is quite effective in converting Hm into an upper-triangular form,

making the minimization problem quite inexpensive.

If ρm ≤ ηk‖r0‖2 then exit loop. Note ηk is a relative tolerance that is defined in the outer

iteration.

3. Update the solution: Update the solution xm = x0 + Vm ym.

4.3.3 Convergence of GMRES

Consider an arbitrary diagonalizable n × n matrix A = XΛX−1. If λ1, . . . , λv are the eigenvalues of Awith non-positive real parts and λv+1, . . . , λn are the rest of the eigenvalues that are bound in a circle

centered at C > 0 with radius R < C, then Christara [295] states the residual of GMRES, at iteration

m, satisfies

||rm||2||r0||2

≤ ‖ X ‖2 ‖ X−1 ‖2(R

C

)m−vn

maxj=v+1

v∏i=1

|λi − λj ||λi|

(4.25)

This upper convergence bound for GMRES is related to the conditioning of A. It is precisely related to

the condition number of the matrix of eigenvectors, X . The condition number is inherently related to

the convergence of GMRES as well as other iterative methods.


4.3.4 Practical Aspects of the Newton–GMRES Algorithm

If A is invertible, the GMRES algorithm will converge fully in at most n iterations. However, this would

be a very slow and expensive process since n search directions and minimization problems would be

solved, not to mention the enormous storage cost. The storage cost for the search directions alone, would

amount to the same storage cost of a dense Rn×n matrix! Therefore aspects such as preconditioning and

restarting are necessary to make the GMRES algorithm more robust, less expensive on memory, and

faster. Also, one generally does not need to converge the solution fully.

Jacobian-free GMRES

In matrix-present form, the matrix A is explicitly formed and stored. For the compressible Navier–

Stokes equations, this computation is difficult and expensive. Since GMRES only requires the product

of A with a vector, v, the formation of A (i.e. the Jacobian plus a diagonal continuation matrix) can be

avoided by using the first-order finite-difference approximation

Av =

(∂R

∂Q+

1

∆tI)v ≈ R(Q+ εv)−R(Q)

ε+

1

∆tv (4.26)

where

ε =εm||v||22

(4.27)

and εm is machine zero. This formulation is referred to as Jacobian-free GMRES. Other variations of

(4.26) are possible [300]. Chisholm [4] experimented with values of ε producing a finite difference that

has low roundoff and truncation error.

Restarted GMRES

In order to reduce the memory requirements of GMRES, a restarted algorithm can be used and is referred

to as GMRES(m). In the restarted algorithm, after a predefined number of Krylov search directions

have been formed, the update is computed, the subspace is discarded, and the algorithm is restarted

with an updated initial guess. Restarted GMRES is not used in this thesis for efficiency reasons.

Inexact Newton method

For practical problems, GMRES is typically not converged to machine zero. In this approach, a relative

tolerance, ηk, is imposed on the L2-norm of the linear residual. This tolerance can vary from one Newton

iteration to the next. The inexact tolerance is set to, ηk = 0.1, except for the first 15-20 iterations of

turbulent cases, where it is set to 10−5. Since the linear system is not converged to machine zero, the

method is referred to as an inexact Newton method.

In the early stages of Newton’s method, one can also modify the Jacobian and/or use a lower-order

approximation to it to make the algorithm more stable. This is referred to as an approximate Newton


method. In this work, an approximate Newton method is considered where the matrix is a first-order

approximation to the Jacobian, A1.

Approximate Newton method

Newton globalization is further improved by using an approximate Jacobian at the onset of the nonlinear

algorithm. Specifically, the A1–Jacobian is used. The baseline preconditioner is derived from this matrix,

meaning no additional cost is incurred for using it. Once the nonlinear residual is below a certain

tolerance, the nonlinear algorithm reverts to a Jacobian-free formulation of the linear system.

Preconditioning

A major obstacle when using GMRES to solve (4.18) for x is that the conditioning of A is typically poor

for aerodynamic problems governed by the Navier–Stokes equations (insofar as the discretization that

is considered in this research). However, with a suitable preconditioner the linear solver performs much

better. One can use left preconditioning, right preconditioning, or a combination of both. Soulaimani

et al. [301,302] and Saad [303] discuss preconditioning techniques for GMRES for CFD applications.

Preconditioning is paramount to the iterative solution process. Whether in a continuation phase or

a Newton phase, a poorly-conditioned linear system will have slow convergence or diverge. Hence, the

objective is to find a preconditioner, or set thereof, that make the linear system Ax = b easily solvable.

It is important to have a preconditioner whose inverse, or approximation to its inverse, is known or

sparse. Techniques for constructing preconditioners vary from mathematical to heuristic ones. A good

measure of the preconditioned system matrix is a reduced condition number when compared to the

original system matrix.

The combined right- and left-preconditioned system is

(M−1l AM

−1r )(Mrx) = (M−1

l b) (4.28)

where Ml and Mr are the left and right preconditioners, respectively. The solution, x, of the original

system (4.18) is identical to that of the preconditioned system (4.28). Right preconditioning is considered

in this research. Specifically, Ml is replaced by the identity matrix. Hence, the right-preconditioned

system is given by

(AM−1r )(Mrx) = b (4.29)

It is essential that Mr be invertible and relatively inexpensive to compute. Furthermore, M−1r ≈ A−1.

Chapter 5

PRECONDITIONING

In the previous chapter, the Newton–Krylov method was introduced to solve the nonlinear system of

equations corresponding to steady-state solutions of the discretized, compressible Navier–Stokes equa-

tions. The GMRES Krylov subspace method is used to solve the linear systems.

It is well known that for realistic CFD problems, such as the ones investigated here, GMRES requires

acceleration through preconditioning. Right preconditioning is used. It transforms the original linear

system

Ax = b (5.1)

into another linear system (AM−1

)(Mx) = b (5.2)

that has the same solution, but is more easily solvable. The right-preconditioned system can be thought

of as solving (AM−1

)y = b

x = M−1y

An advantage of right-preconditioning is that it renders the residual of the linear system

r = b−Ax (5.3)

47

Chapter 5. PRECONDITIONING 48

Algorithm 1 Right-Preconditioned GMRES(A,M,b,x,m)

1. Initialize:

Choose x0, compute r0 = b−Ax0 and v1 = r0/β where β = ‖r0‖2.

2. Generate Krylov polynomial and compute its coeffients:

for m = 1, 2, . . . do

Precondition:

zm =M−1vm (5.4)

Form next Krylov subspace vector: wm = AzmAugment the upper-Hessenberg matrix: hi,m = (wm, vi), ∀ i = 1, 2, . . . ,m

Orthogonalize: vm+1 = wm −∑mi=1 hi,mvi

hm+1,m = ‖vm+1‖2vm+1 = vm+1/hm+1,m

Solve the least squares problem:

ρm = minym‖rm‖2 = min

ym||β e1 − Hm ym||2

if ρm ≤ TOL then

Exit loop

end if

end for

3. Update:

Compute um = Vm ym.

Update the solution xm = x0 + um.

unchanged. The right-preconditioned GMRES algorithm is presented in Algorithm 1.

Recall that the GMRES algorithm searches for an approximate solution within the Krylov subspace

Km(A; b) =b,Ab,A2b, . . . ,Am−1b

(5.5)

For the right-preconditioned linear system (5.2), GMRES searches for a solution within the Krylov

subspace

Km(AM−1; b) =b,(AM−1

)b,(AM−1

)2b, . . . ,

(AM−1

)m−1b

(5.6)

Within GMRES, the subspace vectors are generated and orthogonalized. Before orthogonalization, the

subsequent Krylov subspace vector, vi+1, is obtained by multiplying the previous Krylov subspace vector,

vi, by AM−1. The process is outlined by the following pattern:

v1 → M−1v1 → AM−1v1orthogonalize−−−−−−−−→ v2

v2 → M−1v2 → AM−1v2orthogonalize−−−−−−−−→ v3

v3 → . . . etc

5.1 SCALING 49

where v1 is a vector that is related to b. Every other step in this process involves the multiplication of

a vector by M−1. This is the preconditioning step. The operator M−1, or preconditioner, can be as

simple as a matrix splitting or as elaborate as an iterative method. In any case, the preconditioning step

in GMRES (5.4) is equivalently solving a system

Mz = v (5.7)

for a (preconditioned) vector z, where v is its unpreconditioned counterpart. To extend the precondi-

tioning step beyond a simple matrix splitting, one can interpret (5.7) as the first iteration in a stationary

iterative method with an initial guess of zero that solves the system

Az = v (5.8)

The three preconditioners of interest in this work are: an incomplete LU factorization with fill-in,

or simply ILU(p); an ILU(p)-smoothed relaxation method; and an ILU(p)-smoothed (linear) geometric

multigrid method. Furthermore, a comparison is made between the minimum discarded fill (MDF)

ordering (with block modifications for systems of PDEs) and the popular reverse Cuthill-McKee ordering

for the aforementioned preconditioners of interest. The effects of scaling and permutation matrices (for

these orderings) are also considered for these preconditioners.

To facilitate the description of these preconditioners and orderings, this chapter is strategically struc-

tured as follows: First, scaling is introduced. Next, ILU(p) is discussed and the effect of scaling on it

is touched on. The discussion then moves to graph theory and orderings. The minimum degree (MD)

ordering algorithm is briefly introduced using graph theory notation, followed by RCM. A thorough

discussion of MDF concludes the discussion on ordering.

The remainder of this chapter is devoted to multigrid preconditioning. First a relaxation method is

defined, followed by a demonstration of the effectiveness of ILU(p) as a smoother. Some mathematics

follows regarding the implementation of an ILU(p)-smoothed relaxation method as a preconditioner.

Finally, the geometric multigrid (GMG) preconditioner and its related inter-grid operators are described.

Attention is also paid toward scaling and reordering operators that can possibly be encountered at the

various grid levels.

5.1 Scaling

We are interested in iteratively solving a linear system

Ax = b (5.9)

corresponding to an iteration of Newton’s method, where A is the flow Jacobian plus a diagonal contin-

uation matrix. This system is scaled in order to improve the performance of the iterative method. In

general, row- and column-scaling matrices, S1 and S2 respectively, can be used, resulting in the system

S1AS2 S−12 x = S1b (5.10)

5.1 SCALING 50

Note that S1 and S2 are diagonal matrices. In the absence of round-off error, this system has the same

solution as the unscaled system.

Chisholm and Zingg [149] explain that row and column scaling influence the residual and state

vectors, respectively. For example, consider row scaling. If the residual vector at each node (containing

the discretized mass, momenta, energy, and turbulence model equations) is not scaled well, GMRES will

encourage the convergence of some of the equations, while allowing other equations to diverge. This

usually results in a very poor solution, x, that corresponds to a very poor update for the nonlinear

state in Newton’s method. Recall that an inexact Newton method is used and therefore GMRES is

not fully converged. While a large relative tolerance in GMRES can circumvent this problem, it means

that GMRES would take many more iterations, making the overall algorithm slow. The key disparity in

equation scaling can be due to the presence of the turbulence model, whose scaling can differ by several

orders of magnitude compared to its mean-flow equation counterparts. Chisholm [4] explains that various

scalings can be used to bring these equations to a closer relative size. Examples include scaling in terms

of the Reynolds number, the metric Jacobian, or a constant. In this research, the discretized turbulence

equation is scaled by the Reynolds number.

Another cause of disparity in scaling is due to the equations that correspond to boundary nodes.

The linearizations of the various boundary conditions (e.g. body, wakecut, farfield, and outflow) are

not of the same order as the the interior nodes. The interior nodes in the Jacobian have entries of

O(1). Therefore, the boundary equations in the linear system are scaled by the diagonal of the Jacobian

corresponding to each equation at each node.

The scalar diagonal elements of the Jacobian are well scaled prior to the aforementioned row scalings

that are applied to the equations. Therefore, the column scaling is set to

S2 = S−11 (5.11)

to preserve the original scaling of the diagonal elements (as well as the spectrum).

5.1.1 Jacobian-Vector Products in GMRES

Since scaling transforms the linear system, its effect must be accounted for in two important areas of

GMRES: the Jacobian-vector product and the preconditioner. Here the former is discussed in some

detail.

The scaled system (5.10) is solved using GMRES, which only requires matrix-vector products using

the matrix A. For simplicity, A is assumed to be only the flow Jacobian, and the diagonal continuation

matrix is ignored. This is achieved by using a Frechet derivative (4.26), as discussed in the previous

chapter. With right preconditioning, the Frechet derivative becomes

AM−1v =R[Q+ εM−1v]−R[Q]

ε(5.12)

5.2 INCOMPLETE LU (ILU) PRECONDITIONING 51

The right-preconditioned linear system with row and column scaling is given by[(S1AS2) (S1MS2)

−1] [

(S1MS2)(S−1

2 x)]

= S1b (5.13)

If row and column scaling matrices are used, the preconditioned Jacobian-vector product becomes

(S1AS2) (S1MS2)−1

(S1v) = S1

[R[Q+ ε (S1MS2)

−1(S1v)]−R[Q]

ε

](5.14)

Note that v itself must also have its rows scaled by S1.

5.2 Incomplete LU (ILU) Preconditioning

The inverse of the incomplete LU (ILU) factorization of a first-order approximation to the Jacobian

matrix, A1 is the baseline preconditioner that is used for this thesis. Specifically,

M−1 = (LU)−1 (5.15)

where L and U are the incomplete lower and upper factors, respectively. The subscript on A1 is dropped

for the remainder of this chapter since the theory applies to more general matrices in linear systems.

In this approach, the level of fill-in, p, with respect to the pattern of A is controlled. This method is

referred to as ILU(p). In its simplest form, ILU(0) refers to a factorization whose sparsity pattern is the

same as the matrix A, that is the sparsity pattern of L+ U is identical to the sparsity pattern of A. Any

entries that are outside of this pattern are discarded during the factorization. ILU(1) allows additional

fill-in from entries in the original matrix pattern. ILU(2) allows additional fill-in from entries within the

pattern of ILU(1). In general, the fill-in for ILU(p) is kept if it is due to entries from ILU(p− 1).

There are several variants to the ILU(p) factorization. Similarly, there are several variants to incom-

plete factorizations in general. They were reviewed in detail in Chapter 1.

The IKJ variant used in the SPARSKIT [304] package traverses the matrix in a row-wise sense

starting with row i = 1 until i = n, where n is the total number of rows in the matrix. For a given row,

i, contributions are factored into their row values from previous rows, k. The level of fill, LFIL or p, must

be considered if a zero entry in the matrix is to be modified. Since L and U are generated in a row-wise

manner, and they only rely on previous rows’ information, this variant of the algorithm is amenable to

the sparse-row storage formats used in SPARSKIT, such as the compressed sparse row (CSR) format.

Algorithm 2 offers a more detailed description of ILU(p) using an IKJ indexing strategy.

The Crout variation of the IKJ-indexed incomplete LU factorization modifies the original matrix

entries in the sub-block whose indices follow the pivot index. This is different than Algorithm 2 used

in SPARSKIT. The explanation provided for the minimum discarded fill (MDF) algorithm uses the

Crout variation of the IKJ-ILU factorization. Refer to Algorithm 3.


Algorithm 2 SPARSKIT [304] ILU(p) factorization algorithm

Define a shifted level of fill-in: p = p+ 1

for i = 1, n do

for k = 1, i− 1 do

if levik <= p then

φ = aik/akk

for j = 1, n do

lev?ij = levik + levkj

if levij == 0 then

% Fill is unassigned

if lev?ij <= p then

aij = −φ akjlevij = lev?ij

end if

else

% Existing fill

aij ← aij − φ akjlevij ← min(levij , lev

?ij)

end if

end for % Index j

end if

end for % Index k

end for % Index i

A good measure of the quality of ILU is in terms of the preconditioned error of the factorization.

The incomplete factorization of A can be written as

A = LU + E (5.16)

where E is the error of the factorization. However, it is the preconditioned error that should be close to

zero. The preconditioned matrix can be written as

A(LU)−1 = I + E(LU)−1 (5.17)

where E(LU)−1 is the preconditioned error. When the preconditioner (5.15) is applied to the matrix

A, it should bring it closer to the identity matrix. For example, the eigenvalues of A(LU)−1 should be

closer to unity.

A major drawback of ILU is its failure to handle zero pivots. This problem can often be alleviated

by an intelligent pivoting strategy. Fortunately, the system considered here can always be ordered in


Algorithm 3 Crout ILU(p) factorization algorithm

Define a shifted level of fill-in: p = p+ 1

for i = 1, n− 1 do

for k = i+ 1, n do

φ = aki/aii

for j = i+ 1, n do

lev?kj = levki + levij

if levkj == 0 then

% Fill is unassigned

if lev?ij <= p then

akj = −φ aijlevkj = lev?kj

end if

else

% Existing fill

akj ← akj − φ aijlevkj ← min(levkj , lev

?kj)

end if

end for % Index j

end for % Index k

end for % Index i

such a manner that avoids zero pivots. A second shortcoming of ILU is that it scales poorly with with

increasing problem size.

For discretized systems of PDEs, block forms of ILU are preferred. For example, a block-fill ILU(p),

or BFILU(p), algorithm can be used. Orkwis [128] and Pueyo [238] employ this preconditioner. In

BFILU(p), all entries within a block of A (corresponding to a single grid node) are assigned a fill-in

value of zero. The factorization proceeds as ILU(p) would with scalar quantities.

A block ILU(p), or BILU(p), algorithm is used in this work, where each block corresponds to a single

grid node. In contrast to BFILU(p), this approach is a true block incomplete factorization, where divisions

and multiplications in the scalar algorithm are replaced with matrix inversions and multiplications.

Chisholm [4] and Hicken [305] used BILU(p) for their Navier-Stokes and Euler simulations, respectively.

5.2.1 Effect of Scaling

It is important to determine the sensitivity of ILU(p) with respect to scaling. This is especially true

when considering multigrid preconditioning, since different scaling operators can be used on the various


iia

jia a jk

a ik

multip

ly

multiply

divide

Row: j

Row: i

Column: i Column: k

Figure 5.1: Contributions to ajk from pivot aii in the elimination algorithm.

grid levels. In the absence of round-off errors,

ILU(S1AS2) = S1 ? ILU(A) ? S2 (5.18)

The proof is outlined below.

The ILU (and the more general LU) factorization process uses the elimination step

ajk ← ajk −ajiaikaii

(5.19)

Figure 5.1 shows the contribution of the of various elements of the matrix A to the entry ajk.

First consider the more general row and column scaling matrices S1 and S2 respectively. Next,

consider only the entries in S1, S2, and A that relate to rows i and j and columns i and p of Arespectively. The matrices are

. . .

s1ii

. . .

s1jj

. . .

......

. . . aii . . . aik . . ....

...

. . . aji . . . ajk . . ....

...

. . .

s2ii

. . .

s2kk

. . .

(5.20)

The product of these matrices becomes

......

. . . s1iiaiis2ii . . . s1iiaiks2kk. . .

......

. . . s1jjajis2ii . . . s1jjajks2kk. . .

......

(5.21)

5.3 ORDERING 55

Hence, the ILU elimination step applied to (5.21) becomes

s1jjajks2kk

← s1jjajks2kk

−s1jjajis2ii s1iiaiks2kk

s1iiaiis2ii

(5.22)

and simplifies (through the cancellation of s2ii and s1ii) to

s1jjajks2kk

← s1jj

(ajk −

ajiaikaii

)s2kk

(5.23)

Thus the ILU elimination step is insensitive to row and column scalings.

5.3 Ordering

The ordering of the equations and the unknowns in (5.9) is crucial to solution algorithm. The incomplete

factorization preconditioner depends heavily on ordering in terms of quality and stability. For example,

if a pivoting strategy fails to counter a zero pivot, ILU will break down. Furthermore, the amount of

fill-in that is discarded is heavily dependent on the ordering.

Fortunately, the matrix A does not contain zero pivots with the current discretization. The only

locations in A that possibly contain zeros on the diagonal correspond to the boundary conditions for a

given node. This is easily rectified by reordering the mass, momenta, and energy equations.

The second type of ordering that impacts the quality of BILU is on a nodal level. Larger blocks can

also be used, but are not considered in this research. The computational domain has a default ordering.

This is referred to as a grid-based ordering. Examples of grid-based orderings include natural and double-

bandwidth (for meshes with a wakecut). Natural ordering is a lexicographical ordering that traverses

one direction before another. For a C-topology mesh, the natural ordering can be in the normal direction

first and then in the streamwise direction. Proceeding in the normal direction first is preferred since the

normal direction typically has a smaller amount of nodes compared to the streamwise direction, leading

to a tighter clustering of bands around the main diagonal. Figure 3.2 shows the sparsity pattern for A2

and A1 matrices using this specific natural ordering. The bandwidth is very poor, however, because of

the entries in the upper-right and lower-left corners of the matrix resulting from the discretization across

the wakecut. Double-bandwidth ordering is a better grid-based ordering for C-topology meshes since it

traverses across the wakecut.

Nevertheless, reordering of the nodes can improve on these initial orderings. Two main categories

for reordering include graph-based and matrix-based. The latter include minimum degree [131] (MD)

and reverse Cuthill–McKee [134] (RCM). A matrix ordering that is researched in detail in this work is

minimum discarded fill (MDF) [114]. Only symmetric nodal reordering strategies are used in this work

so as to preserve the favourable (block) diagonal entries in A. The rest of this section outlines these

reordering approaches. The MD reordering is discussed since it aids in the explanation of RCM. The

reorderings are well described by using terminology from graph theory. Therefore, a brief review of the

basic aspects of graph theory is also presented.

5.3 ORDERING 56

The idea of domain decomposition is closely related to nodal reordering. The objective of domain

decomposition is to break the larger problem domain into smaller subproblem domains. The approach is

inherently parallel. Appendix A.1 discusses the general theory of domain decomposition and a detailed

literature review was provided in Chapter 1. Serial aspects of preconditioning are emphasized in this

work and therefore, domain decomposition will not be discussed any further.

5.3.1 Graph Theory

Graph theory is used to better describe the two key orderings that are compared in this thesis: reverse

Cuthill-McKee (RCM) and minimum discarded fill (MDF). Here, some of the basics of graph theory are

briefly outlined including notation. We limit our discussion to the graph associated with a given matrix

A ∈ Rn×n that has nonzero diagonal entries. A concise description of graph theory can be found in

works by Liu and Sherman [146], Dutto [140], and Kaveh et al. [306].

A graph G = 〈V,E〉 of a matrix A consists of a set of n vertices

V = v1, v2, . . . , vn (5.24)

and edges

E = vi, vj : i 6= j, aij 6= 0 and vi, vj ∈ V (5.25)

formed by adjacent vertices. The adjacency set of a given vertex, adjG(vi), contains all vertices that

share edges with vi. The cardinality of a set R is the number of elements contained in that set and is

written as |R|. Hence, the degree of a node vi is

degG(vi) = |adjG(vi)| (5.26)

Finally, f is the numbering of the graph. It is the index of a vertex.

An additional property of the graph G(A) that relates to the matrix is defined, since it is related to

many matrix reordering algorithms. The bandwidth of A is

b(A) = max |i− j| : aij 6= 0 (5.27)

5.3.2 Minimum Degree (MD) Ordering

The minimum degree (MD) ordering is entirely based on the graph G = 〈V,E〉 of a matrix A. First,

a root node of minimal degree is chosen. This node’s influence on the rest of the graph is erased by

deleting it from the graph and updating the edges and vertices of the graph. Next, another node of

minimal degree is chosen and the process is repeated until the graph is depleted. Algorithm 4 shows the

MD ordering. There are two ambiguities that exist in the ordering: root node selection and tie-breaking

strategy.

At the first iteration, there may be several nodes that have a minimal degree. For the classical MD

algorithm that is presented, the choice of root node is arbitrary. Furthermore, at each subsequent node

5.3 ORDERING 57

Algorithm 4 Minimum Degree: MINDEG(A)

Define the graph G = 〈V,E〉 associated with the matrix A.

while V 6= ∅ doSelect a node v ∈ V of minimum degree in G and order that v as next.

Let Vv be the subset of V with the vertex v removed:

Vv = V − v

Define Ev as the remaining set of edges in G that do not contain the removed node v:

Ev = a, b ∈ E : a, v ∈ Vv ∪ a, b : a 6= b and a, b ∈ adjG(v)

Redefine the graph G as the graph that would remain after vertex v is removed. That is, set V = Vv,

E = Ev, and update G = 〈V,E〉.end while

selection, there may also be ties. The tie-breaking strategy in the classical algorithm is also arbitrary.

The MD ordering is not used in this work. However, it clearly illustrates the ambiguities that are also

relevant to the discussion of the reverse Cuthill–McKee ordering strategy.

5.3.3 Reverse Cuthill–McKee (RCM) Ordering

The reverse Cuthill–McKee (RCM) ordering is designed to minimize the bandwidth of a matrix by using

its graph. The ordering is the Cuthill–McKee (CM) ordering, but reversed. Like the MD ordering, CM

also begins with a root node that is of minimal degree. From there, adjacent nodes are selected until the

entire graph is traversed. The algorithm visually appears to be a wavefront that emanates from the root

node and advances through the graph. Algorithm 5 outlines the classical RCM ordering, as presented

by George [134].

RCM suffers from the same ambiguities as MD. In particular, the selection of the root node is arbitrary

in the classical algorithm. Furthermore, tie-breaking between nodes of equal degree is also arbitrary.

For matrices arising from the discretization of the Navier–Stokes equations, the quality of the RCM-

reordered matrix depends greatly on the root-node selection and tie-breaking strategies. Chisholm [4]

investigated some root-node selection and tie-breaking strategies. For 2D inviscid and viscous cases,

he found that a good choice for the root node is downstream and in the middle of the grid. Various

tie-breaking strategies were investigated. Specifically, x- and y-position, up- or down-wind position, and

grid indices were considered. In most cases the choice was not as crucial, but his conclusion was to break

ties by selecting the upwind node first. The root-node selection and tie-breaking strategies used in this

research are discussed in the next chapter.

5.3 ORDERING 58

Algorithm 5 Reverse Cuthill–McKee: RCM(A)

Define the graph G = 〈V,E〉 associated with the matrix A. Let Q be the input queue containing all of

the nodes in an arbitrary order. Let R and S be working queues that are initially empty. Let TOP(P )

denote first entry for any queue P .

Choose a root node v1 ∈ V from Q.

Set i = 1

Q← Q− viR← adj(vi) ∩QQ← Q−Rloop

while R 6= ∅ doi← i+ 1

vi = TOP(R)

R← R− viPlace vi at the end of S

end while

if Q == ∅ then stop

z = TOP(S)

S ← S − zR← adj(z) ∩QQ← Q−R

end loop

Reverse the ordering of the nodes.

5.3 ORDERING 59

5.3.4 Minimum Discarded Fill (MDF) Ordering

Crout LU factorization

We first consider the LU-factorization of the matrixA ∈ Rn×n. Using the Crout form of the factorization,

the first iteration is written as

A = A0 =

(d1 βT1

α1 B1

)(5.28)

where d1 ∈ R is the first diagonal entry of A0 (i.e. pivot), βT1 ∈ R1×(n−1) is the remaining part of the

first row of A0, α1 ∈ R(n−1)×1 is the remaining part of the first column of A0, and B1 ∈ R(n−1)×(n−1)

is the submatrix of A0 after the removal of the first row and column. The initial matrix is written as

A0 = L1U1 (5.29)

where

L1 =

(1 0α1

d1In−1

)(5.30)

and

U1 =

(d1 βT1

0 B1 − α1βT1

d1

)(5.31)

The lower-right submatrix in U1 is defined as A1.

The factorization of the matrix A0 is therefore given by

A0 = L1U1 =

(1 0α1

d1In−1

)(d1 βT1

0 A1

)(5.32)

Without any permutations of row or columns, the factorization proceeds as(1 0αk

dkIn−k

)(dk βTk

0 Ak

)(5.33)

where ∀ k = 1, . . . , n− 1,

Ak = Bk −αkβ

Tk

dk(5.34)

Defining

Ck = [c(k)ij ] ≡ αkβ

Tk

dk(5.35)

gives

Ak = Bk − Ck (5.36)

If the LU factorization is performed in such a manner as not to drop any fill, then (5.36) represents the

exact factorization applied to each respective submatrix of the original matrix, A = A0.

5.3 ORDERING 60

Discarded fill

For incomplete factorizations there is information that is lost during the factorization process. Essen-

tially, some entries are discarded to prevent the accumulation of large amounts of fill. This fill-in that is

lost at each step of the factorization can be represented in the kth iteration of the algorithm by matrix

Fk. The iteration becomes

Ak−1 =

(1 0αk

dkIn−k

)[(dk βTk

0 Bk − αkβTk

dk−Fk

)+

(0 0

0 Fk

)](5.37)

Referring to (5.36), the submatrix Ak can be redefined as

Ak = Bk − Ck −Fk (5.38)

where Fk is a matrix containing the discarded fill.

MDF

The minimum discarded fill [113, 114] algorithm is a reordering strategy (i.e. pivoting) coupled with

an incomplete factorization process that minimizes the amount of fill that is dropped. Since the MDF

algorithm is an incomplete factorization (like ILU or IC), a fill and/or drop-tolerance strategy can be

employed. The former is used, and LFIL is defined as the maximum allowable fill per designated entry

in the matrix Ak.

The fill-in level of the matrix entries is updated using the formula

lev(k)ij ≡ min

(lev

(k−1)im + lev

(k−1)mj + 1, lev

(k−1)ij

)(5.39)

The discarded fill that results from the choice of a pivot in the factorization is given by the matrix

Fk = [f(k)ij ] ≡

0, b

(k)ij 6= 0

−c(k)ij , lev

(k)ij > LFIL

0, otherwise

(5.40)

where lev(k)ij is the fill-in level of a particular entry in the matrix Ak. The equivalent representation for

Ak is

Ak = [a(k)ij ] ≡

b(k) − c(k)

ij , b(k)ij 6= 0

b(k), lev(k)ij > LFIL

b(k) − c(k)ij , otherwise

(5.41)

In the MDF algorithm, at iteration k, the subsequent pivot node is chosen such that the Frobenius norm

of Fk is minimized. Refer to Algorithms 6 and 7. D’Azevedo et al. [114] discuss variations to their

original MDF algorithm. For example, the threshold MDF algorithm modifies (5.40) to

Fk = [f(k)ij ] ≡

0, b

(k)ij 6= 0

−c(k)ij , lev

(k)ij > LFIL or |c(k)

ij | < εmin(Ri, Rj)

0, otherwise

(5.42)

5.3 ORDERING 61

Algorithm 6 Minimum discarded fill: MDF(A)

Initialization:

Set: A0 ≡ A.

Set:

lev(0)ij ≡

0, aij 6= 0

∞, otherwise

Compute the discard value for all nodes vj in the graph of A0 using Algorithm 7.

for k = 1, . . . , n− 1 do

1. Choose the next pivot node vm is such that it has a minimal discard(vm). The tie-breaking strat-

egy hierarchy is: (a) minimum deficiency, (b) minimum degree, and (c) minimum lexicographical

ordering index.

2. Update the incomplete factorization Ak using (5.38) with the defined maximum allowable fill

level, LFIL.

3. Define the permutation matrix Pk to exchange vm to the first position in Ak.

4. Update the fill level of the elements in Ak using (5.39). Specifically,

for neighbour vi of vm, where (vi, vm) ∈ Ek−1 do

for neighbour vj of vm, where (vm, vj) ∈ Ek−1 do

lev(k)ij ≡ min

(lev

(k−1)im + lev

(k−1)mj + 1, lev

(k−1)ij

)end for

end for

5. Update the discard values of vm’s neigbours using the following iteration:

for each vi neighbour of vm, where (vi, vm) ∈ Ek−1 do

Using Algorithm 7, re-compute its discard value discard(vi) = ||Fk+1||F,

where Fk+1 is obtained from

Pk+1AkPTk+1 =

(dk+1 βTk+1

αk+1 Bk+1

)

and Pk+1 is the permutation matrix that exchanges vi to the first position in Ak.

end for

end for

where ε is a tolerance and

Ri = maxm=1,n

(|aim|) = ||ai∗||∞ (5.43)

Another variation of MDF is the minimum update matrix (MUM) [114] algorithm, which is related to

classic work by Markowitz [130].

5.3 ORDERING 62

Algorithm 7 Compute the discard value for node viInitialize: discard(vi) ≡ 0

Refer to Figure 5.1. Compute the discard value, discard(vk) = ||Fk||F using the following iterations:

for each neighbour vj of vi in Ek do

for each p such that a(k)ip 6= 0, a

(k)jp = 0, lev

(k+1)jp > LFIL?? do

discard(vi)← discard(vi) +

(a(k)ji a

(k)ip

a(k)ii

)2

end for

end for

discard(vi)←√

discard(vi)

?? Note: a(k)ip 6= 0, a

(k)jp = 0, lev

(k+1)jp > LFIL simply means that if a nonzero entry a

(k)ip is to introduce

some new fill into the matrix in entry a(k)jp and it exceeds the allowable fill-in level limit, it should be

treated as discarded fill.

Greedy MDF

The approach of Persson and Peraire [145] is followed to extend the MDF algorithm to a system of PDEs.

It is called the greedy MDF algorithm. Although they use the discontinuous Galerkin finite element

method, their approach is directly-applicable to finite-difference discretization used in this work. In the

greedy MDF algorithm, the blocks in the system matrix are approximated as scalars by taking their

Frobenius norms.

Beginning with the original system matrix, A, a block-scaled matrix

B = (AD)−1A (5.44)

is formed where AD is equivalent to the block diagonal of A. The block-diagonal entries of B are identity

matrices with dimensions equal to the block size. The reduced system matrix, C, has scalar entries equal

to the Frobenius norms of the block entries in B

Cij = ||Bij ||F (5.45)

The MDF algorithm is then performed on the reduced system matrix and the nodal reordering is

obtained. The block reduction is summarized in Algorithm 8.

Effect of scaling

The effect of scaling on the greedy MDF algorithm was investigated. Specifically, row and column

scaling were investigated, and it was found that MDF insensitive to row scaling. The proof that MDF

is insensitive to row scaling and sensitive to column scaling is as follows:

Consider a 2× 2 block matrix

A =

(A11 A12

A21 A22

)(5.46)

5.3 ORDERING 63

Algorithm 8 Block reduction for greedy MDF

for i = 1, nnblocks do

Compute the inverse of the diagonal block i of A, ADi

Scale block row i of A by ADi and store as block-row i of Bfor j = 1, nnblocks do

cij ← ||Bij ||F, where C = cij is the reduced matrix

end for

end for

For the greedy MDF algorithm, a diagonal block-row scaling given by (5.44) is applied. This results in

the matrix

B =

(I11 A−1

11 A12

A−122 A21 I22

)(5.47)

This matrix is then reduced to a scalar equivalent and the discard values are obtained.

Consider the a diagonal row scaling matrix, Sr, that is partitioned into blocks with equivalent di-

mensions to I11 and I22. Hence,

Sr =

(Sr1 0

0 Sr2

)(5.48)

If (5.46) is scaled by (5.48), then

SrA =

(Sr1 0

0 Sr2

)(A11 A12

A21 A22

)(5.49)

=

(Sr1A11 Sr1A12

Sr2A21 Sr2A22

)(5.50)

Applying the diagonal block-row scaling based on this matrix gives

Brow scale =

(I11 (Sr1A11)

−1 Sr1A12

(Sr2A22)−1 Sr2A21 I22

)(5.51)

=

(I11 A−1

11 S−1r1 Sr1A12

A−122 S−1

r2 Sr2A21 I22

)(5.52)

=

(I11 A−1

11 I11A12

A−122 I22A21 I22

)(5.53)

=

(I11 A−1

11 A12

A−122 A21 I22

)(5.54)

= B (5.55)

which is the original diagonal block-row scaled matrix given in (5.47). The diagonal block-row scaling

matrices cancel out in this formulation. Hence the greedy MDF algorithm is insensitive to row scaling.

5.4 MULTIGRID PRECONDITIONING 64

However, the algorithm is influenced by column scaling. If a diagonal column scaling matrix Sc (parti-

tioned into blocks with equivalent dimensions to I11 and I22) is considered and the same approach as

the above derivation is followed, the following diagonal block-column scaled matrix is obtained:

Bcolumn scale =

(I11 S−1

c1 A−111 A12Sc2

S−1c2 A

−122 A21Sc1 I22

)6= B (5.56)

Hence, the greedy MDF ordering is sensitive to column scaling.

5.4 Multigrid Preconditioning

The remainder of this chapter focuses on the use of multigrid as a preconditioner for GMRES. Multigrid

consists of a smoother and coarse-grid correction. The smoothers in this work are based on ILU(k), and

therefore belong to the family of stationary iterative methods that are based on matrix splittings. This

section begins with a brief review of matrix splittings, followed by a demonstration that ILU(k) is indeed

a good smoother. From there the discussion shifts to the iterative use of ILU(k) and ILU(k)-smoothed

multigrid as a preconditioner. Inter-grid operators, reordering, and scaling are also considered in the

discussion of multigrid preconditioning.

5.4.1 Stationary Iterative Methods

The linear system (5.9) is solved using GMRES. Here, a stationary iterative method is considered for

solving the linear system, thus introducing the concept of the smoother. In turn, that smoother will be

accelerated by multigrid, leading to a multigrid preconditioner.

Consider the splitting

A =M+N (5.57)

where the cost of inverting M is cheaper than A. A classic relaxation method is used to solve (5.9)

using this matrix splitting and is defined by

xm+1 = xm +M−1rm (5.58)

where

rm = b−Axm (5.59)

is the residual. A damping parameter, ω, can be introduced into (5.58) resulting in

xm+1 = xm + ωM−1rm (5.60)

If x is the exact solution, then the error at iteration m is

em = x− xm (5.61)


and

Aem = rm (5.62)

Furthermore, the iteration matrix for the damped method is

G = I + ωM−1A (5.63)

It can be easily shown that

em = Gem−1 (5.64)

and

em = Gme0 (5.65)

where e0 is the initial error. Using the properties of norms,

||em|| ≤ ||Gm|| ||e0|| (5.66)

Convergence for the relaxation method is guaranteed if

limm→∞

||Gm|| = 0 (5.67)

or equivalently, if the spectral radius of G satisfies

ρ(G) < 1 (5.68)

5.4.2 ILU(p) as a Smoother

A relaxation method has a corresponding splitting matrix (or set thereof). For example, the Richardson

[307] and Jacobi methods have splitting matrices of MR = I and MJ = DA, respectively. I is the

identity matrix and DA is the diagonal of A. The Gauss–Seidel method uses more information from the

system matrix, A, and its corresponding splitting matrix is either

MGS = LA (5.69)

or

MGS = UA (5.70)

where LA and UA are the lower- and upper-triangular parts of A, respectively.

In order for a relaxation method to be effectively accelerated by multigrid, it must exhibit a good

smoothing behaviour. That is, the method must efficiently damp high-frequency errors, thus making it

amenable to coarse-grid corrections. The symmetric Gauss–Seidel (SGS) method, with splitting matrix

MSGS = LAUA, is an example of a good smoother. SGS alternates the forward and backward solves

indicated by the operators LA and UA.


In addition to being a preconditioner for GMRES, ILU(p) can be a good smoother of high-frequency

errors and thus be accelerated by multigrid. ILU(p) can be represented as the matrix splitting

MILU(p) = LU (5.71)

where L and U are the incomplete factors of A.

A study was conducted to investigate the effectiveness of ILU(p) as a smoother. The study later

was extended to compare ILU(p) to SGS and to measure the importance of the coarse-grid correction.

A linear system was constructed using the convection-diffusion equation matrix operator, ACD, with a

Peclet number of 0.01 and a flow angle of θ = 45. Specifically, the operator was generated using an

n×n-node square grid (for n = 21, 41, 81, 161) using second-order centered differences. The right-hand

side was set to zero, resulting in the linear system:

ACD φ = 0 (5.72)

with an exact solution of φ = 0. This system was solved using a stationary iterative method with ILU(0)

as a smoother. For comparison, the same problem was solved using an SGS smoother. In order to

determine the effectiveness of the each smoother, the initial guess for the iterative method was set to

φ0 = e0(θx, θy) = sin θx sin θy (5.73)

where larger values of θx and θy correspond to increasing frequencies in each respective direction. Since

the exact solution is zero, the convergence of the method depends solely on the specific initial frequencies

related to θx and θy.

Tables 5.1 and 5.2 compare the number of iterations for the ILU(0) and SGS methods to converge the

linear residual by ten orders of magnitude for n = 21 and n = 41, respectively. Both methods require

significantly fewer relative iterations for high-frequency initial errors and this effect increases with n.

Since ILU(0) has better coupling, it generally requires fewer iterations than SGS. Furthermore, the CPU

time for ILU(0) is much lower than SGS for all cases.

Tables 5.3 and 5.4 compare the smoothing effectiveness of ILU(0) to ILU(1) for n = 41 and n = 81,

respectively. ILU(1) is also an effective smoother and requires fewer iterations overall than ILU(0).


Table 5.1: SGS (left) and ILU(0) (right) iterations on a 21 × 21–node grid for various initial error

frequencies.

HHHHH

HHθx

θylow medium high

low 20284 15119 10092

medium 15119 10563 4402

high 10092 4402 4951

HHHHH

HHθx

θylow medium high

low 289 215 140

medium 215 150 83

high 140 83 82

Table 5.2: SGS (left) and ILU(0) (right) iterations on a 41 × 41–node grid for various initial error

frequencies.

HHHHH

HHθx

θylow medium high

low 274027 175009 100319

medium 175009 89515 28499

high 100319 28499 11329

HHHHH

HHθx

θylow medium high

low 986 627 336

medium 627 316 108

high 336 108 43

Table 5.3: ILU(0) (left) and ILU(1) (right) iterations on a 41 × 41–node grid for various initial error

frequencies.

HHHH

HHHθx

θylow medium high

low 986 627 336

medium 627 316 108

high 336 108 43

HHHH

HHHθx

θylow medium high

low 379 235 129

medium 242 116 49

high 155 46 20

Table 5.4: ILU(0) (left) and ILU(1) (right) iterations on an 81 × 81–node grid for various initial error

frequencies.

HHHHH

HHθx

θylow medium high

low 3466 1802 730

medium 1802 370 80

high 730 80 20

HHHHH

HHθx

θylow medium high

low 1326 664 210

medium 692 126 36

high 342 36 10


Algorithm 9 Relaxation: RELAX(A,M,z,v,ν)

for i = 1, ν do

Compute the residual: r = v −AzSolve M∆z = r for the update ∆z.

Update the solution: z ← z + ∆z

end for % Index i

5.4.3 Iterative ILU(p) as a Preconditioner

Pueyo [5] examined the use of ILU(p) to solve the nonlinear system. Here, the repeated use of ILU(p) as

a preconditioner for GMRES is of interest for two reasons: it is generalization of ILU(p) to an iterative

method, thus offering more flexibility in its use and tuning, and it constitutes the smoothing component

to a more general multigrid preconditioner.

Consider the iterative solution of (5.8) using an ILU(p)-smoothed stationary method, where the

initial guess to the solution is z0 = 0. The baseline preconditioning step in GMRES (5.4) is consistent

with one iteration of this stationary method. Algorithm 9 summarizes this iterative method, where Mis the ILU(p)-factorization of the system matrix, A.

Iteration r of Algorithm 9 can be written in terms of the initial guess, z0, and the unpreconditioned

Krylov subspace vector, v, as

zr = Grz0 + (I − Gr)A−1v (5.74)

where

G = I −M−1A (5.75)

is the iteration matrix. Although the matrix A−1 appears in (5.74), A is not inverted in the iterative

method. Using the initial guess z0 = 0, equation (5.74) simplifies to

zr = (I − Gr)A−1v (5.76)

Using the notation of Algorithm 1, the preconditioning step at iteration m of GMRES is therefore

given by

wm = (I − Gr)A−1vm (5.77)

The smoothing operator is independent of vm. Furthermore, for r = 1, the baseline ILU(p) precondi-

tioning step (5.4) is recovered:

wm = (I − G)A−1vm (5.78)

⇒ wm =(I −

(I −M−1A

))A−1vm (5.79)

⇒ wm =(M−1A

)A−1vm (5.80)

⇒ wm = M−1vm (5.81)


5.4.4 ILU(p)-Smoothed Geometric Multigrid as a Preconditioner

The right-preconditioned GMRES algorithm searches for a solution within the Krylov subspace (5.6).

In the previous section, it was shown that the baseline ILU(p) preconditioner is a matrix splitting of A,

and its application is equivalent to the first iteration of a stationary iterative method. If an iterative

method is used as a preconditioner, the preconditioned Krylov subspace can be represented as

Km(AM−1Iter; b) =

b,(AM−1

Iter

)b,(AM−1

Iter

)2b, . . . ,

(AM−1

Iter

)m−1b

(5.82)

where the operator M−1Iter represents the iterative preconditioner.

Earlier, ILU(p) was shown to be an excellent smoother of high-frequency errors; however it is not as

effective for damping low-frequency errors. A coarse-grid correction can be used to solve for the remaining

error by projecting the relationship (5.62) onto a coarse grid. On the coarse grid, the remaining error

waveform is represented by fewer nodes, thus appearing to have a higher frequency. The smoother can

be applied to effectively reduce the coarse-grid error, and this error can then be interpolated back to the

fine grid.

The original linear system (5.1) with multigrid preconditioning can be written as

AM−1MGMMGx = b (5.83)

and the solution found by using GMRES lies within the Krylov subspace

Km(AM−1MG; b) =

b,(AM−1

MG

)b,(AM−1

MG

)2b, . . . ,

(AM−1

MG

)m−1b

(5.84)

A geometric multigrid (GMG) approach is used to determine the coarse-grid operators in this re-

search. The operators on the various coarser grid levels are generated on each grid level in a similar

fashion to the fine-grid operators. A V-cycle is used for this research. Figure 5.2 shows a four-grid V-

cycle. Algorithm 10 shows a two-grid V-cycle multigrid preconditioner. The letters f and c denote the

fine and coarse grids, respectively. The restriction operator, Icf , interpolates the fine grid residual to the

coarse grid, and the prolongation operator, Ifc , interpolates the coarse grid correction to the remaining

error to the fine grid.

The effectiveness of multigrid depends on the restriction and prolongation operators. For geometric

multigrid, the inter-grid operators must satisfy the order rule [308]

mIcf +mIfc > morder (5.85)

where mIcf and mIfc are the orders of the restriction and prolongation operators, respectively, plus

one. For the problems considered in this work, morder = 2, since, at most, second-order operators are

considered for the discretization of the PDEs (e.g. the convection-diffusion and Navier–Stokes equations).

Bilinear interpolation (or full-weighting) is used for the 2D restriction and prolongation operators

and satisfy (5.85). Although not used, restriction by injection combined with linear interpolation prolon-

gation also satisfies the aforementioned order rule criterion. Figures 5.3 and 5.4 illustrate the restriction


Smooth

Restrict

Prolong

Figure 5.2: A four-grid, multigrid V-cycle.

Algorithm 10 Multigrid V-cycle: MGV2(Af ,Mf ,zf ,vf ,Ac,Mc,ν1,ν2,νc)

Initialize: zf = 0

Perform ν1 pre-smoothing iterations: RELAX(Af ,Mf ,zf ,vf ,ν1)

Compute the residual: rf = vf −AfzfRestrict the residual: vc = Icf rfInitialize the coarse-grid correction: zc = 0

if νc == 0 then

Solve the coarse-grid system exactly: Aczc = vc

else

Solve the coarse-grid system inexactly: RELAX(Ac,Mc,zc,vc,νc)

end if

Prolong the coarse-grid correction: zf ← zf + Ifc zcPerform ν2 post-smoothing iterations: RELAX(Af ,Mf ,zf ,vf ,ν2)

and prolongation operators, respectively. Higher-order interpolation operators do not improve the per-

formance. Furthermore, other approaches at the boundaries were examined; however they perform

worse.

Tables 5.5-5.7 show the effectiveness of a coarse-grid correction for ILU(0) for the convection-diffusion

problem in Section 5.4.2. Tables 5.3 and 5.4 show that ILU(0) scales poorly with increasing problem

size. ILU(0) accelerated by a coarse-grid correction scales nearly optimally with increasing grid size.

For this case, information is projected to the coarse grid using a simple injection operator and the error

correction is projected to the fine grid using a weighted prolongation operator. These are classical results

that are expected of multigrid for diffusion-dominated problems. Recall, that the Peclet number for this

case is 0.01, which is characteristic of a diffusion-dominated flow. Tables 5.8-5.10 present similar results

with ILU(1) as a smoother.


Figure 5.3: Full-weighting restriction operator.

Figure 5.4: Full-weighting prolongation operator.


Table 5.5: ILU(0) (left) and ILU(0)+MG (right) iterations on a 41 × 41–node grid for various initial

error frequencies.

HHHHH

HHθx

θylow medium high

low 986 627 336

medium 627 316 108

high 336 108 43

HHHHH

HHθx

θylow medium high

low 21 19 16

medium 19 15 13

high 16 13 11

Table 5.6: ILU(0) (left) and ILU(0)+MG (right) iterations on an 81 × 81–node grid for various initial

error frequencies.

HHHHHHHθx

θylow medium high

low 3466 1802 730

medium 1802 370 80

high 730 80 20

HHHHHHHθx

θylow medium high

low 21 19 16

medium 19 14 12

high 16 12 10

Table 5.7: ILU(0) (left) and ILU(0)+MG (right) iterations on a 161× 161–node grid for various initial

error frequencies.

HHHH

HHHθx

θylow medium high

low 11703 4669 1091

medium 4669 256 46

high 1091 46 10

HHHH

HHHθx

θylow medium high

low 22 19 16

medium 19 14 12

high 16 12 9


Table 5.8: ILU(1) (left) and ILU(1)+MG (right) iterations on a 41 × 41–node grid for various initial

error frequencies.

HHHHH

HHθx

θylow medium high

low 379 235 129

medium 242 116 49

high 155 46 20

HHHHH

HHθx

θylow medium high

low 17 15 13

medium 14 10 10

high 13 9 8

Table 5.9: ILU(1) (left) and ILU(1)+MG (right) iterations on an 81 × 81–node grid for various initial

error frequencies.

HHHHHHHθx

θylow medium high

low 1326 664 210

medium 692 126 36

high 342 36 10

HHHHHHHθx

θylow medium high

low 18 15 13

medium 15 10 9

high 13 9 7

Table 5.10: ILU(1) (left) and ILU(1)+MG (right) iterations on a 161× 161–node grid for various initial

error frequencies.

HHHH

HHHθx

θylow medium high

low 4470 1666 364

medium 1795 119 21

high 487 21 7

HHHH

HHHθx

θylow medium high

low 18 15 13

medium 14 9 8

high 12 9 7


Algorithm 11 Relaxation (reordered): RELAX(A,M,P,z,v,ν)

for i = 1, ν do

Compute the residual: Pr = Pv − (PAPT)PzSolve (PMPT)(P∆z) = Pr for the update P∆z.

Update the solution: Pz ← Pz + P∆z

end for % Index i

5.4.5 Reordering and Scaling

Accounting for reordering

For the convection-diffusion equation, relaxation with ILU splitting can be accelerated by nodal re-

ordering. Essentially, the reordering leads to a more effective splitting matrix, M. For example, if the

criterion of minimum bandwidth is used for the system matrix A, it leads to a (reverse) Cuthill-McKee

reordering. An ILU factorization can be performed on the reordered matrix A, leading to a popular

choice of preconditioner.

If a pivoting strategy is used during the incomplete factorization process, then the reordering and the

factorization are done simultaneously. Therefore, a permutation must be applied to the operators used in

the solution process after the factorization is performed. An example of an incomplete LU factorization

process that uses a pivoting strategy is the minimum discarded fill (MDF) ILU strategy. In MDF-ILU,

pivot choices are made in order to minimize the amount of discarded fill.

After the factorization process is completed, operators such as the system matrix, A, and the restric-

tion and prolongation operators for the multigrid preconditioner, Icf and Ifc , must be reordered using

the permutation matrix that was obtained during the factorization process. The reordering is achieved

using the permutation matrix P and its transpose. Note that P is an orthonormal matrix (P−1 = PT).

In practice, the matrix P is stored as a vector.

In order to better understand how these permutations affect the multigrid preconditioner, Algorithms

9 and 10 are re-written using the permutation matrix, P. Algorithm 11 shows the effect of the reordering

on the smoothing process. The matrix PMPT is the only operator that has a built-in reordering since it

was subject to the pivoting process during or prior to the incomplete factorization. The system operator,

A has its rows and columns reordered according to P, leading to a new operator PAPT.

The permutations can be extended to the multigrid preconditioner. Algorithm 12 shows how the

permutations are used in the two-grid cycle preconditioner. The key operators that need to be considered

in terms of reordering are the system matrices on the fine and coarse grids, PfAfPTf and PcAcPT

c , and

the restriction and prolongation operators, PcIcfPTf and PfIfc PT

c .


Algorithm 12 Multigrid V-cycle (reordered): MGV2(Af ,Mf ,Pf ,zf ,vf ,Ac,Mc,Pc,ν1,ν2,νc)

Initialize: Pfzf = 0

Perform ν1 pre-smoothing iterations: RELAX(PfAfPTf ,PfMfPT

f ,Pfzf ,Pfvf ,ν1)

Compute the residual: Pfrf = Pfvf − (PfAfPTf )Pfzf

Restrict the residual: Pcvc = (PcIcfPTf ) Pfrf

Initialize the coarse-grid correction: Pczc = 0

if νc == 0 then

Solve the coarse-grid system exactly: (PcAcPTc )Pczc = Pcvc

else

Solve the coarse-grid system inexactly: RELAX(PcAcPTc ,PfMcPT

f ,Pczc,Pcvc,νc)end if

Prolong the coarse-grid correction: Pfzf ← Pfzf + (PfIfc PTc ) Pczc

Perform ν2 post-smoothing iterations: RELAX(PfAfPTf ,PfMfPT

f ,Pfzf ,Pfvf ,ν2)

Algorithm 13 Multigrid V-cycle (scaled): MGV2(Af ,Mf ,S1,f ,S2,f ,zf ,vf ,Ac,Mc,S1,c,S2,c,ν1,ν2,νc)

Initialize: S−12,fzf = 0

Perform ν1 pre-smoothing iterations: RELAX(S1,fAfS2,f ,S1,fMfS2,f ,S−12,fzf ,S1,fvf ,ν1)

Compute the residual: S1,frf = S1,fvf − (S1,fAfS2,f )S−12,fzf

Restrict the residual: S1,cvc = (S1,cIcfS−11,f ) S1,frf

Initialize the coarse-grid correction: S−12,c zc = 0

if νc == 0 then

Solve the coarse-grid system exactly: (S1,cAcS2,c)S−12,c zc = S1,cvc

else

Solve the coarse-grid system inexactly: RELAX(S1,cAcS2,c,S1,cMcS2,c,S−12,c zc,S1,cvc,νc)

end if

Prolong the coarse-grid correction: S−12,fzf ← S

−12,fzf + (S−1

2,fIfc S2,c) S−12,c zc

Perform ν2 post-smoothing iterations: RELAX(S1,fAfS2,f ,S1,fMfS2,f ,S−12,fzf ,S1,fvf ,ν2)

Accounting for scaling

If a row and column scaling is applied to the general system that arises in the preconditioning step (5.8)

of the GMRES algorithm, the system becomes

(S1AfS2)(S−1

2 zf)

= (S1vf ) (5.86)

Algorithm 13 shows a two-grid-level multigrid V-cycle preconditioner that incorporates row and column

scalings for both the fine and coarse grid levels.

5.5 CHAPTER SUMMARY AND HIGHLIGHTS OF CONTRIBUTIONS 76

Accounting for reordering and scaling

In order to avoid a bookkeeping nightmare, all reordering and scaling operators are absorbed into the

restriction and prolongation operators. This means that the development of a multigrid preconditioner

can be done first without row and column scaling, as well as reordering. Once preliminary results are

obtained, the algorithm can be extended to include first reordering and then scaling(s).

In the case where reordering is performed before scaling, the restriction and prolongation operators

are

S1,c Pc Icf PTf S

−11,f and S−1

2,f Pf Ifc PTc S2,c (5.87)

In the case where scaling is performed before reordering, the inter-grid transfer operators are

Pc S1,c Icf S−11,f PT

f and Pf S−12,f Ifc S2,c PT

c (5.88)

5.5 Chapter Summary and Highlights of Contributions

This chapter outlines the various preconditioning techniques that were explored and developed. Pre-

conditioning is the focus of this thesis; hence, a summary is presented with particular focus on the

contributions that were made during this research.

In the first section, the right-preconditioned GMRES algorithm was presented. Particular focus was

made on how scaling of the linear system affects the algorithm.

In the second section, the ILU(p) preconditioner was introduced. In particular, the Crout formulation

was discussed in detail with particular focus on the error of the incomplete factorization. A proof was

presented, demonstrating that the incomplete factorization is insensitive to row and column scaling of

the system matrix.

The focus of the subsequent section was on orderings. Essential definitions from graph theory were

presented and used to briefly outline the minimum degree, reverse Cuthill–McKee (RCM) and minimum

discarded fill (MDF) orderings. RCM ordering is a baseline ordering in this research and is based on

minimizing the bandwidth of the system matrix. In contrast, the MDF ordering minimizes the fill-in

that is discarded during an incomplete factorization process. The MDF ordering, originally developed

for the system of equations resulting from discretized linear PDEs, was extended to the system of

equations resulting from the discretized Navier–Stokes equations. An approach was developed that

reduces the blocks in the Jacobian matrix in Newton’s method, using a similar approach to that of

Persson and Peraire [145] for finite-element discretizations. The resulting ordering algorithm was proven

to be insensitive to row scaling, but sensitive to column scaling. The latter phenomenon therefore

required particular consideration of how the linear system was scaled.

The final section of this chapter outlined the process of developing a linear multigrid preconditioner

by extending the baseline ILU(p) preconditioning step to an entire method. First, stationary methods

were introduced as well as the concept of a smoother. To facilitate and justify the use of ILU(p) in

5.5 CHAPTER SUMMARY AND HIGHLIGHTS OF CONTRIBUTIONS 77

the multigrid algorithm, its effectiveness as a smoother was demonstrated for the convection-diffusion

equation. It was shown that ILU(0) and ILU(1) exhibit excellent smoothing properties for high-frequency

errors. It was demonstrated that ILU(p) has greater coupling in its incomplete factors relative to lower-

and upper-triangular matrices that are used in classical symmetric Gauss–Seidel relaxation.

The ILU(p) preconditioning step was extended to an iterative method using a clearly-defined algo-

rithm. This algorithm was then embedded into a broader multigrid preconditioning algorithm. It is

believed that this algorithm is potentially one of the most clearly-defined approaches for a researcher

to transition from a simple preconditioning step (e.g. say ILU(p)) in a linear system solver to the linear

(geometric) multigrid method as a preconditioner. The use of ordering and scaling in the multigrid

preconditioning process was also described in detail, with special attention paid to the restriction and

prolongation operators.

Chapter 6

RESULTS

This chapter is divided into two sections: results from the convection-diffusion equation and results from

the Euler and Navier–Stokes equations. Within each section, several studies are presented in a manner

that mirrors the investigations that were conducted throughout this research. The studies that are

presented are a subset of a larger set of studies and represent the most relevant and novel with respect

to preconditioning.

In the section presenting the results from the convection-diffusion equation, the investigations include:

the impact of Peclet number on the performance of GMRES; the effect of the ILU(p) preconditioner

both with and without a multigrid correction on the performance of GMRES; the effect of iterative

ILU(p) preconditioning; the effect of ordering in the formation of the preconditioner on the performance

of GMRES; a comparison of MDF reordering for matrices arising from centered-difference discretizations

to matrices arising from upwinding; and the use of an evolutionary algorithm in identifying a root node

and a tie-breaking strategy for the MDF algorithm.

In the section presenting the results for the Euler and Navier–Stokes equations, the studies include:

the effect of ILU(p) preconditioning on GMRES; a comparison of ILU(p) preconditioning to iterative

ILU(p) preconditioning and ILU(p)-smoothed multigrid preconditioning; and a comparison of various

orderings including natural, RCM and MDF.

78

6.1 CONVECTION-DIFFUSION EQUATION 79

All cases (unless specified otherwise) are run on desktop computer with an Intel R© Dual-CoreTM i3

CPU 530 processor, with a 1.60GHz CPU per core and 4GB of RAM.

6.1 Convection-Diffusion Equation

A sequence of studies is presented here for the convection-diffusion equation. There are many parameters

that are constant for some studies and variable for others. Therefore, this section begins with a descrip-

tion of parameters and constructs of the convection-diffusion solver that remain unchanged, encompass

all cases, or are the default.

The 2D convection-diffusion equation has constants Pe, ~v, and µ which represent the Peclet number,

the velocity vector and diffusion coefficient respectively. For simplicity, a unit velocity vector and length

scale is assumed, which is inclined at an angle θ. Therefore Pe and θ are the physical constants that

define the flow and µ is defined implicitly by the definition of the Peclet number (2.40). The Peclet

number changes from one study to the next, and a baseline value of θ = 22 is used.

The boundary conditions used in (2.43) for all cases include the following Dirichlet conditions on the

upstream boundaries:

φ(x, 0) = [4x(x− 1)]2

(6.1)

φ(0, y) = [4y(y − 1)]2

(6.2)

The problem is discretized on the domain R[0, 1] × R[0, 1], using a uniform grid with a number of

nodes in each direction that facilitates a desired number of coarser grids for multigrid. A coarse grid is

derived from a finer grid by removing its even-numbered nodes in each direction. The n-th grid level is

denoted as CUn, where n = Z[0, 7], and the number of nodes for a given grid level is(29−n + 1

)2. The

finest grid level (n = 0) has 5132 nodes and the coarsest grid level (n = 7) has 52 nodes.

Second-order centred-differences are used to discretize both the first and second derivatives. An

artificial dissipation coefficient of ε = 0.1 is used for (3.55) and (3.56).

Based on preliminary tests, the baseline GMRES parameters include a restart value of 400 iterations,

a total number of 1200 iterations, and a relative tolerance of 10−8. In the case of multigrid precondi-

tioning, a default of ν1 = 1 pre-smoothing and ν2 = 0 post-smoothing iterations are used in the general

V-cycle as defined in Algorithm 10.

MATLAB R© R2011a is used for all convection-diffusion simulations. It contains its own ILU(0)

routine, in addition to other drop-tolerance and modified routines. Although its ILU(0) routine is

optimized and fast, it is not used because it does not permit non-zero fill-in values and it would make for

an unfair comparison to an external ILU(p) routine. In lieu of MATLAB’s ILU(0) routine, an external

ILU(p) algorithm was developed using the Crout formulation presented in Algorithm 3. This formulation

is amenable to the MDF algorithm, which is an essential component of this research.


0 50 100 15010

−8

10−6

10−4

10−2

100

102

GMRES Iteration

Resid

ual

(a) GMRES convergence (b) Solution

0 2 4 6 8 10 12 14

x 104

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2x 10

−4

cond(A) = 1130517

ℜ(λ)

ℑ(λ

)

(c) Unpreconditioned matrix

−1 −0.5 0 0.5 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

ρ(GILU(1)

) = 9.9863e−01

ℜ(λ)

ℑ(λ

)

(d) Preconditioned matrix

Figure 6.1: Convergence of GMRES, solution, and eigenvalues of system matrix with and without ILU(1)

preconditioning of a uniform grid case with a Peclet number of 0.001.

6.1.1 GMRES convergence and Peclet number

This investigation looks at the effectiveness of ILU(p) as a preconditioner for GMRES across a broad

range of Peclet numbers. Fill-in levels of 0 and 1 are used. All cases are run using grid CU2. A natural

ordering of grid nodes is used. Figures 6.1 and 6.2 show the convergence of GMRES and solutions

for Peclet numbers of 0.001 and 1000, respectively. For the diffusion-dominated case, information is

spread out from the Dirichlet boundary condition, and for the convection-dominated case, information

is propagated to the outflow boundary along the direction of the velocity.

The eigenvalue spectrum of the unpreconditioned system matrix, A, for a Peclet number of 0.001

is shown in Figure 6.1(c). The conditioning of this matrix is quite poor, thus warranting the use of a


0 5 10 1510

−8

10−6

10−4

10−2

100

102

GMRES Iteration

Resid

ual

(a) GMRES convergence (b) Solution

0 50 100 150 200−200

−150

−100

−50

0

50

100

150

200

cond(A) = 3062

ℜ(λ)

ℑ(λ

)

(c) Unpreconditioned matrix

−1 −0.5 0 0.5 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

ρ(GILU(1)

) = 8.8273e−02

ℜ(λ)

ℑ(λ

)

(d) Preconditioned matrix

Figure 6.2: Convergence of GMRES, solution, and eigenvalues of system matrix with and without ILU(1)

preconditioning of a uniform grid case with a Peclet number of 1000.

preconditioner. Figure 6.1(d) shows the eigenvalue spectrum of the iteration matrix, G = I −M−1A.

The iteration matrix is used in contrast to the preconditioned matrix to facilitate the computation of a

spectral radius. For this particular case, the spectral radius is 0.99863 (i.e. less than 1), meaning ILU

will dampen all error modes during the preconditioning step in GMRES.

Similar eigenvalue spectra are shown in Figures 6.2(c) and 6.2(d) for a Peclet number of 1000. For

this case, the condition number of the unpreconditioned matrix is not as poor as for the much lower

Peclet number. Furthermore, the spectral radius is 0.088273, meaning ILU will rapidly dampen all error

modes. This is evident from the convergence information for this case. Only 15 GMRES iterations

are required for convergence, compared to the case with a Peclet number of 0.001, which required 127

iterations.


Table 6.1: GMRES iterations and CPU times with (left) ILU(0) and (right) ILU(1) preconditioning on

a 129× 129-node grid for various Peclet numbers.

Pe GMRES Form Solve Total

Iterations (s) (s) (s)

10−3 186 16.3 10.0 26.3

10−2 190 16.3 10.7 27.0

10−1 194 16.3 11.0 27.3

1 198 16.3 11.5 27.8

101 167 16.4 8.2 24.6

102 68 16.4 1.8 18.2

103 63 16.3 1.7 18.0

Pe GMRES Form Solve Total

Iterations (s) (s) (s)

10−3 127 32.3 10.8 43.1

10−2 127 32.2 5.1 37.3

10−1 127 32.6 5.1 37.7

1 126 32.7 5.2 37.9

101 103 32.6 3.8 36.4

102 43 32.5 0.9 33.4

103 15 32.4 0.2 32.6


a 129× 129-node grid for a Peclet number of 0.001.

ILU(0) GMRES Form Solve Total

Iterations Iterations (s) (s) (s)

1 186 16.2 10.1 26.3

2 120 16.4 4.9 21.3

3 98 16.6 3.9 20.5

4 85 16.3 3.2 19.5

5 76 16.2 2.9 19.1

6 69 16.6 2.5 19.1

7 65 16.5 2.5 19.0

8 61 16.4 2.3 18.7

9 57 16.5 2.4 18.9



1 127 32.8 5.3 38.1

2 83 32.4 2.8 35.2

3 68 32.3 2.2 34.5

4 59 32.4 1.9 34.3

5 53 32.5 1.7 34.3

6 48 32.4 1.5 33.9

7 44 32.7 1.7 34.4

8 42 32.6 1.6 34.2

9 39 32.6 1.5 34.1

Table 6.1 summarizes the results over a broad range of Peclet numbers. The number of iterations for

the convection-dominated cases are much lower than for the diffusion-dominated cases. Furthermore, a

minimum number of iterations is found for a Peclet number of 1000. ILU(1) is a better preconditioner

than ILU(0) in terms of iterations. However, in terms of CPU time, it is roughly twice as slow in its

formation, resulting in a slower solver overall. CPU times are not emphasized because the implementation

of ILU(p) is not efficient and therefore constitutes the majority of the overall CPU time.



a 129× 129-node grid for a Peclet number of 1000.



1 61 16.1 1.7 17.8

2 *1200 16.3 127.2 143.5

3 505 16.4 47.0 63.4



1 15 32.6 0.2 32.8

2 9 33.1 0.2 33.3

3 6 32.5 0.1 32.6

6.1.2 Iterative ILU(p) preconditioning

The focus of this investigation is on how iterative ILU(p) preconditioning affects the performance of

GMRES for both convection- and diffusion-dominated flows. Specifically, fill-in values of p = 0 and

p = 1 are considered. All cases are run using grid CU2 with a natural nodal ordering. Table 6.2

summarizes the results for the diffusion-dominated case with a Peclet number of 0.001. For this case,

the general trend is that for an increasing number of iterations of ILU(p) the number of GMRES iterations

decreases. In terms of CPU time, it should be noted that an efficient implementation of ILU(p) is not

used here. Therefore, the formation time of the linear system and the ILU(p) preconditioner accounts

for the majority of the computational time.

Table 6.3 shows the results for the convection-dominated cases with a Peclet number of 1000. For a

fill-in level of p = 0, additional ILU iterations in the preconditioning step do not improve the performance

of GMRES. It is believed that this is caused by the presence of unstable eigenvalues in the preconditioned

matrix whose modes are effectively damped by GMRES if a single iteration of ILU(0) is used. However, if

multiple ILU(0) iterations are used, GMRES is unable to counteract the relative growth of these unstable

error modes. However, for a fill-in level of p = 1, the number of GMRES iterations is reduced. The

reference ILU(1) preconditioner for this problem is initially quite good, since GMRES only requires 15

iterations, compared to the diffusion-dominated case which requires 127 iterations. Therefore, multiple

ILU(1) preconditioning does not provide a substantial reduction in CPU time. For more practical

problems, there is a potential for increased reduction since the baseline number of GMRES iterations is

larger.

6.1.3 ILU(p) and multigrid preconditioning

This investigation focuses on the effectiveness of ILUp)-smoothed multigrid preconditioning on GMRES.

For this study, a set of uniform grids and a natural ordering of those grid nodes are used.

Table 6.4 shows the results for the diffusion-dominated case with a Peclet number of 0.001. As

expected, multigrid dramatically reduces the number of GMRES iterations as the the number of grid

nodes increases. On grid CU0, for the case with one grid level, the number of iterations exceeds the


Table 6.4: GMRES iterations for various multigrid preconditioners with ILU(0) smoothing (Pe = 0.001).

Grid Nodes Grid Levels

1 2 3 4 5 6 7 8

CU5 172 25 15 13 - - - - -

CU4 332 49 24 16 14 - - - -

CU3 652 95 45 25 18 16 - - -

CU2 1292 186 88 45 27 19 18 - -

CU1 2572 373 172 88 47 29 21 19 -

CU0 5132 *1200 347 173 91 54 50 46 22

maximum number of allowable GMRES iterations, represented by a *1200 on the table. Furthermore,

the relative increase in multigrid-preconditioned iterations (for a maximum number of grid levels) with

increasing grid size is quite small. This compares well to the theory, which estimates a complexity of

order n.

Table 6.5 shows the results for a much larger Peclet number of 1000. For the finest grid, CU2, the

number of iterations required by ILU(0)-preconditioned GMRES is approximately one third compared

to the diffusion-dominated case (i.e. 61 iterations, versus 186 iterations). A very important observa-

tion is that multigrid acceleration of the ILU(0) preconditioner does not offer an improvement in the

performance of GMRES and that this phenomenon is exacerbated with increasing grid size.

The baseline pre- and post-smoothing parameters in the multigrid preconditioner were used in this

study. Investigations were conducted to determine the optimal number of smoothing iterations before

and after the coarse-grid correction(s) for the diffusion-dominated cases on all uniform grids. It was

found that additional smoothing does not improve the performance of the multigrid preconditioner in

terms of CPU time. Furthermore, a comparison was made between solving the coarsest grid level problem

directly or by a smoothing iteration. For coarse grids, a direct solve was faster, however, with increasing

grid size, the performance of multigrid-preconditioned GMRES is insensitive to this decision.

6.1.4 Orderings

Two key investigations in this research are multigrid preconditioning and orderings. Multigrid precondi-

tioning significantly reduces the number of GMRES iterations for diffusion-dominated cases. However,

for convection-dominated cases, the results are not as promising. In this section, ordering strategies are

investigated and particular attention is paid to convection-dominated flows. Multigrid preconditioning

is also considered with these orderings for the diffusion-dominated cases.

For this study, the orderings considered include: natural; reverse; reverse Cuthill–McKee [146]


Table 6.5: GMRES iterations for various multigrid preconditioners (Pe = 1000).

Grid Nodes Grid Levels

1 2 3 4 5 6

CU5 172 22 24 22 - - -

CU4 332 25 29 26 27 - -

CU3 652 52 425 433 447 321 -

CU2 1292 61 *1200 419 431 437 436

Table 6.6: GMRES iterations for various orderings using ILU(0) multigrid preconditioning (129 × 129

nodes and Pe = 0.001).

Ordering Grid Levels

1 2 3 4 5 6

natural 186 88 45 27 19 18

reverse 188 88 47 29 21 19

RCM 186 88 45 27 19 18

MDF 186 88 45 27 19 18

(RCM); and minimum discarded fill [114] (MDF). MATLAB’s RCM routine is used in this work. In this

routine, the node of minimum degree with the lowest initial index is chosen as the root node. Ties are

broken by selecting the node with the lowest initial index. MDF was implemented in this research using

key aspects from the literature as well as novel contributions to the approach, most importantly to the

tie-breaking strategy. Various tie-breaking strategies for MDF are compared in a later section. For all

cases considered, the default parameters are used in the formation of the linear system and the GMRES

algorithm. Furthermore, the uniform CU2 and CU0 grids are used for these cases.

Tables 6.6 and 6.7 summarize the GMRES iterations for the diffusion-dominated and convection-

dominated cases for grid CU2 using ILU(0) preconditioning and smoothing for multigrid, respectively.

For the diffusion-dominated case, multigrid preconditioning is also considered. For this grid, little

variation is observed in the number of GMRES iterations for the various orderings.

Tables 6.8 and 6.9 summarize the results for the diffusion-dominated case for the various orderings

with ILU(0), ILU(1) and multigrid preconditioning on the finer CU1 grid. For ILU(0), the performance

of GMRES, with or without multigrid preconditioning, is virtually insensitive to the ordering that is

used. However, for ILU(1), MDF is the clear winner requiring roughly 20% fewer iterations than the


Table 6.7: GMRES iterations for various orderings using ILU(0) preconditioning (129× 129 nodes and

Pe = 1000).

Ordering GMRES Iterations

natural 61

reverse 62

RCM 62

MDF 61




1 2 3 4 5 6 7

natural 373 172 88 47 29 21 19

reverse 375 172 89 49 31 23 21

RCM 373 172 88 47 29 21 19

MDF 373 172 88 47 29 21 19

RCM, natural, and reverse orderings.

Table 6.10 summarizes the results for the convection-dominated case for the various orderings with

ILU(1) preconditioning on the finer CU1 grid. For this case, as for grid CU2, MDF yields the fewest

iterations with roughly 40% and two-thirds the iterations of natural ordering and RCM, respectively.

Although MDF requires the fewest iterations, its cost of formation is much larger than that of RCM.

In the literature, there are suggestions to modify this algorithm to make it more efficient. In the main

focus of this research (nonlinear systems solved by a Newton-GMRES algorithm), the higher CPU cost

of MDF could be amortized over many linear system solves.

The relative performance of MDF with respect to the other orderings improves with a fill-in level of 1.

This suggests that the advantage of minimizing the discarded fill over a matrix bandwidth minimization

(RCM) is even more relevant as the allowable nonzero sparsity pattern of the factorization increases.





1 2 3 4 5 6 7

natural 252 114 59 32 20 16 15

reverse 254 116 61 34 22 17 17

RCM 252 114 59 32 20 16 15

MDF 195 86 44 24 15 12 11

Table 6.10: GMRES iterations for various orderings using ILU(1) preconditioning (257× 257 nodes and

Pe = 1000).

Ordering GMRES Iterations

natural 24

reverse 24

RCM 15

MDF 10

6.1.5 Further investigation of MDF

The MDF reordering strategy showed promise in preliminary investigations when compared to other

orderings, especially for convection-dominated cases on finer grids with ILU(1). Two key components

of the ordering, as well as other orderings such as RCM, include the selection of the root node and

tie-breaking strategy. In the previous subsection, the node with the lowest index was chosen to break

ties. A novel idea to this research is to incorporate the physics and geometry of the problem into the

reordering strategy. This arises from the observation that MDF can result in a reordering that produces

a matrix whose sparsity pattern resembles one that arises from an upwinding discretization.

This subsection is divided into three components. First, a connection is made between MDF and the

flow direction (i.e. upwinding). Next, results from an evolutionary algorithm provide insight into root

node selection and tie-breaking strategies for a simple convection-dominated case. Finally, distance-

and line-distance-based tie-breaking strategies are compared to the baseline index-based tie-breaking

strategy.

Row and column scaling of the system matrix were also investigated. It was experimentally confirmed


that MDF is insensitive to row scaling, as proven in Chapter 5. Furthermore, experiments with column

scaling found that the MDF reordering of the original (unscaled) system matrix is superior to a matrix

whose columns are scaled by the diagonal entries or the square root of the diagonal entries of the matrix.

Connection between MDF and upwinding

The purpose of this investigation is to demonstrate a connection between MDF and the flow direction.

Furthermore, it clearly shows that depending on the initial ordering of the matrix, MDF can lead to

multiple orderings that are equally good.

Consider the matrix that arises from the discretization of the convection-diffusion equation on a

5 × 5–node grid. Specifically, the Peclet number is 109, the flow angle is 45, and a second-order

centered-difference discretization with a dissipation coefficient of ε = 0.5 is used. Figure 6.3 illustrates

the sparsity pattern of this matrix, along with the relative size and sign of each entry. Note: upward-

and downward-facing triangles indicate positive and negative entries, respectively. For this convection-

dominated case, the upper-triangular entries are very small. If the Peclet number was infinite, they

would be zero.

Now consider the system matrix with the small upper triangular entries removed, as shown in Figure

6.4. The resulting matrix resembles one that is obtained after a first-order upwinding discretization in

each spatial direction. Since the matrix is lower-triangular, its ILU(0) factorization is exact.

Figure 6.5 shows the resulting matrix after the MDF reordering algorithm is applied. Note that

MDF yields a lower-triangular matrix and hence would have an exact ILU(0) factorization. However,

this reordered matrix does not have the same pattern as the original matrix, yet leads to an exact

ILU(0) factorization. Furthermore, it does not simply correspond to the upwinding discretization using

a natural ordering in the alternate direction to the original ordering.

Now consider the application of a random permutation to the matrix in Figure 6.4, shown in Figure

6.6. The LU-factorization of this matrix, shown in Figure 6.7, clearly illustrates that there is additional

fill-in. Hence, ILU(0) will have an associated error, resulting from discarded entries.

The application of MDF to the randomly-permuted matrix leads to a reordering of the system matrix

shown in Figure 6.8. The MDF algorithm leads to an ordering that yields zero discard. An interesting

result is that the matrix is not lower-triangular, which has guaranteed zero discard. Furthermore,

boundary nodes are ordered first.

Depending on the initial ordering, MDF has led to different orderings that, for this study, all have

zero discard. The tie-breaking strategy used for this investigation is simply choosing the node with the

lowest initial index. This also applies in the selection of the initial (or root) node. Hence, it is important

to consider both root-node selection and tie-breaking strategies in more detail. Furthermore, the result-

ing multiple optimal orderings motivate answering a much more difficult and broad question: Which

orderings minimize discard? An evolutionary algorithm was developed to help answer this question as

well as to determine statistically the best root-node location(s) over a range of Peclet numbers.


0 5 10 15 20 25

0

5

10

15

20

25

N = 25 , Nnz

= 81

0 < |aij| < 1e−04 , N

nz = 24

1e−04 <= |aij| < 1e−02 , N

nz = 0

1e−02 <= |aij| < 1 , N

nz = 9

1 <= |aij| , N

nz = 48

Figure 6.3: Initial system matrix for a 5×5–node grid with a Peclet number 109. Upward- and downward-

facing triangles represent positive and negative values, respectively.

0 5 10 15 20 25

0

5

10

15

20

25

N = 25 , Nnz

= 57

0 < |aij| < 1e−04 , N

nz = 0

1e−04 <= |aij| < 1e−02 , N

nz = 0

1e−02 <= |aij| < 1 , N

nz = 9

1 <= |aij| , N

nz = 48

Figure 6.4: System matrix after very small entries are discarded. Upward- and downward-facing triangles

represent positive and negative values, respectively.


0 5 10 15 20 25

0

5

10

15

20

25

N = 25 , Nnz

= 57

0 < |aij| < 1e−04 , N

nz = 0

1e−04 <= |aij| < 1e−02 , N

nz = 0

1e−02 <= |aij| < 1 , N

nz = 9

1 <= |aij| , N

nz = 48

Figure 6.5: Resulting matrix after MDF ordering. Upward- and downward-facing triangles represent

positive and negative values, respectively.

0 5 10 15 20 25

0

5

10

15

20

25

N = 25 , Nnz

= 57

0 < |aij| < 1e−04 , N

nz = 0

1e−04 <= |aij| < 1e−02 , N

nz = 0

1e−02 <= |aij| < 1 , N

nz = 9

1 <= |aij| , N

nz = 48

Figure 6.6: Resulting matrix after a random permutation. Upward- and downward-facing triangles

represent positive and negative values, respectively.


0 5 10 15 20 25

0

5

10

15

20

25

N = 25 , Nnz

= 57

A pattern

L pattern

U pattern

Figure 6.7: LU-factorization of the randomly-permuted matrix.

0 5 10 15 20 25

0

5

10

15

20

25

N = 25 , Nnz

= 57

0 < |aij| < 1e−04 , N

nz = 0

1e−04 <= |aij| < 1e−02 , N

nz = 0

1e−02 <= |aij| < 1 , N

nz = 9

1 <= |aij| , N

nz = 48

Figure 6.8: Resulting matrix after MDF for the randomly-permuted matrix. Upward- and downward-

facing triangles represent positive and negative values, respectively.


Evolutionary algorithm and MDF

The goal of this study is to identify which orderings (with a particular focus on root nodes) correspond

with the lowest amount of discarded fill-in for ILU(0).

Consider the problem from the previous section: 25 equations and unknowns resulting from the

discretization of the convection-diffusion equation on a 5 × 5–node grid. In order to fully-determine

which ordering would lead to the least discarded fill, it would require investigating 25!, or approximately

16 trillion trillion, permutations. A deterministic approach is prohibitive in terms of CPU time and

memory. Therefore a stochastic approach is used to find the optimum. Specifically, an evolutionary

algorithm is developed and used to achieve this end. From the previous subsection, it is apparent that

there can be multiple orderings that will lead to a minimized discarded fill.

The order-based evolutionary algorithm developed for this research briefly consists of the following

components: Each member of the population is an array of natural numbers from 1 to 25, representing an

ordering of 25 nodes. The fitness function is the discarded fill-in corresponding to the ILU(0) factorization

resulting from the convection-diffusion system matrix. Specifically, the norm of the difference between

the original matrix and its ILU(0) factors, ||A − LU||, is minimized. This matrix difference represents

the discarded fill.

There is an allowance of pass-through to the next generation for the most fit members. Tournament

selection, crossover and mutation govern the survival and progress of the remaining population members.

Crossover is performed using an approach presented by Davis [309]. Mutation is performed by exchanging

the position of two random entries in a population member. For this study the parameters relating to

the PDE that are investigated include the Peclet number and flow angle. A population size of 50 is

used over 25 generations. Furthermore a crossover value of 90% is used. These parameters are based on

preliminary studies. It is important to note that for preliminary studies, for each given case, multiple

orderings resulted in the lowest amount of discarded fill. (The minimum value is small for large Peclet

numbers, and optimization tolerances are discussed in the next paragraph.) Since root nodes that

corresponded to these optima were of interest, it was important to conduct many optimizations in order

to identify all of these root nodes.

Trends for 100 and 1000 converged optimizations showed enough qualitative convergence in their

pattern to suggest that 1000 converged optimizations would be sufficient for each case investigated.

Specifically, preliminary runs for Peclet numbers of 10−9, 1, 104 and 109 were executed for flow angles of

0, 15, and 45. The angle 15 is somewhat arbitrary since its performance was similar to other angles

between 0 and 45. Ultimately, the 0 angle was rejected since it corresponded to a velocity along only

one direction. The 45 angle was rejected since it would lead to possible symmetry in the discretization.

Hence, the 15 angle was chosen for the test cases. The extremes for the Peclet numbers investigated

were chosen for the test cases. Specifically, the Peclet numbers of 10−9 and 109 exhibited the most

contrast in the discretization between a diffusion and a convection operator, respectively. For each test

case, 1000 converged optimizations were used. Many optimizations were considered since each case had


Table 6.11: Locations of root nodes that correspond to minimized discarded fill using an evolutionary

algorithm for convection-dominated (Pe = 109) and diffusion-dominated (Pe = 10−9) cases for a 5× 5–

node grid with flow angle of θ = 15.

Diffusion (d) x Index

Convection (c) 1 2 3 4 5y

Ind

ex

1 d c d c d c d c d c

2 d c d d d d

3 d c d d d d

4 d c d d d d

5 d c d d d d c

more than one optimal ordering. The optimization tolerances for the respective Peclet numbers were

1 and 10−8 for Peclet numbers of 10−9, and 109 respectively. Experiments show that it was possible

to achieve smaller tolerances for the convection-dominated cases because the upper-triangular entries in

the initial matrix decrease in magnitude as the Peclet number increases.

Table 6.11 summarizes the root node locations that corresponded to a minimized amount of discarded

fill for both convection (Pe = 109) and diffusion (Pe = 10−9). For diffusion, all nodes can be root nodes

that lead to an optimal ordering. This is consistent with results by d’Azevedo et al. [114] when studying

Laplace’s equation (i.e. a Peclet number of zero). For convection, the upstream boundary nodes and the

downstream corner node are the only root nodes that correspond to a minimal discarded fill.

The results of these studies indicate that an intelligent root-node selection strategy is important to

an MDF reordering strategy for convection-dominated flows. This is also true for RCM. Experiments

also showed that there were neighbouring nodes to the root node were subsequently chosen as second

and third nodes for MDF. This observation was incorporated into the development of more intelligent

root-node selection and tie-breaking strategies.

Tie-breaking strategy

Results from the studies involving the evolutionary algorithm indicate the importance of root-node

selection and tie-breaking strategy in the MDF algorithm, especially for convection-dominated flows, for

a 5×5–node grid. For earlier studies, ties were broken by selecting the node with the lowest index, where

the index was based on a natural ordering of the grid nodes. In this section, this strategy is compared

to two novel strategies. They are referred to as distance and line-distance tie breaking strategies.

In the distance tie-breaking strategy the node which is most upstream is chosen. Therefore, the

ordering routine is provided with the physical location of each grid node and the freestream velocity.


Algorithm 14 Downstream-line tie-breaking strategy for MDF

Define W as the set of nodes that have tied for minimum discarded fill. Let ~v∞ be the freestream

velocity vector. Let Ps ∈ R3 correspond to the physical location of node ws ∈W .

while W 6= ∅ doChoose wt ∈W which is the most upstream node as the next node to be ordered.

W ←W − wtwhile wt has downstream neighbours do

Let wn be the node that is the closest downstream node to wt:

wn = minws∈W

∣∣∣−−→PtPs∣∣∣ such that−−→PtPs · ~v∞ > 0

Order wn as the next node.

wt = wn

W ←W − wtend while

end while

The line-distance tie-breaking strategy, shown in Algorithm 14, begins by selecting a node that is

most upstream. However, it then chooses subsequent nodes that are downstream of the previous node.

If no other downstream nodes exist, the algorithm selects the most upstream node next. If there are

many nodes that tie, the resulting ordering of those nodes will be a collection of lines that progress in

the downstream direction.

Table 6.12 compares the performance of the index, distance, and line-distance tie-breaking strategies

for the MDF algorithm. Results for the RCM reordering are also shown for comparison. For all cases

presented, MDF outperforms RCM in terms of iterations. For the coarsest grid, CU3, there is no

noticeable difference in the performance of the various tie-breaking strategies. For the finer grids, the

line-distance tie-breaking strategy outperforms the index- and distance- based approaches, for a fill-in

level of p = 1.


Table 6.12: GMRES iterations for MDF-ILU(p) preconditioners (Pe = 1000) and a comparison to RCM.

Note: The most upstream node is (1,1).

Grid Ordering Root Node Tie-Breaking Preconditioner

Strategy ILU(0) ILU(1)

CU3 RCM (1,1) index 30 14

CU3 MDF (1,1) index 29 12

CU3 MDF (1,1) distance 29 12

CU3 MDF (1,1) distance / line 29 10

CU2 RCM (1,1) index 62 16

CU2 MDF (1,1) index 61 11

CU2 MDF (1,1) distance 60 11

CU2 MDF (1,1) distance / line 61 9

CU1 RCM (1,1) index - 15

CU1 MDF (1,1) index - 10

CU1 MDF (1,1) distance - 10

CU1 MDF (1,1) distance / line - 8

6.2 EULER AND NAVIER–STOKES EQUATIONS 96

Table 6.13: Computational grids for Euler and Navier–Stokes calculations.

Grid Geometry Nodes JMAX KMAX JTAIL1 JTAIL2 Off-wall Spacing

I0 NACA 0012 10,045 245 41 33 213 1× 10−3

I1 NACA 0012 2,583 123 21 17 107 2× 10−3

I2 NACA 0012 682 62 11 9 54 4× 10−3

V0 RAE 2822 18,785 289 65 33 257 2× 10−6

V1 RAE 2822 4,785 145 33 17 129 4× 10−6

V2 RAE 2822 1,241 73 17 9 65 8× 10−6

V3 RAE 2822 333 37 9 5 33 2× 10−5

Waux RAE 2822 263,425 1,025 257 129 897 6× 10−7

W0 RAE 2822 66,177 513 129 65 449 1× 10−6

W1 RAE 2822 16,705 257 65 33 225 2× 10−6

W2 RAE 2822 4,257 129 33 17 113 5× 10−6

W3 RAE 2822 1,105 65 17 9 57 1× 10−5

W4 RAE 2822 297 33 9 5 29 2× 10−5

W5 RAE 2822 85 17 5 3 15 5× 10−5

6.2 Euler and Navier–Stokes Equations

In the first half of this chapter, preconditioning and related topics were explored for the convection-

diffusion equation. Specifically, BILU(p) preconditioning, multigrid preconditioning, and orderings were

of particular interest. In this half of the chapter, many of the ideas explored thus far are extended to a

Newton–Krylov algorithm for the Euler and compressible Navier–Stokes equations.

6.2.1 Test cases

Details of computational grids that are used for the test cases are shown in Table 6.13. The family of

grids Ik are used for inviscid cases. The index k = 0 refers to the finest grid level. For the inviscid cases,

the geometry is a NACA 0012 airfoil. Similarly, for viscous cases, flow around the RAE 2822 airfoil is

simulated and the corresponding family of grids that are used are denoted as Vk. The coarser grids are

used for multigrid preconditioning, algorithm development, and eigenvalue computations. An additional

family of finer viscous grids is used for grid studies for multigrid preconditioning. This family of finer

grids is denoted as Wk.

Table 6.14 shows the test cases that are studied. Specifically, case E1 simulates inviscid, subsonic

flow, whereas E2 simulates transonic flow. There is one laminar test case L1, for which the Reynolds


Table 6.14: Test cases for Euler and Navier–Stokes calculations.

Case Finest Grid Flow Mach Number Angle of Attack Reynolds Number

E1 I0 inviscid 0.3 0 -

E2 I0 inviscid 0.76 0 -

L1 V0 laminar 0.3 0 500

T1 V0 turbulent 0.3 0 3.0× 106

T2 V0 turbulent 0.729 2.31 6.5× 106

number is 500. Finally, there are two turbulent test cases, T1 and T2, that simulate subsonic and

transonic flow.

The baseline parameters and features of the Newton–Krylov algorithm are as follows: Jacobian-free

Newton’s method is used to solve the nonlinear problem with a pseudo-transient globalization approach,

outlined in detail in Chapter 4. To improve the robustness of the early stages of the Newton algorithm,

an approximate Jacobian is used (i.e. approximate Newton) until the L2-norm of the nonlinear residual

is below 1 × 10−5 and for a minimum of 5 nonlinear iterations. The latter condition is related to

inviscid and laminar cases where the L2-norm of the nonlinear residual is below 1× 10−5 for these early

iterations. GMRES is used to converge the linear problem by two orders of magnitude (ηk = 10−2)

outside of the transient phase. A maximum number of 80 Krylov subspace directions is permitted to

achieve the desired residual reduction. Furthermore, no restarting of the GMRES algorithm is used.

Reverse Cuthill–McKee ordering is used for the grid nodes and the linear system is preconditioned by

BILU(p). For inviscid and viscous cases the fill-in values are p = 3 and p = 4, respectively.

Table 6.15 shows the performance of the Newton-Krylov solver for the five test cases. The convergence

criterion is a reduction of the L2-norm of the nonlinear residual to 1 × 10−13. In the table, IG and IN

refer to the number of GMRES and Newton iterations, respectively. The ratio of IG to IN and CPU

times are also shown. CPU times based on the average of 5 runs for each case.

Figure 6.9 shows the convergence and Mach contour for case E1. The Mach contour shows the

presence of a stagnation point near at the leading edge of the NACA 0012 airfoil; a result expected for

inviscid flow. Furthermore, the contour is symmetric since the airfoil encounters the flow at a zero angle

of attack. Figure 6.9(a) shows the convergence of Newton’s method as a solid blue line. The residual

norm presented in this figure is based on the continuity equation. For all cases, each of the residuals

was monitored corresponding to mass momentum and energy. Superimposed to this information is the

convergence of GMRES, indicated by red circles. The independent variable is inner iterations. Inner

iterations are used here instead of CPU times in order for comparisons to be made across various

computers. The initial residual of GMRES for each nonlinear iteration matches the nonlinear residual,


Table 6.15: Baseline Newton (IN ) iterations, GMRES (IG) iterations, and CPU times for all Euler and

Navier-Stokes test cases solved using BILU(p) preconditioning.

Case Fill-in, p IG IN IG / IN Time (s)

E1 3 197 12 16.4 4.3

E2 3 201 16 12.6 5.0

L1 4 354 13 27.2 14.5

T1 4 652 88 7.4 75.3

T2 4 522 107 4.9 83.2

since the nonlinear residual vector is the right-hand side of the linear system and an initial guess of zero

is used for GMRES.

Figure 6.10 shows the convergence and Mach contour for case E2. This case also represents symmetric

flow about a NACA 0012 airfoil with a zero angle of attack. In contrast to case E1, a transonic region,

terminated by a shock is illustrated. The convergence for cases E1 and E2 is similar, especially during

the continuation phase. The transonic case requires additional nonlinear iterations in middle phase of

the Newton algorithm, during which the transonic region is developed.

During this initial nonlinear solution phase GMRES is converged enough to ensure that the residuals

of all governing equations are being reduced at each node. If this preventative measure is not executed in

a satisfactory manner, undersolving will occur, leading to instability of the nonlinear algorithm. Once the

nonlinear residual is sufficiently small, the system is no longer influenced by the continuation parameter

(i.e. ∆t) and nonlinear convergence improves dramatically.

Figures 6.11, 6.12 and 6.13 show the convergence and Mach contours for viscous cases L1, T1 and

T2, respectively. The laminar case, L1, exhibits different nonlinear convergence behaviour compared

to the case E2. The nonlinear residual decreases significantly during the initial phase of the solution

process. The turbulent cases, T1 and T2, are the most complex flow that is simulated in the entire

suite of test cases. The latter simulation includes phenomena such as a shock, a boundary layer and

their interaction. During the continuation phase for the turbulent cases, the nonlinear residual does not

significantly decrease as the working turbulent variable and mean flow variables evolve. The residual of

the turbulent equation is monitored separately to the residual of the mean-flow equations.

The continuation method described in Chapter 4 contributes over one-half of the overall solution

cost, when measured in terms of inner iterations. For inviscid and laminar cases, this value is roughly

one-half and for turbulent cases it is two-thirds. These values are roughly related to CPU time.


0 50 100 150 20010

−20

10−15

10−10

10−5

100

Inner Iterations

L2−

Norm

of R

esid

ual

Newton Convergence

GMRES Convergence

(a) Convergence

X

Y

0 0.5 1

0.5

0

0.5

M

0.362

0.332

0.302

0.272

0.242

0.212

0.182

0.152

0.122

0.092

0.062

0.032

0.002

(b) Mach contour

Figure 6.9: Convergence and solution for the subsonic inviscid case, E1.

0 50 100 150 200 25010

−20

10−15

10−10

10−5

100

Inner Iterations

L2−

Norm

of R

esid

ual

Newton Convergence

GMRES Convergence

(a) Convergence

X

Y

0 0.5 1

0.5

0

0.5

M

1.055

0.905

0.755

0.605

0.455

0.305

0.155

0.005

(b) Mach contour

Figure 6.10: Convergence and solution for the transonic inviscid case, E2.


0 100 200 300 40010

−20

10−15

10−10

10−5

Inner Iterations

L2−

Norm

of R

esid

ual

Newton Convergence

GMRES Convergence

(a) Convergence

X

Y

0 0.5 1

0.5

0

0.5

M

0.331

0.301

0.271

0.241

0.211

0.181

0.151

0.121

0.091

0.061

0.031

0.001

(b) Mach contour

Figure 6.11: Convergence and solution for the laminar case, L1.

0 200 400 600 80010

−15

10−10

10−5

100

105

Inner Iterations

L2−

Norm

of R

esid

ual

Newton Convergence

GMRES Convergence

(a) Convergence

X

Y

0 0.5 1

0.5

0

0.5

M

0.331

0.301

0.271

0.241

0.211

0.181

0.151

0.121

0.091

0.061

0.031

0.001

(b) Mach contour

Figure 6.12: Convergence and solution for the subsonic turbulent case, T1.


0 100 200 300 400 500 60010

−15

10−10

10−5

100

Inner Iterations

L2−

Norm

of R

esid

ual

Newton Convergence

GMRES Convergence

(a) Convergence

X

Y

0 0.5 1

0.5

0

0.5

M

1.205

1.055

0.905

0.755

0.605

0.455

0.305

0.155

0.005

(b) Mach contour

Figure 6.13: Convergence and solution for the transonic turbulent case, T2.


6.2.2 Orderings

In this section three nodal orderings are compared. First, a natural ordering is considered for the C-

topology grid based on the arrangement of the nodes in the normal direction starting from the first

wake-cut, followed by the airfoil surface, and ending with the second wake-cut (see Figure 2.2). Second,

the reverse Cuthill–McKee (RCM) ordering is considered. Finally, the minimum discarded fill (MDF)

ordering is used. The continuation parameters outlined in Table 4.1 are used for all three of these

orderings.

Before discussing the results, some additional details are noted here regarding root node selection for

each ordering, and for the RCM and MDF orderings, tie-breaking strategy. The natural ordering has a

root node of (1,1) on the C-topology mesh, which lies at the downstream boundary along the wake cut

boundary interface. There is no tie-breaking strategy for natural ordering since the nodes are arranged

in order along the grid lines. Nodes are arranged first along the streamwise direction to minimize the

matrix bandwidth for rows that have no information relating to the wake cut. This is true because there

are typically fewer nodes in the streamwise direction compared to the normal direction.

The RCM reordering comes from the reversal of the Cuthill–McKee (CM) ordering. Hence, root-node

and tie-breaking strategies for the CM ordering are described here. Similar to natural ordering, the root

node for the CM ordering is the downstream node (1,1) along the wake-cut boundary. Ties are broken

by selecting the node that has the lowest initial index based on the natural ordering. The CM ordering

produces a final node that is located at the upstream farfield boundary. For a symmetric airfoil with

a symmetric C-topology mesh and an odd number of nodes in the streamwise direction, the final node

is specifically located at the upstream boundary and lies along the line extended from the chord in the

upstream direction. For RCM, the first node is therefore at the upstream boundary and the final node

is at the downstream corner along the wake-cut boundary.

Similar to CM, MDF has a root node that is at the downstream node (1,1) along the wake cut.

The criteria for selecting a root node for MDF are, in order: discard, degree, and initial index. These

criteria are discussed in detail in Chapter 5. Discard refers to information that would be lost during

the incomplete factorization, degree refers to the number of neighbours that a node possesses, and the

initial index is refers to the ordering that is used prior to MDF (i.e. natural ordering for this work

unless otherwise specified). Ties are broken for MDF based on these same three criteria. Additional tie-

breaking strategies were explored for MDF, including replacing the initial-index approach with distance

and line-distance approaches, however their performance did not yield an improvement. This observation

was in contrast to the results presented earlier for the convection-diffusion equation.

Table 6.16 compares the performance of the natural, RCM, and MDF orderings for the five test cases.

The performance measures include the total number of GMRES iterations, IG, the average number of

GMRES iterations required per Newton iteration IG / IN , and CPU time measured in seconds. RCM

outperforms the other two orderings in terms of all three performance measures.

The MDF reordering strategy did not yield an improvement compared to RCM. Recall that for the


Table 6.16: Performance of Newton–Krylov algorithm using BILU(p) with various orderings.

Case Fill-in, p Ordering Tie Break IG IN IG / IN Time (s)

E1 3 natural - 243 16 15.2 6.3

E1 3 RCM index 197 12 16.4 4.3

E1 3 MDF index 911 17 53.6 25.4

E2 3 natural - 220 23 9.6 7.4

E2 3 RCM index 201 16 12.6 5.0

E2 3 MDF index 995 23 43.3 30.1

L1 4 natural - 1568 28 56.0 64.0

L1 4 RCM index 354 13 27.2 14.5

L1 4 MDF index - - - -

T1 4 natural - 1878 119 15.8 172.0

T1 4 RCM index 652 88 7.4 75.3

T1 4 MDF index 3121 125 25.0 424.5

T2 4 natural - 1188 153 7.8 177.0

T2 4 RCM index 522 107 4.9 83.2

T2 4 MDF index 2294 158 14.5 452.5

convection-diffusion equation, MDF outperformed RCM in terms of GMRES iterations. MDF exhibits

poor performance in the mid-late Newton stage of the nonlinear algorithm where the linear system

matrix is most stiff. It is believed that there are aspects associated with the C-topology mesh that were

not encountered with the rectangular mesh for the convection-diffusion equation that contribute to the

performance of the MDF. Since RCM performed so well relative to the natural and MDF orderings, the

tie-breaking strategy for MDF was modified to include using the index resulting from RCM instead of

the index resulting from natural ordering. However, this resulted in a decrease in the performance of

MDF. It is believed that this is because RCM and MDF have fundamentally different objectives. RCM

is a bandwidth minimization algorithm, whereas MDF is a local discard minimization algorithm. A key

difference between MDF and RCM is that RCM traverses across the wake-cut boundary and into the

interior as early as when it selects the first four nodes. In contrast, MDF selects all of the boundary

nodes, and then works its way into the interior thereafter.


6.2.3 Iterative BILU(p) preconditioning

In this investigation, the performance of BILU(p) preconditioning is compared to iterative BILU(p)

preconditioning. RCM ordering is used for this study, since it yielded the best performance in the

previous section. All continuation parameters remain unchanged and GMRES is converged two orders

of magnitude for all cases.

Tables 6.17 and 6.18 compare the performance of the iterative BILU(p) preconditioner to the baseline

BILU(p) preconditioner for the Euler and Navier–Stokes equations. Similar to the ordering study,

performance is measured in terms of the total number of GMRES iterations, IG, the average number of

GMRES iterations per Newton iteration, IG / IN (i.e. inner per outer iterations ratio), and the CPU

time in seconds. One preconditioning cycle is equivalent to the baseline preconditioner.

A damping parameter is introduced for the iterative BILU(p) preconditioner for this study. For a sin-

gle iteration of BILU(p) the damped and undamped preconditioner have similar performance. However,

for multiple iterations of BILU(p) there is a noticeable difference in the performance of the precondi-

tioner. The damping parameters for the baseline cases, E1, E2, L1, T1 and T2, are 0.7, 0.7, 0.9, 0.6 and

1.0, respectively.

For the inviscid subsonic case, E1, additional iterations of BILU(3) reduce both the total number

of GMRES iterations and the ratio of inner to outer iterations. The approximate trend is that for an

increasing number of preconditioning cycles there is a reduction in the number of total iterations and

the ratio of inner to outer iterations, with diminishing returns. If a large enough number of iterations

of the preconditioner are executed, in theory the ratio should approach a value of 1. This is not evident

in practice because the iterative method in the preconditioner sometimes diverges (depending on the

stiffness of the linear problem for a given Newton iteration). This point is a topic of later discussion. In

terms of CPU time, the baseline preconditioner and 2 or 3 iterations of BILU(3) are competitive.

The results for the inviscid transonic case, E2, are similar to case E1 in terms of general trends.

The reduction in GMRES iterations and the ratio of inner to outer iterations are similar for both cases.

When comparing 3 preconditioning cycles to 1 cycle for example, there are 51% fewer GMRES iterations

for case E1 and 46% fewer iterations for case E2. Since the number of Newton iterations is coincidentally

unchanged for each case, the ratio of inner to outer iterations is also reduced by the same amount. In

terms of CPU time, the baseline preconditioner and 2 iterations of BILU(3) are competitive.

For the laminar case, L1, 2 to 5 iterations of BILU(4) preconditioning outperforms the baseline

BILU(4) preconditioner in terms of total number of GMRES iterations and the ratio of inner to outer

iterations. In terms of CPU time, the baseline preconditioner is the best choice for this case. In terms

of GMRES iterations, 5 iterations of BILU(4) is the best preconditioner. In terms of the ratio of inner

to outer iterations, the best preconditioner is 5 iterations of BILU(4).

The results for the turbulent subsonic case, T1, show that 5 iterations of BILU(4) preconditioning

reduces the number of GMRES iterations and the ratio of inner to outer iterations. However, this

decrease is small and does not offset the additional cost that is incurred to execute the additional


Table 6.17: Performance of Newton–Krylov algorithm using multiple BILU(p) preconditioning for invis-

cid test cases.

Case Fill-in, p Prec. Cycs. IG IN IG / IN Time (s)

E1 3 1 197 12 16.4 4.3

E1 3 2 123 12 10.2 4.6

E1 3 3 96 12 8.0 5.0

E1 3 4 80 12 6.7 5.3

E1 3 5 72 12 6.0 5.6

E2 3 1 201 16 12.6 5.0

E2 3 2 134 16 8.4 5.5

E2 3 3 109 16 6.8 6.1

E2 3 4 94 16 5.9 6.5

E2 3 5 76 16 4.8 6.6

iterations in the preconditioning step. The baseline preconditioner requires the lowest amount of CPU

time. This case is especially difficult in terms of observing an improvement compared to the baseline

BILU(4) preconditioner. It is believed that this is due to the continuation algorithm. Specifically for

this case, the continuation algorithm contains more aggressive parameters than the turbulent transonic

case, T2. The results for the case T2 are more promising.

For the turbulent transonic case, T2, the total number of GMRES iterations and the ratio of inner to

outer iterations decrease as the number of preconditioner iterations increase for the first 5 iterations. The

baseline BILU(4) preconditioner yields the best CPU time and two iterations of BILU(4) preconditioning

results in a 46% reduction in both the total number of GMRES iterations and the ratio of inner to outer

iterations.

Recall that for case E1 it was mentioned that in theory, if enough iterations in the preconditioner are

executed, the ratio of inner to outer iterations should approach a value of 1. It is not evident, though,

in practice. In addition to this observation, the ratio of inner to outer iterations worsens for tougher

cases for a large amount of preconditioner iterations. Eigenvalue analysis of the iteration matrix for the

iterative BILU(p) preconditioner provides some insight. There are unstable eigenvalues (i.e. eigenvalues

that produce a spectral radius greater than 1) in the preconditioning iteration matrix for many cases

for given Newton iterations. It is believed that GMRES reduces the error modes associated with these

eigenvalues. However, if too many iterations of the preconditioner are performed, then GMRES is unable

to reduce the amplified error modes associated with these eigenvalues. The introduction of the damping

parameter for this study attenuated this effect. Eigenvalue analyses on coarser grids for unstable iterative


Table 6.18: Performance of Newton–Krylov algorithm using multiple BILU(p) preconditioning for lam-

inar and turbulent test cases.

Case Fill-in, p Prec. Cycs. IG IN IG / IN Time (s)

L1 4 1 354 13 27.2 14.5

L1 4 2 217 13 16.7 15.5

L1 4 3 161 13 12.4 16.2

L1 4 4 157 13 12.1 19.4

L1 4 5 124 13 9.5 19.2

T1 4 1 657 88 7.5 76.4

T1 4 2 498 88 5.7 85.7

T1 4 3 419 89 4.7 93.7

T1 4 4 379 89 4.3 99.3

T1 4 5 336 87 3.9 103.5

T2 4 1 522 107 4.9 83.2

T2 4 2 385 107 3.6 90.4

T2 4 3 307 106 2.9 93.9

T2 4 4 294 106 2.8 101.0

T2 4 5 283 107 2.6 108.4

preconditioning cases typically have no more than 20 of these unstable eigenvalues.

The iterative BILU(p) preconditioner is further explored in terms of its performance with respect to

certain phases of the nonlinear algorithm. Specifically, the performance is examined when the iterative

preconditioner is only active for either the approximate Newton phase or the inexact Newton iterations.

Table 6.19 compares the performance of the iterative preconditioner for the subsonic inviscid case, E1.

The reference is that the iterative preconditioner is active for all nonlinear iterations. For 2-iteration

preconditioning, an increase in the total number of GMRES iterations as well as the ratio of inner to outer

iterations is observed. When the preconditioner is only active for the inexact Newton iterations, there are

substantially fewer GMRES iterations compared to when it is only active for the approximate Newton

iterations. However, the CPU time for the inexact Newton phase using the iterative preconditioner is

noticeably larger than the baseline BILU(p) preconditioner.

A similar study is presented in Table 6.20 for the turbulent transonic case, T2. For this case, it is

evident that the iterative preconditioner is effective in both the approximate and inexact Newton phases.

Specifically, when comparing two iterations of BILU(4) to the baseline BILU(4) preconditioner, there is

a 26% reduction in the total number of GMRES iterations when the iterative preconditioner is always


Table 6.19: Performance of Newton–Krylov algorithm using multiple BILU(p) preconditioning for invis-

cid subsonic test case, E1.

Iterative Precon. Active Fill-in, p Prec. Cycs. IG IN IG / IN Time (s)

No (Baseline) 3 1 197 12 16.4 4.3

Always 3 2 148 12 12.3 5.2

Always 3 3 113 12 9.4 5.5

Always 3 4 97 12 8.1 6.0

Always 3 5 104 12 8.7 7.3

Approx. Newton 3 2 190 12 15.8 4.3

Approx. Newton 3 3 187 12 15.6 4.4

Approx. Newton 3 4 186 12 15.5 4.5

Approx. Newton 3 5 185 12 15.4 4.6

Inexact Newton 3 2 155 12 12.9 5.0

Inexact Newton 3 3 124 12 10.3 5.4

Inexact Newton 3 4 107 12 8.9 5.7

Inexact Newton 3 5 114 12 9.5 6.9

active. If the iterative preconditioner is only active for the approximate Newton phase, there is a 16%

reduction and if the preconditioner is only active for the inexact Newton phase, there is a comparable

11% reduction. There is a noticeable increase in CPU time when the iterative preconditioner is active

for all Newton iterations. This increase relative to the baseline BILU(p) preconditioner mostly occurs

during the inexact Newton phase.

6.2.4 BILU(p) and multigrid preconditioning

In this investigation, multigrid preconditioning is compared to both the baseline BILU(p) preconditioner

and its iterative extension. Similar to the previous section, RCM ordering is used as well as the default

continuation parameters, shown in Table 4.1. All preconditioner fill-in levels are the same for all grid

levels for each one of the respective five test cases. Furthermore, GMRES is converged two orders of

magnitude for all cases.

The multigrid preconditioner consists of an l-level V-cycle, where l ∈ 2, 3, 4. The components of

the multigrid preconditioner include a BILU(p) smoothing iteration followed by a restriction of the linear

residual to the coarser mesh, a calculation of the solution error estimate on the coarser mesh using one

or more smoothing iterations, and a prolongation of this error to the finer mesh. Full-weighting and


Table 6.20: Performance of Newton–Krylov algorithm using multiple BILU(p) preconditioning for tur-

bulent transonic test case, T2.

Iterative Precon. Active Fill-in, p Prec. Cycs. IG IN IG / IN Time (s)

No (Baseline) 4 1 522 107 4.9 84.0

Always 4 2 385 107 3.6 90.4

Always 4 3 307 106 2.9 93.9

Always 4 4 294 106 2.8 101.0

Always 4 5 283 107 2.6 108.4

Approx. Newton 4 2 441 107 4.1 89.6

Approx. Newton 4 3 378 106 3.6 92.2

Approx. Newton 4 4 365 105 3.5 97.5

Approx. Newton 4 5 378 106 3.6 92.2

Inexact Newton 4 2 465 107 4.3 84.0

Inexact Newton 4 3 448 107 4.2 85.5

Inexact Newton 4 4 434 107 4.1 86.0

Inexact Newton 4 5 438 108 4.1 89.2

bi-linear interpolation operators are used for restriction and prolongation, respectively. The multigrid

preconditioner is not active for the first kstart iterations, since the linear problem is converged to its

required relative tolerance in a small number of iterations (most often less than 3). Furthermore, the

multigrid preconditioner is activated once the L2-norm of the nonlinear residual is below a predefined

tolerance. Multigrid preconditioning does not provide a significant reduction in the approximate Newton

phase since the ratio of inner to outer iterations is already quite small, thus not justifying the cost of

generating and incorporating the coarse-grid operators. Experiments show that good choices for this

tolerance for inviscid and viscous cases are 10−6 and 10−5, respectively. When multigrid preconditioning

is active, the relative tolerance is kept at the baseline value to make an objective comparison to the

other preconditioners.

Each grid level, for both descent and ascent in the l-level V-cycle preconditioner requires the definition

of parameters describing the number of smoothing iterations and if necessary a damping parameter. For

smoothing components that occur in the descent phase of the V-cycle, the parameter νl1 describes the

number of smoothing iterations for each grid level. For smoothing components that occur in the ascent

of the V-cycle, the parameter νl2 describes the number of smoothing iterations for each grid level. The

coarsest grid level has νngrids

1 iterations where ngrids is the number of grid levels. It is found that the

introduction of a damping parameter for the relaxation is also useful, ωl. For the cases described in


Table 6.21: Performance of Newton–Krylov algorithm using BILU(p) or 2-level multigrid preconditioning

for inviscid test cases.

Case Fill-in, p Prec. Grid Prec. Coarse IG IN IG / IN Time (s)

Levels Grid Iters.

E1 3 1 - 197 12 16.4 4.3

E1 3 2 1 181 12 15.1 7.0

E1 3 2 2 178 12 14.8 7.3

E1 3 2 3 176 12 14.7 7.7

E1 3 2 10 210 12 17.5 12.3

E2 3 1 - 201 16 12.6 5.0

E2 3 2 1 193 16 12.1 8.1

E2 3 2 2 197 16 12.3 8.6

E2 3 2 3 204 16 12.8 9.2

E2 3 2 10 272 17 16.0 15.8

Tables 6.21 and 6.22 the value of ν21 corresponds to the number of coarse grid iterations and the damping

parameter is set to unity. For the remainder of this discussion all values of νl1, νl2, and ωl are assumed

to be unity unless otherwise specified.

It is important that the continuation parameter (i.e. time step) is treated properly for the coarser

grid levels. Experiments show that the time step should be generated based on the nonlinear residual

and the state variables within its own respective grid level.

Two-grid preconditioner performance for baseline cases

Tables 6.21 and 6.22 compare the performance of the 2-level multigrid preconditioner to the baseline

BILU(p) preconditioner for the 5 original test cases: E1, E2, L1, T1 and T2. One through five coarse

grid iterations are compared for each case. An additional grid level is considered later for this suite

of cases. The quantity IG refers to the total number of GMRES iterations on the fine mesh and each

GMRES iteration corresponds to one preconditioning step using either the baseline BILU(p) or an entire

multigrid cycle as the preconditioner. The baseline preconditioner is indicated by 1 grid level.

The damping parameter for the finest grid level, ω0, is consistent with the earlier study on iterative

BILU(p) preconditioning. Specifically, for cases E1, E2, L1, T1 and T2, the damping parameter has

values of 0.7, 0.7, 0.9, 0.6 and 1.0, respectively. Preliminary investigations indicate that several values for

these fine-grid damping parameters produce optimal performance in the 2-level preconditioner. However,

the study resulted in an enormous parametric study that would be further exacerbated by parametric


Table 6.22: Performance of Newton–Krylov algorithm using BILU(p) or 2-level multigrid preconditioning

for laminar and turbulent test cases.

Case Fill-in, p Prec. Grid Prec. Coarse IG IN IG / IN Time (s)

Levels Grid Iters.

L1 4 1 - 354 13 27.2 14.5

L1 3 2 1 221 13 17.0 17.5

L1 3 2 2 220 13 16.9 18.6

L1 3 2 3 215 13 16.5 19.7

L1 3 2 10 203 13 15.6 25.8

T1 4 1 - 652 88 7.4 75.3

T1 4 2 1 585 86 6.8 81.3

T1 4 2 2 593 86 6.9 83.5

T1 4 2 3 604 86 7.0 87.8

T1 4 2 10 685 86 8.0 108.6

T2 4 1 - 522 107 4.9 83.2

T2 4 2 1 498 107 4.7 89.2

T2 4 2 2 503 107 4.7 90.2

T2 4 2 3 502 107 4.7 91.6

T2 4 2 10 495 107 4.6 98.3

investigation on additional grid levels. To simplify the parametric study of the preconditioner, the fine-

grid damping parameters are fixed. Furthermore, once the first coarser-grid damping parameter, ω1, is

optimized for each case, it is fixed for any subsequent studies involving additional grid levels.

For the inviscid subsonic case, E1, the multigrid preconditioner with 1, 2 or 3 coarse grid iterations

yields a reduction in the total number of GMRES iterations and the ratio of inner to outer iterations. The

coarse-grid damping parameter, ω1, is 0.2. An excess number of smoothing iterations on the coarse grid

level (e.g. 10), however, results in an increase in these performance measures. In terms of these measures,

the best performance occurs with 3 coarse-grid iterations. This particular preconditioner decreases the

number of GMRES iterations by 11%. In terms of CPU time, the baseline BILU(3) preconditioner is

the fastest.

For the inviscid transonic case, E2, a 2-level preconditioner with 1 coarse-grid iteration results in a

4% fewer GMRES iterations compared to the baseline preconditioner. Additional coarse grid iterations

result in an increase in the number of GMRES iterations. A coarse-grid damping parameter of ω1 = 0.05

is used.


The laminar case, L1, has the greatest percent reduction in the number of GMRES iterations with

respect to the baseline BILU(4) preconditioner. The coarse-grid damping parameter, ω1, is set to 0.7. Of

the results presented, 10 coarse-grid iterations offers the best improvement in the iteration performance

measures of the Newton–Krylov algorithm. Specifically, the total number of GMRES iterations and the

ratio of inner to outer iterations are reduced by 43% compared to the baseline preconditioner.

The turbulent subsonic case, T1, exhibits a decrease in the number of GMRES iterations for an

increasing number of coarse-grid iterations, with diminishing returns. One course grid iteration (with

ω1 = 0.1) results in the fewest GMRES iterations and the lowest ratio of inner to outer iterations.

Specifically, a 10% reduction in these quantities is observed compared to the baseline preconditioner.

The turbulent transonic case, T2, exhibits similar performance to case T1. The reduction in GMRES

iterations for the multigrid preconditioner is 5%. Both 1 and 10 coarse grid iterations result in the fewest

number of GMRES iterations. The coarse-grid damping parameter, ω1, is set to 0.5.

Three-grid preconditioner performance for baseline cases

A 3-level preconditioner is also considered for the baseline cases. Since the third grid level is very coarse,

its grid metrics, which have thusfar been computed by finite differences often lead to negative values.

Therefore, the metric Jacobian of the generalized curvilinear coordinate transformation is approximated

by the cell area of a node. Table 6.23 compares the performance of BILU preconditioning to 2- and

3-level multigrid preconditioning for test cases E1, E2, L1, T1 and T2. The damping parameter, ω2, is

set to 0.1, 0.02, 0.5, 0.01 and 0.3 for these test cases, respectively. The number of smoothing iterations

on each grid level is another possible parameter that can be investigated. To reduce the size of the

parametric study, one smoothing iteration on each grid level is considered.

For the inviscid subsonic case, E1, the 3-level preconditioner results in the fewest number of GMRES

iterations. Specifically, a reduction of 10% compared to the baseline BILU(3) preconditioner is observed.

For the inviscid transonic case, E2, The 3-level preconditioner results in a slight increase in the

number of GMRES iterations compared to the 2-level preconditioner.

The performance for laminar viscous case, L1, is improved by the introduction of a 2- or 3-level

multigrid preconditioner. In comparison to the baseline BILU(4) preconditioner, the 2- and 3-level

preconditioners result in 38% and 42% fewer GMRES iterations, respectively.

Multigrid preconditioning reduces the iterations of GMRES for both turbulent cases, T1 and T2.

Similar behaviour is exhibited to case E2 in terms of the 3-grid level preconditioner improving on the

baseline BILU(4) preconditioner but not improving on the 2-grid level preconditioner.

A 3-level preconditioner can at least match the performance of the 2-level preconditioner for all

cases. However, for cases E2, T1 and T2 this performance in only achieved when venturing outside of

the current constraints on the damping parameters and the number of smoothing iterations on each

grid level. Recall, the damping parameter on the finest grid level, ω0, was optimized for the iterative

BILU(p) preconditioner. It’s value was fixed when ω1 was optimized and both ω0 and ω1 were fixed when


Table 6.23: Performance of Newton–Krylov algorithm using BILU(p) and 2- or 3-level multigrid precon-

ditioning for inviscid, laminar and turbulent test cases.

Case Fill-in, p Preconditioner IG IN IG / IN Time (s)

E1 3 BILU 197 12 16.4 4.3

E1 3 2-level multigrid 181 12 15.1 7.0


E2 3 BILU 201 16 12.6 5.0



L1 4 BILU 354 13 27.2 14.4

L1 4 2-level multigrid 221 13 17.0 17.5

L1 4 3-level multigrid 205 13 15.8 18.6

T1 4 BILU 658 88 7.5 75.7

T1 4 2-level multigrid 585 86 6.8 81.3


T2 4 BILU 534 108 4.9 83.8



optimizing the parameter ω2. A superior 3-level preconditioner compared to the 2-level and baseline

preconditioners can ultimately be determined through an enormous parametric study involving all ωl,

νl1 and νl2 values simultaneously. A formal presentation this study is not presented here. Instead, the

investigation shifts to examining the performance of iterative BILU(p) an multigrid preconditioning on

finer grids, including the consideration of alternative inter-grid operators.

Iterative and multigrid preconditioner performance on finer grids

The performance of iterative and multigrid preconditioning on finer grids is examined and concludes this

results chapter. Specifically, these preconditioners are compared to the baseline BILU(p) preconditioner

for a transonic turbulent case about the RAE 2822 airfoil, on two fine grids. Table 6.24 summarizes

these two cases. Case F0 represents a very fine grid, W0, consisting of 513 × 129 nodes, resulting in

330,885 equations and unknowns. Case F1 uses grid W1 which is derived from grid W0 by removing

every other node in both the streamwise and normal directions.

Table 6.25 summarizes the results for both cases. Specifically, the performance of 1 through 4


Table 6.24: Finer grid cases for Euler and Navier–Stokes calculations.

Case Finest Grid Flow Mach Number Angle of Attack Reynolds Number

F0 W0 turbulent transonic 0.729 2.31 6.5× 106

F1 W1 turbulent transonic 0.729 2.31 6.5× 106

Table 6.25: Performance of Newton–Krylov algorithm using BILU(p) and 2-, 3- or 4-level multigrid

preconditioning for finer-grid test cases.

Case Fill-in, p Preconditioner IG IN IG / IN Time (s)

F0 4 BILU 766 122 6.3 363.1

F0 4 2 iterations of BILU 552 120 4.6 388.3



F0 4 2-level multigrid 694 121 5.7 379.9



F1 4 BILU 627 121 5.2 84.3







iterations of BILU(4) and 2, 3 and 4 level multigrid preconditioning are compared. For the multigrid

preconditioners, one smoothing iteration is performed on each grid level.

Two, three and four iterations of BILU(4) perform better than BILU(4) in terms of number of

GMRES iterations. Since the number of Newton iterations varies for each preconditioner, the inner to

outer iterations measure, IG/IN , is used to compare the various preconditioners. For the coarser case,

F1, the reductions are 27%, 38%, and 44%, for 2, 3, and 4 iterations respectively. For the finer case, F0,

the reductions are 27%, 35% and 44% for 2, 3, and 4 iterations, respectively. These relative reductions

in GMRES iterations are similar from case F1 to case F0, although the absolute number of GMRES

iterations increases from case F1 to F0; a phenomenon that GMRES exhibits for increasing system size.


The 2-level multigrid preconditioner reduces the number of GMRES iterations compared to the

baseline BILU(4) preconditioner. However, 3- and 4-level coarse-grid corrections do not improve on

the performance of the 2-level preconditioner. This behaviour is consistent for cases F0 and F1. Two

iterations of BILU(4) is a more effective preconditioner than multigrid in terms of iterations.

The multigrid preconditioning results that are presented use bilinear interpolation for both restriction

and prolongation, as shown in Figures 5.3 and 5.4. In an attempt to improve on the performance of

the multigrid preconditioner, alternative inter-grid operators were explored. Specifically, auxiliary cell

area-based restriction and prolongation operators were investigated. The approach follows the work of

Zuliani [310]. In this approach an auxiliary grid, Waux, that is finer than the finest grid is used to

compute neighbouring cell areas for a given node and ratios of these cell areas are used to compute the

inter-grid weighting factors. Multigrid preconditioning using these inter-grid operators did not improve

on the performance of the existing multigrid preconditioner.

Chapter 7

CONCLUSIONS, CONTRIBUTIONS

AND RECOMMENDATIONS

The results for the convection-diffusion, Euler and Navier–Stokes calculations led to considerable insight

on how preconditioning can impact the performance of a Newton–Krylov flow solver. This chapter begins

by providing conclusions for the research into preconditioning for the convection-diffusion equation.

Conclusions for the Euler and Navier–Stokes equations are then provided. Original contributions are

then summarized and the chapter ends with recommendations for future research.

7.1 Conclusions

7.1.1 Convection-Diffusion Equation

In the first component of this research a linear problem was examined. The convection-diffusion equation

is a linear partial differential equation, whose discretization leads to linear system of equations. Two

baseline cases were considered: diffusion-dominated and convection-dominated flow with Peclet number

115

7.1 CONCLUSIONS 116

of 0.001 and 1000, respectively.

First, the effect of Peclet number on the number of GMRES iterations was studied. For ILU(0) and

ILU(1) preconditioning, fewer GMRES iterations were required as the Peclet number increased from

0.001 to 1000. This effect was more dramatic when ILU(1) preconditioning was used. Specifically, for

ILU(0), the diffusion-dominated case required 186 GMRES iterations, whereas the convection-dominated

case required 63 GMRES iterations. For ILU(1), the GMRES iterations were 127 and 15 for the diffusion-

and convection-dominated cases, respectively. Therefore, ILU(p) had a greater potential to be improved

on for the diffusion-dominated case than the convection-dominated case.

Iterative ILU(p) preconditioning was subsequently studied, both for its own merit and in the develop-

ment of multigrid preconditioning. Iterative ILU(1) preconditioning resulted in fewer GMRES iterations.

For example, 3 iterations of ILU(1) reduced the number of GMRES iterations from 127 to 68 for the

diffusion-dominated case and from 15 to 6 for the convection-dominated case. A minimum fill-in level

of 1 is required for the convection-dominated case to improve on the baseline ILU(p) preconditioner.

Multigrid preconditioning for the diffusion-dominated case resulted in a nearly grid-independent

number of GMRES iterations. For example, a grid consisting of 172 nodes required 25 iterations without

multigrid preconditioning and 13 iterations with multigrid preconditioning. For a grid consisting of

2572 nodes, GMRES required 373 iterations without multigrid preconditioning and 19 iterations with

multigrid preconditioning. Furthermore, a grid consisting of 5132 nodes, GMRES required only 22

iterations when preconditioned by multigrid. Multigrid preconditioning for the convection-dominated

case did not reduce the number of GMRES iterations.

Four orderings were studied including: a natural ordering along each respective dimension; an or-

dering in the reverse direction of the natural ordering; reverse Cuthill–McKee (RCM); and minimum

discarded fill (MDF). Results for the convection-dominated case showed that the MDF ordering required

the fewest GMRES iterations for a grid consisting of 2572 nodes and ILU(1) preconditioning. Specif-

ically, the natural and reverse ordering each required 24 iterations, RCM required 15 iterations, and

MDF required 10 iterations. Multigrid preconditioning did not result in fewer GMRES iterations for

each respective ordering for the convection-dominated case.

The MDF ordering resulted in the fewest GMRES iterations for diffusion-dominated cases. This

phenomenon also occurred when multigrid preconditioning was considered. For example, with ILU(1)

preconditioning on a grid consisting of 2572 nodes, the RCM ordering resulted in 252 GMRES iterations

and the MDF ordering resulted in 195 iterations. With multigrid preconditioning these values became

15 and 11 for RCM and MDF, respectively.

Additional studies into the MDF algorithm were conducted. A connection between MDF and up-

winding was demonstrated. Through a series of examples it was shown that MDF can lead to an ordering

that results in a system matrix that is analogous to a matrix that would arise from an upwind discretiza-

tion. The importance of this is tremendous in terms of incomplete factorizations because a matrix that

arises from an upwind discretization (i.e. lower triangular), for the problems considered here, has an

7.1 CONCLUSIONS 117

exact ILU(0) factorization, thus resulting in a preconditioner that is the inverse of the linear system

matrix.

It is well known that the incomplete factorization of a lower-triangular matrix will have no discarded

fill in. It was also discovered in this study that MDF can lead to a matrix that is not lower triangular,

yet yields a discarded fill of zero.

The final investigation related to the convection-diffusion equation included the development of an

evolutionary algorithm in order to study the root-node selection and tie-breaking strategies in the MDF

algorithm. A distance-based and a novel line-distance tie-breaking strategies were compared. Results

for a 25-node grid suggest that root-node selection is not important for diffusion-dominated problems.

In contrast, for convection-dominated problems upstream nodes are excellent root-node candidate. Fur-

thermore, if the flow is not aligned to a particular direction, the downstream corner node is also a

good root node. The evolutionary algorithm also indicated that the line-distance tie-breaking strategy

resulted in the fewest GMRES iterations.

7.1.2 Euler and Navier–Stokes Equations

In the second component of this research, two nonlinear problems were considered: the discretized Euler

equations and the discretized, compressible Navier–Stokes equations fully coupled with the one-equation

Spalart–Allmaras turbulence model. Specifically, five baseline cases were used including an inviscid

subsonic case, E1, and inviscid transonic case, E2, a laminar subsonic case, L1, a turbulent subsonic

case, T1, and a turbulent transonic case, T2. The baseline preconditioner was BILU(3) for the inviscid

cases and BILU(4) for the viscous cases. The reverse Cuthill–McKee (RCM) ordering was the baseline

ordering. The performance measures included the total number of GMRES iterations (on the fine grid,

where relevant), the ratio of GMRES to Newton iterations (i.e. inner to outer iterations) and CPU time.

Many investigations were conducted in this research. The following studies were discussed in detail:

orderings; iterative BILU(p) preconditioning; and multigrid preconditioning. The iterative and multi-

grid preconditioners were also investigated on finer grids and the inter-grid operators based on bilinear

interpolation were compared to a more advanced formulation.

Three nodal orderings were compared for the C-topology computational grid: a natural ordering

based on the lexicographical arrangement of the nodes along the streamwise and normal directions, re-

spectively; the reverse Cuthill–McKee (RCM) ordering; and the minimum discarded fill (MDF) ordering.

Both RCM and MDF used the downstream corner node as a root node. In contrast to the MDF ordering

used for the convection-diffusion equation, the MDF ordering for the discretized Navier–Stokes equations

(i.e. a system of PDEs) required the use of a so-called greedy reduction of the block system matrix to a

smaller matrix of scalars whose dimensions equal the total number of grid nodes.

RCM required the fewest number of GMRES iterations for all five test cases in which the baseline

BILU(p) preconditioner was used. Hence, RCM was selected as the nodal ordering for the subsequent

studies involving iterative BILU(p) and multigrid preconditioning.

7.1 CONCLUSIONS 118

Iterative BILU(p) preconditioning was explored for the five baseline cases. Specifically, 2 through 5 it-

erations of BILU(p) were compared to the baseline BILU(p) preconditioner. The baseline preconditioner

required 197, 201, 354, 657 and 522 GMRES iterations for cases E1, E2, L1, T1 and T2, respectively.

Two iterations of BILU(p) reduced the number of GMRES iterations by 38%, 33%, 39%, 24% and 26%

for these cases, respectively. Three through five iterations if BILU(p) also reduced the number of GM-

RES iterations, however with diminishing returns. Five iterations of BILU(p) reduced the number of

GMRES iterations by 63%, 62%, 65%, 49% and 46% for these cases, respectively. The inviscid and

laminar cases produced the must significant reductions. For each respective case, if enough iterations of

BILU(p) were used in the preconditioner, this decreasing trend in GMRES iterations would cease. It is

believed that this is due to the existence of unstable eigenvalues in the iteration matrix related to the

BILU(p) relaxation method. Below a given number of BILU(p) preconditioning iterations, the modes

associated with these eigenvalues do not grow because GMRES effectively reduces them.

The dramatic reduction in the number of GMRES iterations that iterative BILU(p) produces provides

motivation for its use in situations where memory is limited. Restarting the Krylov subspace reduces

the amount of memory that is required for GMRES and iterative BILU(p) enhances this reduction.

Furthermore, for cases that require a prohibitive (in terms of storage) amount of fill-in, p, iterative

BILU(p) preconditioning with a lower fill-in parameter could potentially be used.

The final, and most extensive investigation for preconditioning of the Newton–Krylov algorithm

for the discretized Navier–Stokes equations related to multigrid. Specifically, 2-, 3- and 4-level V-cycle

multigrid preconditioners were compared to the iterative BILU(p) and baseline BILU(p) preconditioners.

Experiments showed that the multigrid preconditioner would potentially be most effective beyond the

pseudo-transient continuation phase of the Newton algorithm, where the L2-norm of the nonlinear resid-

ual is relatively large (e.g. 10−5 or 10−6) compared to the convergence tolerance (e.g. 10−14). Within the

pseudo-transient continuation phase, the number of GMRES iterations per Newton iterations is small

and any reduction these iterations that multigrid preconditioning would offer would be outweighed by

its relative cost per iteration.

Since the number of GMRES iterations per Newton iteration is quite small when the L2-norm of the

nonlinear residual is relatively large compared to the convergence tolerance (e.g. 10−5 of 10−6 compared

to 10−14), any reduction that multigrid would offer in terms of GMRES iterations would be outweighed

by its relative cost per iteration.

For the baseline cases E1, E2, L1, T1 and T2, 2-level multigrid preconditioning with one smoothing

iteration on the coarsest grid level resulted in 8%, 4%, 38%, 11% and 12% fewer GMRES iterations,

respectively when compared to BILU(p). The laminar subsonic case produced the largest reduction.

Additional smoothing iterations (up to a certain limit) on the second grid level for the inviscid subsonic

and laminar subsonic cases also produced a reduction with diminishing returns. As mentioned earlier,

the presence of unstable eigenvalues in the iteration matrix of the BILU(p) smoother eventually resulted

in its instability for each case on the coarsest grid level. Three-level multigrid preconditioning did not

7.2 CONTRIBUTIONS 119

improve on the performance of two-level preconditioning.

In an attempt to further understand the behaviour of the iterative and multigrid preconditioners,

two additional studies were conducted. The turbulent transonic case was investigated on finer grids

and additional inter-grid operators for multigrid were implemented. Iterative BILU(p) preconditioning

produced similar relative reductions in GMRES iterations for increasing grid size, compared to BILU(p)

preconditioning. For the finest grid, W0, 2, 3 and 4 iterations of BILU(p) preconditioning reduced the

number of GMRES iterations by 27%, 35% and 44%, respectively. Two-level multigrid preconditioning

reduced the number of GMRES iterations by 9%. Three- and four-level multigrid preconditioning also

reduced the number of GMRES iterations, however, these preconditioners performed worse than the two-

level preconditioner. Additional inter-grid transfer operators were explored in an attempt to improve

on the multigrid preconditioner. Specifically, the operator developed by Zuliani [310] was implemented.

However, its performance did not improve on the baseline bilinear restriction and prolongation operators.

Based on the results obtained using the multigrid preconditioner, including the consideration of

relative cost per iteration, its use would be most effective for situations in which the transient phase

of the Newton algorithm is short. Specifically, multigrid preconditioning would be useful for studies

involving flow solves that use a fully-converged flow solution as an initial guess, whose parameters are

close to the current parameters (i.e. warm starts). Examples of this type of situation would include

generating lift or drag versus angle of attack plots, drag polars and the linesearching process in a

gradient-based optimization algorithm. For the latter example, function evaluations in the linesearching

algorithm correspond to flow solves.

7.2 Contributions

A broad range of preconditioners were investigated in this research. The literature review in this dis-

sertation offers an extensive delineation of the history of preconditioning. Specifically, topics such as

ordering algorithms and BILU(p), iterative BILU(p) and multigrid preconditioning were investigated in

tremendous detail. Below are some of the most notable contributions that were made:

* A detailed comparison of the reverse Cuthill–McKee (RCM) ordering was made with respect to

minimum discarded fill (MDF) ordering. Various root-node selection and tie-breaking strategies

were compared for the latter. Distance and novel line-distance tie-breaking strategies were im-

plemented for the MDF algorithm. The MDF algorithm was also adapted to systems of PDEs

(e.g. the discretized compressible Navier–Stokes equations) by reducing the block system matrix

associated with the linearization of the discretized system to a matrix equivalent in dimension to

the number of grid nodes.

* A permutation-based evolutionary algorithm was created to determine the optimal root node for

convection- and diffusion-dominated problems. It is believed that this is the first instance of such a

7.3 RECOMMENDATIONS 120

study. The study clearly demonstrates that upstream boundary nodes and the downstream corner

node are effective root nodes for convection-dominated problems and any node can be a root node

for diffusion-dominated problems.

* A mathematical formulation was created for an iterative BILU(p) preconditioner including the

consideration of damping, scaling and reordering. The preconditioner was studied for both the

convection-diffusion equation and the discretized, compressible Navier–Stokes equations.

* It was found that BILU(p) as an iterative method has unstable eigenvalues in its iteration matrix.

The investigation of iterative BILU(p) preconditioning for GMRES suggests that GMRES and

BILU(p) work well together because GMRES effectively reduces the modes associated with those

unstable eigenvalues, in addition to other known reasons.

* A mathematical formulation was created for a BILU(p)-smoothed multigrid preconditioner, includ-

ing the consideration of scaling, reordering and the smoothing operator. A detailed investigation

of this preconditioner was conducted for both the convection-diffusion and the discretized, com-

pressible Navier–Stokes equations.

7.3 Recommendations

Some results in this research on their own demonstrated the effectiveness of the various preconditioners

and associated methods explored. Other results and investigations were intended to be more of a

foundation for future work. Below are some of ideas that would potentially be of most interest to other

researchers:

* The permutation-based evolutionary algorithm used in the investigation of the minimum discarded

fill (MDF) ordering can be extended to larger problem sizes that are governed by more sophisticated

equations (e.g. Navier–Stokes). Recall, that the problem size is tremendous for practical grids, since

it scales by the factorial of the number of grid nodes. Additional objective functions can also be

explored. For example, instead of minimizing the discard, ||A − LU||, one can minimize other

important properties relating to the matrix or the iterative algorithm. For example, the number

of GMRES iterations or a spectral property such as ||AU−1L−1 − I|| can be minimized.

* A variant of the reverse Cuthill–McKee reordering algorithm can be developed using a distance

or line-distance tie breaking in its approach. RCM is a bandwidth minimization algorithm, and

it would be interesting to investigate its response to the enforcement of geometric criteria in its

decision process.

* The investigation of the MDF algorithm was limited to C-topology meshes in this research. The

performance of the MDF algorithm can be assessed for multi-block and 3D structured grids. The

7.3 RECOMMENDATIONS 121

MDF-BFILU(p) preconditioner is written in the same syntax as the 3D finite-difference Navier–

Stokes flow solver, DIABLO [270]. This investigation should also include a comparison of the

tie-breaking strategies that have been developed.

* The iterative BILU(p) preconditioning algorithm can be used for more complicated simulations

(e.g. 3D turbulent flow) to exploit its memory saving benefits (i.e. less required GMRES iterations

and using a lower fill-in parameter value).

* Iterative BILU(p) and multigrid preconditioning can be studied for higher-order discretizations.

Memory considerations become increasingly important for such formulations.

Appendix A

OTHER PRECONDITIONING

TECHNIQUES

In addition to the preconditioning techniques that were studied and compared in detail in this research,

the following approaches were also reviewed: domain decomposition and sparse approximate inverses.

Section A.1 describes the general aspects of domain decomposition. Section A.2 gives a brief introduction

into sparse approximate inverse preconditioning.

A.1 Domain Decomposition

The basic idea behind domain decomposition is to break a large problem into smaller problems. This is

accomplished by subdividing the problem domain into smaller domains. Domain decomposition works

well for systems that arise from model PDEs and is well-suited for parallel applications. The domain

decomposition method is essentially a reordering strategy with special solution techniques.

The theory is taken from Saad [38] and Christara [295]. Consider the discretized physical domain,

Ω. The domain is subdivided into smaller domains Ωi, where i = 1, . . . , s, with interfaces Γjk between

122

A.1 DOMAIN DECOMPOSITION 123

adjacent subdomains. Now consider a linear system, Ap = q, that arises from the discretization of

a partial differential equation on Ω. For the purposes of this analysis, a linear system is considered,

although the linear system may also arise from an iteration of a nonlinear solution method such as

Newton’s method. The system is reordered into blocks(B EF C

)(x

y

)=

(f

g

)(A.1)

where the matrix B is a block diagonal matrix of s square block matrices, Bi, relating to interior of

the given ith subdomain. Hence, the vector x contains the unknowns in the subdomains. The vector

y contains the interface unknowns and the square matrix C describes their interaction. The matrix Edescribes the subdomain to interface coupling and the matrix F describes the interface to subdomain

coupling.

There are three types of methods that can be used to solve the system (A.1) for (x y)T . Here, the

Schur complement method is outlined. Other aspects that are important include optimal subdomain

partitioning, overlap selection, and whether to solve subproblems exactly or iteratively.

Consider again the system (A.1). Each block equation is

Bx+ Ey = f (A.2)

Fx+ Cy = g (A.3)

From (A.2) x is isolated as

x = B−1(f − Ey) (A.4)

which when substituted into (A.3) yields

(C − FB−1E)y = (g −FB−1f) (A.5)

Once (A.5) is solved for y, y can be substituted into (A.4) to find x.

The matrix

S ≡ C − FB−1E (A.6)

is the Schur complement of the matrix associated with the interface variable, y. Alternative names for the

Schur complement include the capacitance matrix and the Gauss transform. Since B is a block-diagonal

matrix, its inversion reduces to s inversions of its sub-blocks, Bi.A full-matrix method looks at solving the entire system (A.1) using intelligent preconditioners.

Schwarz decomposition looks at successive solution passes along B. If the interface values are updated

on-the-fly, then it is referred to as multiplicative Schwarz, and if the interface values are updated after

the entire pass of B, then the method is referred to as additive Schwarz. The latter is analogous to a

block Jacobi method, the former to a block Gauss–Seidel method.

A benefit to solving (A.5) instead of the whole system is that the number of interface nodes is

typically a lot smaller than the total number of nodes and hence the system is much smaller. However,

A.2 SPARSE APPROXIMATE INVERSE PRECONDITIONING 124

the trade-off is that many linear solutions are required in the formation of the Schur complement. If the

explicit formation of S can be avoided, then the computational storage and time can be reduced.

The Schur complement method refers to solving the reduced system (A.5) iteratively. For example,

the preconditioned GMRES Krylov subspace method can be used. Since GMRES only requires the

product of S with some vector, v, the explicit storage of S can be avoided entirely. To compute the

Krylov subspace direction w = Sv one first computes

v′ = Ev (A.7)

Next,

Bz = v′ (A.8)

is solved for z. Finally, w is formed as

w = Cv −Fz (= Sv) (A.9)

to complete the computation.

The challenge when using the Schur complement method is in finding a suitable preconditioner for

the Krylov subspace method that is used to solve the linear system (A.5). An obvious choice is to use

the matrix S itself as the preconditioner. This is referred to as induced preconditioning. An alternative

to the exact formation of the Schur complement as a preconditioner, is to use an incomplete factorization

of S. This is an attractive alternative since a sparse, parallel Gaussian elimination algorithm can be

exploited. Another form of preconditioning can be based on probing. Probing stems from approximating

sparse Jacobians for nonlinear equations [38]. A major drawback to probing is that there are increased

losses in accuracy due to roundoff error in the various subiterations that may significantly degrade the

performance of GMRES.

In the literature, there are many examples on the use of domain decomposition in the construction of

preconditioners. A recent example is the work by Hicken and Zingg [166]. In their paper, they presented

a parallel Newton–Krylov solver for the 3D Euler equations. Both additive Schwarz and approximate

Schur preconditioners were explored.

A.2 Sparse Approximate Inverse Preconditioning

The ILU preconditioning approach is to find a matrix M such that M−1 is a good approximation

to A−1. Rather than attempt to form M and in turn obtain M−1, the sparse approximate inverse

preconditioning technique attempts to model A−1 directly.

Benzi [46] and Saad [38] describe various approaches for finding a sparse approximate inverse, P,

to the matrix A. It involves minimizing the Frobenius norm1 of the residual matrix, I − AP. This

1The Frobenius norm of a matrix A is given by ||A||F =(∑

j

∑i |aij |2

) 12

. An alternative form is to use ||A||F =

tr(AAH

) 12 .

A.2 SPARSE APPROXIMATE INVERSE PRECONDITIONING 125

minimization problem is given by

minP

F (P) (A.10)

where

F (P) = ||I − AP||2F =

n∑j

||ej −Apj ||22 (A.11)

and ej and pj are the jth columns of the I and P respectively. This is referred to as a global iteration.

Alternatively, one can minimize each individual function in the summation as

minpj||ej −Apj ||22 ∀ j = 1, . . . , n (A.12)

This approach is often referred to as a column-oriented approach. The column-oriented approach favours

a parallel implementation. In either case, a sparsity pattern constraint is used to limit the amount of

fill in the approximate preconditioner [190]. Saad [38] describes some simple methods for performing

the constrained minimization. They include a minimal residual (MR) algorithm and a steepest descent

method. Typically, the minimization problem is solved inexactly (i.e. to some prescribed tolerance of

the objective function).

A sparse approximate inverse preconditioner may become a good alternative or complement to the

widely-used ILU preconditioner. In its column-oriented problem formulation an efficient parallel applica-

tion could be formulated. However, current research shows that it is still very expensive to form a sparse

approximate inverse preconditioner in comparison to ILU, although it may be more robust for some

cases. For the Newton–Krylov approach used in this thesis, the sparse approximate inverse computation

would be an expensive burden in the transient solution phase.

REFERENCES

[1] Lomax, H., Pulliam, T. H., and Zingg, D. W., Fundamentals of Computational Fluid Dynamics,Springer–Verlag, Berlin, Germany, 2001.

[2] Nemec, M., Zingg, D. W., and Pulliam, T. H., “Multipoint and multi-objective aerodynamic shapeoptimization,” AIAA Journal , Vol. 42, No. 6, 2004, pp. 1057–1065.

[3] Pulliam, T. H., Nemec, M., Holst, T., and Zingg, D. W., “Comparison of evolutionary (genetic)algorithm and adjoint methods for multi-objective viscous airfoil optimizations,” The 41st AIAAAerospace Sciences Meeting and Exhibit , No. AIAA–2003–0298, January 2003.

[4] Chisholm, T. T., A fully coupled Newton–Krylov solver with a one-equation turbulence model ,Ph.D. thesis, University of Toronto, 2007.

[5] Pueyo, A., An efficient Newton–Krylov method for the Euler and Navier–Stokes equations, Ph.D.thesis, University of Toronto, December 1997.

[6] Geuzaine, P., An implicit upwind finite volume method for compressible turbulent flows on un-structured grids, Ph.D. thesis, Universite de Liege, April 1999.

[7] Saad, Y. and Schultz, M. H., “GMRES: A generalized minimal residual algorithm for solving non-symmetric linear systems,” SIAM Journal on Scientific and Statistical Computing , 1986, pp. 7:856–869.

[8] Pueyo, A. and Zingg, D. W., “An efficient Newton–GMRES solver for aerodynamic computations,”AIAA Paper 97-1955, 1997.

[9] Saad, Y. and van der Vorst, H. A., “Iterative solution of linear systems in the 20th century,”Journal of Computational and Applied Mathematics, Vol. 123, 2000, pp. 1–33.

[10] Simoncini, V. and Szyld, D. B., “Recent computational developments in Krylov subspace methodsfor linear systems,” Numerical Linear Algebra with Applications, Vol. 14, 2007, pp. 1–59.

[11] van der Vorst, H., Iterative Krylov methods for large linear systems, Cambridge, 1st ed., 2003.

126

REFERENCES 127

[12] Hestenes, M. and Stiefel, E., “Methods of conjugate gradients for solving linear systems,” J. Res.Nat. Bur. Stand., Vol. 49, 1952, pp. 409–436.

[13] Meijerink, J. A. and van der Vorst, H. A., “An iterative solution method for linear systems ofwhich the coefficient matrix is a symmetric M-matrix,” Mathematics of Computation, Vol. 31, No.137, January 1977, pp. 148–162.

[14] Fletcher, R., “Conjugate gradient methods for indefinite systems,” Proceedings of the DundeeBiennal Conference on Numerical Analysis 1974 , edited by G. Watson, Springer Verlag, NewYork, 1975, pp. 73–89.

[15] Sonneveld, P., “CGS, a fast Lanczos-type solver for nonsymmetric linear systems,” SIAM J. Sci.Stat. Comput., Vol. 10, January 1989, pp. 36–52.

[16] van der Vorst, H. A., “BI-CGSTAB: a fast and smoothly converging variant of BI-CG for thesolution of nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing ,Vol. 13, No. 2, March 1992, pp. 631–644.

[17] de Sturler, E., “Truncation strategies for optimal Krylov subspace methods,” SIAM Journal onNumerical Analysis, Vol. 36, No. 3, 1999, pp. 864–889.

[18] Abe, K. and Sleijpen, G. L. G., “BiCR variants of the hybrid BiCG methods for solving linear sys-tems with nonsymmetric matrices,” Journal of Computational and Applied Mathematics, Vol. 234,June 2010, pp. 985–994.

[19] Lanczos, C., “An iteration method for the solution of the eigenvalue problem of linear differentialand integral operators,” J. Res. Nat. Bur. Stand., Vol. 45, 1950, pp. 255–282.

[20] Arnoldi, W., “The principle of minimized iterations in the solution of the matrix eigenvalue prob-lem,” Quarterly of Applied Mathematics, Vol. 9, 1951, pp. 17–29.

[21] Lanczos, C., “Solution of systems of linear equations by minimized iterations,” Journal of Researchof the National Bureau of Standards, Vol. 49, 1952, pp. 33–53.

[22] Paige, C. and Saunders, M., “Solution of sparse indefinite systems of linear equations,” SIAMJournal on Numerical Analysis, Vol. 12, 1975, pp. 617–629.

[23] Concus, P. and Golub, G., “A generalized conjugate gradient method for nonsymmetric systems oflinear equations,” Computer methods in Applied Sciences and Engineering, Second InternationalSymposium, edited by R. Glowinski and J. Lions, Springer Verlag, New York, December 1976, pp.56–65.

[24] Vinsome, P., “ORTHOMIN: an iterative method for solving sparse sets of simultaneous linearequations,” Proceedings of the Fourth Symposium of Reservoir Simulation, Society of PetroleumEngineers of AIME, 1976, pp. 149–159.

[25] Widlund, O., “A Lanczos method for a class of non-symmetric systems of linear equations,” SIAMJ. Numer. Anal., Vol. 15, 1978, pp. 801–802.

[26] Jea, K. and Young, D., “Generalized conjugate-gradient acceleration of nonsymmetrizable iterativemethods,” Linear Algebra and its Applications, Vol. 34, 1980, pp. 159–194.

[27] Saad, Y., “Krylov subspace methods for solving large unsymmetric linear systems,” Mathematicsof Computation, Vol. 37, 1981, pp. 105–126.

REFERENCES 128

[28] Paige, C. C. and Saunders, M. A., “LSQR: an algorithm for sparse linear equations and sparseleast squares,” ACM Transactions on Mathematical Software, Vol. 8, March 1982, pp. 43–71.

[29] Eisenstat, S. C., Elman, H. C., and Schultz, M. H., “Variational iterative methods for nonsym-metric systems of linear equations,” SIAM Journal on Numerical Analysis, Vol. 20, No. 2, April1983, pp. 345–357.

[30] Freund, R. W. and Nachtigal, N. M., “QMR: a quasi-minimal residual method for non-Hermitianlinear systems,” Numer. Math., 1991, pp. 60:315–339.

[31] Gutknecht, M. H., “Variants of BICGSTAB for matrices with complex spectrum,” SIAM J. Sci.Comput., Vol. 14, September 1993, pp. 1020–1033.

[32] Sleijpen, G. L. G. and Fokkema, D. R., “BiCGStab(l) for linear equations involving unsymmetricmatrices with complex spectrum,” Electronic Transactions on Numerical Analysis, 1993, pp. 1:11–32.

[33] Freund, R. W. and Nachtigal, N. M., “An implementation of the QMR method based on coupledtwo-term recurrences,” SIAM Journal on Scientific Computing , Vol. 15, March 1994, pp. 313–337.

[34] Weiss, R., “Error-minimizing Krylov subspace methods,” SIAM Journal on Scientific Computing ,Vol. 15, May 1994, pp. 511–527.

[35] Chan, T. F., Gallopoulos, E., Simoncini, V., Szeto, T., and Tong, C. H., “A quasi-minimal residualvariant of the Bi-CGSTAB algorithm for nonsymmetric systems,” SIAM Journal on ScientificComputing , Vol. 15, March 1994, pp. 338–347.

[36] Kasenally, E. M., “GMBACK: a generalized minimum backward error algorithm for nonsymmetriclinear systems,” SIAM J. Sci. Comput., Vol. 16, May 1995, pp. 698–719.

[37] Fokkema, D. R., Sleijpen, G. L. G., and van der Vorst, H. A., “Generalized conjugate gradientsquared,” Journal of Computational and Applied Mathematics, Vol. 71, July 1996, pp. 125–146.

[38] Saad, Y., Iterative Methods for Sparse Linear Systems, PWS Publishing Company, 1996.

[39] Baker, A. H., Jessup, E. R., and Manteuffel, T., “A technique for accelerating the convergenceof restarted GMRES,” SIAM Journal on Matrix Analysis and Applications, Vol. 26, No. 4, 2005,pp. 962–984.

[40] Morgan, R. B., “Restarted block-GMRES with deflation of eigenvalues,” Applied Numerical Math-ematics, Vol. 54, July 2005, pp. 222–236.

[41] Simoncini, V. and Gallopoulos, E., “An iterative method for nonsymmetric systems with multipleright-hand sides,” SIAM Journal on Scientific Computing , Vol. 16, No. 4, 1995, pp. 917–933.

[42] Simoncini, V. and Gallopoulos, E., “A hybrid bock GMRES method for nonsymmetric systemswith multiple right hand sides,” Journal of Computational and Applied Mathematics, Vol. 66, 1996.

[43] Kilmer, M., Miller, E., and Rappaport, C., “QMR-based projection techniques for the solutionof non-Hermitian systems with multiple right-hand sides,” SIAM J. Sci. Comput., Vol. 23, No. 3,2001, pp. 761–780.

[44] Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R.,Romine, C., and der Vorst, H. V., Templates for the solution of linear systems: building blocks foriterative methods, 2nd edition, SIAM, Philadelphia, Pennsylvania, 1994.

REFERENCES 129

[45] Trottenberg, U., Oosterlee, C., and Schuller, A., Multigrid , chap. An introduction to algebraicmultigrid by K. Stuben, Academic Press, 2001, pp. 413–532.

[46] Benzi, M., “Preconditioning techniques for large linear systems: a survey,” Journal of Computa-tional Physics, Vol. 182, No. 2, 2002, pp. 418–477.

[47] Briggs, W. L., Henson, V. E., and McCormick, S. F., A multigrid tutorial (2nd ed.), Society forIndustrial and Applied Mathematics, Philadelphia, Pennsylvania, 2000.

[48] Wesseling, P., “Introduction to Multigrid Methods,” ICASE Report No. 95-11, February 1995.

[49] Wagner, C., “Introduction to algebraic multigrid,” Course Notes of an Algebraic Multigrid Courseat the University of Heidelberg.

[50] Stuben, K., “A review of algebraic multigrid,” Tech. Rep. REP-AiS-1999-69, German NationalResearch Centre for Information Technology, November 1999.

[51] Stuben, K., “Algebraic multigrid (AMG): an introduction with applications,” Tech. Rep. REP-AiS-1999-70, German National Research Centre for Information Technology, November 1999.

[52] Lomax, H., Pulliam, T. H., and Zingg, D. W., Fundamentals of computational fluid dynamics,chap. Multigrid, Springer-Verlag, 2001, pp. 177–187.

[53] Yavneh, I., “Why multigrid methods are so efficient,” Computing in Science and Engineering ,Vol. 8, November 2006, pp. 12–22.

[54] Southwell, R., “Stress calculation in frameworks by the method of systematic relaxation of con-straints,” Proc. Roy. Soc. London, Vol. 151, 1935, pp. 56–95.

[55] Fedorenko, R., “Finite difference scheme for the Stefan problem,” Zhurnal Vychislitel’noi Matem-atiki i Matematicheskoi Fiziki , Vol. 15, 1961, pp. 1339–1344.

[56] Fedorenko, R., “A relaxation method for solving elliptic difference equations,” USSR Computa-tional Mathematics and Mathematical Physics, Vol. 1, No. 4, 1962, pp. 1092–1096.

[57] Fedorenko, R., “The speed of convergence of one iterative process,” USSR Computational Mathe-matics and Mathematical Physics, Vol. 4, No. 3, 1964, pp. 227–235.

[58] Bakhvalov, N., “On the convergence of a relaxation method with natural constraints on the ellipticoperator,” USSR Computational Mathematics and Mathematical Physics, Vol. 6, No. 5, 1966,pp. 101–135.

[59] Brandt, A., “A multi-level adaptive technique MLAT for fast numerical solution to boundary valueproblems,” Proc. 3rd Int’l Conf. Numerical Methods in Fluid Mechanics, edited by H. Cabannesand R. Temam, Springer, Paris, 1973, pp. 82–89, Lecture Notes in Physics 18.

[60] Brandt, A., “Algebraic multigrid theory: the symmetric case,” Preliminary Proceedings for theInternational Multigrid Conference, Copper Mountain, Colorado, April 1983.

[61] Brandt, A., McCormick, S., and Ruge, J., “Algebraic multigrid (AMG) for automated multigridsolutions with applications to geodetic computations,” Tech. rep., Inst. for Computational Studies,Fort Collins, Colorado, October 1982.

[62] Ruge, J. and Stuben, K., “Algebraic multigrid,” Frontiers in Applied Mathematics, Vol. 5, chap.Multigrid Methods, SIAM Press, Philadelphia, McCormick ed., 1987, pp. 73–130.

REFERENCES 130

[63] Jameson, A., “Solution of the Euler equations for two dimensional transonic flow by a multigridmethod,” Applied Mathematics and Computation, Vol. 13, 1983, pp. 327–356.

[64] Jameson, A., “Multigrid solutions of the Euler equations using implicit schemes,” AIAA Journal ,Vol. 24, No. 11, 1986, pp. 1737–1743.

[65] Martinelli, L. and Jameson, A. and Grasso, F., “A multigrid method for the Navier–Stokes equa-tions,” 1986.

[66] Mavriplis, D., “Multigrid Strategies for Viscous Flow Solvers on Anisotropic UnstructuredMeshes,” Journal of Computational Physics, Vol. 145, No. 1, 1998, pp. 141 – 165.

[67] Mavriplis, D. J., “Multigrid approaches to non-linear diffusion problems on unstructured meshes,”Numerical Linear Algebra with Applications, Vol. 8, 2001, pp. 499–512.

[68] Mavriplis, D. J., “As assessment of linear versus nonlinear multigrid methods for unstructuredsolvers,” Journal of Computational Physics, Vol. 175, 2002, pp. 302–325.

[69] Moinier, P. and Giles, M. B., “Preconditioned Euler and Navier–Stokes calculations on unstruc-tured grids,” 6th Conference on Numerical Methods for Fluid Dynamics, ICFD, Oxford, UK, 1998.

[70] Zeng, S. and Wesseling, P., “Multigrid solution of the incompressible Navier–Stokes equations ingeneral coordinates,” SIAM Journal on Numerical Analysis, Vol. 31, No. 6, 1994, pp. 1764–1784.

[71] Allmaras, S. R., “Multigrid for the 2-D compressible Navier–Stokes equations,” AIAA Paper 99-3336, 1999.

[72] Weiss, J. M., Maruszewski, J. P., and Smith, W. A., “Implicit solution of preconditioned Navier–Stokes equations using algebraic multigrid,” AIAA Journal , Vol. 37, 1999, pp. 29–36.

[73] Griebel, M., Neunhoeffer, T., and Regler, H., “Algebraic multigrid methods for the solution of theNavier–Stokes equations in complicated geometries,” International Journal for Numerical Methodsin Fluids, Vol. 26, 1998, pp. 281–301.

[74] Ollivier-Gooch, C., “Multigrid acceleration of an upwind Euler solver on unstructured meshes,”AIAA Journal , Vol. 33, No. 10, October 1995, pp. 1822–1827.

[75] Morano, E., Mavriplis, D., and Venkatakrishnan, V., “Coarsening strategies for unstructured multi-grid techniques with application to anisotropic problems,” ICASE Report No. 95-34, May 1995.

[76] Thomas, J. L., Diskin, B., and Brandt, A., “Textbook multigrid efficiency for the incompressibleNavier–Stokes equations: high Reynolds number wakes and boundary layers,” Computers andFluids, Vol. 30, 2001, pp. 853–874.

[77] Bordner, J. and Saied, F., “MGLab: An interactive multigrid environment,” Seventh CopperMountain Conference on Multigrid Methods, edited by N. D. Melson, T. A. Manteuffel, S. F.McCormick, and C. C. Douglas, Vol. CP 3339, NASA, Hampton, Virginia, 1996, pp. 57–71.

[78] Lassaline, J. V., A Navier–Stokes equation solver using agglomerated multigrid featuring directionalcoarsening and line implicit smoothing , Ph.D. thesis, University of Toronto, 2003.

[79] Lassaline, J. V. and Zingg, D. W., “Development of an agglomeration multigrid algorithm withdirectional coarsening,” AIAA Paper 99-3338, 1999.

[80] Manzano, L.M., Implementation of multigrid for aerodynamic computations on multi-block grids,Master’s thesis, University of Toronto, January 1999.

REFERENCES 131

[81] Chisholm, T., Multigrid acceleration of an approximately-factored algorithm for steady aerodynamicflows, Master’s thesis, University of Toronto, January 1997.

[82] Luksch, P., “Algebraic multigrid,” 2002, www.bode.cs.tum.edu/Par/appls/apps/amg.html.

[83] Raw, M., “A coupled algebraic multigrid method for the 3D Navier–Stokes equations,” Proceed-ings of the 10th GAMM-Seminar, Notes on numerical fluid mechanics, Vol. 49, Vieweg-Verlag,Wiesbaden, 1995.

[84] Raw, M., “Robustness of coupled algebraic multigrid for the Navier–Stokes equations,” AIAAPaper 96-0297, Reno, Nevada, January 1996.

[85] Cleary, A. J., Falgout, R. D., Henson, V. E., Jones, J. E., Manteuffel, T. A., McCormick, S. F.,Miranda, G. N., and Ruge, J. W., “Robustness and scalability of algebraic multigrid,” SIAM J.Sci. Comput., Vol. 21, No. 5, 2000, pp. 1886–1908.

[86] Brezina, M., Cleary, A. J., Falgout, R. D., Henson, V. E., Jones, J. E., Manteuffel, T. A., Mc-Cormick, S. F., and Ruge, J. W., “Algebraic multigrid based on element interpolation (AMGe),”SIAM Journal on Scientific Computing , Vol. 22, 2000, pp. 1570–1592.

[87] Chartier, T. P., Element-based algebraic multigrid (AMGe) and spectral AMGe, Ph.D. thesis,University of Colorado, 2001.

[88] Haase, G., Kuhn, M., and Reitzinger, S., “Parallel algebraic multigrid methods on distributedmemory computers,” SIAM Journal on Scientific Computing , Vol. 24, No. 2, 2002, pp. 410–427.

[89] Axelsson, O. and Vassilevski, P. S., “A black box generalized conjugate gradient solver with inneriterations and variable-step preconditioning,” SIAM Journal on Matrix Analysis and Applications,Vol. 12, August 1991, pp. 625–644.

[90] Saad, Y., “A flexible inner-outer preconditioned GMRES algorithm,” SIAM J. Sci. Stat. Comput.,Vol. 14, 1993, pp. 461–469.

[91] van der Vorst, H. A. and Vuik, C., “GMRESR: a family of nested GMRES methods,” NumericalLinear Algebra with Applications, Vol. 1, 1994, pp. 369–386.

[92] Szyld, D. B. and Vogel, J. A., “FQMR: a flexible quasi-minimal residual method with inexactpreconditioning,” SIAM Journal on Scientific Computing , Vol. 23, February 2001, pp. 363–380.

[93] Vogel, J. A., “Flexible BiCG and flexible Bi-CGSTAB for nonsymmetric linear systems,” AppliedMathematics and Computation, Vol. 188, No. 1, 2007, pp. 226 – 233.

[94] Hicken, J. E. and Zingg, D. W., “A simplified and flexible variant of GCROT for solving nonsym-metric linear systems,” SIAM Journal on Scientific Computing , Vol. 32, No. 3, 2010, pp. 1672–1694.

[95] Buleev, N. I., “A numerical method for solving two-dimensional diffusion equations,” AtomicEnergy , Vol. 6, 1960, pp. 222–224, 10.1007/BF01481461.

[96] Varga, R., “Factorization and normalized iterative methods,” Boundary problems in differentialequations, edited by R. Langer, University of Wisconsin Press, Madison, 1960, pp. 121–142.

[97] Oliphant, T., “An implicit, numerical method for solving two-dimensional time-dependent diffusionproblems,” Quarterly of Applied Mathematics, Vol. 19, 1961, pp. 221–229.

[98] Oliphant, T., “An extrapolation process for solving linear systems,” Quarterly of Applied Mathe-matics, Vol. 20, 1962, pp. 257–267.

REFERENCES 132

[99] Dupont, T., Kendall, R., and Rachford, H., “An approximate factorization procedure for solvingself-adjoint elliptic difference equations,” SIAM Journal on Numerical Analysis, Vol. 5, 1968,pp. 559–573.

[100] Manteuffel, T., “An incomplete factorization technique for positive definite linear systems,” Math-ematics of Computation, Vol. 34, No. 150, April 1980, pp. 473–497.

[101] Eisenstat, S., “Efficient implementation of a class of preconditioned conjugate gradient methods,”SIAM J. Sci. Statist. Comput., Vol. 2, 1981, pp. 1–4.

[102] Elman, H. C., “A stability analysis of incomplete LU factorizations,” Mathematics of Computation,Vol. 47, No. 175, July 1986, pp. 191–217.

[103] Bruaset, A. M., Tveito, A., and Winther, R., “On the stability of relaxed incomplete LU factor-izations,” Mathematics of Computation, Vol. 54, No. 190, April 1990, pp. 701–719.

[104] Chow, E. and Saad, Y., “Experimental study of ILU preconditioners for indefinite matrices,”Journal of Computational and Applied Mathematics, Vol. 86, No. 2, 1997, pp. 387–414.

[105] Gopaul, A., Sunhaloo, M., Boojhawon, R., and Bhuruth, M., “Analysis of incomplete factorizationsfor a nine-point approximation to a convection-diffusion model problem,” Journal of Computationaland Applied Mathematics, Vol. 224, 2009, pp. 719–733.

[106] Gustafsson, I., “A class of first-order factorization methods,” BIT , Vol. 18, 1978, pp. 142–156.

[107] Watts III, J., “A conjugate gradient truncated direct method for the iterative solution of thereservoir simulation pressure equation,” Society of Petroleum Engineers Journal , Vol. 21, 1981,pp. 345–353.

[108] Meijerink, J. A. and van der Vorst, H. A., “Guidelines for the usage of incomplete decompositionsin solving sets of linear equations as they occur in practical problems,” Journal of ComputationalPhysics, Vol. 44, 1981, pp. 134–155.

[109] Chapman, A., Saad, Y., and Wigton, L., “High-order ILU preconditioners for CFD problems,”International Journal for Numerical Methods in Fluids, Vol. 33, 2000, pp. 767–788.

[110] Zlatev, Z., “Use of iterative refinement in the solution of sparse linear systems,” SIAM Journal onNumerical Analysis, Vol. 19, 1982, pp. 381–399.

[111] Young, D. P., Melvin, R. G., Johnson, F. T., Bussoletti, J. E., Wigton, L. B., and Samant, S. S.,“Application of sparse matrix solvers as effective preconditioners,” SIAM Journal on Scientificand Statistical Computing , Vol. 10, November 1989, pp. 1186–1199.

[112] Gallivan, K., Sameh, A., and Zlatev, Z., “A parallel hybrid sparse linear system solver,” ComputingSystems in Engineering , Vol. 1, No. 2-4, 1990, pp. 183–195.

[113] D’Azevedo, E. F., Forsyth, P. A., and Tang, W.-P., “Ordering methods for preconditioned conju-gate gradient methods applied to unstructured grid problems,” SIAM Journal on Matrix Analysisand Applications, Vol. 13, July 1992, pp. 944–961.

[114] D’Azevedo, E. F., Forsyth, P. A., and Tang, W.-P., “Towards a cost-effective ILU preconditionerwith high-level fill,” BIT , Vol. 32, October 1992, pp. 442–463.

[115] Saad, Y., “ILUT: A dual threshold incomplete ILU factorization,” Numerical Linear Algebra withApplications, Vol. 1, 1994, pp. 387–402.

REFERENCES 133

[116] Jones, M. and Plassman, P., “An improved Cholesky factorization,” ACM Transactions on Math-ematical Software, Vol. 21, No. 5, 1995.

[117] van der Vorst, H. A., “Iterative solution methods for certain sparse linear systems with a non-symmetric matrix arising from PDE-problems,” Journal of Computational Physics, Vol. 44, No. 1,1981, pp. 1–19.

[118] Axelsson, O. and Lindskog, G., “On the eigenvalue distribution of a class of preconditioningmethods,” Numerische Mathematik , Vol. 48, 1986, pp. 479–498.

[119] Elman, H. C., “Relaxed and stabilized incomplete factorizations for non-self-adjoint linear sys-tems,” BIT , Vol. 29, 1989, pp. 890–915.

[120] Wittum, G. and Liebau, F., “On truncated incomplete decompositions,” BIT , Vol. 29, 1989,pp. 179–740.

[121] van der Vorst, H. A., “The convergence behaviour of preconditioned CG and CG-S in the presenceof rounding errors,” Preconditioned Conjugate Gradient Methods, edited by O. Axelsson and L. Y.Kolotilina, Nijmegen 1989, 1990, Lecture Notes in Mathematics 1457.

[122] Underwood, R., “An approximate factorization procedure based on the block Cholesky decompo-sition and its use with the conjugate gradient method,” Tech. Rep. NEDO-11386, General ElectricCo., Nuclear Energy Div., San Jose, California, 1976.

[123] Concus, P., Golub, G., and Meurant, G., “Block preconditioning for the conjugate gradientmethod,” SIAM Journal on Scientific and Statistical Computing , Vol. 6, 1985, pp. 220–252.

[124] Concus, P. and Meurant, G., “On computing INV block preconditioning for the conjugate gradientmethod,” BIT , Vol. 26, December 1986, pp. 493–504.

[125] Axelsson, O., “A general incomplete block-matrix factorization method,” Linear Algebra and itsApplications, Vol. 74, 1986, pp. 179–190.

[126] Magolu, M., “Modified-block-approximate factorization strategies,” Numerische Mathematik ,Vol. 61, 1992, pp. 91–110.

[127] Yun, J. H., “Block ILU preconditioners for a nonsymmetric block-tridiagonal M-matrix,” BITNumerical Mathematics, Vol. 40, 2000, pp. 583–605, 10.1023/A:1022328131952.

[128] Orkwis, P. D., “Comparison of Newton’s and quasi-Newton’s method solvers for the Navier–Stokesequations,” AIAA Journal , Vol. 31, No. 5, 1993, pp. 832–836.

[129] Duff, I. S. and Ucar, B., “Combinatorial problems in solving linear systems,” Invited presentationat Dagstuhl Seminar on Combinatorial Scientific Computing delivered by Iain S. Duff, February2009.

[130] Markowitz, H. M., “The elimination form of the inverse and its application to linear programming,”Management Science, Vol. 3, No. 3, April 1957, pp. 255–269.

[131] Tinney, W. F. and Walker, J. W., “Direct solutions of sparse network equations by optimallyordered triangular factorization,” Proceedings of the IEEE , Vol. 55, No. 11, 1967, pp. 1801–1809.

[132] Rosen, R., “Matrix bandwidth minimization,” Proceedings of the 1968 23rd ACM national confer-ence, ACM ’68, ACM, New York, New York, 1968, pp. 585–595.

REFERENCES 134

[133] Cuthill, E. and McKee, J., “Reducing the bandwidth of sparse symmetric matrices,” 24th NationalConference of the Association for Computing Machinery , No. ACM P-69, Brandon Press, NewYork, 1969.

[134] George, A., Computer implementation of the finite element method , Ph.D. thesis, Stanford Uni-versity, 1971.

[135] George, A., “Nested dissection of a regular finite-element mesh,” SIAM Journal on NumericalAnalysis, Vol. 10, 1973, pp. 345–363.

[136] Gibbs, N., Poole, Jr, W., and Stockmeyer, P., “An algorithm for reducing the bandwidth andprofile of a sparse matrix,” SIAM Journal on Numerical Analysis, Vol. 13, 1976, pp. 236–250.

[137] Sloan, S., “An algorithm for profile and wavefront reduction of sparse matrices,” InternationalJournal for Numerical Methods in Engineering , Vol. 23, No. 239, 1986.

[138] Baumann, M., Fleischmann, P., and Mutzbauer, O., “Double ordering and fill-in for the LUfactorization,” SIAM Journal on Matrix Analysis and Applications, Vol. 25, No. 3, 2003, pp. 630–641.

[139] Hassan, O., Morgan, K., and Peraire, J., “An implicit finite element method for high speed flows,”AIAA Paper 90-0402, Reno, Nevada, January 1990.

[140] Dutto, L. C., “The effect of ordering on preconditioned GMRES algorithm, for solving the com-pressible Navier–Stokes equations,” International Journal for Numerical Methods in Engineering ,Vol. 36, 1993, pp. 457–497.

[141] Duff, I. and Meurant, G., “The effect of ordering on preconditioned conjugate gradients,” BIT ,Vol. 29, No. 4, 1989, pp. 635–657.

[142] Hendrickson, B. and Rothberg, E., “Proceedings of the Eighth SIAM Conference on Parallel Pro-cessing for Scientific Computing, Hyatt Regency Minneapolis on Nicollel Mall Hotel, Minneapolis,Minnesota, USA,” PPSC , SIAM, 1997.

[143] Henon, P., Ramen, P., and Roman, J., “On finding approximate supernodes for an efficient block-ILU(k) factorization,” Parallel Computing , Vol. 34, 2008, pp. 345–362.

[144] Clift, S. and Tang, W.-P., “Wieghted graph-based ordering techniques for preconditioned conjugategradient methods,” BIT , Vol. 35, No. 30, 1995.

[145] Persson, P.-O. and Peraire, J., “Newton–GMRES preconditioning for discontinuous Galerkin dis-cretizations of the Navier–Stokes equations,” SIAM Journal on Scientific Computing , Vol. 30,No. 6, 2008, pp. 2709–2733.

[146] Liu, W. and Sherman, A. H., “Comparative analysis of the Cuthill-McKee and the reverse Cuthill-McKee ordering algorithms for sparse matrices,” SIAM Journal on Numerical Analysis, Vol. 13,No. 2, 1976, pp. 198–213.

[147] Benzi, M., Szyld, D. B., and Duin, A. V., “Orderings for incomplete factorization preconditioning ofnonsymmetric problems,” SIAM Journal on Scientific Computing , Vol. 20, No. 5, 1999, pp. 1652–1670.

[148] Pollul, B. and Reusken, A., “Numbering techniques for preconditioners in iterative solvers forcompressible flows,” International Journal for Numerical Methods in Fluids, Vol. 55, 2007, pp. 241–261.

REFERENCES 135

[149] Chisholm, T. T. and Zingg, D. W., “A Jacobian-free Newton-Krylov algorithm for compressibleturbulent fluid flows,” Journal of Computational Physics, Vol. 228, No. 9, 2009, pp. 3490–3507.

[150] Bondarabady, H. A. R. and Kaveh, A., “Nodal ordering using graph theory and a genetic algo-rithm,” Finite Elements in Analysis and Design, Vol. 40, June 2004, pp. 1271–1280.

[151] Dubois, P., Greenbaum, A., and Rodrigue, G., “Approximating the inverse of a matrix foruse in iterative algorithms on vector processors,” Computing , Vol. 22, 1979, pp. 257–268,10.1007/BF02243566.

[152] van der Vorst, H. A., “A vectorizable variant of some ICCG methods,” SIAM Journal on Scientificand Statistical Computing , Vol. 3, No. 3, September 1982, pp. 350–356.

[153] van der Vorst, H. A., “High performance preconditioning,” SIAM Journal on Scientific and Sta-tistical Computing , Vol. 10, No. 6, November 1989, pp. 1174–1185.

[154] Anderson, E. C. and Saad, Y., “Solving sparse triangular systems on parallel computers.” Inter-national Journal of High Speed Computing , Vol. 1, 1989, pp. 73–96.

[155] Elman, H. C. and Golub, G. H., “Line iterative methods for cyclically reduced discrete convection-diffusion problems,” SIAM J. Sci. Stat. Comput., Vol. 13, January 1992, pp. 339–363.

[156] Adams, L., LeVeque, R., and Young, D., “Analysis of the SOR iteration fo the q-point Laplacian,”SIAM Journal on Numerical Analysis, Vol. 25, 1988, pp. 1156–1180.

[157] Hysom, D. and Pothen, A., “Parellel ILU ordering and convergence relationships: numerical ex-periments,” Tech. Rep. CR-2000-210119, NASA, May 2000.

[158] Hysom, D. and Pothen, A., “Efficient parallel computation of ILU(k) preconditioners,” Tech. Rep.CR-2000-210120, NASA, May 2000.

[159] Schwarz, H. A., Gesammelte Mathematische Abhandlungen, Vol. 2, Springer, Berlin, 1890, pages133-143.

[160] Miller, K., “Numerical analogs to the Schwarz alternating procedure,” Numerische Mathematik ,Vol. 7, 1965, pp. 91–103.

[161] Mandel, J., “Two-level domain decomposition preconditioning for the p-version finite elementmethod in three dimensions,” Fourth Copper Mountain Conference on Multigrid Methods, CopperMountain, Colorado, April 1989.

[162] Knoll, D. A., McHugh, P. R., and Keyes, D. E., “Newton–Krylov methods for low-Mach-Numbercompressible combustion,” AIAA Journal , Vol. 34, No. 5, May 1996, pp. 961–967.

[163] Fischer, P. F., Miller, N. I., and Tufo, H. M., “An overlapping Schwarz method for spectral elementsimulation of three-dimensional incompressible flows,” IMA Domain Decomposition Workshop Pro-ceedings, 1997.

[164] Saad, Y., Sosonkina, M., and Zhang, J., “Domain decomposition and multi-level type techniquesfor general sparse linear systems,” Tech. Rep. umsi-97-244, Minnesota Supercomputer Institute,University of Minnesota, 1997.

[165] Gropp, W. D., Keyes, D. E., McInnes, L. C., and Tidriri, M. D., “Globalized Newton–Krylov-Schwarz algorithms and software for parallel implicit CFD,” The International Journal of HighPerformance Computing Applications, Vol. 14, No. 2, 2000, pp. 102–136.

REFERENCES 136

[166] Hicken, J. E. and Zingg, D. W., “A parallel Newton-Krylov solver for the Euler equations dis-cretized using simultaneous approximation terms,” AIAA Journal , Vol. 46, No. 11, Nov. 2008,pp. 2773–2786.

[167] Benson, M., Iterative solution of large scale linear systems, Master’s thesis, Lakehead University,1973.

[168] Benson, M. and Frederickson, P., “Iterative solution of large sparse linear systems arising in certainmultidimensional approximation problems,” Utilitas Mathematica, Vol. 22, 1982, pp. 127–140.

[169] Benzi, M., Meyer, C. D., and Tuma, M., “A sparse approximate inverse preconditioner for the con-jugate gradient method,” SIAM Journal on Scientific Computing , Vol. 17, No. 5, 1996, pp. 1135–1149.

[170] Kolotilina, L. Y. and Yeremin, A. Y., “On a family of two-level preconditionings of the incompleteblock factorization type,” Soviet Journal of Numerical Analysis and Mathematical Modeling , Vol. 1,1993, pp. 293–320.

[171] Kolotilina, L. Y. and Yeremin, A. Y., “Factorized sparse approximate inverse preconditionings I:theory,” SIAM Journal on Matrix Analysis and Applications, Vol. 14, January 1993, pp. 45–58.

[172] Grote, M. and Simon, H., “Parallel preconditioning and approximate inverses on the connectionmachine,” Parallel processing for scientific computing , edited by R. Sincovec, K. D.E., P. L.R.,and R. D.A., Vol. 2, SIAM, 1992, pp. 519–523.

[173] Cosgrove, J. D. F., Approximate inverses as parallel preconditionings, Ph.D. thesis, University ofOklahoma, 1992.

[174] Cosgrove, J., Diaz, J., and Griewank, A., “Approximate inverse preconditioning for sparse linearsystems,” International Journal of Computer Mathematics, Vol. 44, 1992, pp. 91–110.

[175] Chow, E. and Saad, Y., “Approximate inverse preconditioners for general sparse matrices,” Tech.Rep. umsi-94-101, Minnesota Supercomputer Institute, University of Minnesota, 1994.

[176] Huckle, T. and Grote, M. J., “A new approach to parallel preconditioning with sparse approximateinverses,” Tech. Rep. SCCM-94-03, SCCM Program, Stanford University, September 1994.

[177] Grote, M. J. and Huckle, T., “Parallel preconditioning with sparse approximate inverses,” SIAMJournal on Scientific Computing , Vol. 18, May 1997, pp. 838–853.

[178] Chow, E. and Saad, Y., “Approximate inverse techniques for block-partitioned matrices,” Tech.Rep. umsi-95-13, Minnesota Supercomputer Institute, University of Minnesota, 1995.

[179] Barnard, S. T. and Grote, M. J., “A block version of the SPAI preconditioner,” 9th SIAM Con-ference on Parallel Processing for Scientific Computing , March 1999.

[180] Chow, E. and Saad, Y., “Approximate inverse preconditioners via sparse-sparse iterations,” SIAMJournal on Scientific Computing , Vol. 19, 1998, pp. 995–1023.

[181] Sosonkina, M., “Sparse approximate inverses in preconditioning distributed linear systems,” Tech.Rep. TR-97-11, Department of Computer Science, Virginia Polytechnic Institute and State Uni-versity, 1997.

[182] Huckle, T., “Factorized sparse approximate inverses for preconditioning,” Journal of Supercom-puting , Vol. 25, June 2003, pp. 109–117.

REFERENCES 137

[183] Alleon, G., Benzi, M., and Giraud, L., “Sparse approximate inverse preconditioning for denselinear systems arising in computational electromagnetics,” Tech. Rep. TR/PA/97/05, CERFACS,1997.

[184] Carpentier, B., Duff, I. S., Giraud, L., and monga Made, M. M., “Sparse symmetric preconditionersfor dense linear systems in electromagnetism,” Tech. Rep. TR/PA/01/35, CERFACS, 2001.

[185] Guillaume, P., Saad, Y., and Sosonika, M., “Rational approximation preconditioners for generalsparse linear systems,” Tech. Rep. umsi-99-209, Minnesota Supercomputer Institute, University ofMinnesota, 1999.

[186] Chow, E., “A preori sparsity patterns for parallel sparse approximate inverse preconditioners,”SIAM Journal on Scientific Computing , Vol. 21, No. 5, 2000, pp. 1804–1822.

[187] Tang, W. P. and Wan, W. L., “Sparse approximate inverse smoother for multigrid,” SIAM Journalon Matrix Analysis and Applications, Vol. 21, No. 4, 2000, pp. 1236–1252.

[188] Broker, O., Grote, M. J., Mayer, C., and Reusken, A., “Robust parallel smoothing for multigridvia sparse approximate inverses,” SIAM J. Sci. Comput., Vol. 23, No. 4, 2001, pp. 1396–1417.

[189] Bollhoefer, M. and Saad, Y., “On the relations between ILUs and factored approximate inverses,”SIAM Journal on Matrix Analysis, Vol. 24, 2002, pp. 219–237.

[190] Huckle, T., Kallischko, A., Roy, A., Sedlacek, M., and Weinzierl, T., “An efficient parallel imple-mentation of the MSPAI preconditioner,” Parallel Computing , Vol. 36, 2010, pp. 273–284.

[191] Axelsson, O. and Vassilevski, P. S., “Algebraic multilevel preconditioning methods, I,” NumerischeMathematik , Vol. 56, 1989, pp. 157–177.

[192] Axelsson, O. and VAssilevski, P. S., “Algebraic multilevel preconditioning methods, II,” SIAMJournal on Numerical Analysis, Vol. 27, November 1990, pp. 1569–1590.

[193] van der Ploeg, A., Botta, E. F. F., and Wubs, F. W., “Nested grids ILU-decomposition (NGILU),”Journal of Computational and Applied Mathematics, Vol. 66, January 1996, pp. 515–526.

[194] Botta, E. F. F. and Wubs, F. W., “Matrix renumbering ILU: an effective algebraic multilevel ILUpreconditioner for sparse matrices,” SIAM Journal on Matrix Analysis and Applications, Vol. 20,No. 4, 1999, pp. 1007–1026.

[195] Saad, Y., “ILUM: A multi-elimination ILU preconditioner for general sparse matrices,” SIAMJournal on Scientific Computing , Vol. 17, No. 4, 1996, pp. 830–847.

[196] Vassilevski, P., “A block-factorization (algebraic) formulation of multigrid and Schwarz methods,”East-West Journal of Numerical Mathematics, Vol. 6, 1998, pp. 65–79.

[197] Bank, R. and Wagner, C., “Multilevel ILU Decomposition,” Numerische Mathematik , Vol. 82,1999, pp. 543–576.

[198] Saad, Y. and Zhang, J., “BILUM: Block versions of multielimination and multilevel ILU precondi-tioner for general sparse linear systems,” SIAM Journal on Scientific Computing , Vol. 20, No. 6,1999, pp. 2103–2121.

[199] Saad, Y. and Zhang, J., “BILUTM: A domain-based multilevel block ILUT preconditioner forgeneral sparse matrices,” SIAM Journal on Scientific Computing , Vol. 21, No. 1, 1999, pp. 279–299.

REFERENCES 138

[200] Saad, Y. and Zhang, J., “Enhanced multi-level block ILU preconditioning strategies for generalsparse linear systems,” Computational and Applied Mathematics, Vol. 130, 2001, pp. 99–188.

[201] Saad, Y. and Suchomel, B., “ARMS: An algebraic recursive multilevel solver for general sparselinear systems,” NLAA, Vol. 9, 2001, pp. 359–378.

[202] Saad, Y., Soulaimani, A., and Touihri, R., “Adapting algebraic recursive multilevel solvers (ARMS)for solving CFD problems,” Tech. Rep. umsi-2002-105, Minnesota Supercomputer Institute, Uni-versity of Minnesota, 2002.

[203] Shen, C. and Zhang, J., “Parallel two level block ILU preconditioning techniques for solving largesparse linear systems,” Parallel Computing , Vol. 28, 2002, pp. 1451–1475.

[204] Shen, C., Zhang, J., and Wang, K., “Distributed block independent set algorithms and parallelmultilevel ILU preconditioners,” Journal of Parallel and Distributed Computing , Vol. 65, No. 3,2005, pp. 331–346.

[205] Gu, T.-X., Chi, X.-B., and Liu, X.-P., “AINV and BILUM preconditioning techniques,” AppliedMathematics and Mechanics (English Edition), Vol. 25, No. 9, 2004, pp. 1012–1021.

[206] Saad, Y., “Multilevel ILU With reorderings for diagonal dominance,” SIAM Journal on ScientificComputing , Vol. 27, No. 3, 2005, pp. 1032–1057.

[207] Mayer, J., “A multilevel Crout ILU preconditioner with pivoting and row permutation,” NumericalLinear Algebra with Applications, Vol. 14, 2007, pp. 771–789.

[208] Bollhoefer, M. and Saad, Y., “Multilevel preconditioners constructed from inverse-based ILUs,”SIAM Journal on Scientific Computing , Vol. 27, No. 5, 2006, pp. 1627–1650.

[209] Notay, Y., “Using approximate inverses in multilevel methods,” Numerische Mathematik , Vol. 80,1998, pp. 397–417.

[210] Bollhoefer, M. and Mehrmann, V., “Algebraic multilevel methods and sparse approximate in-verses,” SIAM Journal on Matrix Analysis and Applications, Vol. 1, 2002, pp. 191–218.

[211] Meurant, G., “A multilevel AINV preconditioner,” Numerical Algorithms, Vol. 29, 2002, pp. 107–129.

[212] Axelsson, O. and Vasilevski, P. S., “A survey of multilevel preconditioned iterative methods,” BIT ,Vol. 29, No. 4, 1989, pp. 769–793.

[213] Oosterlee, C. W. and Washio, T., “An evaluation of parallel multigrid as a solver and a precondi-tioner for singularly perturbed problems,” SIAM Journal on Scientific Computing , Vol. 19, No. 1,1998, pp. 87–110.

[214] Braess, D., “Towards algebraic multigrid for elliptic problems of second order,” Computing , 1995,pp. 379–393.

[215] Hager, J. O. and Lee, K. D., “Effects of implicit preconditioners on solution acceleration schemesin CFD,” International Journal for Numerical Methods in Fluids, Vol. 22, No. 10, 1996, pp. 1023–1035.

[216] Oliveira, S. and Deng, Y., “Preconditioned Krylov subspace methods for transport equations,”Progress in Nuclear Energy , Vol. 33, No. 1/2, 1998, pp. 155–174.

REFERENCES 139

[217] Oosterlee, C. W. and Washio, T., “On the use of multigrid as a preconditioner,” Ninth InternationalConference on Domain Decomposition Methods, No. 52, Bergen, Norway, 1998, pp. 441–448.

[218] Washio, T. and Oosterlee, C. W., “Krylov subspace acceleration for nonlinear multigrid schemes,”Electronic Transactions on Numerical Analysis, Vol. 6, December 1997, pp. 271–290.

[219] Oosterlee, C. W. and Washio, T., “Krylov subspace acceleration of nonlinear multigrid with ap-plication to recirculating flows,” SIAM Journal on Scientific Computing , Vol. 21, No. 5, 2000,pp. 1670–1690.

[220] Wienands, R., Oosterlee, C. W., and Washio, T., “Fourier analysis of GMRES(m) preconditionedby multigrid,” SIAM Journal of Scientific Computing , Vol. 22, No. 2, 2000, pp. 582–603.

[221] Tuminaro, R., Tong, C., Shadid, J., Devine, K., and Day, D., “On a multilevel preconditioningmodule for unstructured mesh Krylov solvers: Two-level Schwarz,” Communications in NumericalMethods in Engineering , Vol. 18, 2002, pp. 363–389.

[222] Wang, Q. and Joshi, Y., “Algebraic multigrid preconditioned Krylov subspace methods for fluidflow and heat transfer on unstructured meshes,” Numerical Heat Transfer, Part B , Vol. 49, 2006,pp. 197–221.

[223] Pennacchio, M. and Simoncini, V., “Algebraic multigrid preconditioners for the bidomain reaction–diffusion system,” Appl. Numer. Math., Vol. 59, December 2009, pp. 3033–3050.

[224] Wigton, L., Yu, N., and Young, D., “GMRES acceleration of computational fluid dynamics codes,”No. 1494, AIAA, July 1985.

[225] Venkatakrishnan, V., “Newton solution of inviscid and viscous problems,” AIAA Journal , Vol. 27,No. 7, 1989, pp. 885–891.

[226] Johan, Z., Hughes, T., and Shakib, F., “A globally convergent matrix-free algorithm for implicittime-marching schemes arising in finite element analysis in fluids,” Computer Methods in AppliedMechanics and Engineering , Vol. 87, 1991, pp. 281–304.

[227] Ajmani, K., Preconditioned conjugate gradient methods for the Navier–Stokes equations, Ph.D.thesis, Virginia Polytechnic Institute and State University, 1991.

[228] Ajmani, K., Ng, W., and Liou, M., “Preconditioned conjugate gradient methods for the Navier–Stokes equations,” Journal of Computational Physics, Vol. 110, 1994, pp. 68–81.

[229] Venkatakrishnan, V. and Mavriplis, D. J., “Implicit solvers for unstructured meshes,” Journal ofComputational Physics, Vol. 105, 1993, pp. 83–91.

[230] McHugh, P. R. and Knoll, D. A., “Comparison of standard and matrix-free implementations ofseveral Newton–Krylov solvers,” AIAA Journal , Vol. 32, No. 12, 1994, pp. 2394–2400.

[231] Barth, T. J. and Linton, S. W., “An unstructured mesh Newton solver for compressible fluid flowand its parallel implementation,” AIAA Paper 95-0221, 1995.

[232] Nielsen, E. J. and Anderson, W. K. and Walters, R. W. and Keyes, D. E., “Application of Newton–Krylov methodology to a three-dimensional unstructured Euler code,” AIAA Paper 95-1733, 1995.

[233] Anderson, W. K., Rausch, R. D., and Bonhaus, D. L., “Implicit/Multigrid Algorithms for Incom-pressible Turbulent Flows on Unstructured Grids,” AIAA Paper 95-1740, 1995.

REFERENCES 140

[234] Anderson, W. K., Rausch, R. D., and Bonhaus, D. L., “Implicit/multigrid algorithms for incom-pressible turbulent flows on unstructured grids,” Journal of Computational Physics, Vol. 128, 1996,pp. 391–408.

[235] Dawson, C. N., Klie, H., Wheeler, M. F., and Woodward, C. S., “A parallel, implicit, cell-centeredmethod for two-phase flow with a preconditioned Newton–Krylov solver,” Computational Geo-sciences, Vol. 1, 1997, pp. 215–249.

[236] Wille, S. O., “Adaptive linearization and grid iterations with the tri-tree multigrid refinement-recoarsement algorithm for the Navier–Stokes equations,” International Journal for NumericalMethods in Fluids, Vol. 24, 1997, pp. 155–168.

[237] Blanco, M. and Zingg, D. W., “Fast Newton–Krylov method for unstructured grids,” AIAA Jour-nal , Vol. 36, No. 4, 1998, pp. 607–612.

[238] Pueyo, A. and Zingg, D. W., “Efficient Newton–Krylov solver for aerodynamic computations,”AIAA Journal , Vol. 36, No. 11, 1998, pp. 1991–1997.

[239] Geuzaine, P., Lepot, I., Meers, F., and Essers, J.-A., “Multilevel Newton–Krylov algorithms forcomputing compressible flows on unstructured meshes,” AIAA 14th Computational Fluid Dynam-ics Conference, No. 99-3341, Norfolk, Virginia, June 1999, pp. 750–760.

[240] Geuzaine, P., “Newton–Krylov strategy for compressible turbulent flows on unstructured meshes,”AIAA Journal , Vol. 39, No. 3, 2000, pp. 528–531.

[241] Gropp, W., Keyes, D., McInnes, L. C., and Tidriri, M. D., “Globalized Newton–Krylov–Schwarzalgorithms and software for parallel implicit CFD,” International Journal of High PerformanceComputing Applications, Vol. 14, May 2000, pp. 102–136.

[242] Chisholm, T. T. and Zingg, D. W., “A fully coupled Newton–Krylov solver for turbulent aerody-namic flows,” ICAS 2002 Congress, No. 333, 2002.

[243] Chisholm, T. T. and Zingg, D. W., “A Newton-Krylov algorithm for turbulent aerodynamic flows,”AIAA Paper 2003–0071, Reno, Nevada, January 2003.

[244] Chisholm, T. T. and Zingg, D. W., “Start-up issues in a Newton–Krylov algorithm for turbulentaerodynamic flows,” AIAA Paper 2003-3708, Orlando, Florida, June 2003.

[245] Zingg, D. W. and Chisholm, T. T., “Jacobian-free Newton-Krylov methods: issues and solutions,”Proceedings of The Fourth International Conference on Computational Fluid Dynamics, Ghent,Belgium, July 2006.

[246] Nemec, M. and Zingg, D. W., “Towards efficient aerodynamic shape optimization based on theNavier–Stokes equations,” AIAA Paper 2001-2532, June 2001.

[247] Nemec, M. and Zingg, D. W., “Newton–Krylov algorithm for aerodynamic design using the Navier–Stokes equations,” AIAA Journal , Vol. 40, No. 6, June 2002, pp. 1146–1154.

[248] Nemec, M., Zingg, D. W., and Pulliam, T. H., “Multi-point and multi-objective aerodynamicshape optimization,” AIAA Paper 2002-5548, September 2002.

[249] Nemec, M. and Zingg, D. W., “Optimization of high-lift configurations using a Newton–Krylovalgorithm,” AIAA Paper 2003-3957, Orlando, Florida, June 2003.

REFERENCES 141

[250] Nemec, M., Aftosmis, M. J., Murman, S. M., and Pulliam, T. H., “Adjoint formulation for anembedded-boundary Cartesian method,” 43rd AIAA Aerospace Sciences Meeting and Exhibit , No.AIAA–2005–0877, Reno, Nevada, 2005, NAS Technical Report NAS–05–008.

[251] Nemec, M. and Aftosmis, M. J., “Aerodynamic shape optimization using a Cartesian adjointmethod and CAD geometry,” AIAA Paper 2006-3456, San Francisco, California, June 2006.

[252] Nemec, M., Optimal shape design of aerodynamic configurations: A Newton–Krylov approach,Ph.D. thesis, University of Toronto, 2003.

[253] Gatsis, J., A fully-coupled algorithm for aerodynamic design optimization, Master’s thesis, Univer-sity of Toronto, 2001.

[254] Gatsis, J. and Zingg, D. W., “A fully-coupled Newton–Krylov algorithm for aerodynamic designoptimization,” AIAA Paper 2003-3956, Orlando, Florida, June 2003.

[255] Olawsky, F., Infed, F., and Auweter-Kurtz, M., “Preconditioned Newton method for computingsupersonic and hypersonic nonequilibrium flows,” AIAA Paper 2003-3072, 2003.

[256] Harrison, R. J., “Krylov subspace accelerated inexact Newton method for linear and nonlinearequations,” Journal of Computational Chemistry , Vol. 3, February 2004, pp. 328–334.

[257] Vandekerckhove, C., Kevrekidis, I., and Roose, D., “An Efficient Newton–Krylov implementationof the constrained runs scheme for initializing on a slow manifold,” Journal of Scientific Computing ,Vol. 39, May 2009, pp. 167–188.

[258] Nichols, J. C., A three-dimensional multi-block Newton–Krylov flow solver for the Euler equations,Master’s thesis, University of Toronto, 2004.

[259] Nichols, J. and Zingg, D. W., “A three-dimensional multi-block Newton-Krylov flow solver for theEuler equations,” AIAA Paper 2005-5230, Toronto, Canada, June 2005.

[260] Groth, C. and Northrup, S., “Parallel implicit adaptive mesh refinement scheme for body-fittedmulti-block mesh,” AIAA Paper 2005-5333, Toronto, Canada, June 2005.

[261] Bellavia, S. and Berrone, S., “Globalization strategies for Newton-Krylov methods for stabilizedFEM discretization of Navier-Stokes equations,” Journal of Computational Physics, Vol. 226, Oc-tober 2007, pp. 2317–2340.

[262] Nejat, A. and Ollivier-Gooch, C., “A high-order accurate unstructured GMRES algorithm forinviscid compressible flows,” AIAA Paper 2005-5341, Toronto, Canada, June 2005.

[263] Michalak, K. and Ollivier-Gooch, C., “Matrix-explicit GMRES for a higher-order accurate inviscidcompressible flow solver,” AIAA-Paper 2007-3943, Miami, Florida, June 2007.

[264] “A high-order accurate unstructured finite volume Newton-Krylov algorithm for inviscid compress-ible flows,” Journal of Computational Physics, Vol. 227, No. 4, 2008, pp. 2582–2609.

[265] Nejat, A. and Ollivier-Gooch, C., “Effect of discretization order on preconditioning and convergenceof a high-order unstructured Newton-GMRES solver for the Euler equations,” J. Comput. Phys.,Vol. 227, February 2008, pp. 2366–2386.

[266] Michalak, C. and Ollivier-Gooch, C., “Globalized matrix-explicit Newton–GMRES for the high-order accurate solution of the Euler equations,” Computers and Fluids, Vol. 39, No. 7, August2010, pp. 1156–1167.

REFERENCES 142

[267] Hicken, J. E. and Zingg, D. W., “Aerodynamic optimization algorithm with integrated geometryparameterization and mesh movement,” AIAA Journal , Vol. 48, No. 2, Feb. 2010, pp. 400–413.

[268] Northrup, S. and Groth, C., “Parallel implicit AMR scheme for unsteady reactive flows,” 18thAnnual Conference of the CFD Society of Canada, London, Canada, May 2010.

[269] Osusky, M., Hicken, J. E., and Zingg, D. W., “A parallel Newton–Krylov–Schur flow solver for theNavier-Stokes equations using the SBP–SAT approach,” AIAA Paper 2010-116, Orlando, Florida,January 2010.

[270] Osusky, M. and Zingg, D. W., “A parallel Newton–Krylov–Schur flow solver for the Reynolds-averaged Navier–Stokes equations,” AIAA Paper 2012-442, 2012.

[271] Lucas, P., van Zuijlen, A. H., and Bijl, H., “Fast unsteady flow computations with a Jacobian-free Newton-Krylov algorithm,” Journal of Computational Physics, Vol. 229, December 2010,pp. 9201–9215.

[272] Brieger, L. and Lecca, G., “Parallel multigrid preconditioning of the conjugate gradient method forsystems of subsurface hydrology,” Journal of Computational Physics, Vol. 142, 1998, pp. 148–162.

[273] Piquet, J. and Vasseur, X., “Multigrid preconditioned Krylov subspace methods for three-dimensional numerical solutions of the incompressible Navier–Stokes equations,” Numerical Al-gorithms, Vol. 17, 1998, pp. 1–32.

[274] Rider, W. J., Knoll, D. A., and Olson, G. L., “A multigrid Newton–Krylov method for mul-timaterial equilibrium radiation diffusion,” Journal of Computational Physics, Vol. 152, 1999,pp. 164–191.

[275] Mousseau, V. A., Knoll, D. A., and Rider, W. J., “Physics-based preconditioning and the Newton–Krylov method for non-equilibrium radiation diffusion,” Journal of Computational Physics,Vol. 160, 2000, pp. 743–765.

[276] Knoll, D. A. and Mousseau, V. A., “On Newton–Krylov multigrid methods for the incompressibleNavier–Stokes equations,” Journal of Computational Physics, Vol. 163, No. 1, 2000, pp. 262–267.

[277] Knoll, D. A. and Rider, W. J., “A multigrid preconditioned Newton–Krylov method,” SIAMJournal on Scientific Computing , Vol. 21, No. 2, 1999, pp. 691–710.

[278] Jones, J. E. and Woodward, C. S., “Newton–Krylov-multigrid solvers for large-scale, highly hetero-geneous, variably saturated flow problems,” Advances in Water Resources, Vol. 24, 2001, pp. 763–774.

[279] Pernice, M. and Tocci, M. D., “A multigrid-preconditioned Newton–Krylov method for the incom-pressible Navier–Stokes equations,” SIAM Journal on Scientific Computing , Vol. 23, No. 2, 2001,pp. 398–418.

[280] Wu, J., Srinivasan, V., Xu, J., and Wang, C. Y., “Newton–Krylov Multigrid Algorithms for BatterySimulation,” Journal of teh Electrochemical Society , Vol. 149, No. 10, 2002, pp. A1342–A1348.

[281] Syamsudhuha and Silvester, D. J., “Efficient solution of the steady-state Navier–Stokes equationsusing a multigrid preconditioned Newton–Krylov method,” International Journal for NumericalMethods in Fluids, Vol. 43, 2003, pp. 1407–1427.

[282] Elman, H. C., Loghin, D., and Wathen, A. J., “Preconditioning techniques for Newton’s method forthe incompressible Navier–Stokes equations,” BIT Numerical Mathematics, Vol. 43, 2003, pp. 961–974.

REFERENCES 143

[283] Knoll, D. A. and Keyes, D. E., “Jacobian-free Newton–Krylov methods: a survey of approachesand applications,” Journal of Computational Physics, Vol. 193, 2004, pp. 357–397.

[284] Diosady, L. T. and Darmofal, D. L., “Preconditioning methods for discontinuous Galerkin solutionsof the Navier–Stokes equations,” Journal of Computational Physics, Vol. 228, June 2009, pp. 3917–3935.

[285] Spalart, P. R. and Allmaras, S. R., “A one-equation turbulence model for aerodynamic flows,”AIAA Paper 92-0439, January 1992.

[286] Spalart, P. R. and Allmaras, S. R., “A one-equation turbulence model for aerodynamic flows,” LaRecherche Aerospatiale, 1994, pp. 5–21.

[287] Ashford, G. A., An unstructured grid generation and adaptive solution technique for high Reynoldsnumber compressible flows, Ph.D. thesis, University of Michigan, 1996.

[288] Pulliam, T. H., “Efficient solution methods for the Navier–Stokes equations,” Lecture Notes ForThe Von Karman Institute For Fluid Dynamics Lecture Series: Numerical Techniques For ViscousFlow Computation In Turbomachinery Bladings, January 1986.

[289] Hirsch, C., Numerical computation of internal and external flows, Vol. 2, John Wiley & Sons, 1994.

[290] Anderson, J., Fundamentals of aerodynamics, McGraw Hill Inc., 2nd ed., 1991.

[291] Patankar, S. V., Numerical Heat Transfer and Fluid Flow , chap. Convectiona and Diffusion,McGraw-Hill, 1980.

[292] Hicken, J. E. and Zingg, D. W., “Globalization strategies for inexact-Newton solvers,” AIAA Paper2009-4139, San Antonio, Texas, June 2009.

[293] Ilinca, F. and Pelletier, D., “Positivity preservation and adaptive solution for the k − ε model ofturbulence,” AIAA Journal , Vol. 36, No. 1, 1998, pp. 44–50.

[294] Wong, P. and Zingg, D. W., “Three-dimensional aerodynamic computations on unstructured gridsusing a Newton–Krylov approach,” Computers and Fluids, Vol. 37, No. 2, 2008, pp. 107–120.

[295] Christara, C. C., “Matrix Computations, Numerical Linear Algebra,” 2001, Course notes.

[296] Eiermann, M., Ernst, O. G., and Schneider, O., “Analysis of acceleration strategies for restartedminimal residual methods,” Journal of Computational and Applied Mathematics, Vol. 123, 2000,pp. 261–292.

[297] Brown, P. N. and Hindmarsh, A. C., “Matrix-free methods for stiff systems of ODE’s,” SIAMJournal on Numerical Analysis, Vol. 23, No. 3, June 1986, pp. 610–638.

[298] Eiermann, M. and Ernst, O. G., “Geometric aspects of the theory of Krylov subspace methods,”Acta Numerica, 2001, pp. 251–312.

[299] Catinas, E., “Inexact perturbed Newton methods and applications to a class of Krylov solvers,”Journal of Optimization Theory and Applications, Vol. 108, No. 3, 2001, pp. 543–571.

[300] Lyness, J. N. and Moler, C. B., “Numerical differentiation of analytic functions,” SIAM Journalon Numerical Analysis, Vol. 4, No. 2, 1967, pp. 202–210.

[301] Soulaimani, A., Salah, N. B., and Saad, Y., “Acceleration of GMRES convergence for some CFDproblems: Preconditioning and stabilization techniques,” Tech. Rep. umsi-2000-165, MinnesotaSupercomputer Institute, University of Minnesota, 2000.

REFERENCES 144

[302] Soulaimani, A., Salah, N. B., and Saad, Y., “Enhanced GMRES acceleration techniques for someCFD problems,” International Journal of CFD , Vol. 16, No. 1, March 2002, pp. 1–20.

[303] Saad, Y., “Preconditioned Krylov subspace methods for CFD applications,” Tech. Rep. umsi-94-171, Minnesota Supercomputer Institute, University of Minnesota, 1994.

[304] Saad, Y., “SPARSKIT: a basic tool kit for sparse matrix computations,” Tech. rep., http://www.cs.umn.edu/ Research/ arpa/ SPARSKIT/ sparskit.html, 1994.

[305] Hicken, J. E., Efficient algorithms for future aircraft design: Contributions to aerodynamic shapeoptimization, Ph.D. thesis, University of Toronto, 2009.

[306] Kaveh, A., Zahedi, A., and Laknegadi, K., “A novel ordering algorithm for profile optimization byefficient solution of a differential equation,” International Journal for Computer-Aided Engineer-ing , Vol. 24, No. 6, 2007, pp. 572–585.

[307] Richardson, L. F., “The approximate arithmetical solution by finite differences of physical problemsinvolving differential equations, with an application to the stresses in a masonry dam,” Philosoph-ical Transactions of the Royal Society of London, Vol. 210, 1911, pp. 307–357.

[308] Hemker, P. W., “On the order of prolongations and restrictions in multigrid procedures,” Journalof Computational and Applied Mathematics, Vol. 32, No. 3, 1990, pp. 423–429.

[309] Davis, L., “Order-based genetic algorithm and the graph coloring problem,” Handbook of GeneticAlgorithms, Von Nostrand Reinhold, 1991.

[310] Zuliani, G., Aerodynamic flow calculations using finite-differences and multigrid , Ph.D. thesis,University of Toronto, 2004.

Documents

Preconditioning Techniques for a Newton{Krylov Algorithm ... · The one-equation Spalart{Allmaras turbulence model is used. ... {McKee ordering. An evolutionary algorithm is used