A Higher-Order Accurate Unstructured Finite Volume Newton ...tetra.mech.ubc.ca/ANSLab/publications/Nejat-PhD.pdfA Higher-Order Accurate Unstructured Finite Volume Newton-Krylov Algorithm

A Higher-Order Accurate Unstructured Finite VolumeNewton-Krylov Algorithm for Inviscid Compressible Flows

by

AMIR NEJAT

B.Sc. (Aerospace Engineering), AmirKabir University of Technology, 1996

M.Sc. (Aerospace Engineering), AmirKabir University of Technology, 1998

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

in

THE FACULTY OF GRADUATE STUDIES

(Department of Mechanical Engineering)

THE UNIVERSITY OF BRITISH COLUMBIA

April 2007

c© Amir Nejat, 2007

Abstract

A fast implicit (Newton-Krylov) finite volume algorithm is developed for higher-order un-

structured (cell-centered) steady-state computation of inviscid compressible flows (Euler

equations). The matrix-free Generalized Minimal Residual (GMRES) algorithm is used for

solving the linear system arising from implicit discretization of the governing equations,

avoiding expensive and complicated explicit computation of the higher-order Jacobian ma-

trix. An Incomplete Lower-Upper factorization technique is employed as the preconditioning

strategy and a first-order Jacobian as a preconditioning matrix. The solution process is di-

vided into two phases: start-up and Newton iterations. In the start-up phase an approximate

solution of the fluid flow is computed which includes most of the physical characteristics

of the steady-state flow. A defect correction procedure is proposed for the start-up phase

consisting of multiple implicit pre-iterations. At the end of the start-up phase (when the

linearization of the flow field is accurate enough for steady-state solution) the solution is

switched to the Newton phase, taking an infinite time step and recovering a semi-quadratic

convergence rate (for most of the cases). A proper limiter implementation for higher-order

discretization is discussed and a new formula for limiting the higher-order terms of the

reconstruction polynomial is introduced. The issue of mesh refinement in accuracy mea-

surement for unstructured meshes is revisited. A straightforward methodology is applied for

accuracy assessment of the higher-order unstructured approach based on total pressure loss,

drag measurement, and direct solution error calculation. The accuracy, fast convergence

and robustness of the proposed higher-order unstructured Newton-Krylov solver for differ-

ent speed regimes are demonstrated via several test cases for the 2nd, 3rd and 4th-order

discretization. Solutions of different orders of accuracy are compared in detail through sev-

eral investigations. The possibility of reducing the computational cost required for a given

level of accuracy using high-order discretization is demonstrated.

ii

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.1 Mesh Generation and Spatial Discretization . . . . . . . . . . . . . 7

1.2.2 Higher-Order Discretization . . . . . . . . . . . . . . . . . . . . . . 10

1.2.3 Implicit Method and Convergence Acceleration . . . . . . . . . . . . 12

1.3 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Flow Solver 18

2.1 Governing Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Implicit Time-Advance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3 Upwind Flux Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3.1 The Godunov Approach . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.3.2 Roe’s Flux Difference Splitting Scheme . . . . . . . . . . . . . . . . . 26

2.4 Boundary Flux Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

iii

CONTENTS iv

2.4.1 Wall Boundary Condition . . . . . . . . . . . . . . . . . . . . . . . . 28

2.4.2 Inlet/Outlet Boundary Conditions . . . . . . . . . . . . . . . . . . . 29

2.5 Flux Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3 Reconstruction and Monotonicity 33

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 K-Exact Least-Square Reconstruction . . . . . . . . . . . . . . . . . . . . . 34

3.2.1 Conservation of the Mean . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2.2 K-Exact Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2.3 Compact Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.2.4 Boundary Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2.4.1 Dirichlet Boundary Constraint . . . . . . . . . . . . . . . . 39

3.2.4.2 Neumann Boundary Constraint . . . . . . . . . . . . . . . 40

3.2.5 Constrained Least-Square System . . . . . . . . . . . . . . . . . . . . 40

3.3 Accuracy Assessment for a Smooth Function . . . . . . . . . . . . . . . . . 41

3.4 Monotonicity Enforcement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4.1 Flux Limiting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4.2 Slope Limiting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 Flux Jacobian 52

4.1 What is the Jacobian ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.2 Flux Jacobian Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2.1 Roe’s Flux Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

CONTENTS v

4.2.2 Boundary Flux Jacobians . . . . . . . . . . . . . . . . . . . . . . . . 60

4.2.2.1 Wall Boundary Flux Jacobian . . . . . . . . . . . . . . . . 61

4.2.2.2 Subsonic Inlet Flux Jacobian . . . . . . . . . . . . . . . . . 61

4.2.2.3 Subsonic Outlet flux Jacobian . . . . . . . . . . . . . . . . 64

4.2.2.4 Supersonic Inlet/Outlet Flux Jacobians . . . . . . . . . . . 65

4.3 Finite Difference Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.4 Numerical Flux Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5 Linear Solver and Solution Strategy 71

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.2 GMRES Linear Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.2.1 The Basic GMRES Algorithm . . . . . . . . . . . . . . . . . . . . . . 74

5.2.2 Matrix-Vector Products Computation in GMRES . . . . . . . . . . 77

5.2.3 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.2.4 GMRES with Right Preconditioning . . . . . . . . . . . . . . . . . . 83

5.3 Solution Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.3.1 Start-up Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.3.2 Newton Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6 Results(I): Verification Cases 90

6.1 Reconstruction Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6.1.1 Square Test Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.1.2 Annulus Test case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

CONTENTS vi

6.2 Subsonic Flow Past a Semi-Circular Cylinder . . . . . . . . . . . . . . . . . 97

6.3 Supersonic Vortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.3.1 Numerical Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.3.2 Solution accuracy measurement . . . . . . . . . . . . . . . . . . . . . 117

7 Results(II): Simulation Cases 124

7.1 Subsonic Airfoil, NACA 0012, M = 0.63, α = 2.00 . . . . . . . . . . . . . . 124

7.1.1 Solution Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7.2 Transonic Airfoil, NACA 0012, M = 0.8, α = 1.250 . . . . . . . . . . . . . . 153

7.3 Supersonic flow, Diamond airfoil , M = 2.0, α = 0.0 . . . . . . . . . . . . . 161

8 Concluding Remarks 169

8.1 Summary and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

8.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

8.3 Recommended Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

8.3.1 Start-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

8.3.2 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

8.3.3 Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

8.3.4 Extension to 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

8.3.5 Extension to Viscous Flows . . . . . . . . . . . . . . . . . . . . . . . 174

Bibliography 175

List of Tables

1.1 Qualitative illustration of research on solver development . . . . . . . . . . 4

5.1 Ratio of non-zero elements in factorized matrix . . . . . . . . . . . . . . . . 82

6.1 2nd-order error norms for the square case . . . . . . . . . . . . . . . . . . . 91

6.2 3rd-order error norms for the square case . . . . . . . . . . . . . . . . . . . 92

6.3 4th-order error norms for the square case . . . . . . . . . . . . . . . . . . . 93

6.4 2nd-order error norms for the annulus case . . . . . . . . . . . . . . . . . . 94

6.5 3rd-order error norms for the annulus case . . . . . . . . . . . . . . . . . . . 94

6.6 4th-order error norms for the annulus case . . . . . . . . . . . . . . . . . . . 95

6.7 Sizes and ratios of the control volumes for circular cylinder meshes . . . . . 98

6.8 Error norms for total pressure, 2nd-order solution . . . . . . . . . . . . . . 106

6.9 Error norms for total pressure, 3rd-order solution . . . . . . . . . . . . . . . 106

6.10 Error norms for total pressure, 4th-order solution . . . . . . . . . . . . . . . 109

6.11 Solution error norms, 2nd-order discretization . . . . . . . . . . . . . . . . . 121

6.12 Solution error norms, 3rd-order discretization . . . . . . . . . . . . . . . . . 121

6.13 Solution error norms, 4th-order discretization . . . . . . . . . . . . . . . . . 121

vii

LIST OF TABLES viii

7.1 Mesh detail for NACA 0012 airfoil

124

7.2 Convergence summary for NACA 0012 airfoil, M = 0.63, α = 20 . . . . . . 131

7.3 Lift and drag coefficients for all meshes and discretization orders, NACA

0012, M = 0.63, α = 20, far field size of 25 chords . . . . . . . . . . . . . . . 142

7.4 Effect of the far field distance on lift and drag coefficients, NACA 0012,

M = 0.63, α = 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

7.5 Convergence summary for NACA 0012 airfoil, M = 0.8, α = 1.250 . . . . . 155

7.6 Lift and drag coefficients, NACA 0012, M = 0.8, α = 1.250 . . . . . . . . . 156

7.7 Convergence summary for diamond airfoil, M = 2.0, α = 00 . . . . . . . . . 164

7.8 Drag coefficient, diamond airfoil, M = 2.0, α = 00 . . . . . . . . . . . . . . 164

List of Figures

1.1 Main approaches in fluid dynamics . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 CFD overall algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Example of a structured and an unstructured mesh over a 2D airfoil . . . . 8

2.1 Propagation of a linear wave in positive direction . . . . . . . . . . . . . . . 24

2.2 Shock-Tube problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3 Rounding the characteristic slope near zero . . . . . . . . . . . . . . . . . . 28

2.4 Schematic illustration of Gauss quadrature points . . . . . . . . . . . . . . . 31

3.1 A typical cell center control volume and its reconstruction stencil, including

three layers of neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 Imposing boundary constraint at the Gauss boundary points . . . . . . . . 39

3.3 Typical unlimited/limited linear reconstruction . . . . . . . . . . . . . . . . 46

3.4 Using first neighbors for monotonicity enforcement . . . . . . . . . . . . . . 47

3.5 Typical unlimited/limited quadratic reconstruction . . . . . . . . . . . . . . 50

3.6 Defining σ as a function of φ . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.1 Schematic of Direct Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . 55

ix

LIST OF FIGURES x

4.2 Typical cell-centered mesh numbering . . . . . . . . . . . . . . . . . . . . . 57

4.3 Total numerical error versus perturbation magnitude . . . . . . . . . . . . . 67

5.1 Linearization of a sample function . . . . . . . . . . . . . . . . . . . . . . . 72

6.1 Unstructured meshes for a square domain . . . . . . . . . . . . . . . . . . . 92

6.2 Error-Mesh plot for the square case . . . . . . . . . . . . . . . . . . . . . . . 93

6.3 Unstructured meshes for a curved domain (annulus) . . . . . . . . . . . . . 95

6.4 Error-Mesh plot for the annulus case . . . . . . . . . . . . . . . . . . . . . . 96

6.5 Circular domain over half a cylinder, Mesh 1 (1376 CVs) . . . . . . . . . . . 97

6.6 Circular cylinder, Mesh 1 (1376CVs) . . . . . . . . . . . . . . . . . . . . . . 98

6.7 Circular cylinder, Mesh 2 (5539 CVs) . . . . . . . . . . . . . . . . . . . . . 99

6.8 Circular cylinder, Mesh 3 (22844 CVs) . . . . . . . . . . . . . . . . . . . . . 99

6.9 Convergence history for the coarse mesh (Mesh 1) . . . . . . . . . . . . . . 101

6.10 Convergence history for the fine mesh (Mesh 3) . . . . . . . . . . . . . . . . 101

6.11 2nd-order pressure coefficient contours, Mesh 1 . . . . . . . . . . . . . . . . 102

6.12 3rd-order pressure coefficient contours, Mesh 1 . . . . . . . . . . . . . . . . 102

6.13 4th-order pressure coefficient contours, Mesh 1 . . . . . . . . . . . . . . . . 103

6.14 4th-order pressure coefficient contours, Mesh 3 . . . . . . . . . . . . . . . . 104

6.15 Pressure coefficient along the axis . . . . . . . . . . . . . . . . . . . . . . . . 104

6.16 Close up of the pressure coefficient along the axis (suction region) . . . . . 105

6.17 Pressure coefficient along the axis, Mesh 3 . . . . . . . . . . . . . . . . . . 105

6.18 Error in total pressure ratio, 2nd-order discretization, Mesh 1 . . . . . . . . 107

LIST OF FIGURES xi

6.19 Error in total pressure ratio, 3rd-order discretization, Mesh 1 . . . . . . . . 107

6.20 Error in total pressure ratio, 4th-order discretization, Mesh 1 . . . . . . . . 108

6.21 Error in total pressure ratio, 4th-order discretization, Mesh 3 . . . . . . . . 108

6.22 Error-Mesh plot for the total pressure . . . . . . . . . . . . . . . . . . . . . 109

6.23 Drag coefficient versus mesh size . . . . . . . . . . . . . . . . . . . . . . . . 110

6.24 Drag coefficient versus CPU time . . . . . . . . . . . . . . . . . . . . . . . . 111

6.25 Annulus, Mesh 1 (108 CVs) . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.26 Annulus, Mesh 2 (427 CVs) . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

6.27 Annulus, Mesh 3 (1703 CVs) . . . . . . . . . . . . . . . . . . . . . . . . . . 113

6.28 Annulus, Mesh 4 (6811 CVs) . . . . . . . . . . . . . . . . . . . . . . . . . . 114

6.29 Annulus, Mesh 5 (27389 CVs) . . . . . . . . . . . . . . . . . . . . . . . . . . 114

6.30 Convergence history for the coarse mesh (Mesh 1) . . . . . . . . . . . . . . 116

6.31 Convergence history for the fine mesh (Mesh 5) . . . . . . . . . . . . . . . . 116

6.32 2nd-order Mach contours for the coarse mesh (Mesh 1) . . . . . . . . . . . . 117

6.33 3rd-order Mach contours for the coarse mesh (Mesh 1) . . . . . . . . . . . . 118

6.34 4th-order Mach contours for the coarse mesh (Mesh 1) . . . . . . . . . . . . 118

6.35 2nd-order density error for the coarse mesh (Mesh 1) . . . . . . . . . . . . . 119

6.36 3rd-order density error for the coarse mesh (Mesh 1) . . . . . . . . . . . . . 119

6.37 4th-order density error for the coarse mesh (Mesh 1) . . . . . . . . . . . . . 120

6.38 Density, 4th-order solution over the fine mesh (Mesh 5) . . . . . . . . . . . 120

6.39 Error-Mesh plot for the solution (Density) . . . . . . . . . . . . . . . . . . . 122

6.40 Error versus CPU Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

LIST OF FIGURES xii

7.1 NACA 0012, Mesh 1, 1245 CVs . . . . . . . . . . . . . . . . . . . . . . . . . 125

7.2 NACA 0012, Mesh 2, 2501 CVs . . . . . . . . . . . . . . . . . . . . . . . . . 125

7.3 NACA 0012, Mesh 3, 4958 CVs . . . . . . . . . . . . . . . . . . . . . . . . . 126

7.4 NACA 0012, Mesh 4, 9931 CVs . . . . . . . . . . . . . . . . . . . . . . . . . 126

7.5 NACA 0012, Mesh 5, 19957 CVs . . . . . . . . . . . . . . . . . . . . . . . . 127

7.6 Cp over the upper surface after start-up, Mesh1 (1245 CVs) . . . . . . . . . 128

7.7 Cp over the upper surface after start-up, Mesh 5 (19957 CVs) . . . . . . . . 129

7.8 CPU time versus the grid size, NACA 0012, M = 0.63, α = 20 . . . . . . . 132

7.9 Total work unit versus the grid size, NACA 0012, M = 0.63, α = 20 . . . . 133

7.10 Newton phase work unit versus the grid size, NACA 0012, M = 0.63, α = 20 133

7.11 Convergence history, NACA 0012, Mesh 1, M = 0.63, α = 20 . . . . . . . . 134

7.12 Convergence history, Mesh 5, M = 0.63, α = 20 . . . . . . . . . . . . . . . . 135

7.13 Non-linear residual versus linear system residual, Mesh 3, M = 0.63, α = 20 135

7.14 Linear system residual dropping order, Mesh 3, M = 0.63, α = 20 . . . . . 136

7.15 Eigenvalue pattern for the preconditioned system, Mesh 3, M = 0.63, α = 20 137

7.16 Condition No. of the preconditioned system, Mesh 3, M = 0.63, α = 20 . . 138

7.17 Condition No. of the preconditioned system, Mesh 5, M = 0.63, α = 20 . . 138

7.18 Lift coefficient convergence history, NACA 0012, Mesh 1, M = 0.63, α = 20 139

7.19 Drag coefficient convergence history, NACA 0012, Mesh 1, M = 0.63, α = 20 140

7.20 Lift coefficient convergence history NACA 0012, Mesh 5, M = 0.63, α = 20 141

7.21 Drag coefficient convergence history, NACA 0012, Mesh 5, M = 0.63, α = 20 141

7.22 NACA 0012, Mesh 3 (4958 CVs) . . . . . . . . . . . . . . . . . . . . . . . . 144

LIST OF FIGURES xiii

7.23 2nd-order Mach contours for NACA 0012 airfoil, Mesh 3, M = 0.63, α = 20 145

7.24 3rd-order Mach contours for NACA 0012 airfoil, Mesh 3, M = 0.63, α = 20 145

7.25 4th-order Mach contours for NACA 0012 airfoil, Mesh 3, M = 0.63, α = 20 146

7.26 Mach profile, upper side, NACA 0012 airfoil, Mesh 1, M = 0.63, α = 20 . . 146

7.27 Mach profile close up, upper side, NACA 0012 airfoil, Mesh 1, M = 0.63, α = 20147

7.28 Mesh 1, close-up at the leading edge region . . . . . . . . . . . . . . . . . . 147

7.29 Mach profile, lower side, NACA 0012 airfoil, Mesh 1, M = 0.63, α = 20 . . . 148

7.30 Mach profile, upper side, NACA 0012 airfoil, Mesh 5, M = 0.63, α = 20 . . 148

7.31 Mach profile close up, upper side, NACA 0012 airfoil, Mesh 5, M = 0.63, α = 20149

7.32 Mach profile, lower side, NACA 0012 airfoil, Mesh 5, M = 0.63, α = 20 . . . 149

7.33 1 − Pt

Pt∞, upper side, NACA 0012 airfoil, Mesh 1, M = 0.63, α = 20 . . . . . 150

7.34 1 − Pt

Pt∞, lower side, NACA 0012 airfoil, Mesh 1, M = 0.63, α = 20 . . . . . 150

7.35 1 − Pt

Pt∞, upper side, NACA 0012 airfoil, Mesh 5, M = 0.63, α = 20 . . . . . 151

7.36 1 − Pt

Pt∞, lower side, NACA 0012 airfoil, Mesh 5, M = 0.63, α = 20 . . . . . 151

7.37 Mach profile at the end of start-up process, M = 0.8, α = 1.250 . . . . . . . 154

7.38 Convergence history for NACA 0012, M = 0.8, α = 1.250 . . . . . . . . . . 155

7.39 2nd-order Mach contours, NACA 0012, M = 0.8, α = 1.250 . . . . . . . . . 156

7.40 3rd-order Mach contours, NACA 0012, M = 0.8, α = 1.250 . . . . . . . . . 157

7.41 4th-order Mach contours, NACA 0012, M = 0.8, α = 1.250 . . . . . . . . . 157

7.42 limiter φ (3rd-order), NACA 0012, M = 0.8, α = 1.250 . . . . . . . . . . . . 158

7.43 limiter σ (3rd-order), NACA 0012, M = 0.8, α = 1.250 . . . . . . . . . . . . 159

7.44 Mach profile, NACA 0012, M = 0.8, α = 1.250 . . . . . . . . . . . . . . . . 159

LIST OF FIGURES xiv

7.45 Mach profile in shock regions, NACA 0012, M = 0.8, α = 1.250 . . . . . . . 160

7.46 Mesh 7771 CVs , diamond airfoil, M = 2.0, α = 0.0 . . . . . . . . . . . . . . 162

7.47 Cp at the end of start-up process, diamond airfoil, M = 2.0, α = 0.0 . . . . 163

7.48 Convergence history for diamond airfoil, M = 2.0, α = 00 . . . . . . . . . . 164

7.49 2nd-order Mach contours, diamond airfoil, M = 2.0, α = 0.0 . . . . . . . . . 165

7.50 3rd-order Mach contours, diamond airfoil, M = 2.0, α = 0.0 . . . . . . . . . 165

7.51 4th-order Mach contours, diamond airfoil, M = 2.0, α = 0.0 . . . . . . . . . 166

7.52 2nd-order Cp, diamond airfoil, M = 2.0, α = 0.0 . . . . . . . . . . . . . . . 166

7.53 3rd-order Cp, diamond airfoil, M = 2.0, α = 0.0 . . . . . . . . . . . . . . . 167

7.54 4th-order Cp, diamond airfoil, M = 2.0, α = 0.0 . . . . . . . . . . . . . . . 167

List of Symbols

Roman Symbols

a speed of sound

A area, Jacobian matrix

b righr hand side (Ax = b)

B fixed iteration matrix

C center

CL lift coefficient

CD drag coefficient

Cp specific heat at constant pressure, pressure coefficient

Cv specific heat at constant volume

D reconstruction solution vector (derivatives)

e specific internal energy

E total energy, error

F flux vector

G Gauss (integration) point

h specific enthalpy, mesh length scale

H Hessenberg matrix

I identity matrix

xv

LIST OF SYMBOLS xvi

J Jacobian matrix

k number of subspace size

K order of accuracy, polynomial order, constant in Venkatakrishnan limiter

l length of the control volume face, norm

L norm, left eigenvector

m number of restarts

M Mach number, moments, preconditioner matrix, coefficient matrix in reconstruc-

tion

n normal vector

N total number of ...

p polynomial order

P static pressure, polynomial

r residual, distance

R gas constant, residual, radius, right eigenvector

S slope in higher-order limiter

t time, exponent of the weighting in reconstruction

T static temperature

U solution vector, conservative variables

u, v velocity components

V primitive variables, velocity, subspace

x, y Cartesian coordinates, unknown vectors

z preconditioning vector

LIST OF SYMBOLS xvii

Greek Symbols

α angle of attack

γ specific heat ratio

ε perturbation parameter

λ eigenvalue, wave speed

ρ Density

σ higher-order limiter

φ slope limiter

ω relaxation factor

Superscripts

i iteration index

k, K order, iteration number, subspace number

m, n polynomial exponent

Subscripts

b boundary

c center

CD central difference

CV control volume

DB Dirichlet boundary

FD forward difference

G Gauss point

i inner, control volume index

LIST OF SYMBOLS xviii

in inlet

L left side

m magnitude

min, max minimum and maximum

n normal, normalized

N neighbor

NB Neumann boundary

o outer

out outlet

ref reference

R right side

RB reconstructed at the boundary

SubIn subsonic inlet

SubOut subsonic outlet

t total

W wall

x, y Cartesian directions

Acknowledgments

I would like to express my deep appreciation to my research supervisor, Dr. Carl Ollivier-

Gooch for all of his guidance, support and patience. His remarkable feedbacks throughout

the course of this research were very helpful.

I also would like to thank Dr. Chen Greif from the Computer Science Department. Attend-

ing his valuable lectures on sparse matrix solvers and having the opportunity to engage in

discussions with him, have greatly benefited my research.

I am grateful to my all colleagues in the ANSLab research group in the Mechanical En-

gineering Department, especially Chris Michalak, Serge Gosselin, and Harsha Perera, for

their computer assistance on many occasions.

Finally, my most sincere and deepest appreciation is for my great family, especially my

mother “Homa” and my father “Mehdi”, whose constant support, encouragement, patience

and love I have enjoyed all through my life.

xix

Chapter 1

Introduction

Prediction of fluid flow quantities, such as pressure, velocity, and temperature, and the

study of flow behavior are the main goals in the field of fluid dynamics. Like other science

and engineering disciplines, fluid dynamics has greatly benefited from the development of

computing technology (numerical algorithms and computational tools) over the last four

decades, resulting in the creation of a new born approach in the field known as Computa-

tional Fluid Dynamics (CFD), Fig (1.1). CFD has shown remarkable capability for fluid

flow analysis both in academia and industry. CFD has not only made possible the sim-

ulation of flows (such as reentry of space vehicles) for which complete analysis (either by

theory or experiment) was impossible before, but also has provided a valuable feedback and

information source for improving theoretical and experimental fluid dynamics. Increasing

computing power over the last two decades has resulted in the development of new computa-

tional techniques and algorithms, enhancing the versatility of CFD application. Nowadays,

CFD is not just a research tool, and it is used extensively and successfully in industry

throughout the design process, from preliminary design to shape optimization.

1.1 Motivation

In the field of computational aerodynamics, the final goal is the accurate simulation of the

flow field around (and/or inside) complex 3D geometries to compute aerodynamic force

coefficients. In the mid 1980’s, Jameson [36] was the first person to compute the three-

dimensional flow over realistic aerodynamic configurations using the finite volume technique

(a robust conservative numerical approach for discretization of the fluid flow equations over

1

CHAPTER 1. INTRODUCTION 2

general meshes [38]); since then tremendous progress has been made in this area and appli-

cation of CFD for aerodynamic computations has revolutionized the process of aerodynamic

design [27].

To simulate the flow field around a 3D complex geometry accurately, a CFD package should

include three essential parts:

1. State of the art mesh generation capability. Mesh generation or domain dis-

cretization, is one of the most important parts (if not the most!) in CFD simulations.

Since the discretization of the fluid flow equations is carried out on the mesh, without

a good domain discretization the CFD solution can be very inaccurate. Furthermore

generating an appropriate mesh, especially for complex geometries in practice is the

most time consuming part for a CFD user and certainly is not a trivial task. Accord-

ing to a real aerodynamic case study in aerospace engineering, the mesh preparation

time can be up to 45 times larger than the required computation time for the fluid flow

simulation [47]. Therefore, it is both desirable and necessary to reduce the mesh prepa-

ration time and there is a large potential to gain by automating this process. Ideally,

meshing software should be able to generate a geometrically and physically suitable

mesh around or inside a complex 3D geometry with a reasonable user workload. Also

the user should be able to refine the mesh according to geometric parameters and

to adapt the mesh based on flow features without excessive effort. The unstructured

mesh technique, among other types of mesh generation methods, is a very good candi-

date to address mesh generation issues due to its automation capability in generating

meshes for complex geometries and its flexibility in refinement and adaptation.

2. Accurate physical modeling of high Reynolds number turbulent flow. Most

practical engineering applications such as aircraft aerodynamics involve turbulent

flows. The Direct Numerical Simulation (DNS) of turbulent flows for practical pur-

poses is not feasible at least for the next couple of decades due to computing technology

limitations (memory and speed). Therefore modeling the turbulent flow is the only

viable approach in high Reynolds number CFD simulations and will remain a major

active research area for the foreseeable future [7, 75]. Discussing the physical and

numerical criteria for choosing an appropriate turbulence model and related issues are

beyond the scope of this thesis, but accurate physical modeling is a key part of valid

CFD simulation for practical engineering problems. It should be noted that the mod-

eling of the physical phenomena in CFD simulations is not just limited to turbulence,

but is also essential for combustion, multiphase flows, hydromagnetic flows, and other

types of fluid flows.


ExperimentalTheoreticalFluid Dynamics Fluid Dynamics

Applied Math Algorithms

Numerical

ComputingTools

ComputationalFluid Dynamics

Figure 1.1: Main approaches in fluid dynamics

3. A robust, efficient, accurate flow solver for the generic mesh. Numerical

solution of the fluid flow equations is what CFD is all about and this solution must

be stable and converge to the correct answer at a reasonable cost. A solver algo-

rithm includes three separate components: discretization of the fluid flow equations,

numerical flux computation, and updating the solution mostly through time advance

methods. Fig (1.2) shows overall schematic of a CFD algorithm. Current CFD flow

solver algorithms still have some limitations both in terms of efficiency and accuracy

especially for simulation of physically complicated flows. Specifically, the memory

and speed of current computers (even using parallel techniques) do not yet allow us to

simulate physically complex flows in realistic geometries, with sufficient accuracy, in

a short time and at a reasonable cost. As a result, we have to simplify the physics of

the fluid flow via approximate modeling, and neglect some of the numerical/physical

issues caused by insufficient mesh density (especially for complex geometries), limiting

the validity of the simulation and adversely affecting its application.

Before proceeding forward with further details, it should be mentioned that this thesis is

aimed only at the third component mentioned above improving the efficiency and accuracy

of CFD algorithms.

The numerical error in a simulation can be written in the form of hp, where h is the

mesh length scale and p is the discretization order of accuracy. Clearly, then, improving the

accuracy of a numerical simulation (modeling issues aside) is possible by means of increasing


Order Structured Structured Unstructured UnstructuredExplicit Implicit Explicit Implicit

Second-Order ♣♣♣♣♣♣♣ ♣♣♣♣ ♣♣♣♣ ♣♣Higher-Order ♣♣ ♣ ♣ ?

Table 1.1: Qualitative illustration of research on solver development

the mesh density (using smaller grid or decreasing h), and/or increasing the discretization

order. Reducing mesh length scale can be achieved by global or local mesh refinement or

adaptation.

Nearly all modern CFD codes use second-order methods, which produce a diffusive error

proportional to h2 due to diffusive derivatives beside possible added artificial viscosity for

stability purposes (in central difference schemes). For instance in a second-order 2D finite

volume formulation in the Cartesian coordinates this numerical error for each control volume

can be written in the following form:

Numerical diffusive error =∂2U

∂x2

∆x2

2+

∂2U

∂x∂y∆x∆y +

∂2U

∂y2

∆y2

2(1.1)

This leading-order error term causes two significant numerical problems. First, it smears

sharp gradients in convection dominated parts of the flow and spoils the conservation of

total pressure in isentropic regions of the flow field as it acts like a (numerical) viscosity.

Second, this numerical diffusivity produces parasitic error in viscous regions by adding extra

diffusion, which is very grid dependent, to the solution. Therefore using a high resolution

numerical scheme for discretization is quite desirable. Application of discretization orders

larger than second-order both for structured and unstructured grids has been an area of

ongoing research for the last two decades [35, 10], and will be the focus of this thesis.

However, convergence of high resolution schemes is not as efficient as second-order schemes

especially for unstructured grids [88, 46] due to the increased complexity of the discretiza-

tion, decreased damping (lack of diffusive damping) and adding more error modes (which

must be damped in the solution process). Consequently implementing a higher-order un-

structured discretization within an implicit framework to achieve the efficient convergence

is extremely helpful if not necessary! As shown in Fig(1.2), up to the flux integral compu-

tation, the overall CFD algorithm is the same, but the choice of the time advance technique

in updating the solution changes the level of complexity of the solution process completely.

Integration of the discretized equations in time can be done either explicitly or implicitly.

In the explicit time integration the space discretization is performed at the previous time


Discretized Domain

over the Discretized DomainDiscretization of the Fluid Flow Equations

Explicit Time Advance

Preconditioning Sparse Matrix Solver

Implicit Time Advance

Solution Update

Solution Update

Flux Integral

Physics & Fluid Flow Equations

Boundary & Initial Conditions

Geometry & Solution Domain Mesh Generation Package

Flux Integral LinearizationMultistage Techniques

Figure 1.2: CFD overall algorithm


level using the known flow quantities found at the previous time iteration. In the implicit

time integration both the space and the time discretizations are performed at the current

time level where the flow quantities are needed as unknowns. Equation (1.2) shows a typical

unsteady fluid flow PDE where the right hand side represents the spatial discretization and

the left hand side shows the time derivative. For example, employing the first-order forward

differencing time advance technique in the explicit and implicit forms leads to Eq. (1.3)

and Eq. (1.4).dU

dt= −R(U) (1.2)

Explicit time advance:Un+1

i − Uni

∆t= −R(Un

i ) (1.3)

Implicit time advance:Un+1

i − Uni

∆t= −R(Un+1

i ) (1.4)

While an explicit update just needs multistage integration of the flux integral using a Runge-

Kutta type scheme, an implicit update requires linearization of the flux integral (Chapter

2) and constructing a large linear system which requires an efficient sparse matrix solver

(discussed in Chapter 4 and 5). Efficiently solving a large linear system, especially with

an ill conditioned matrix resulting from a higher-order discretization, demands effective

preconditioning which adds to the complexity of the process. But the error reduction of

each solution update in implicit integration is far larger than the explicit one, since implicit

methods do not suffer from the stability issues of explicit methods and large time steps can

be taken. On balance, therefore, it is preferable to bear the complexity of the algorithm

and accelerate the solution toward the steady-state in a relatively small number of implicit

iterations.

Table 1.1 provides a qualitative summary on finite volume solver development research,

where the number of symbols represents the approximate volume of the research on the

solver development since early 80’s (based on the author’s survey). Clearly the trend in

solver development is moving toward:

1. Unstructured meshes to address the mesh generation issues for complex geometries

2. Higher-order discretizations to increase the global solution accuracy

3. Implicit techniques to improve the efficiency of the solution process

This research is mainly intended to contribute to the development of efficient and accurate

flow solvers filling the gap in high-order implicit methods, on unstructured meshes.


1.2 Background

This section provides an overview of relevant aspects of current CFD solvers including

discretization type and order as well as implicit algorithms. This overview is complemented

by detailed discussion of previous work related to each part of the solver in the relevant

chapters of the thesis.

1.2.1 Mesh Generation and Spatial Discretization

Numerical simulation of fluid flow consists of two main parts: discretization of the flow field

around or inside the geometry by a finite number of cells (grid generation) and solution of

the fluid flow equations over the discretized domain (flow solver). Structured and unstruc-

tured meshes, Fig (1.3), are the most common types of the grids used in CFD applications.

In a structured mesh, all cells and vertices have the same topology but in an unstruc-

tured mesh, elements can have irregular and variable topologies. The task of generating

structured grids around complex configurations has proved to be a considerable challenge.

Sophisticated structured approaches such as multi-block grid generation has resolved this

issue by dividing the domain between the body and far field into simple geometrical blocks;

structured grids are generated inside each block. However, automation of the blocking

procedure is still a relatively difficult job [76]. Another structured approach is overlap or

chimera grids. In this technique, the computational domain is divided into multiple zones

and a suitable grid is generated in each zone. The chimera approach allows zones to over-

lap, and interpolation routines are used to transmit data between the overset grids in the

flow solver. However, generalizing the grid generation and adaptation in this approach is

still not an easy task and demands a high level of expertise as well as considerable effort.

Furthermore, interpolation between the blocks and overlapped meshes has its own issues

and can introduce additional error. The most powerful approach for complex geometries is

unstructured grids (typically triangular in 2D and tetrahedral in 3D). Unstructured grids

have a higher flexibility in refinement based on the geometry and adaptation based on the

solution features and gradients. Therefore, unstructured meshes are one of the most suitable

choices for complex geometries; associated solvers are becoming more common in modern

CFD applications and promise to be more capable and successful for complex aerodynamic

problems [88].

The fluid flow equations (PDEs) generally are discretized in one of the following forms:

finite difference, finite element or finite volume. Finite difference is the point wise repre-

sentation of the flow field where the flow equations are solved only for variables defined at


(a) Structured (b) Unstructured

Figure 1.3: Example of a structured and an unstructured mesh over a 2D airfoil

grid points. The finite difference scheme was the original approach to the CFD problems

and it is well suited for structured grids. Therefore, its higher-order implementation can

be easily achieved by employing higher-order differencing formula. However, the finite dif-

ference discretization does not conserve mass, momentum and energy of the flow which is

an important issue for most of the practical applications such as shock capturing. Further-

more and more importantly the finite difference discretization can not be implemented on

unstructured grids.

The finite element method, another discretization technique, is one of the most complete and

well established mathematical approach for numerical solution of PDEs. In this method,

the flow equations are multiplied by a test function and then integrated over the discretized

domain. The solution is represented by a local basis function (interpolation function) for

each element. Finite element method is very flexible both in terms of theory and appli-

cation and it can be easily used for unstructured meshes. Its high-order extension is also

fairly common by employing higher-order basis and test functions. The challenging part in

application of the finite element for CFD computation is again the conservation of the flow

equations especially for non-smooth flows. Although conserving the mass, momentum and

energy is possible in the finite element formulation but it is not an easy task, and finite

element codes require considerable fine tuning in shock capturing.

The finite volume approach is designed based on the conservation of mass, momentum


and energy. The solution is represented by control volume averages and equations are

discretized over the volume integrals. Like the finite element method, it is very flexible

for complex geometries and unstructured meshes. At the same time its robustness for

nearly all CFD applications especially shock capturing problems is well established. Higher-

order application of finite volume methods is possible by using a higher-order polynomial

inside each control volume which the polynomial average integral over the control volume

represents the control volume average.

Jameson and Mavriplis [37] reported some of the earliest unstructured finite volume CFD

results, solving two-dimensional inviscid flow on regular triangular grids obtained by subdi-

viding quadrilateral grids; central differencing was used. Their approach was second-order

(linear distribution). The artificial viscosity and the second-order truncation error made

the mentioned approach relatively high diffusive for general irregular unstructured meshes.

Second-order upwind schemes (discretization of the flow equations based on the physical

waves propagation directions) have also been used on unstructured grids either through

Green-Gauss gradient technique or least-squares linear reconstruction method. Applying

upwind schemes for unstructured grids is more complicated than for structured grids espe-

cially for higher-order approximation. For the unstructured case, over each finite volume

(triangle in 2D) a polynomial approximation to the solution is reconstructed with the help

of the neighboring control volumes, and then the Riemann (shock tube) problem is solved

approximately at the control volume interfaces. One of the most successful approaches in

applying the upwind scheme was undertaken by Barth[10]. In this approach, Barth defined

a general upwind formulation (in multi-dimensions), introducing the minimum energy least-

squares reconstruction procedure for flux calculation up to the desired order of accuracy.

However, any upwind scheme higher than first-order (which is monotone by its nature), often

causes oscillations in the vicinity of sharp gradients and discontinuities which can produce

instability problems. For example, Agarwal and Halt [6] proposed a compact higher-order

scheme for solution of Euler equations over unstructured grids using explicit time integra-

tion. Considerable over and undershoots in their transonic airfoil case (3rd-Order) were

evident. A common solution to that is using limiters, which enforce monotonicity at the

expense of adversely affecting both accuracy and convergence. Barth and Jespersen [13]

introduced a multi-dimensional limiter to achieve the monotonic solution. Although that

approach was quite successful in suppressing oscillations, reaching full convergence was not

possible even after freezing the limiter because the limiter was not differentiable. Since

then several attempts have been made to design a differentiable limiter for unstructured

grid solvers, and Venkatakrishnan’s limiter [87] seems to be one of the most robust. That

limiter does not strictly enforce monotonicity but allows only small overshoots in the con-


verged solution. However, it preserves accuracy especially for smooth regions where there

are local extrema. Although this limiter shows better convergence behavior, it still has some

convergence issues with implicit solvers. Designing an appropriate limiter for higher-order

unstructured grid solvers is a fairly unexplored topic so far and needs to be addressed for

practical higher-order unstructured application.

Another more sophisticated approach to cure oscillations in compressible flow computation

is the essentially non-oscillatory (ENO) family of schemes. These schemes are uniformly

accurate and prevent oscillations in the non-smooth regions by detecting discontinuity and

modifying the reconstruction stencil from cell to cell and time level to time level [30]. ENO

schemes are computationally expensive and sacrifice fast convergence because of their dy-

namic stencils [34]. Weighted ENO (WENO) schemes were developed to address the prob-

lems caused by dynamic stencils. Near discontinuities, weighted ENO schemes ([3, 57, 28])

remove the effect of non-smooth data in the reconstruction stencil by giving it an asymptot-

ically small weight. However no comprehensive convergence analysis and/or computational

cost studies are presented. Furthermore, performance of ENO/WENO schemes in the con-

text of implicit time advance, which is one of the most efficient solution strategies, has not

yet been studied.

1.2.2 Higher-Order Discretization

For structured meshes, application of higher-order algorithms has progressed considerably

and it has been shown that, for practical levels of accuracy, using a higher-order accurate

method can be more efficient both in terms of solution time and memory usage. With

higher-order accurate methods, the cost of flux computation, integration, and other associ-

ated numerical calculations increase per control volume. However, as we can use a coarser

mesh, computation time and memory are saved overall and accuracy can be increased as

well. De Rango and Zingg [66] applied a globally third-order accurate algorithm for steady

turbulent flow over a 2-D airfoil using a structured grid. They showed this approach can

lead to a dramatic reduction in numerical error in drag using relatively coarse grids, and

the results provide a convincing demonstration of the benefits of higher-order methods for

practical flows. Zingg et al. [96] compared different flux discretization techniques with

higher-order accuracy for laminar and turbulent flows (including transition) both in sub-

sonic and transonic speed regimes. Extending the conclusion of the previous research, it

was shown that the higher-order discretization produces solutions of a given accuracy much

more efficiently than the second-order methods. More aspects of implementation of higher-

order methods have been discussed by De Rango and Zingg [67], and convergence behavior


of this approach has been studied in detail. Again a higher-order algorithm has been ap-

plied for calculation of the flow around the multi-element airfoil using the multi-block grid

technique by the same researchers [68]. A grid convergence study in this research showed

that the higher-order discretization produces a substantial reduction in the numerical errors

in the flow field in comparison with the second-order algorithm. This smaller error has been

achieved on a grid several times coarser than the grid which had been used for the second-

order algorithm. In summary, these studies show that achieving the desired accuracy in

practical aerodynamic flows using higher-order algorithms not only is possible but also has

some advantages.

Research in high-order unstructured solvers is motivated by the desire to combine the ac-

curacy and efficiency benefits seen in the application of high-order methods on structured

meshes with the geometric and adaptive flexibility of unstructured meshes. The application

of methods having higher than second-order accuracy for solving the compressible Euler

and Navier-Stokes equations on unstructured meshes has not been thoroughly investigated

yet and remains an active research topic.

Several researchers have achieved higher-order accuracy by the use of the finite element

method. Bey and Oden [17] have used a discontinuous Galerkin method to reach fourth-

order accuracy for smooth flows. Bassi and Rebay [15] have used a new discontinuous

element for discretization of Euler equations and have computed the compressible flow over

a simple unstructured grid.

The finite volume approach has received more attention, Barth and Fredrickson [12] derived

a general condition for a scheme to be higher-order accurate, including a reconstruction pro-

cedure satisfying the properties of conservation of mean, K-exactness and compact support

(these criteria are discussed in detail in Chapter 3). They also proposed a minimum energy

(least-square) reconstruction to calculate the required polynomial coefficients. Delanaye and

Essers [23] proposed a quadratic reconstruction finite volume scheme for compressible flows

on unstructured adaptive grids. The overall accuracy of the scheme was second-order. The

inviscid flux was computed directly from their quadratic polynomials; however, diffusive

derivatives were obtained through a linear interpolation. For monotonicity enforcement, a

discontinuity detector was introduced and higher-order terms in reconstructed polynomials

were dropped in the vicinity of discontinuities. Ollivier-Gooch and Van Altena [60] have

analyzed a new approach for higher-order accurate finite-volume discretization for diffusive

fluxes that is based on the gradients computed during solution reconstruction; fourth-order

accurate solution for the advection-diffusion problem has been computed. Recently Nejat

and Ollivier-Gooch [48] developed an implicit higher-order unstructured solver for Poisson’s


equation. They clearly showed the possibility of reducing computational cost required for

a given level of solution accuracy using higher-order discretization over an unstructured

stencil for certain fluid problems.

1.2.3 Implicit Method and Convergence Acceleration

Flow features, especially in physically complicated flows, vary greatly in size. In particular,

time and length scales associated with physical phenomena like turbulence and combustion

can be very disparate. The use of millions of cells, which is common today in practical

simulations, results in global length scales spanning the computational domain which are

several orders of magnitude larger than the smallest scales resolved by neighboring grid

points. With such disparate length scales, and such large numerically stiff problemes, effi-

cient time integration and solution convergence are real challenges for the solution of the

resulting discrete systems.

Generally explicit integration methods (multi-stage Runge-Kutta family schemes), even with

the help of acceleration techniques, such as local time stepping and residual smoothing, still

show slow convergence behavior for steady-state solution of large and/or stiff CFD problems.

Implicit methods are a fairly common and efficient approach for the steady-state solution of

the fluid flow equations. Regardless of the space discretization technique, finding solutions

to fluid flow problems requires solving a large linear system resulting from the linearization

of fluid flow equations in time (temporal discretization).

In the limit of very large time steps, implicit time advance schemes approach Newton’s

method. Newton methods, which will be discussed in Chapter 5, have been used in CFD

since the late 80’s and are considered an attractive approach for solution convergence of

steady flows due to their property of quadratic convergence (when starting from a good

initial guess). In early attempts direct methods were employed for solving the linear system

arising at each Newton iteration [85, 89, 8]. While direct solvers have been developed for

stiff linear systems, the size of the systems of equations arising in CFD makes applying

direct methods impossible in practice. Therefore, using iterative linear solvers with proper

preconditioning, which is a crucial factor for complex problems, is the only reasonable choice

for solving the linear system in each Newton iteration. At the same time, the cost of each

iteration in terms of CPU time and memory usage for a pure Newton method is relatively

high. Quasi-Newton methods can have satisfactory convergence behavior, lower memory

usage and less cost per iteration at the expense of increasing total number of Newton

iterations and losing quadratic convergence rate [62]. Quasi-Newton methods are generally

categorized as Approximate Newton and Inexact Newton methods.


In Approximate Newton methods the flux Jacobian on the left hand side (arising from

linearization) either is computed through some simplifications or is evaluated based on

lower-order discretization, while the flux integral on the right hand side is evaluated up to

the desired order of accuracy. In either case the linearization is done approximately and

although the Jacobian matrix has simpler structure and is better conditioned (i.e. easier

to invert), the overall convergence rate of the non-linear problem will be degraded. This

approach is also known as the defect correction technique, and it is useful when there are

memory limitations and storing the full Jacobian is impractical, specifically in 3D. If the true

Jacobian is very stiff and solving the linear system of the true linearization is challenging,

Approximate Newton may work better than the original Newton method. This is often true

in the early stage of Newton iterations when a good starting solution is not yet available.

Therefore, Approximate Newton is a very good candidate for the start-up process, especially

if the solution process is started from a poor initial guess.

In the second category, Inexact Newton methods [25], the complete linearization based on

the flux integral on the right hand side is employed and the true Jacobian is calculated.

However, the resultant linear system at each Newton iteration is solved approximately by

an iterative linear solver. For highly non-linear problems such as compressible flows, the

linearized system, especially at the initial iterations is not an accurate representation of the

non-linear problem. As a result, completely solving the linear system does not improve the

overall convergence rate. Instead, the linear system is solved up to some tolerance criteria

which normally is chosen as a fraction (typically between 10−1 and 10−2) of the flux integral

on the right hand side.

Among iterative linear solvers, Krylov subspace methods are the most common, and amongst

these, the Generalized Minimal Residual (GMRES) [72] algorithm (visited in section 5.2 in

detail) has been developed mainly for non-symmetric systems such as those resulting from

unstructured meshes. In the matrix-free GMRES, the matrix vector products required by

the GMRES algorithm are computed without forming the matrix explicitly. Matrix-free

GMRES [14] is a very attractive technique for dealing with complicated Jacobian matri-

ces, because it reduces memory usage considerably and eliminates the problem of explicitly

forming the Jacobian matrix. This is especially helpful for higher-order unstructured mesh

solvers where full (analytic) Jacobian calculation is extremely costly and difficult, if not

impossible. GMRES efficiency depends strongly on the conditioning of the linear system.

This is especially important for higher-order discretization, which makes the Jacobian ma-

trix more off-diagonally dominant and quite ill-conditioned, and for the Euler equations

(compressible flow) with the non-linear flux function and possible discontinuities in the so-

lution. Applying a good preconditioner for GMRES under these circumstances becomes a


necessity [49].

As a result, there are a wide variety of Quasi-Newton methods, where the forming of the

Jacobian, solving the linear system, choosing the preconditioner and starting up the solution

process are the key factors in overall performance and robustness of the solver.

For structured meshes, Pueyo and Zingg [63, 64, 65] presented an efficient matrix-free

Newton-GMRES solver for steady-state aerodynamic flow computations. They investigated

the efficiency of different Quasi-Newton methods, Incomplete Lower-Upper (ILU) precondi-

tioning strategies (see section 5.2.3), and reordering techniques for a variety of compressible

inviscid, laminar and turbulent flows using a GMRES iterative solver. They showed that

the Approximate Newton method using matrix-free GMRES-ILU(2) with Reverse Cuthill-

McKee reordering, when the lower order Jacobian is employed as the preconditioner matrix,

provides the best efficiency in terms of CPU time for most cases. Later on Nichols and Zingg

[54] developed a 3D multi-block Newton-Krylov solver for the Euler equations using the

same approach. Through parametric study, they showed that ILU(1) provides an adequate

balance between good preconditioning and low computational time.

For unstructured meshes, Venkatakrishnan and Mavriplis [90] developed an approximate

Newton-GMRES implicit solver for computing compressible inviscid and turbulent flows

around a multi-element airfoil. They compared different preconditioning strategies and

found out that GMRES with ILU preconditioning had the best performance. In their case

the graph of the linearized Jacobian and the unstructured mesh were the same, as the

Jacobian was approximated based on the direct neighbors, and ILU(0) demonstrates satis-

factory result. Barth and Linton [14] successfully applied both full matrix and matrix-free

Newton-GMRES for computing turbulent compressible flow on unstructured meshes. They

also presented a technique for constructing matrix-vector products which is an exact calcu-

lation of the directional derivatives. Delanaye et al. [24] presented an ILU preconditioned

matrix-free Newton-GMRES solver for Euler and Navier-Stokes Equations on unstructured

adaptive grids using quadratic reconstruction. This study shows ILU(0) in preconditioning

of stiff problems when the Jacobian on the left hand side is higher-order can be insufficient

for reaching full convergence. By permitting one more fill-level in the ILU decomposition

(ILU(1)), full convergence was achieved. A totally matrix-free implicit Newton-GMRES

method was introduced by Luo et al. [44] for 3D compressible flows using LU-SGS (Lower-

Upper Symmetric Gauss-Seidel) as the preconditioning strategy. They completely elimi-

nated the storage of the preconditioning Jacobian matrix by approximating the Jacobian

with numerical fluxes. However, most probably because of the stability consideration for

their preconditioning strategy (LUSGS), the full Newton iterations were not performed and


the convergence rate remained nearly linear. Manzano et al. [45] presented an efficient ILU

preconditioned matrix-free Newton-Krylov(GMRES) algorithm for 3D unstructured meshes.

They used different levels of fill (ILU(1-3)) depending on the case and the flux residual to

achieve optimum performance. Nejat and Ollivier-Gooch [50] developed a LU-SGS precon-

ditioned matrix-free Newton-GMRES algorithm for higher-order inviscid compressible flow

computations. Their results show that LUSGS-GMRES works almost as efficiently for the

third-order discretization as for the second-order one. They also presented a supersonic

airfoil case in which the third-order discretization converged faster than the second-order

discretization. That again raised the possibility that using higher-order discretization could

in fact increase both accuracy and efficiency.

1.3 Objective

To develop an efficient and accurate algorithm for a compressible viscous flow (i.e. Navier-

Stokes equations), the first step is to develop the base flow solver for a compressible inviscid

flow (i.e. Euler equations). It is worth mentioning that the extension of an inviscid solver to

the viscous solver will not affect the overall algorithm in principle, although application of

anisotropic/hybrid meshes as well as computing the viscous flux function (instead of inviscid

one) are required.

The goal of this research is to develop an efficient and accurate solution algorithm for inviscid

compressible fluid flow simulations using unstructured meshes. To achieve this objective

an existing higher-order unstructured reconstruction procedure [60] is combined with an

efficient implicit time advance algorithm.

1.4 Contributions

A fast and efficient implicit (matrix-free) algorithm is designed and successfully implemented

for the 2nd, 3rd and 4th order accurate steady-state computation of inviscid compressible

flows. The robustness and accuracy of the developed solver have been verified through

several test cases. It should be noted that the 4th-order unstructured finite volume solution

for inviscid compressible flows has not been available prior to this research, and the currently

available 3rd-order results in the literature lack curved boundary implementation.


The solution process has been divided into two separate phases, a start-up phase and a

Newton phase. A defect correction procedure is proposed for the start-up phase consisting

of multiple implicit pre-iterations.

This research provides a deep insight into preconditioning, a comprehensive convergence

comparison, and a meaningful cost breakdown study for the implicit algorithm for the 2nd,

3rd and 4th-order unstructured discretization methods.

A differentiable switch is designed for the limiting of the higher-order terms in the recon-

struction polynomial in the vicinity of discontinuities.

Accuracy assessment is performed for a series of independently generated irregular unstruc-

tured meshes, which based on the author’s knowledge to this extent is unprecedented and

proves the overall accuracy of the developed solver algorithm.

Unstructured finite volume solution of the 2nd, 3rd and 4th-order methods have been com-

pared in detail for subsonic, transonic and supersonic cases.

The possibility of reducing computational cost required for a given level of accuracy using

high-order unstructured discretization is demonstrated.

1.5 Outline

In Chapter 2, the main flow solver is introduced, including governing equations, implicit time

advance algorithm (linearization of the flux in time), upwind flux formulation, boundary

flux treatment, and flux integration.

In Chapter 3, the employed reconstruction procedure, monotonicity enforcement and higher-

order limiter implementation are discussed; a new formula for limiting the higher-order terms

of the reconstruction polynomial is introduced including a smooth switch.

In Chapter 4, the Jacobian matrix calculation used in preconditioning is described. A first-

order approximate analytical Jacobian is derived for the interior and boundary fluxes. Also,

a similar low-order Jacobian computation using the finite difference technique is introduced

in addition to further discussion on the accuracy of this finite difference Jacobian.

Chapter 5 lays out the solution strategy and introduces the matrix-free GMRES linear

solver (with special higher-order considerations) and the applied preconditioning in detail.


Chapter 6 and 7 are devoted to the numerical results which include the verification cases

(Part I) and simulation cases (Part II) respectively. First the accuracy and performance of

the developed solver are verified for basic (not necessarily easy) test cases. Then fast con-

vergence and robustness of the proposed higher-order unstructured Newton-Krylov solver

have been investigated for subsonic, transonic, and supersonic flows. Furthermore solutions

of different orders of accuracy for all test cases have been compared in detail.

Finally, in Chapter 7, the thesis is brought to closure by summarizing the research, describ-

ing the contributions, providing conclusions based on the results, and recommending some

future work.

Chapter 2

Flow Solver

2.1 Governing Equations

Conservation of mass (continuity), momentum (Newton’s second law) and energy (first

Thermodynamics’s Law) are the principal equations which govern the dynamics of all fluid

flows. To simulate a fluid flow field, we need to solve these equations with proper implemen-

tation of physical boundary conditions. If dissipation, viscous transport phenomena, and

thermal conductivity are neglected in a fluid flow, the flow is considered to be non-viscous

or inviscid. Leonhard Euler, the Swiss scientist for the first time derived (1775) inviscid

fluid flow equations, known today as Euler equations. Euler equations are the limit form of

the Navier-Stokes equations, where Reynolds number goes to infinity. For many practical

aerodynamic applications, Euler flow is a relatively accurate representation of the flow field

and includes both rotational and discontinuous (shock) phenomena in the flow providing an

excellent approximation for lift, induced drag and wave drag. Also, a robust Euler solver is

an essential part of any Navier-Stokes solver.

The finite volume formulation of the unsteady 2D Euler equations for an arbitrary control

volume can be written in the following form of a volume and a surface integral (2.1), where

U is the solution vector in conservative variables, and F is the flux vector1.

d

dt

∫

CV

UdV +

∮

CS

FdA = 0 (2.1)

1More formally, F should be referred as a dyad.

18

CHAPTER 2. FLOW SOLVER 19

where

U =

ρ

ρu

ρv

E

, F =

ρun

ρuun + P nx

ρvun + P ny

(E + P )un

(2.2)

In (2.2), un = unx + vny and [ρ ρu ρv E]T are the densities of mass, x-momentum, y-

momentum and energy, respectively. The energy is related to the pressure by the perfect

gas equation of state, (i.e. P = ρRT ). The following equations (or definitions) correlate

thermodynamic properties such as energy and enthalpy with density, temperature, pressure

and velocity in the Euler equations:

E =P

γ − 1+ ρ

(u2 + v2)

2(2.3)

γ =Cp

Cv, Cp =

γR

γ − 1, Cv =

R

γ − 1, R = Cp − Cv (2.4)

e = CvT, et = e +1

2(u2 + v2), E = ρet, h = e +

P

ρ, (2.5)

In (2.3), E is the total energy and γ is the ratio of specific heats. In (2.4) R is the gas

constant, Cp and Cv are the specific heats. Finally in (2.5) e is the internal energy per unit

mass (specific energy), et is the total energy per unit mass, and h is the specific enthalpy.

For a compressible flow, most of the flow properties are described as a function of Mach

number, M , which is a non-dimensional form of the flow speed, and it is defined as the ratio

of velocity magnitude to the local sound speed, a.

M =

√u2 + v2

a, a =

√γP

ρ=√

(γ − 1)h =√

γRT (2.6)

Normally for aerodynamic applications, density is in the order of 1, velocity is in the order

of 102, and energy is in the order of 105. As a result, the solution vector, U , and the flux

vector, F , (2.2) in their dimensional form have quantities with very different scales. This


introduces tremendous potential for numerical errors, as well as unnecessary stiffness of

the resultant linearized system of equations (next section). To avoid that issue we need to

normalize our system of equations, in such a way that all quantities become order of one.

This can be done by normalizing the quantities respect to some properly chosen reference

values, like free stream values. While normalization does not change the form of the original

equations, it transforms them to the proper non-dimensional form. However, some of the

correlations between the properties may look different. For example, in the next couple of

lines we show part of this process for Euler equations:

Reference quantities:

ρref = ρ∞, Tref = T∞, Pref = γP∞, Rref = R∞ = Const. (2.7)

Non-Dimensional quantities:

ρn =ρ

ρref, Tn =

T

Tref, Pn =

P

Pref, Rn =

R

Rref= 1 (2.8)

It is not difficult to show that the speed of sound in its non-dimensional form is equal to

the square root of normalized temperature, i.e. an =√

Tn .

In general, the mathematical characteristics of the fluid flow equations depend on the flow

speed regime and they change with Mach number. For instance the system of steady

Euler equations are elliptic in space for subsonic flow and they are hyperbolic in space for

supersonic flow. If the fluid flow equations are written in their unsteady form, they would

be hyperbolic in time independent of the speed regime. As a result, it is relatively easy to

integrate fluid flow equations in time instead of space. The set of unsteady Euler equations

are a first-order nonlinear PDE, which make a hyperbolic system in time, and therefore

properties propagate along the characteristic lines. To obtain a steady-state solution time

marching process would be continued up to the point that all time derivatives reach some

tolerance criteria.

2.2 Implicit Time-Advance

Assuming the discretized physical domain does not change in time, U can be brought out

from the integral in Eq. (2.1), as the average solution vector of the control volume:


dUi

dt= − 1

ACVi

∮

CSi

FdA (2.9)

The left hand side of (2.9) is the first time derivative of the average solution vector in each

control volume and the right hand side represents the spatial discretization for the same

control volume. Right hand side of (2.9) is called flux integral or residual of the control

volume, which is a nonlinear function of solution vector, and can be re-written in the form

of (2.10) for control volume i :

dUi

dt= −R(Ui) (2.10)

To form an implicit expression for (2.10), we need to evaluate both sides of the equation at

the time level “n+1”. This can be done by backward time differencing (backward Euler) of

the left hand side and residual linearization in time for the right hand side.

Un+1i − Un

i

∆t= −R(Un+1

i ) = −[R(Un

i ) +∂R

∂U(Un+1

i − Uni ) + O

((Un+1

i − Uni )2)]

(2.11)

or

(I

∆t+

∂R

∂U

)δUi = −R(Un

i ), Un+1i = Un

i + δUi (2.12)

where ∂R∂U

is the Jacobian matrix resulting from residual linearization. Equation (2.12)

is a large linear system of equations which should be solved at each time step to obtain

an update for the vector of unknowns. With this approach the accuracy in time is first

order; however, as we are only interested in steady-state solution, time accuracy is not our

concern, and advancing the solution in time continues till the residual of the non-linear

problem practically converges to zero.

In Eq. (2.12), the Jacobian matrix, ∂R∂U

, is an essential element in implicit formulation, and

setting Jacobian equal to zero is equivalent to explicit time advance. Note that in the case

of Euler flux, Jacobian calculation is not a trivial task and it takes a considerable amount of

effort. The degree of difficulty of Jacobian evaluation depends on the type of the employed

flux formula and its complexity. The details of Jacobian evaluation will be discussed in


Chapter 4. Having the Jacobian computed, solving the resultant large linear system in

each time iteration which is often very ill conditioned and off-diagonal is another part of

implicit solvers. Due to size and sparsity structure (graph) of the matrix, subspace iterative

methods are the most appropriate choice for solving such a system (Chapter 5).

Coding an implicit formulation is much more complicated than similar explicit formulation

and efficient parallelization of implicit solvers is a relatively difficult process. However,

implicit methods have a critical advantage over explicit methods which is stability and fast

convergence. Implicit methods do not suffer from restrictive stability condition of explicit

methods. In theory they are unconditionally stable and by taking large time steps steady-

state solution is reached rapidly. If we take an infinite time step Eq. (2.12) becomes

Newton iteration (2.13). Newton’s method generally converges quadratically in the vicinity

of solution. However, near singularities the convergence rate may degrade from quadratic

rate to linear rate [14].

(∂R

∂U)δUi = −R(Un

i ), Un+1i = Un

i + δUi (2.13)

In practice, taking a time step too large is not beneficial when the solution process is started

from a poor initial guess. Because Euler flow is highly non-linear, the linearization of the

flow equations is not an accurate representative of the original non-linear problem at an early

stage of the iterations. Therefore any update based on a large time step (such as Newton

update) would be inaccurate, often causing instability or stall in convergence. Normally, a

start-up process is needed to advance the solution from an initial guess to a good solution

state for taking large time step and eventually switching to Newton iteration.

One strategy for this start-up process is defect correction, in which the flux integral on the

right hand side is computed to the desired order of accuracy while on the left hand side the

linearization is done based on a lower-order discretization (normally first-order):

(I

∆t+ (

∂R

∂U)low

)δUi = −Rhigh(Un

i ), Un+1i = Un

i + δUi (2.14)

With this approach, the higher-order Jacobian computation which is very expensive and

inaccurate in early stage of iterations is avoided. As the linear system based on lower-order

discretization has simpler structure in terms of the matrix bandwidth and is better condi-

tioned, solving the resultant system is easier and more stable especially for stiff Jacobians.

The defect correction linear system is relatively inexpensive to form and it can be precon-

ditioned effectively by the same matrix. In addition, early high-frequency oscillations are


damped efficiently when a moderate time step is used which is necessary anyhow for the

start up process.

Another start-up technique is using Eq. (2.12) based on higher-order Jacobian but taking

small time steps at early stage of solution process. With this approach ∆t is small, the linear

system would be diagonally dominant or at least better conditioned, and consequently easier

to solve. The drawback is in the case of complicated flows like transonic flow, higher-order

Jacobian would be very ill-conditioned, and to be able to solve the linear system we need

to choose very small ∆t, which may not accelerate the solution convergence efficiently.

In practice one may combine the above strategies with mesh sequencing, starting the solution

process on coarser meshes and then transferring it to a final fine mesh. First-order solution

also could be used as an initial guess either independently or in combination with other

strategies. In general, it is hard to say which strategy or combination will work best for the

start-up process as it depends on the problem and type of linear solver (Chapter 5).

2.3 Upwind Flux Formulation

To compute the control volume residual or flux integral, we need to evaluate the numerical

flux at Gauss points located on control volume boundaries and integrate along the boundary

edges. In other words, residual or flux integral computation includes three steps:

1. Reconstructing solution vector V (in primitive variables) over each control volume up

to the desired accuracy (next chapter).

2. Computing the numerical flux vector at the control volume interfaces based on the

reconstructed V.

3. Integrating numerical flux along the control volume interfaces using Gauss quadrature

rule.

To describe an upwind scheme, it is helpful to start from a first-order linear wave equation,

Eq.(2.15), where λ is the wave speed. For a positive value of λ, this represents the propa-

gation of a wave from left to the right (positive direction) along the x axis, as it is shown

in Fig(2.1).

∂u

∂t+ λ

∂u

∂x= 0 (2.15)


x

uxu1

u2

Propagating Wave

i-1 i i+1λ

Figure 2.1: Propagation of a linear wave in positive direction

Based on the physics and the model equation, it is evident that the information in the field

is propagating with the wave speed and in the wave direction. Therefore, the properties at

point i are influenced by the propagated properties originating from point i−1 and properties

at point i + 1 will not physically affect the properties at point i. As a result it makes sense

that in discretization of wave equation, ∂u∂x

is replaced by ui−ui−1

∆xor by any other type of

backward differencing. It is well understood that using central or forward differencing would

result in non-physical and/or oscillatory behavior and typically instability in solution. In

other words, any numerical scheme used in discretization of the fluid flow equations should

respect the physical characteristics of the fluid flow, i.e. the velocity and direction of the

propagated information throughout the flow field.

The unsteady Euler equation, which is hyperbolic in time, behaves somewhat similarly to

the wave equation, although it is much more complex, has non-linearity, and is in vector

form. In Euler (inviscid compressible) flow, information travels along characteristic lines,

and the slope of these lines are the eigenvalues of the Jacobian matrix, ∂F∂U

, which is the

derivative of the Euler flux vector with respect to the solution vector, Eq (2.16 ).


∂F

∂U=

0 nx ny 0

(γ − 1)eknx − uun un − (γ − 2)unx uny − (γ − 1)vnx (γ − 1)nx

(γ − 1)ekny − vun vnx − (γ − 1)uny un − (γ − 2)vny (γ − 1)ny

((γ − 1)ek − ht)un htnx − (γ − 1)uun htny − (γ − 1)vun γun

(2.16)

et = e + ek, ek =1

2(u2 + v2), ht = h + ek

λi =[

un un un + a un − a]

(2.17)

These eigenvalues, Eq (2.17), describes the direction and speed of disturbance (information)

propagation in the flow field. Obviously, the upwinding direction is determined based on the

direction of these characteristic lines. Several categories of upwind methods such as Flux

Vector Splitting (FVS), Flux Difference Splitting (FDS) and Advection Upstream Splitting

Method (AUSM) have been developed during the last three decades for flux evaluation and

have been successfully employed in compressible flow computation. Hirsch [35] and Laney

[41] reviewed upwind schemes in detail.

2.3.1 The Godunov Approach

S.K. Godunov in 1959 introduced an elegant idea in numerical flux calculation [35]. He

suggested the use of the locally exact Euler solution for the boundary interfaces of domain

cells. The solution in each cell is considered to be piecewise constant, and the cell interface

separates two different solution states, i.e. UL and UR (Fig ( 2.2)). Knowing the initial

solution, evolution of the flow to the next time step can be exactly calculated through the

wave interactions originating at the boundaries between two adjacent cells. This leads to

solving the shock tube or Riemann problem. The Riemann problem has an exact solution

composed of a shock wave, a contact discontinuity and an expansion fan separating regions

with constant solution states. Propagating the aforementioned waves over a time interval

4t changes those regions and their states from one solution at t = t1 to t = t1 + 4t.

All waves propagate consistently with the upwind directions, making the resultant solution

depend only on the physical zone of dependence. As shock tube problem requires solving

a non-linear algebraic equation, which is quite time consuming, implementing an exact


1:L 2 3 4:R

Contact Surface Shock WaveExpansion Waves

Discontinuity Interface

U UL R

PL PR>

t=t1

∆tt=t +1

Figure 2.2: Shock-Tube problem

Riemann solver for upwind flux evaluation would not be efficient2. Therefore, different

approximate Riemann solvers have been developed to reduce the computation cost through

some averaging procedure. Roe’s approximate Riemann solver, introduced in the early

1980’s [69] is one of the most successful and commonly employed approximate Riemann

solver in modern CFD is used in this research.

2.3.2 Roe’s Flux Difference Splitting Scheme

Roe’s approximate Riemann solver [70] is based on a characteristic decomposition of the

flux differences and is a clever extension of the linear wave decomposition to the non-linear

shock tube problem keeping the conservation properties of the scheme. The detail of Roe’s

scheme is quite complicated [41] and here for brevity just the flux formulation is presented.

The Roe’s flux formula at the interface of a control volume is equal to the average fluxes of

left and right states minus a differencing term which splits the difference of the fluxes on

both sides of the control volume:

F (UL, UR) =1

2[F (UL) + F (UR)] − 1

2

∣∣∣A∣∣∣ (UR − UL) (2.18)

2There is some evidence suggesting that an exact Riemann solver for polytropic gases can be competitivewith most approximate techniques [31].


A is the Jacobian matrix evaluated based on the Roe’s average properties, and∣∣∣A∣∣∣ is written

in diagonalized form in practice as:

∣∣∣A∣∣∣ = X−1

∣∣∣Λ∣∣∣ X ,

∣∣∣Λ∣∣∣ = Diag(λi) (2.19)

where X are the right eigenvectors and Λ are the eigenvalues of the Jacobian matrix [71].

Roe’s average properties are given by [69]:

ρ =√

ρLρR

u =

√ρLuL +

√ρRuR√

ρL +√

ρR

v =

√ρLvL +

√ρRvR√

ρL +√

ρR

ht =

√ρLhtL +

√ρRhtR√

ρL +√

ρR(2.20)

Although Roe’s scheme is one of the most accurate available flux functions, it is quite an

expensive method for flux computation and due to its matrix differencing term it requires

O(n2) operations per control volume in each iteration (n is number of equations). If a real

gas is used, then differencing term should be derived specifically based on the new equation

of state, which could introduce some difficulty in derivation of differencing matrix. At the

same time applying the Roe’s flux formula in implicit context, requires differentiation of

differencing matrix which has proven to be very tricky and expensive even for perfect gas

(Chapter 4). And finally some empirical cure needs to be implemented in Roe’s formulation

in the case that Mach number goes towards one (sonic speed) or when Mach number goes

to zero (stagnation point). In either case, we are dealing with eigenvalues equal to zero and

the method can not distinguish the correct upwind direction resulting either non-physical

solution (entropy reduction) or lack of convergence due to its non-differentiable flux. Adding

a small value of ε to eigenvalues, or rounding the slope of characteristic lines near zero, which

is the taken approach in this thesis Eq. (2.21), could totally resolve that issue (Fig(2.3)).

if |λi| < Ccut off λci =(λ2

i + C2

)C1 (2.21)


λ

t-0.1 0 0.1

0.05

0.1

0.15

0.2

Cut-Off=0.1

Eigenvalue correction

Figure 2.3: Rounding the characteristic slope near zero

C1 =0.5

Ccut off, C2 = C2

cut off , Ccut off ≈ 0.1(typically)

2.4 Boundary Flux Treatment

Implementing correct boundary conditions is a necessary condition for numerical fluid flow

computation. Generally speaking, any kind of mistake in boundary condition treatment

would result in either convergence problem and/or inaccurate solution. This is especially

true if the imposed boundary condition does not match the flow physics. For an implicit

solver it is also essential to formulate the boundary treatment implicitly both for the main

solver and for the preconditioner (Chapter 4), or performance of the implicit solver will be

degraded severely in addition to introducing a restrict CFL limitation. Normally boundary

conditions are categorized as wall boundary and inlet/outlet boundary conditions, which

are discussed in the following subsections.

2.4.1 Wall Boundary Condition

In steady-state inviscid flow, there is a slip velocity condition: velocity normal to the wall

surface is zero (i.e. velocity at the boundary is parallel to the surface). As a result the


convective flux normal to the surface is zero (flow does not cross the wall) and the pressure

flux in the momentum equations is the only remaining flux. Both the continuity and energy

fluxes are zero at steady-state. The easiest way to implement wall boundary flux in Euler

flow is to set Un = 0 in the flux formula. There are other alternative wall treatments

like reflective boundary conditions using ghost cells, or reconstruction of flow properties

(Chapter 3) with a constraint of Un = 0. All those treatments more or less produce the

same result as in all cases the convective flux is forced to be zero. In the current research,

normal velocity is set to be zero using a proper constraint in reconstruction, and pressure

is reconstructed up to the desired order of accuracy at the wall.

2.4.2 Inlet/Outlet Boundary Conditions

As mentioned in section 2.3, the numerical modeling should be consistent with the physical

characteristics of the flow field, and with the information propagating along characteristic

lines (according to eigenvalues of the Jacobian matrix). The inflow and outflow boundary

condition implementation are completely different for subsonic and supersonic flows, there-

fore the first thing is to determine the type of boundary condition. That can be easily done

by computing the flow quantities (primitive variables) at the boundary using reconstructed

values and working out the Mach number and flow direction with respect to the bound-

ary surface. Knowing the type of the boundary, a proper boundary treatment needs to be

carried out.

For a subsonic inlet three eigenvalues are positive, meaning that three characteristic lines

are transferring inlet information from inflow toward the solution domain. The remaining

eigenvalue is negative at subsonic inlet i.e., one set of information travels from the solution

domain towards the inlet. In this situation three parameters, in our case, total pressure,

total temperature, and angle of attack are set at the subsonic inlet and static pressure is

taken from the interior. The following formulas show the detail of computing the velocity

components and density at subsonic inlet.

Tt = 1 +γ − 1

2M2

∞, T∞ = 1.0 (2.22)

Pt =1

γ

(1 +

γ − 1

2M2

∞

) γγ−1

, P∞ =1.0

γ(2.23)


Tb = Tt

(PRB

Pt

) γ−1γ

(2.24)

Pb=PRB : Reconstructed Pressure at the Boundary

Vmb =

√2

γ − 1

(Tt

Tb

− 1

)(2.25)

ub = Vmcos(α) , vb = Vmsin(α) (2.26)

α : Angle of Attack

ρb = γPb

Tb

(2.27)

At a supersonic inlet all eigenvalues are positive which means all boundary fluxes are com-

puted based on the inlet properties (a∞ = 1).

Pb = Pin = P∞ , ρb = ρin = ρ∞ (2.28)

ub = uin = M∞cos(α) , vb = vin = M∞sin(α) (2.29)

At a subsonic outlet, three eigenvalues are negative and one is positive (normal is toward

the solution domain) and therefore three pieces of information are taken from the solution

domain (through reconstruction) and only the static pressure is fixed at the outlet as the

back pressure.

ρout = ρRB , uout = uRB , vout = vRB , Pout = PBP (2.30)

PBP = P∞ for external flow


LR

2nd−Order / 1 Gauss point per face 3rd and 4th−Order / 2 Gauss points per face

LR

Figure 2.4: Schematic illustration of Gauss quadrature points

In the case of supersonic outflow, since all eigenvalues are negative, all the information

is coming from the solution domain towards the boundary, and no condition is set at the

supersonic outlet.

ρout = ρRB , uout = uRB , vout = vRB , Pout = PRB (2.31)

In all cases that implementing boundary condition requires properties to be taken from

the solution domain, the higher-order reconstruction of those properties have been used to

insure the correctness of accuracy of boundary condition treatment.

2.5 Flux Integration

To compute the residual for each control volume, numerical fluxes should be integrated over

control volume edges. The flux residual is the summation of these flux integrals. The accu-

racy of flux integration should be equal or higher than the accuracy of flow reconstruction

in order to evaluate the residuals with higher-order accuracy. This is achieved by Gauss

quadrature integration. For the 2nd-order or linear reconstruction case one quadrature


point per face is used. For the 3rd-order (quadratic reconstruction) and 4th-order (cubic

reconstruction) cases, we use two quadrature points per face with the proper weightings.

In the case of a curved boundary, the quadrature points are located on the curved arc

segment instead of the straight edge for higher-order flux integration. This is due to the

fact that mistaking a curved segment with a straight edge will introduce a 2nd-order error

(i.e. O(δS2), S is the arc length) in both computing and integrating the fluxes at curved

boundaries. Fig (2.4) shows the quadrature points for the flux integration schematically.

More information regarding the locations and weights of the Gauss quadrature integration

points is provided in Ref. [80].

Chapter 3

Reconstruction Procedure and

Monotonicity Enforcement

3.1 Introduction

To compute a higher-order solution, numerical fluxes should be evaluated with higher-order

accuracy containing two separate steps: first, calculating all flow variables anywhere inside

a control volume up to the desired order of accuracy; second, higher-order integration of

the evaluated fluxes over boundary edges using the proper number of Gauss points. The

latter is relatively easy, but higher-order approximation of flow variables inside a control

volume is neither unique nor trivial. This task could be done by implementing higher-order

upwind differencing, and/or higher-order representation of the solution inside a control vol-

ume (i.e. solution reconstruction) using polynomials with degree one or higher [82]. In

structured discretization, this leads to the development of MUSCL (Monotone Upstream-

centered Schemes for Conservation Laws) [35] . For structured meshes, higher-order re-

construction of a solution is much easier due to mesh regularity. However, in the case of

unstructured meshes a more complicated procedure needs to be adopted to represent the

solution with higher-order accuracy within a control volume.

Barth [13] laid out an straight forward procedure for gradient estimation for unstructured

tessellation using Green-Gauss gradient technique. Because of its simplicity and robustness,

the linear Green-Gauss reconstruction has been employed extensively in finite volume solvers

since then. Barth also developed a minimum energy (i.e. Least-Squares) reconstruction

procedure capable to computing the flow variables up to (K+1)th-order of accuracy using

33

CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 34

a Kth-order polynomial known as K-exact reconstruction [10, 12, 11]. The least-square

reconstruction is relatively more difficult to implement but in general the linear least-square

reconstruction is more accurate especially in the case of distorted meshes and/or in the

presence of limiter [4]. That is expected since the least-square technique is not as sensitive

as the Green-Gauss method to noise in the solution. Delanaye [22] has compared the

accuracy of the linear and quadratic reconstruction techniques for a smooth flow (Ringleb’s

flow) on unstructured meshes. The linear least-squares reconstruction performed better

than the linear Green-Gauss reconstruction both in terms of accuracy and preserving the

total pressure (i.e. less entropy production). The quadratic Green-Gauss and least-square

reconstructions behaved very similarly, although this time the total pressure loss was a bit

higher for least-square method.

In this research a K-exact least-squares reconstruction [60, 56] is employed which is de-

scribed in the next section. In section 3.3 the accuracy assessment methodology is studied.

Finally, section 3.4 discusses the monotonicity enforcement and introduces the taken limit-

ing approach to address that issue.

3.2 K-Exact Least-Square Reconstruction

The goal here is to reconstruct a solution polynomial based on the control volume data

such that the truncation error of the solution in each control volume would remain in the

order of (∆x)K where (∆x) is the local mesh length scale. Obviously, certain physical and

mathematical criteria need to be met to reconstruct such a polynomial for each control

volume. These criteria are: 1- Conservation of the Mean, 2- K-Exact Reconstruction, 3-

Compact Support which are introduced briefly in next three subsections. In the last two

subsections the implementation of boundary condition at Gauss points using boundary

constraints, and solving the least-square system will be discussed.

3.2.1 Conservation of the Mean

The finite volume approach provides us piecewise constant data in each control volume

which is equal to the integral control volume average. This integral average is updated from

one iteration to the next and is the most meaningful measure of the physical quantities in

each control volume. A Kth-order piecewise polynomial U(K)R (x, y) can be reconstructed in

such a way that the integral of U(K)R (x, y) over the control volume is equal to the control


volume average Eq. (3.1). This is a very important property called Conservation of the

Mean.

∫

CV

U(K)R (x, y) = UCV (3.1)

U(K)R (x, y) =

∑

m+n≤k

α(m,n)P(m,n)(x − xc, y − yc) (3.2)

P(m,n)(x − xc, y − yc) = (x − xc)m(y − yc)

n (3.3)

(xc,yc) are the coordinates of the reference point for the control volume, which in our case

(cell center formulation) is the cell centroid, Fig (3.1).

In general, α(m,n) are the derivatives of a continuous function represented by the recon-

structed solution polynomial (U(K)R (x, y)) with respect to the independent geometric vari-

ables (i.e. x and y) :

U(K)R (x, y) = U(xc, yc) +

∂U

∂x

∣∣∣∣C

∆x +∂U

∂y

∣∣∣∣C

∆y +

∂2U

∂x2

∣∣∣∣C

∆x2

2+

∂2U

∂x∂y

∣∣∣∣C

∆x∆y +∂2U

∂y2

∣∣∣∣C

∆y2

2+

∂3U

∂x3

∣∣∣∣C

∆x3

6+

∂3U

∂x2∂y

∣∣∣∣C

∆x2∆y

2+

∂3U

∂x∂y2

∣∣∣∣C

∆x∆y2

2+

∂3U

∂y3

∣∣∣∣C

∆y3

6+

... (3.4)

where,

∆x = x − xc , ∆y = y − yc (3.5)

Imposing the conservation of the mean condition, or mean constraint is accomplished by

integrating of U(K)R (x, y) over the control volume i and making that equal to the control

volume average i.e. Eq. (3.1). For simplicity all the derivative terms in the following

equations are replaced by simpler expressions like: Uxy ≡ ∂2u∂x∂y

.


Ci

SN

SN

SN

FN

SN

SN

SN

FN

FN

TN

TN

TN

TN

TN

TN

TN

TN

TN

TN

TN

Figure 3.1: A typical cell center control volume and its reconstruction stencil, includingthree layers of neighbors


Equation (3.6) can be rearranged by introducing control volume moments definitions:

U i = UC +1

Ai

⟨Ux|C Mx + Uy|C My

⟩i+

1

Ai

⟨1

2Uxx|C Mx2 + Uxy|C Mxy +

1

2Uyy|C My2

⟩

i

+

1

Ai

⟨1

6Uxxx|C Mx3 +

1

2Uxxy|C Mx2y +

1

2Uxyy|C Mxy2 +

1

6Uyyy|C My3

⟩

i

+

... (3.6)

where,

Mxmyn |i=

∫

CVi

(x − xc)m(y − yc)

ndA (3.7)

These moments can be computed up to the desired order of accuracy by Gauss’s theorem.

Notice that replacing m and n in Eq. (3.7) by zero computes the area of the control volume.

The introduced mean constraint is the first row (equation) in the least-square system.

3.2.2 K-Exact Reconstruction

Consider a problem which is smooth and has an exact solution in the form of UE(x, y).

If⟨U

(K)R (x, y)

⟩iis a Kth-order polynomial reconstructing an approximate solution for the

problem over control volume i, then the error in solution reconstruction for the control

volume i would be in the order of O(∆xi)k+1 Eq. (3.8).

⟨U

(K)R (x, y)

⟩i= UE(x, y) + O(∆xi)

k+1 (3.8)

In other words⟨U

(K)R (x, y)

⟩ireconstructs the solution exactly if the exact solution is a Kth-

order polynomial, i.e. UE(x, y) = P KE (x, y). This property is called K-Exact Reconstruction.

This property also can be expressed in the integral form:

Error|CVi=

1

Ai

[∫

CVi

UE(x, y)dA −∫

CVi

U(K)R (x, y)dA

]= O(∆xi)

k+1 (3.9)


3.2.3 Compact Support

Notice that the reconstructed polynomial,⟨U

(K)R (x, y)

⟩i, is piecewise and changes from one

control volume to another. Therefore integrating the reconstructed function of a control

volume over its neighbors does not satisfy the average value of the neighboring control

volumes exactly. For the Kth-order solution reconstruction, we are reconstructing a (K-1)th

degree polynomial which satisfies the mean property for the control volume i and minimizes

the error in computing control volume averages of the neighboring control volumes. The

neighboring control volumes used in reconstruction constitute the reconstruction stencil,

Fig (3.1). The reconstruction stencil should include a proper number of neighboring control

volumes for computing the mentioned derivatives. The number of unknowns for the linear,

quadratic, and cubic reconstructions are 3, 6 and 10 leading to 2nd, 3rd and 4th-order

accuracy. Generally speaking, for a K-th order accurate reconstruction, at least K(K +1)/2

neighboring control volumes must be included in the reconstruction stencil. However, for

practical reasons one may use a few more control volumes in the reconstruction stencil. This

leads to a larger least-squares system but provides additional information for reconstruction

usually resulting in more robust solution reconstruction in the presence of non-smooth

and/or vigorously oscillatory data. In this research, the reconstruction stencils for 2nd, 3rd,

and 4th-order methods include 4, 9 and 16 control volumes respectively. Not surprisingly,

the polynomial⟨U

(K)R (x, y)

⟩ifor control volume i should be reconstructed based on control

volumes that are topologically and physically close to the control volume i. This property

is called Compact Support. Consequently, the 4th-order reconstruction stencil covers three

layers of the neighboring control volumes. This could adversely affect the compactness

of the reconstruction for very coarse meshes. Since for very coarse meshes the distance

spanned by the reconstruction stencil is uncomfortably close to the characteristic size of the

domain, full convergence might not be achieved as reconstruction stencils of the boundary

control volumes overlap considerably with the reconstruction stencils of interior control

volumes. Because the size of the stencil and the associated cost of the reconstruction

increase quadratically with the order of accuracy, there is a practical limit to the benefits

of the classic finite-volume higher-order reconstruction.

In reconstructing a non-smooth data in the vicinity of a discontinuity such as shock wave

even when we use a compact stencil geometrically, the stencil is not compact physically

as the data employed in reconstruction are irrelevant for control volumes across the shock

wave. There are two different approaches to address this issue. One is choosing the best

stencil containing the smoothest possible data and/or down playing the role of non-smooth

data in reconstruction by assigning small weights to them leading to ENO (Essentially Non

Oscillatory) and WENO (Weighted Essentially Non Oscillatory) schemes [33, 43, 3, 57].


C

G1G 2

Figure 3.2: Imposing boundary constraint at the Gauss boundary points

The other passive approach is to employ a limiter and enforce monotonicity by correcting

(reducing) the computed gradients, suppressing over and under-shoots resulting from recon-

struction over a non-physically compact stencil [13, 87, 24]. The latter approach has been

taken in this research.

3.2.4 Boundary Constraints

In addition to the Mean Constraint, it is also possible to include additional boundary

constraints to the reconstructed polynomial satisfying physical boundary condition at the

solution domain boundaries [80, 60]. The straightforward way of doing this is imposing the

boundary constraint at Gauss points where fluxes are actually calculated. Therefore, the

coefficients of the reconstructed polynomial are computed such that the physical boundary

condition is imposed automatically. As a result, number of boundary constraints per control

volume is equal to the number of control volume Gauss points on boundary edges, Fig(3.2).

3.2.4.1 Dirichlet Boundary Constraint

For a Dirichlet boundary condition, the value of the flow variable is known at the boundary.

Consequently, the reconstructed polynomial should give us the same boundary value at the

Gauss points (3.10):


U(K)R (xG1 , yG1) = UDB(xG1 , yG1) , U

(K)R (xG2 , yG2) = UDB(xG2 , yG2) (3.10)

3.2.4.2 Neumann Boundary Constraint

In Neumann boundary condition the normal derivative of the flow variable is known at

the boundary edge (3.11). Therefore, we need to take the normal derivative from the

reconstruction polynomial at the boundary Gauss point and equate that to the known

normal derivative provided by Neumann boundary condition (3.12 and 3.13):

∂U(K)R

∂n= UNB (3.11)

∂U(K)R (xG1 , yG1)

∂x(nx)G1 +

∂U(K)R (xG1 , yG1)

∂y(ny)G1 = UNB(xG1 , yG1) (3.12)

∂U(K)R (xG2 , yG2)

∂x(nx)G2 +

∂U(K)R (xG2 , yG2)

∂y(ny)G2 = UNB(xG2 , yG2) (3.13)

3.2.5 Constrained Least-Square System

Knowing the constraints and the neighbors involved in the polynomial reconstruction for

control volume i , we can set up the least-square system in the form of Eq. (3.14):

MiC1,1 MiC1,2 MiC1,3 MiC1,4 . . . MiC1,q

BiC2,1 BiC2,2 BiC2,3 BiC2,4 . . . BiC2,q

wi,1N1,1 wi,1N1,2 wi,1N1,3 wi,1N1,4 . . . wi,1N1,q



. . . . . . . .

. . . . . . . .

. . . . . . . .

wi,nNn,1 wi,nNn,2 wi,nNn,3 wi,nNn,4 . . . wi,nNn,q

UC

∂U∂x

∣∣C

∂U∂y

∣∣∣C

∂2U∂X2

∣∣∣C

∂2U∂x∂y

∣∣∣C

.

.

.∂KU∂yK

∣∣∣C

=

Ui

BCi

wi,1UN1

wi,2UN2

wi,3UN3

.

.

.

wi,nUNn

(3.14)

wi,j =1∣∣∣−→XCi−−→

XCj

∣∣∣t

(3.15)


In Eq. (3.14) the first two rows constitute the mean and boundary constraints which have

to be enforced exactly. The rest of rows are additional equations arising from integration

of the i control volume’s polynomial over its neighbors N1, N2, N3, ..., Nn which need to

be satisfied in the least-squares sense, minimizing the L2 norm of the system residual. Of

course, for interior control volumes there is no boundary constraint and the second row does

not appear in the least-square system. A weighting strategy is also considered in forming

the least-square system. Weights can be a function of geometric parameters and/or the

solution data [10, 57]. The main purpose of these weights is to control the influence of the

neighboring solution data on reconstruction. In this research only geometric weighting in

the form of Eq. (3.15) with t = 1 is used, weakening the effect of the solution data far from

the reconstructed control volume by the geometric distance ratio.

To satisfy the constraint exactly, Gaussian elimination with column pivoting is performed

on (3.14) for as many rows as the number of constraints in the system. The remaining least-

square system is reduced to a upper-triangular system using Householder transformation

[92] with the proper scaling, and finally the system is solved through backward substitution.

3.3 Accuracy Assessment for a Smooth Function

Error norms are the most common measure for accuracy assessment of a numerical solution.

They provide useful information about over all accuracy of the numerical solution as well

as local source of the (maximum) error. For a 2D finite volume formulation, the general

form of the error norm can be computed by Eq. (3.16), where p is the norm index, Ncv is

the total number of control volumes in the domain, Ai is the area of the control volume i,

and Ei is the average error of the control volume i.

Lp =

(∑Ncv

i=1 Ai

∣∣Epi

∣∣∑Ncv

i=1 Ai

) 1p

(3.16)

Ei =1

Ai

∫

CVi

Error(x, y)dA (3.17)

L1 and L2 are the global norms and are good indicators for over all accuracy. However, L∞

would be the largest magnitude of the error in the solution domain and it is a local error

indicator.

L∞ = max∣∣Ei

∣∣


Having a proper tool for error measurement, now we proceed with the accuracy assessment

discussion. As previously mentioned (3.2.2), a K-exact reconstruction reconstructs the

solution to the (K+1)th order of accuracy. To verify the accuracy of our reconstruction

technique, we need to examine it for a smooth function. Here a simple but very common

procedure is introduced to assess the accuracy of the numerical solution over discretized

domains.

Assume a 1-D finite volume problem over a uniformly discretized domain of M1 with the

length scale of ∆x. The average numerical solution for an arbitrary control volume i of M1

can be written in the following form:

Ui

∣∣M1

=1

∆x

∫

CVi

UE(x)dx + O(∆x)K (3.18)

where, Ui is the control volume average, UE(x) is the exact solution of the problem and

K is the discretization order (i.e. the polynomial of (K − 1) degree is used in the solution

reconstruction). Then the average solution error for each control volume can be expressed

as:

Error∣∣CVi

=1

∆x

∫

CVi

UE(x)dx − Ui

∣∣M1

≡ O(∆x)K

Therefore a similar relation is held in the case of taking any norm of the error (‖E‖p=1,2,∞)

over the same domain:

‖E|M1‖ ≡ O(∆x)K (3.19)

Now, if M2 represents the discretization of the same problem over the similar domain where

the discretization length scale is divided by half (( ∆x2 )), the error norm on M2 is:

‖E|M2‖ ≡ O(∆x

2)K (3.20)

and the ratio of these two error norms would be only a function of the discretization order:

‖E|M1‖‖E|M2‖

≡ O(∆x)K

O(∆x2 )K

= 2K (3.21)

To examine the order of discretization accuracy, employing (3.21), some error norms must

be plotted in logarithmic scale versus of mesh size, where the mesh length scale is uniformly


reduced each time by half. The asymptotic slope of the Error-Mesh plot will show the

numerical order of accuracy. This analogy can be extended to multiple dimensions as long

as the mesh length scale is decreased each time uniformly in each dimension all through the

domain. For unstructured meshes and in general case, it is nearly impossible to decrease

the mesh length scale uniformly. Even if we try to refine the whole mesh globally in a

self similar way (all angles and ratios remain the same in the refinement process) it would

not be feasible for all triangles (except for structured triangulation of regular geometries).

Therefore in general, the best that we can do is using a series of semi-uniform unstructured

meshes where the density of mesh each time is increased by factor of four. With this

approach we hope that the over all mesh length scale is reduced by factor of two.

Based on the Eq. (3.21) the ratios of 4, 8, and 16 for error norms are expected for 2nd,

3rd, and 4th-orders of accuracy should we globally reduce the mesh length scale by half.

But of course in practice, we are neither dealing with uniform unstructured meshes nor can

we refine them uniformly everywhere, and as a consequence getting the nominal order of

accuracy for all norms is not always exactly possible.

3.4 Monotonicity Enforcement

First-order upwind methods often exhibit poor resolution, and do not resolve physical phe-

nomena in the fluid flow accurately since they introduce considerable amount of numerical

diffusion in the solution process. This is due to the nature of these techniques in which

the fluxes are computed based on cell average data. Despite the mentioned weakness, they

have two great advantages, i.e. first they are monotone and produce non-oscillatory results

for whole domain even in the presence of discontinuities and second they are stable and

converge very well.

High resolution upwind methods (higher than first-order), cure the accuracy problem but

they require precautionary measures such as limiting to overcome the oscillatory behavior

and stability issue especially for non-smooth flows.

3.4.1 Flux Limiting

In central difference schemes, additional terms in the form of artificial viscosity are added

to the flux computation such that the accuracy of the solution in the smooth region is not

affected, but these terms provide enough diffusivity in the shock region to make it oscillation


free and stabilize the convergence. One of the most successful effort on this topic was made

by Antony Jameson [38] who introduced a second-order differencing term to address the

stability problem and a fourth-order differencing term to enhance the convergence of the

numerical solution. With this approach the solution in the vicinity of a discontinuity is first

order (locally) while the global accuracy is preserved for smooth regions.

In principle, high resolution upwind techniques use a similar approach, adopting a low-order

resolution near the shock region to take advantage of the monotonicity of the first-order

upwind scheme. This idea can be better explained by Eq. (3.22), where Φ(U : j) is the

flux limiter function and j is the control volume stencil from which U is calculated. In

smooth regions, Φ(U : j) should be very close to one and therefore the flux is computed

with higher-order resolution. In contrast, in discontinuous regions Φ(U : j) must be nearly

zero to eliminate the higher-order flux, recovering the monotonic low-order scheme.

FH(U : j) = FL(U : j) + Φ(U : j) [FH(U : j) − FL(U : j)] (3.22)

In other words, the goal is to provide just enough diffusion to prevent overshoots. One family

of schemes that apply this procedure is the Total Variation Diminishing (TVD) schemes

[35, 42]. In a TVD scheme, fluxes (F (U)) are limited such that the TV, total variation in

the solution over the domain decreases with time in the solution process compared to the

total variation of the initial data:

TV =

∫

Ω

∣∣∣∣∂U

∂x

∣∣∣∣ dx (3.23)

TV (U) =∑

i

|Ui+1 − Ui| (3.24)

TV (Un+1) ≤ TV (Un) ≤ TV (U 0) (3.25)

All TVD schemes result in monotonic (non-oscillatory) solutions and stable convergence.

3.4.2 Slope Limiting

In a structured discretization where there is a clear and straightforward connection be-

tween the mesh data structure and discretization scheme, employing flux limiters to achieve

non-oscillatory solution is quite common and successful. However, in general, this clear con-

nection between mesh and discretization method does not exist anymore for unstructured


meshes and implementing flux limiters are not that straightforward. At the same time, if

the flux function is computed based on Godunov’s approach (even for structured discretiza-

tion), again determining a proper flux limiter is not an easy job. There is another approach

for imposing the TVD condition in the solution process, known as slope or gradient limiting,

which is more appropriate for generic mesh discretization schemes and Godunov’s approach.

In this approach, the computed solution gradient from reconstruction is corrected (reduced)

to meet the monotonicity condition defined by Eq. (3.28):

Umax = max(U i, UFNj) (3.26)

Umin = min(U i, UFNj) (3.27)

Umin ≤ Ui(xG, yG) ≤ Umax (3.28)

where U i is the control volume average and UFNjare the control volume averages of the

first neighbors. This limiting procedure can be better explained by visualizing a linear

reconstruction for a simple 1-D case. Fig (3.3-a) illustrates a typical unlimited linear re-

construction solution for 1-D control volume averages. Using a slope limiter will correct

the overshoots and undershoots in the solution reconstruction by reducing the slope of the

reconstruction as shown by Fig (3.3-b).

In theory, Eq. (3.28) should be valid for all points inside the control volume i, but in practice

this condition will be checked and satisfied (if necessary) only for Gauss points where the

actual fluxes are computed. Assuming a linear reconstruction (2nd-order method), for the

control volume i (Fig(3.4)), the unlimited reconstructed value at the Gauss point G is

written in the form of Eq. (3.29):

UG = U(xc, yc) + ∇U |C −→r G (3.29)

U(xc, yc) is the value of the flow variable at the cell center, which in this case (2nd-order) is

also the average value of the control volume. The ∇U |C computed from the reconstruction

procedure needs to be adjusted according to the monotonicity condition (3.28) by a scalar

value φ called as the slope limiter:

UG = U(xc, yc) + φi ∇U |C −→r G (3.30)

The goal is to determine the largest acceptable value for φi in such a way that the computed


i−2 i−1 i i+1 i+2

(a) An unlimited linear reconstruction

i−2 i−1 i i+1 i+2

(b) A limited linear reconstruction

Figure 3.3: Typical unlimited/limited linear reconstruction


C

FN

FN

FN12

3

Gri

Figure 3.4: Using first neighbors for monotonicity enforcement

reconstructed value for U at all Gauss points are bounded by the maximum and minimum

of the neighboring control volume averages (including the control volume i average):

φGj=

min

(1, Umax−U i

UGj−U i

), if (UGj

− U i) > 0

min

(1, Umin−U i

UGj−U i

), if (UGj

− U i) < 0

1 if (UGj− U i) = 0

(3.31)

φi = min(φG1 , φG2 , ...) (3.32)

This limiting procedure was introduced by Barth [13] and such an implementation guaran-

tees the monotonicity principle for all Gauss points. The Barth limiter produces a strictly

monotonic solution and removes all oscillations; however it has some convergence and accu-

racy issues. First, Barth’s limiter formulation is clearly not differentiable and this severely

hampers the convergence process as limiter values oscillate across the shock. Second, in

smooth regions (including the stagnation region), we expect to have some local extrema

which will cause the limiter to fire. This reduces the accuracy of the solution in those

circumstances.

In general, an ideal limiter is differentiable, and acts firmly in the shock region not allowing

oscillatory behavior around discontinuities. Such a limiter also should be inactive in smooth

regions despite existence of non-monotone solutions resulting from smooth local extrema.


Venkatakrishnan limiter [86, 87, 4], which is semi-differentiable, nicely addresses most of

the aforementioned issues, and it has been employed in this research:

∆1,max = Umax − U i , ∆1,min = Umin − U i , ∆2 = UG − U i (3.33)

φG =1

∆2

[(∆2

1,max + ε2)∆2 + 2∆22∆1,max

∆21,max + 2∆2

2 + ∆1,max∆2 + ε2

]if ∆2 > 0

φG =1

∆2

[(∆2

1,min + ε2)∆2 + 2∆22∆1,min

∆21,min + 2∆2

2 + ∆1,min∆2 + ε2

]if ∆2 < 0 (3.34)

sign(∆2) =∆2

|∆2|∆2 = sign(∆2)(|∆2| + ω) (3.35)

ε2 = (K∆x)3 (3.36)

To avoid division by zero or a very small value in Eq. (3.34) as a practical measure ∆2 is

replaced by Eq. (3.35) where ω is chosen to be 10−12 for 64-bit arithmetic computations

(according to Ref. [87]). ∆x in Eq. (3.36) is the mesh length scale and it can be picked as

an average mesh length scale or a local mesh length scale. A local mesh length scale has

been used for this research and it is defined as the diameter of the largest circle that may

be inscribed into a local control volume. This length scale is proportional to the square

root of the control volume area, and as a simplification, we assume the control volumes are

equilateral triangles:

∆x = 2

√Area

3√

3(3.37)

In smooth regions, ∆1,max, ∆1,min, and ∆2 all are on the order of (∆x)2 either in constant

regions or at extrema. Since ε2 is made proportional to (∆x)3 , ε2 dominates ∆1,max, ∆1,min,

and ∆2 terms and we recover the scheme without limiting [86, 87]. For instance, near an

extremum, (say a max) ∆1,max = 0, ∆2 ≤ 0 therefore φ −→ ε2

ε2 ≡ 1. For flat solutions again

the limiter will not be fired. However in the general smooth cases the limiter value will

remain close to 1.0 but not exactly 1.0.

In the shock region ∆1,max, ∆1,min, and ∆2 are O(U) ≈ 1 and they dominate ε2 term

and the limiter fires as it was originally supposed. The key point here is the value of

constant K which is the coefficient of ∆x in Eq. (3.36). K determines the extent of the


monotonicity enforcement by setting a threshold below which oscillations in the solution

are not limited. A very large value of K essentially means no limiting at all, and it could

make the solution process unstable. Normally increasing K up to some value enhances

convergence characteristics as long as divergence does not occur. In contrast, a small value

for that constant slows or stops the convergence although it produces more monotonic

solution. Typical values of K used in literature are 0.3, 1, and 10 and in general the optimal

value both for convergence and accuracy purposes is determined based on experience.

Having computed the limiter value, we can directly multiply all the derivatives in the higher-

order reconstructed polynomial (linear and non-linear) by the limiter value as was described

previously for linear reconstruction:

U(K)G (xG, yG) = U(xc, yc) + φi

[∂U

∂x

∣∣∣∣C

∆x +∂U

∂y

∣∣∣∣C

∆y

]+

φi

[∂2U

∂x2

∣∣∣∣C

∆x2

2+

∂2U

∂x∂y

∣∣∣∣C

∆x∆y +∂2U

∂y2

∣∣∣∣C

∆y2

2

]+

φi

[∂3U

∂x3

∣∣∣∣C

∆x3

6+

∂3U

∂x2∂y

∣∣∣∣C

∆x2∆y

2+

∂3U

∂x∂y2

∣∣∣∣C

∆x∆y2

2+

∂3U

∂y3

∣∣∣∣C

∆y3

6

]+

... (3.38)

However, our experience as well as other research [22, 24] shows implementing the limiter

to all gradients yields a more diffusive solution. The approach of [24, 23] is followed, where

the limiter is applied only to the linear part of the reconstruction, and the non-linear part

of the reconstruction is dropped. This also helps the limiting process around the shock as

higher derivatives tend to show oscillatory behavior in the vicinity of discontinuities. The

limited reconstruction polynomial is expressed as:

U(K)G (xG, yG) = U i +

[(1 − σ)φi + σ

]LinearPart + σ Higher-Order Part (3.39)

where σ is a limiter for higher-order terms. In smooth regions, the full higher-order recon-

struction is applied by choosing σ = 1. Near discontinuities we switch from the higher-order

to the limited linear polynomial to prevent possible oscillatory behavior of second and third

derivatives. This is done by a discontinuity detector employing the value of the limiter; if

the limiter fires aggressively we assign zero to σ. This idea is also shown for a simplified

1-D case in Fig(3.5) .

Using a switch to set σ to either zero or one stalls the convergence. To overcome this


i−2 i−1 i i+1 i+2

(a) An unlimited quadratic reconstruction

i−2 i−1 i i+1 i+2

(b) A limited quadratic reconstruction

Figure 3.5: Typical unlimited/limited quadratic reconstruction


φ

σ

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Figure 3.6: Defining σ as a function of φ

problem, σ is defined as a differentiable function of φ, such that σ is nearly one for the regions

that φ ≥ φ0 and it quickly goes toward zero for other values of φ; making σ differentiable

greatly enhances convergence. As a differentiable switch, a smooth semi-step function is

employed:

σ =1 − tanh(S(φ0 − φ))

2(3.40)

S in (3.40) determines the sharpness of the step function, and adjusts how fast the switching

transition between zero and one is; φ0 defines the limiter value that activates the switch.

It appears that φ0 = 0.8 and S = 20 provide a reasonable switch function whose good

behavior is relatively case independent in this research, Fig(3.6).

Chapter 4

Flux Jacobian

4.1 What is the Jacobian ?

Any implicit formulation for solving a PDE includes some sort of Jacobian calculation. We

seek a solution vector X = (X1, X2, X3, ..., Xn) for a coupled non-linear system of algebraic

equations defined by:

F (X) = 0 (4.1)

where,

F = (F1, F2, F3, ..., Fn) (4.2)

The solution can be found via an iterative process (Newton’s method):

∂F

∂X

∣∣∣∣i

δXi+1 = −F (X i) , X i+1 = Xi + δX i+1 (4.3)

Since F is a vector operator, the derivatives of F respect to the vector X are expressed as

a matrix called Jacobian matrix ( ∂F∂X

), such that the(

∂F∂X

)j,k

entry is the derivative of the

52

CHAPTER 4. FLUX JACOBIAN 53

j-th component of F respect to the k-th component of X:

∂F

∂X=

J1,1 J1,2 . . . J1,n

J2,1 J2,2 . . . J2,n

. . . . . .

. . . . . .

. . . . . .

Jn,1 Jn,2 . . . Jn,n

, Jj,k =∂Fj

∂Xk(4.4)

Generally speaking, in an implicit formulation such as (4.3) the Jacobian matrix constitutes

the coefficient matrix and represents the linearization of a non-linear problem which must

be solved through an iterative process. In CFD the non-linear function F , determined by

the physics of the problem, is the residual of fluxes for the discretized domain. In other

words, we look for a solution vector U (conserved flow variables) which satisfies the flux

balance for all control volumes in a meshed domain:

Res(U) = 0 (4.5)

Equation (4.3) is similar to the implicit time advance formula, Eq. (2.12), except for the

diagonal term in (2.12) normalized by a time step (mainly added for stabilization purpose).

Equation (2.12) can be regarded as Newton’s formula with pseudo time stepping. In either

case, the Jacobian matrix is needed not only for forming the linear system but also for

building the preconditioner matrix, and it is the most expensive part of the implicit solver.

The convergence rate of the implicit solver depends on the accuracy and correctness of the

Jacobian matrix, and to achieve the quadratic convergence, we need to employ the true

flux Jacobian on the left hand side of Eq. (4.3). Since the Jacobian matrix consists of

the derivatives of the discretized governing equations with respect to the flow variables

used in the local discretization, the Jacobian entries in each row are zero except for those

few control volumes that contribute to that flux integral (i.e. union of the reconstruction

stencils of the neighbors). The level of difficulty in flux Jacobian computation depends

on the flux function (physics), the numerical flux formulation (i.e. how the flux is being

computed, such as Roe’s flux formula), type of the mesh, number of dimensions, and order

of accuracy. For instance, taking the derivative of the first order flux function for the 1-D

linear wave equation, Eq. (2.15), in finite-volume formulation is fairly easy ( ∂Fi

∂Ui= λ and

∂Fi

∂Ui−1= −λ), since (F lux)i = λ(ui − ui−1) . But in general, for non-linear flux functions

and/or formulations, especially in multi dimensions and with unstructured meshes, the flux

Jacobian computation involves a large number of control volumes and much more work

per control volume. Consequently, increasing the order of accuracy not only adds to the


complexity of flux Jacobian computation but also makes it considerably costly both in terms

of computation time and memory usage.

The Jacobian matrix can be built by either analytical or numerical (finite difference) dif-

ferentiation; both approaches have been employed quite successfully [84]. Analytical differ-

entiation can be performed either manually, symbolically [94] or automatically [78]. The

analytical approach computes the exact Jacobian (ignoring round off error in numerical

evaluation) but it is relatively difficult to apply this approach for complicated flux functions

and/or higher-order discretizations by manual or even symbolic differentiation. Automatic

differentiation (AD), a very powerful tool for computing complex Jacobians, augments the

computer program to compute the derivatives by applying the chain rule repeatedly to the

elementary arithmetic operations performed by the computer program. The derivatives are

computed relatively cheaply by this method and are accurate to machine accuracy without

any truncation error. However, AD requires very careful programming and a considerable

amount of memory [20], and has not been considered in this research. Numerical differ-

entiation is fairly straightforward even for higher-order and complicated flux functions. It

also works reasonably well where the flux function is not differentiable, but like any other

numerical approach, the numerical differentiation technique suffers from truncation error as

well as round off error.

For some of the Newton-Krylov solvers like GMRES, only products of the Jacobian matrix

by some vectors are needed, and explicit computation of the Jacobian matrix can be avoided.

These products are computed using directional finite differencing. Therefore the higher-

order Jacobian of complex CFD problems can be employed on the left hand side of the

implicit formula to match the true order of accuracy of the residual on the right hand

side, Eq. (2.12), without excessive computational effort and memory usage. However, in

most cases, an approximate Jacobian matrix is required explicitly for preconditioning of

the linear system solver. It is possible to use a simplified or first-order Jacobian (based on

direct neighbors and mostly first-order) for preconditioning of the linear system resulting

from higher-order discretization [51, 50].

In this chapter, first, the lower-order Euler Jacobian computation procedure is presented

for an unstructured mesh stencil based on Roe’s flux formula. This flux Jacobian is used

for preconditioning of the linear system solver. The analytical and numerical approaches

for flux differentiation are discussed step by step.


LNn 1R 1

i

N3

N2

l 1

Figure 4.1: Schematic of Direct Neighbors

4.2 Flux Jacobian Formulation

In the case of the 2D Euler equations discretized over a cell-centered unstructured mesh, each

control volume has 3 direct (first) neighbors, and the first-order flux integral of the control

volume i only depends on these three neighbors and the control volume itself (Fig(4.1)):

Ri =∑

m=1,2,3

(F nds)m = F (Ui, UN1)n1l1 + F (Ui, UN2)n2l2 + F (Ui, UN3)n3l3 (4.6)

where, nm and lm are the outward unit normal and the length of the face m of the control

volume i respectively. To compute the flux Jacobian of the control volume i, we need to

take the derivatives of the flux integral or residual function, Eq. (4.6), with respect to

all control volume averages involved in the residual function evaluation as shown in the

following formulas:

J(i,N1) =∂Ri

∂UN1

=∂F (Ui, UN1)

∂UN1

n1l1 (4.7)

J(i,N2) =∂Ri

∂UN2

=∂F (Ui, UN2)

∂UN2

n2l2 (4.8)


J(i,N3) =∂Ri

∂UN3

=∂F (Ui, UN3)

∂UN3

n3l3 (4.9)

J(i, i) =∂Ri

∂Ui=

∂F (Ui, UN1)

∂Uin1l1 +

∂F (Ui, UN2)

∂Uin2l2 +

∂F (Ui, UN3)

∂Uin3l3 (4.10)

Since both the flux F and solution U are 4-component vectors, each entry in the Jacobian

matrix J is a 4 × 4 matrix and the total size of the block matrix is n × n where n is the

total number of control volumes as shown in Eq. (4.11). However, most of these blocks are

just zeros as in general there are no more than three neighboring control volumes in the

first-order cell-center formulation and the Jacobian matrix is very sparse, with at most four

non-zero blocks per row.

J =

x x x x

x x x x

x x x x

x x x x

1,1

. . .

x x x x

x x x x

x x x x

x x x x

1,k

. . .

x x x x

x x x x

x x x x

x x x x

1,n

. . .

x x x x

x x x x

x x x x

x x x x

2,1

. . .

x x x x

x x x x

x x x x

x x x x

2,k

. . .

x x x x

x x x x

x x x x

x x x x

2,n

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

x x x x

x x x x

x x x x

x x x x

n,1

. . .

x x x x

x x x x

x x x x

x x x x

n,k

. . .

x x x x

x x x x

x x x x

x x x x

n,n

(4.11)

Figure (4.2) displays typical numbering and the resulting connectivity between triangles in a

sample unstructured mesh. For this sample mesh, the only non-zero blocks in the Jacobian

matrix corresponding to control volume 42 (row of 42) are (42,38), (42,42), (42,49), and

(42,57); for this row, two non-zero blocks are above the diagonal block. For the row 54, the

non-zero blocks are (54,43), (54,45), (54,54), and (54,65); this time two non-zero blocks are

below the diagonal block.

In the next two subsections, the analytic derivation of the first-order flux Jacobian entries


7 11

12

22

2327

29

30

31

34

35

37

38

41

42

43

45

47

49

51

54

56

57

59

64

65

69

73

74

75

77

80

81

84

85

87

92

96

98

99

102

103

105

118

119

120

133

140

152

157

Figure 4.2: Typical cell-centered mesh numbering

for both interior and boundary fluxes are described in detail.

4.2.1 Roe’s Flux Jacobian

After introducing the structure of the flux Jacobian, we need to compute the derivatives of

the fluxes, i.e. Roe’s flux function. The flux at the face between cell i and N1 is:

F (Ui, UN1) =1

2

[(F (Ui) + F (UN1)) −

∣∣∣A∣∣∣ (UN1 − Ui)

](4.12)

The Euler flux formula, F (U), and its derivative with respect to the conserved variables,∂F∂U

were described by Eq. (2.2) and Eq. (2.16). The A , the Jacobian of the Euler flux

evaluated at the Roe’s averaged properties, was also introduced by Eq. (2.19). The Roe’s

average formula (Tilde form) for density, velocity components, and total enthalpy are the

same as those introduced in section (2.3.2), but this time they have been rearranged to

reduce the evaluation cost.

∣∣∣A∣∣∣ = L

∣∣∣λ∣∣∣ R (4.13)


L =

1 0 1eC2

1eC2

u −nyeueC2

+ nx

eCeueC2

− nx

eCv nx

eveC2

+ny

eCeveC2

− ny

eCQ ut

1β

+ eun

eC +eQ

eC2

1β− eun

eC +eQ

eC2

(4.14)

∣∣∣λ∣∣∣ =

|un| 0 0 0

0 |un| 0 0

0 0∣∣∣un + C

∣∣∣ 0

0 0 0∣∣∣un − C

∣∣∣

(4.15)

R =

1 − βeQ

eC2β eu

eC2β ev

eC2− β

eC2

−ut −ny nx 012 (−Cun + βQ) 1

2(nxC − βu) 12(nyC − βv) 1

2β12(Cun + βQ) −1

2(nxC + βu) − 12(nyC + βv) 1

2β

(4.16)

β = γ − 1 , γ =Cp

Cv(4.17)

ρ =√

ρLρR (4.18)

u =ρLuL + ρuR

ρL + ρ(4.19)

v =ρLvL + ρvR

ρL + ρ(4.20)

ht =ρLhtL + ρhtR

ρL + ρ(4.21)

Q =1

2(u2 + v2) (4.22)

C =

√β(ht − Q) (4.23)

ut = −nyu + nxv (4.24)

un = nxu + nyv (4.25)


The flux differencing term in Eq. (4.12),∣∣∣A∣∣∣ (UN1−Ui) , can be recast in the form of

∣∣∣A∣∣∣∆U

and the full derivative of this term with respect to the solution vector (in general form) is:

∂(∣∣∣A

∣∣∣∆U)

∂U=

1︷︸︸︷∂∣∣∣A∣∣∣

∂U∆U +

2︷︸︸︷∣∣∣A∣∣∣ ∂(∆U)

∂U(4.26)

Now, we take a look at the first term in Eq. (4.26):

∂∣∣∣A∣∣∣

∂U∆U =

∂(L∣∣∣λ∣∣∣ R)

∂U∆U =

∂L

∂U

∣∣∣λ∣∣∣ R + L

∂∣∣∣λ∣∣∣

∂UR + L

∣∣∣λ∣∣∣ ∂R

∂U

∆U (4.27)

Differentiation of∂| eA|∂U

produces some third-rank tensors which not only is difficult to derive

but also is quite expensive to compute. Barth [9] has derived the full derivative of∂(| eA|∆U)

∂U

with some clever modifications to eliminate the tensor computations, reducing the com-

plexity of the Jacobian computation to some degree. Through spectral radius analysis for

1-D flow he showed that for a smooth flow the approximate Jacobian can be accurate up

to CFL=1000 or even above. However, for the shock tube problem the difference between

the true Jacobian and the approximate Jacobian grows after CFL=10, and becomes notice-

able after CFL=100 showing that the approximate Jacobian will not be accurate enough

for larger CFL numbers. Looking back at the Eq. (4.26), similar physical insight can be

concluded. For a smooth flow, the magnitude of∂| eA|∂U

∆U with respect to the magnitude

of∣∣∣A∣∣∣ ∂(∆U)

∂Uis very small, and the resulting approximate Jacobian will be acceptable. In

the case of a flow with discontinuity, this approximation is not accurate anymore because,∂| eA|∂U

as well as ∆U entries would be large enough (at least in some regions of the flow field)

and except for very small ∆U (i.e. smooth regions of the flow field) that approximation

will lose its accuracy to some extent. Therefore applying a large CFL number for the so-

lution acceleration is not necessarily useful when the overall flow linearization is relatively

inaccurate.

Ignoring changes in A (treating it as a constant) greatly simplifies the evaluation of the

Roe’s flux, as well as overall Jacobian computation cost; such a Roe’s flux differentiation

for the Eq. (4.12) with respect to cell i and N1 is called the approximate Roe’s flux Jacobian

and can be written in the following form:

∂F (Ui, UN1)

∂UN1

=1

2

[∂F (UN1)

∂UN1

−∣∣∣A∣∣∣]

(4.28)


∂F (Ui, UN1)

∂Ui=

1

2

[∂F (Ui)

∂Ui+∣∣∣A∣∣∣]

(4.29)

The other Jacobian terms in Eq. (4.8) through Eq. (4.10) can be derived easily by repeating

similar differentiation.

Because of the difficulties in Jacobian calculation, especially for complicated flux functions

such as Roe, several researchers [83, 93] have successfully employed simpler flux functions

like Steger-Warming [77] for linearization purposes–building the Jacobian on the left hand

side–where the right hand side–the residual–still is evaluated based on the Roe’s flux. A

similar approach can be taken for preconditioning of the left hand side where the explicit

Jacobian matrix is not required (matrix-free methods), but a preconditioner matrix still

is needed to enhance the performance of the linear solver [22, 61]. In either case, due to

inconsistency between the left and right hand sides (i.e. linearization and flux evaluation),

the resultant convergence rate per iteration will not be as fast as if a consistent Jacobian

was employed.

4.2.2 Boundary Flux Jacobians

To have a true implicit time-advance and/or Newton formulation, the linearization must be

extended to the boundary fluxes. Both the Jacobian matrix and the preconditioner matrix

(in our case the first-order Jacobian) need to include derivatives of the boundary fluxes as

well. Otherwise, the implicit formulation does not converge as fast as it should and/or some

stability issues may well occur if large CFL numbers are attempted [79, 32]. The goal here

is to compute the Jacobian matrix of the boundary Euler flux with respect to conservative

variables. Since on some occasions boundary conditions (section 2.4) are implemented based

on the primitive variables (V ) rather than conservative variables (U), it is more convenient

to compute the Jacobian of the Euler analytic flux with respect to primitive variables and

then convert that Jacobian to the conservative format using the chain rule:

∂F

∂U=

∂F

∂V

∂V

∂U(4.30)

∂F

∂V=

un ρnx ρny 0

uun ρ(un + unx) ρuny nx

vun ρvnx ρ(un + vny) ny

htun − γPun

ρ(γ−1) ρ(uun + htnx) ρ(vun + htny)γ

γ−1un

(4.31)


∂V

∂U=

1 0 0 0

−uρ

1ρ

0 0

−vρ

0 1ρ

012 (u2 + v2)(γ − 1) −u(γ − 1) −v(γ − 1) (γ − 1)

(4.32)

U =

ρ

ρu

ρv

E

, V =

ρ

u

v

P

, F =

ρun

ρuun + P nx

ρvun + P ny

ρhtun

(4.33)

et = e+ek, e = CvT =P

ρ(γ − 1), ek =

1

2(u2+v2) , E = ρet, h = e+

P

ρ, ht = h+ek (4.34)

4.2.2.1 Wall Boundary Flux Jacobian

The wall boundary condition for the first-order Euler flux can be satisfied by setting all

normal fluxes equal to zero (un = 0), section 2.4.1, and the wall flux Jacobian is derived by

taking the derivative of this simplified wall flux:

FW =

0

P nx

P ny

0

(4.35)

∂F

∂V

∣∣∣∣W

=

0 0 0 0

0 0 0 nx

0 0 0 ny

0 0 0 0

,∂F

∂U

∣∣∣∣W

=∂F

∂V

∣∣∣∣W

∂V

∂U(4.36)

4.2.2.2 Subsonic Inlet Flux Jacobian

As it was explained in section 2.4.2, total pressure, total temperature, and angle of attack

are set at a subsonic inlet as constant inlet conditions and the static pressure is taken from

the interior as a parameter. This time it is easier to rewrite the Euler flux in terms of

total pressure, total temperature, static pressure, and angle of attack and then proceed


with taking derivatives. The magnitude velocity, Vm, can be defined as a function of total

temperature, and the normal velocity component is described in terms of the magnitude

velocity and angle of attack:

Vm =

[(2

γ − 1

)(Tt

T− 1

)] 12

(4.37)

u = Vm cos α =

[(2

γ − 1

)(Tt

T− 1

)] 12

cos α (4.38)

v = Vm sinα =

[(2

γ − 1

)(Tt

T− 1

)] 12

sinα (4.39)

un =

[(2

γ − 1

)(Tt

T− 1

)] 12

(cos αnx + sinαny) (4.40)

Now, we write density in terms of total temperature, total pressure and static pressure:

ρ =γP

T, T = Tt

(P

Pt

) γ−1γ

(4.41)

ρ =γP

Tt

(PPt

) γ−1γ

(4.42)

And finally ht is rewritten in terms of total pressure and static pressure:

ht =γP

(γ − 1) γP

Tt

“PPt

” γ−1γ

+1

2

((2

γ − 1

)((Pt

P

)γ−1γ

− 1

))

The Euler flux vector is recast in its new format after substitution of ρ, u, v, un, ht with

their new definitions:

F =

F1

F2

F3

F4

(4.43)

F1 =γC2

TtP−C1t

[2

β

(PC1

t P (2−3C1) − P (2−2C1))] 1

2

(4.44)


F2 =γC2 cos α

TtP−C1t

[2

β

(PC1

t P (1−2C1) − P (1−C1))]

+ P nx (4.45)

F3 =γC2 sinα

TtP−C1t

[2

β

(PC1

t P (1−2C1) − P (1−C1))]

+ P ny (4.46)

F4 =γC2

TtP−C1t

[TtP

−C1t P

β+

1

β

(PC1

t P (1−2C1) − P (1−C1))][ 2

β

(PC1

t P−C1 − 1)] 1

2

(4.47)

β = γ − 1 , C1 =β

γ, C2 = cos αnx + sinαny (4.48)

Total pressure, total temperature and angle of attack for a steady inlet are constant, and∂η∂V

where η is the inlet variable vector is greatly simplified, Eq. (4.49). Applying the chain

rule (Eq. (4.50)) and since the only apparent primitive variable in the subsonic inlet flux

vector is the static pressure, P , all three first columns of the subsonic inlet flux Jacobian

will be zero and only the 4th column needs to be evaluated.

∂η

∂V=

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 1

, η =

Pt

Tt

α

P

(4.49)

∂F

∂V=

∂F

∂η

∂η

∂V(4.50)

∂F

∂V

∣∣∣∣SubIn

=

0 0 0 ∂F1∂P

0 0 0 ∂F2∂P

0 0 0 ∂F3∂P

0 0 0 ∂F4∂P

(4.51)

∂F1

∂P=

1

2

(γC2

TtP−C1t

)(2

β

[P−C1

t P (2−3C1) − P (2−2C1)])− 1

2

×(

2

β

[(2 − 3C1) P C1

t P (1−3C1) − 2(1 − C1)P(1−2C1)

])(4.52)


∂F2

∂P=

γC2 cosα

TtP−C1t

[2

β

((1 − 2C1)P

C1t P−2C1 − (1 − C1)P

−C1

)]+ nx (4.53)

∂F3

∂P=

γC2 sinα

TtP−C1t

[2

β

((1 − 2C1)P

C1t P−2C1 − (1 − C1)P

−C1

)]+ ny (4.54)

∂F4

∂P=

√2C2γ

√PC1

t P−C1 − 1

β√

βTtP−C1t

[TtP

−C1t + (1 − 2C1)P

C1t P−2C1 − (1 − C1)P

−C1

]−

√2C2C1γP C1

t P−C1

2β√

βTtP−C1t

√P C1

t P−C1 − 1

[TtP

−C1t + P C1

t P−2C1 − P−C1

](4.55)

Having ∂F∂V

∣∣SubIn

evaluated, the subsonic inlet flux Jacobian with respect to conservative

variables can be easily computed:

∂F

∂U

∣∣∣∣SubIn

=∂F

∂V

∣∣∣∣SubIn

∂V

∂U(4.56)

4.2.2.3 Subsonic Outlet flux Jacobian

For subsonic outlet, section 2.4.2, three flow variables (ρ, u, v) are taken from interior, and

only the static pressure is fixed at the outlet:

FSubOut =

ρun

ρuun + Poutnx

ρvun + Poutny

ρht−outun

(4.57)

ht−out =γPout

ρ(γ − 1)+

1

2(u2 + v2) (4.58)

Since the static pressure at the outlet is treated as a constant, the last column of the outlet

flux Jacobian is filled with zeros:


∂F

∂V

∣∣∣∣SubOut

=

un ρnx ρny 0

uun ρ(un + unx) ρuny 0

vun ρvnx ρ(un + vny) 0

ht−outun − γPoutun

ρ(γ−1) ρ(uun + ht−outnx) ρ(vun + ht−outny) 0

(4.59)

and the rest is trivial:

∂F

∂U

∣∣∣∣SubOut

=∂F

∂V

∣∣∣∣SubOut

∂V

∂U(4.60)

4.2.2.4 Supersonic Inlet/Outlet Flux Jacobians

As it was discussed in section 2.4.2, we impose all flow variables directly at the supersonic

inlet, and their values do not depend on the variables of the inside solution domain. There-

fore the supersonic inlet flux Jacobian is just a block of zero. On the contrary at the outlet

all variables are taken from inside domain and the outlet flux Jacobian respect to primitive

variables is the same as Eq. (4.31).

4.3 Finite Difference Differentiation

The flux Jacobian can be computed via finite differencing removing the issue of analytic dif-

ferentiation of the flux function. Finite difference differentiation is a well known technique to

compute derivatives of a complex function using perturbations of the independent variables.

Equations (4.61) and (4.62) show typical forward and central differencing formulas:

∂F

∂U

∣∣∣∣FD

≡ F (U + ε) − F (U)

ε− ∂2F

∂U2

(ε

2

)Forward Differencing (4.61)

∂F

∂U

∣∣∣∣CD

≡ F (U + ε) − F (U − ε)

2ε− ∂3F

∂U3

(ε2

6

)Central Differencing (4.62)

The central differencing has smaller truncation error but it comes with the cost penalty

of one more function evaluation, making it twice as expensive, compared to the forward

differencing. If the function evaluation is expensive, which indeed is the case for residual

evaluation in CFD problem, central differencing may not the best choice.


The accuracy of the differentiation clearly is a function of ε and it appears that smaller ε

results in smaller error, which is true for truncation error, but choosing a very small value

for ε will introduce more noise in numerical differentiation and amplifies the existing error

in numerical function evaluation (caused by round off error) since that error is divided by

very small number. This can be shown by a simple analysis, where the actual function

evaluated by the computer F is replaced by the exact value of F and the associated round

off error Er:

F (U) = F (U) + Er(U) , F (U ± ε) = F (U ± ε) + Er(U ± ε) (4.63)

∂F

∂U

∣∣∣∣∣FD

=∂F

∂U

∣∣∣∣FD

+Er(U + ε) − Er(U)

ε(4.64)

∂F

∂U

∣∣∣∣∣CD

=∂F

∂U

∣∣∣∣CD

+Er(U + ε) − Er(U − ε)

2ε(4.65)

if (Er)max is an error bound for round off error, the total error in numerical differentiation

for forward differencing (FD) and central differencing (CD) can be written as a combination

of truncation error and the error caused by round of error known also as cancellation or

condition error [29]:

ETotal−FD =

∥∥∥∥∂2F

∂U2

∥∥∥∥(ε

2

)+

2(Er)max

ε(4.66)

ETotal−CD =

∥∥∥∥∂3F

∂U3

∥∥∥∥(

ε2

6

)+

(Er)max

ε(4.67)

By examining the Eq. (4.66) and Eq. (4.67) it is clear that if the magnitude of pertur-

bation, ε, is a large number the dominant error in numerical differentiation is truncation

error. Truncation error decreases linearly for forward differencing and quadratically for

central differencing by reducing the perturbation magnitude. However, the condition error

increases linearly by decreasing the ε . Therefore there is an optimum value for perturbation

magnitude which minimizes the total error in numerical differentiation.

Fig (4.3) displays the total-relative numerical error in finite difference differentiation versus

perturbation magnitude for a sample cubic function, F = X 3 +X2 +X +0.1 , at X0 = 0.05.

The relative error is computed by:


ε

Rela

tiveE

rror

10-17 10-15 10-13 10-11 10-9 10-7 10-5 10-3 10-1

10-10

10-8

10-6

10-4

10-2

100

Forward DifferenceCentral Difference

Figure 4.3: Total numerical error versus perturbation magnitude

Errortotal-relative =

∣∣∣F ′(X0) − ∂F (X0)∂X

∣∣∣FiniteDiff.

∣∣∣F ′(X0)

(4.68)

Clearly, there is an optimal value for ε such that the numerical differentiation error is min-

imum, in this case ε ≈ 10−9 for forward differencing and ε ≈ 10−6 for central differencing.

Very small value of perturbation (i.e. ε < 10−9) has caused domination of the condition

error in such a way that both forward and central differencing essentially produce the same

total error.

Since the condition error is caused by round off error, the optimum ε should be determined

based on the machine precision, εM . The error bound for round off error, (Er)max can be

estimated by εM ‖F‖. To find the optimum ε, derivatives of Eq. (4.66) and Eq. (4.67) are

taken respect to ε and they are equated with zero:

∂ETotal−FD

∂ε=

1

2

∥∥∥∥∂2F

∂U2

∥∥∥∥−2εM ‖F‖

ε2= 0 (4.69)

∂ETotal−CD

∂ε=

ε

3

∥∥∥∥∂3F

∂U3

∥∥∥∥−εM ‖F‖

ε2= 0 (4.70)


and by solving the above equations for ε the optimum perturbation for forward and central

differencing are estimated:

εoptFD=

√√√√4εM ‖F‖∥∥∥∂2F∂U2

∥∥∥

εoptCD= 3

√√√√3εM ‖F‖∥∥∥∂3F∂U3

∥∥∥

Therefore, in addition to machine accuracy, the optimum perturbation value, depends on the

norm (magnitude) of the function F and its higher derivatives, and the type of the numerical

differentiation. There is another consideration for choosing the size of the perturbation for

numerical differentiation of a higher-order function, which is the grid size or discretization

length scale; this will be discussed in chapter 5. Not surprisingly, there are other practical

factors in scientific computing for ε selection which is beyond the scope of this research.

Several researchers [40, 26, 29, 61] have studied the effect of ε in numerical differentiation

and they provide some practical formulation and guidelines for choosing the best value for

ε.

4.4 Numerical Flux Jacobian

In numerical Jacobian calculation using the finite difference method, the principle has not

changed; we still need to take the derivatives of the residual function for a control volume,

Eq. (4.6), and compute the flux Jacobian terms through Eq. (4.7) to Eq. (4.10). However,

this time instead of analytic differentiation of the flux function at each face, Eq. (4.12), the

finite difference technique is applied.

The Euler flux function includes 4 equations and 4 primitive variables. At the same time,

the flux derivatives with respect to conservative variables are needed, these variables should

be perturbed originally. The following procedure demonstrates such a flux Jacobian com-

putation for interior faces using forward differencing:


F lux=FluxFunction(V )

for j=1 to 4

for i=1 to 4

δU[i]=εE[j][i]

UP[i]=U[i]+δU[i]

VP=ConservativeToPrimitive(UP)

F luxP=FluxFunction(VP)

for i=1 to 4

DFDU[i][j]=(F luxP[i]-F lux[i])/ε

where F lux is the flux vector, V is the primitive variable vector, U is the consevative

variable, subscript of P presents the perturbed vectors, and E(j, i) is the identity matrix:

E(j, i) =

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

(4.71)

For boundary faces, exactly the same procedure is implemented but the boundary condition

implementation is also considered both for the perturbed variables and the flux functions:

F luxBC=BoundaryFluxFunction(VB)

for j=1 to 4

for i=1 to 4

δU[i]=εE[j][i]

UP[i]=U[i]+δU[i]

VP=ConservativeToPrimitive(UP)

VPB=BoundaryConditionProcedure(VP)


F luxBCP=BoundaryFluxFunction(VPB)

for i=1 to 4

(DFDU)BC[i][j]=(F luxBCP[i]-F luxBC[i])/ε

Chapter 5

Linear Solver and Solution

Strategy

5.1 Introduction

Up to this point, two major elements used in the implicit solver (i.e. residual evaluation and

Jacobian computation) have been explained. This chapter describes the iterative process to

reach the steady-state solution, including the linear system solver, preconditioning, start-up

process, and over all solution strategy.

Linearization of the fluid flow equations over a meshed domain leads to a large sparse linear

system, which needs to be solved in multiple iterations. In principle, there are two separate

issues here, forming the linear system and solving the linear system. As it was described in

section 2.2, it may be useful if the solution process starts from a low CFL number (generally

resulting in small solution update) and even low-order linearization, Eq. (2.14), especially if

the initial condition that the solution starts from is far away from the steady-state solution

of the problem. This is particularly true for non-linear problems such as compressible flow,

where the large solution update based on high CFL number at the early stage in the solution

process does not necessarily help the convergence, and it may slow the convergence or even

cause divergence. This fact can be better explained by Fig (5.1) where a sample scalar

function is linearized. The linearization of a function F (U) at point A, where we are far

away from the solution point C, does not provide us a good slope to estimate the solution

point. Also it does not make a meaningful difference if the linearization at this point is exact

or approximate, as both of them result in inaccurate solution estimation. Now if the slope

71

CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 72

A

B

U

F(U)

A

C

Approximate Linearization

Exact Linearization

F(U)=0U

Figure 5.1: Linearization of a sample function

of function F (U) at point A is used for the solution update, only a small fraction of ∆U can

be applied reasonably. At the same time for small ∆U , an approximate linearization, in our

case low-order linearization, still produces a reasonable solution update. On the other hand,

when we are close enough to the solution point C, the linearization is a fair representative

of the non-linear problem and the more accurately this linearization is performed better

solution update is expected.

Direct methods (Gaussian Elimination and LU factorization) solve the linear system exactly

(to within round off error), and they have been applied to CFD problems in early CFD im-

plicit solvers [8, 89]. However, the computing cost (CPU time) and the storage requirement

(memory usage) of direct methods for large linear system arising from CFD problems limit

their application seriously. Furthermore, quite often the linearization is not exact and we

are solving a non-linear problem through multiple linear iterations, so it is not beneficial

from the performance point of view to solve the linear system in each iteration exactly.

Iterative methods, on the contrary, require far less memory, and can provide a good approx-

imation for the solution at a reasonable cost. It should be mentioned that the effectiveness

of iterative methods as compared with direct methods for solving linear systems is related

to the sparsity of the coefficient matrix. In general, iterative methods are more efficient

for large sparse systems. Development of iterative methods for large sparse linear sys-

tems has been a very active research area in the field of numerical linear algebra, and the

Krylov-subspace family methods have emerged as modern iterative techniques [74]. In these

techniques, a very large sparse linear system is reduced to a much smaller system through


creation of some subspace (known as Krylov subspace), and then solution of the original

large system is approximated using the constructed subspace by satisfying some optimality

criterion for the original system.

A wide variety of Krylov iterative solvers [81], are in use for various applications; the

Generalized Minimal Residual (GMRES) algorithm [72], which is developed mainly for

non-symmetric matrices, is employed in this research for the following reasons:

1. The linear system arising from unstructured discretization is asymmetric both in terms

of the matrix structure and the values of entries.

2. The GMRES algorithm only needs matrix-vector multiplication removing the issue of

high-order Jacobian matrix computation.

3. GMRES minimizes the residual of the linearized system implying that if we are close

enough to the solution and linearization of the non-linear system is accurate then

GMRES computes the best update for solution at each iteration, making it a very

suitable choice for Newton iteration.

It should be noted that the GMRES algorithm has been applied quite successfully and

extensively in implicit CFD solvers since the early 90’s, and it is a well known linear solver for

CFD applications providing us a great deal of experience both in solver and preconditioning

implementation.

Preconditioning is another important issue in linear system solving since no iterative method

for general large linear systems demonstrates suitable performance without proper precon-

ditioning. In principle, preconditioning transforms the original linear system to a modified

linear system making it easier to solve by an iterative technique.

5.2 GMRES Linear Solver

Like other modern iterative techniques GMRES uses a projection process to extract an

approximation to the linear system solution from a subspace. Assume A is a real n × n

matrix and x is a solution vector in <n for a linear system of:

Ax = b (5.1)


An approximate solution of the above system is found within a subspace K of <n. The size

of the subspace is m n, and generally m constraints in the form of orthogonal conditions

(minimization problem) have to be imposed to make the best approximate solution. A

logical candidate for imposing orthogonality condition is the residual vector:

r = b − Ax (5.2)

The residual vector is considered to be orthogonal to another subspace L with the dimension

m. In other words, we are looking for an approximate solution x ∈ K with the constraint

of r ⊥ L. It can be shown that if the subspace L is chosen in the form of L = AK (oblique

projection), imposing such an orthogonality condition minimizes the l2 norm of the residual

vector over the affine space of x0 + K, where x0 is the initial guess and the starting vector

[74].

5.2.1 The Basic GMRES Algorithm

For a linear system of Ax = b, GMRES [72] seeks an approximate solution xk in the form

of xk = x0 + zk , where x0 is the initial solution vector, and zk is the member of the Krylov

subspace (search directions), Kk ≡ spanr0, Ar0, A2r0, ... , AK−1r0

. Here, r0 = b − Ax0

is the initial residual which minimizes the l2 norm of the residual vector.

A subspace Vk ≡ v1, v2, ... , vk is constructed using Arnoldi’s algorithm (and applying

Gram-Schmidt method) for computing the l2 − orthonormal basis of the Krylov subspace,

where v1 = r0/ ‖r0‖. The residual vector at the kth iteration, rk , should be l2 −orthogonal

to Kk:

GMRES:

1.Use initial guess x0 and compute r0 = b − Ax0;

normalize the initial residual v1 = r0/ ‖r0‖.2.Build the orthonormal basis (Arnoldi’s algorithm):

for j=1,2,...,k until satisfied (‖b − Axk‖ ≤tolerance) do:

hi,j = (Avj , vi), i=1,2,...,j,

vj+1 = Avj −∑j

i=1 hijvi,

hj+1,j = ‖vj+1‖, if hj+1,j = 0 stop

vj+1 = vj+1/hj+1,j,

3.Form the approximate solution:

xk = x0 + Vkyk such that xk minimizes ‖b − Axk‖.


After k iteration in Arnoldi’s process, the l2 − orthonormal subspace Vk+1 and a (k +1)× k

matrix Hk are formed. Hk is the upper Hessenberg matrix Hk whose only nonzero entries

are the hi,j described above, except for an additional row with the nonzero element of hk+1,k.

The following important relation will hold:

AVk = Vk+1Hk (5.3)

For minimization, the update xk = x0 + z should satisfy the least-squares problem:

minz

‖b − A(x0 + z)‖ = minz

‖r0 − Az‖ , z ∈ Kk (5.4)

By setting z = Vky, r0 = βv1, and β = ‖r0‖, we have:

‖r0 − Az‖ = ‖βv1 − AVky‖ =∥∥Vk+1(βe1 − Hky)

∥∥ (5.5)

where, e1 is the first column of the (k + 1) × (k + 1) identity matrix. Since Vk+1 is an

orthonormal set, the least-squares problem is reduced to:

miny

∥∥βe1 − Hky∥∥ , y ∈ <k (5.6)

Considering that this is a minimization problem, Eq. (5.4), the residual at each iteration

should be smaller or at least equal to the residual at the previous iteration, i.e. ‖rk‖ ≤‖rk+1‖. Therefore the residual decreases in each iteration or at worst stalls. The algorithm

terminates after n iteration steps in the absence of the round of error, meaning that the exact

solution is computed if the size of the subspace is chosen equal to the size of the system.

The iterations or steps taken in building the subspace or search directions in part 2 of the

GMRES algorithm are called inner iterations. Since by increasing the search directions or

subspace size the cost of the algorithm rises linearly for memory usage and quadratically

for computing time, the Krylov subspace size for large matrices is limited to m n as was

mentioned earlier. In practice, we hope that by picking a small subspace size, a sufficiently

accurate solution is computed if that does not happen we can restart the algorithm after

m steps to limit the cost of the algorithm. However, there could be some occasions when

the convergence criteria for the linear system is not satisfied without excessive number

of restarts. In that case, the user normally limits the number of inner iterations, and/or

restart number, and moves on to the next non-linear iteration with whatever solution update

achieved in the linear solver.

The restarted version of the algorithm, GMRES(m), is explained in the following lines:


GMRES(m):



for j=1,2,...,m do:

hi,j = (Avj , vi), i=1,2,...,j,

vj+1 = Avj −∑j

i=1 hijvi,

hj+1,j = ‖vj+1‖, If hj+1,j = 0 stop

vj+1 = vj+1/hj+1,j,


xm = x0 + Vmym where ym minimizes f(y) =∥∥βe1 − Hky

∥∥ , y ∈ <k.

4-Restart:

Compute rm = b − Axm; if satisfied (‖rm‖≤tolerance) then stop

else compute x0 := xm, v1 := rm/ ‖rm‖ and go to 2.

The only possibility for breaking down of the GMRES algorithm is in the Arnoldi’s part

where the orthogonalization process is performed. If hj+1,j becomes zero the algorithm stops

and the solution at the last iteration is returned. This happens only when the residual is

zero, which means reaching the exact solution.

The least-squares solution in step 3 can be carried out via classical QR factorization of H k

using plane rotations. As it was suggested in the original article [72], it is preferable to

update the factorization of Hk progressively as each column appears at every step of the

orthogonalization process. This enables us to compute the residual norm of the approximate

solution without computing xk and to terminate the algorithm whenever the convergence

criterion for the linear system is satisfied without further operations. This also means the

residual of rm = b − Axm does not need to be computed explicitly saving us the cost of

one matrix-vector multiplication since the magnitude of the∥∥βe1 − Hky

∥∥and ‖rm‖ are the

same. Depending on the convergence criteria, the GMRES algorithm may be completed

with the inner iteration number smaller than the maximum subspace size (m), equal to the

number of subspace size, or larger than the number of subspace size. For instance if one

restart is allowed in the algorithm and the maximum subspace size of 30 is used, assuming

that the convergence criterion in part 4 is not met within inner iterations, then 60 inner

iterations are performed before completion of the algorithm.


5.2.2 Matrix-Vector Products Computation in GMRES

Considering the complexity of the higher-order discretization method used in this research,

Chapter 3, computing the higher-order Jacobian matrix is a highly expensive task in terms

of memory usage and computation cost. Even if memory were not an issue, computing

a higher-order Jacobian would be quite expensive in terms of CPU time no matter what

approach in Jacobian calculation is taken, Chapter 4.

Therefore, a linear solver in which the explicit Jacobian computation is not required, such

as GMRES, is an attractive candidate for a higher-order implicit solver. Looking back at

section 5.2.1, it is clear that in Arnoldi’s algorithm, only matrix-vector products are needed,

and these products can be easily approximated through the (forward) finite difference tech-

nique:

F (U + δU) = F (U) +∂F

∂UδU + O(δU)2 (5.7)

By choosing δU = εv (ε is a small number), Eq. (5.7) can be rewritten:

F (U + εv) = F (U) +

A︷︸︸︷∂F

∂Uεv + O(εv)2 (5.8)

and since ε is a scalar ( v is a vector), the matrix vector product, Av , is evaluated by:

Av =∂F

∂Uv ≈ F (U + εv) − F (U)

ε(5.9)

Eq. (5.9) is a first-order approximation, and a more accurate approximation can be achieved

employing central differencing approach with the extra cost of one more flux (residual)

evaluation per matrix-vector product (section 4.3). If forward differencing is employed for

evaluation of matrix-vector product, for any inner iteration in the GMRES algorithm one

flux evaluation is needed.

So far no consideration has been made regarding the discretization order and ε, but perhaps

the choice of ε can dramatically affect the correctness of the higher-order Jacobian matrix-

vector multiplications, even beyond the consideration of section 4.3. The issue here is

how much the higher-order terms in the reconstruction polynomial are affected by a small

perturbation in the flow field. In other words, do higher-order terms remain arithmetically

significant or measurable after a perturbation of size δU on a discretized domain with the

length scale of ∆x ? To address this question we need to go back to Chapter 3.

The reconstruction procedure led to solving a least-squares problem, Eq. (3.14), that can


be simplified in the form of Eq. (5.10).

MD = U (5.10)

where M is the coefficient matrix containing the moment information of the reconstruction

stencil, D is the solution vector including the derivatives of the reconstruction polynomial,

and U is the control volume average vector for the reconstruction. In Eq. (5.10), matrix M

is purely geometric and it is not solution dependent, therefore by changing U to U + δU , M

will not be affected. Now we would like to know how much the derivatives are perturbed

by a change in control volume averages:

M(D + δD) = U + δU = MD + δU (5.11)

and

MδD = δU or δD = M−1δU (5.12)

By taking the norms of Eq. (5.10) and Eq. (5.12), these two norm inequalities are found:

‖U‖ ≤ ‖M‖ ‖D‖ , ‖δD‖ ≤∥∥M−1

∥∥ ‖δU‖ (5.13)

and finally after multiplication of these two norm inequalities, the relative change in the

derivatives can be expressed in terms of the relative perturbation in U as:

‖δD‖‖D‖ ≤

∥∥M−1∥∥ ‖M‖︸︷︷︸

κ(M)

‖δU‖‖U‖ (5.14)

In Eq. (5.14), κ(M) is called the condition number of the matrix M and it is evident that the

change in the derivatives is a direct function of condition number of the coefficient matrix.

The matrix M is relatively small and its entries are only functions of mesh geometry. In

our case, the quality of the employed isotropic mesh is guaranteed by the mesh software

(GRUMMP[59], [19]), so the condition number of the matrix M is expected to be fairly

small (O(10) for typical employed meshes). Therefore changes in the derivatives are of the

same order (or at most one order larger) as the perturbations in the solution.

Consider a simplified 1-D cubic reconstruction polynomial which leads to the 4th-order

discretization for a mesh with the uniform length scale of ∆x, Eq. (5.15). U is perturbed

by δU and the perturbed reconstruction polynomial can be expressed by Eq. (5.16) where


the superscript p shows the perturbed coefficients:

F (U) = a0 + a1(∆x) + a2(∆x)2 + a3(∆x)3 (5.15)

F (U + δU) = ap0 + ap

1(∆x) + ap2(∆x)2 + ap

3(∆x)3 (5.16)

Now we study the difference between the original reconstruction and the perturbed one:

F (U + δU) −F (U) = (ap0 − a0) + (ap

1 − a1)(∆x) + (ap2 − a2)(∆x)2 + (ap

3 − a3)(∆x)3 (5.17)

In order to have a correct higher-order Jacobian in the matrix-vector product computed by

Eq. (5.9), all terms in Eq. (5.17) are required to be arithmetically significant; otherwise, the

overall accuracy of the linearization would be reduced and the convergence rate is adversely

affected. Of course the highest order term, in our case, (ap3 − a3)(∆x)3 is the main issue

and consequently:

‖ap3 − a3‖ (∆x)3 ≥ εM , εM : machine precision (5.18)

Using Eq. (5.14), and assuming δU = εv , the bound of the highest derivative for the

perturbation is estimated, Eq.(5.19). Combining this bound with Eq. (5.18) determines the

minimum magnitude of ε for keeping the higher-order order terms significant in the finite

difference matrix-vector multiplication, Eq. (5.20).

‖ap3 − a3‖‖a3‖

≤ κ(M)ε ‖v‖‖U‖ (5.19)

ε ≥ εM ‖U‖‖a3‖ (∆x)3κ(M) ‖v‖ (5.20)

Further simplification can be made if ‖U‖‖a3‖κ(M) in Eq. (5.20) is assumed to be O(1).

ε ≥ εM

(∆x)3 ‖v‖ (5.21)

Although this result has been obtained with several simplifying assumptions, it gives us a

very useful insight about the reasonable magnitude of ε for proper finite difference matrix-

vector product computation. In the case of the standard GMRES algorithm, ‖v‖ = 1 due

to the normalization process, and ε can be found based on the geometric length scale. For


a non uniform mesh, ∆x obviously is chosen based on the smallest length scale, and as an

example if ∆x ≡ O(10−4) , and εM ≡ O(10−16) then choosing ε ≈ O(10−4) or larger, would

keep higher-order terms significant. It should be noted that in the presence of the limiter

using a large ε or perturbation often causes limiter firing in residual evaluation leading to

convergence problems. Therefore using a large perturbation for flows with an active limiter

is not suggested.

5.2.3 Preconditioning

Convergence of iterative techniques, including Krylov subspace methods, is highly depen-

dent on the conditioning of the linear system, i.e. the Jacobian matrix. Using a higher-order

discretization introduces more off-diagonal entries and increases the bandwidth of the Ja-

cobian matrix considerably. In addition, in the case of Euler equations (compressible flow),

with a non-linear flux function and possible discontinuities in the solution, the Jacobian

matrix is off-diagonally dominant. All these factors lead to poor convergence of the linear

solver, and consequently slowing or stalling the solution process of the non-linear problem.

To remove this obstacle and enhance the performance of the linear solver, the linear system

needs to be modified through a process called preconditioning, such that the preconditioned

system has better spectral properties and clustered eigenvalues. Although in the case of

the GMRES method, eigenvalues may not describe the convergence of a nonsymmetric ma-

trix iterations as much as they do for symmetric iterative solvers such as the Conjugate

Gradient (CG) method, a clustered spectrum (not too close to zero) normally improves the

convergence characteristics of the linear solver [16].

Consider a preconditioner M as a nonsingular matrix which approximates the Jacobian

matrix A. Equation (5.1) then can be modified by multiplying by M−1 on both sides:

M−1Ax = M−1b (5.22)

If M−1 is a good approximation to A−1 , M−1A becomes close to the identity matrix,

increasing the performance of the linear solver through eigenvalue clustering around unity.

Equation (5.22) is called left preconditioning, which affects not only the matrix operator

but also vector b on the right hand side. It is also possible to introduce the preconditioner

operator in the right side of the matrix A, and leave the right hand intact:

AM−1

z︷︸︸︷(Mx) = b , x = M−1z (5.23)


This is called right preconditioning. If the GMRES algorithm is used with left precondi-

tioning, M−1(b − Axk) is minimized instead of b − Axk , resulting in a different stopping

criterion (residual norm) for the algorithm. However, right preconditioning, still minimizes

the residual norm of the original system [73, 81].

Of course the best apparent choice for M is A, as AM−1 = I, but if we could solve the

system with the A matrix easily we never needed preconditioning in the first place. Finding

the optimal preconditioner matrix is not unique, since it is highly problem dependent, and

also depends on how the preconditioner is applied. Also there are circumstances where the

matrix M need not be computed explicitly and a preconditioner operator replaces M , like

polynomial preconditioners [81]. But in general three factors are considered in choosing a

preconditioner:

1. M is a reasonably good approximation to the coefficient matrix A.

2. M is better conditioned, more narrow banded, and less expensive to build compared

to matrix A.

3. The system Mx = z should be much easier to solve than Ax = b.

The bottom line is the cost of the construction and applying the preconditioner should be

relatively cheap with respect to the cost of the original linear system solving.

For effective preconditioning, in addition to applying a good preconditioner matrix, we need

to employ a good preconditioning technique. A preconditioning technique is a sparse linear

solver itself, and it can be a direct or iterative method. Iterative stationary methods such

as successive over-relaxation (SOR) rely on the fixed iteration Eq. (5.24), where B is the

fixed iteration matrix and C is a fixed vector:

xi+1 = Bxi + c (5.24)

Stationary methods are simple and easy to implement, often have parallelization advan-

tages, are low cost (memory and CPU time), and finally they are effective in damping

high frequency errors. However, they often have a restrictive stability condition, reduc-

ing the benefits of Newton method. This is especially true if the preconditioner matrix

is off-diagonal which is the case for compressible flow. At the same time, for relatively

large systems slow damping of the low frequency errors becomes a noticeable issue for these

techniques in the absence of a multigrid augmentation strategy.


Preconditioning Ratio of non-zero elementsFactored Mat./Original Mat.

ILU-0 1

ILU-1 1.25

ILU-2 1.5

ILU-3 1.8

ILU-4 2.1

LU 7.2

Table 5.1: Ratio of non-zero elements in factorized matrix

Another preconditioning technique is the Incomplete Lower-Upper Factorization (ILU)

method. Factoring the matrix M into two triangular matrices, M = LU , where L is a

lower triangular matrix, and U is an upper triangular matrix normally results in factored

matrices that are far less sparse than the original matrix M . As a consequence, a con-

siderable amount of memory needs to be assigned for factorization, and the factorization

process is quite expensive. One solution to this problem is incomplete factorization in which

additional nondiagonal fill-in entries are allowed only for a predefined set of locations in the

LU factorizations. In other words, we choose a non-zero pattern in advance for the elements

of the factored matrices [74]. This is called ILU-P where P is the fill-level in the factorized

matrix. P equal to zero means no fill is permitted during ILU decomposition. In ILU-0 the

factorized matrix and the original (non-factored) matrix have the same graph or non-zero

element locations. Choosing P larger than zero allows some additional fill-in in the factored

matrix improving the accuracy of factorization and the preconditioning quality. However,

increasing the fill-level comes at the expense of memory usage and extra computing cost,

imposing a restriction in increasing fill-level in practice. Table 5.1 shows the number of

non-zero elements in the factored matrix where the original preconditioning matrix is a

first-order Jacobian [52]. For instance ILU-2 needs 50% and ILU-4 requires a little bit more

than 100% extra memory for storing the decomposed matrix in comparison with the orig-

inal matrix graph. Also LU will be infeasible in practical applications because of its huge

memory requirements.

Both the preconditioner matrix and preconditioning technique may be developed based on

the specific problem approach, but first one should have a detailed knowledge about the

numerical aspects of the problem which is not always possible; second, generally speaking,

such a preconditioning procedure is sensitive to the type of the problem. Application of

general preconditioning techniques such as ILU is more common for compressible CFD

solvers. Another modification in the ILU family is ILU(P,τ), where in addition to the

static non-zero pattern, a tolerance τ is added as a dropping criterion for entries of the


factored matrices [16]. In other words, all fill-in entries within the level P are changed to

zero if they do not satisfy some set of tolerance condition. Since the proper fill-level and

tolerance criterion in ILU(P,τ) for efficient preconditioning of the compressible flows are

highly dependent on the test case, and they are not determined uniquely, ILU(P,τ) is not

considered in this research.

Several researchers have implemented ILU methods for preconditioning of the GMRES

linear solver for compressible fluid flows [91, 14, 24, 18, 45]. Their results show that ILU

with some additional fill-in (i.e. ILU-1&2) is a reliable and robust preconditioning strategy

for a variety of test cases, while ILU-0 fails to provide fast convergence in some cases.

Reordering is another important factor in ILU factorization. Reordering is designed to

reduce the bandwidth of a matrix. Smaller bandwidth leads to a sparser matrix, and

consequently during factorization (Gaussian elimination process) of the matrix fewer fill-in

entries will appear. Knowing that increasing the fill-level in ILU preconditioning has its

own disadvantages, reordering the original preconditioner matrix becomes essential to keep

the accuracy of preconditioning for a low fill-level factorization. Experience shows that

quite often a non-reordered preconditioner matrix with low fill-level works poorly, while the

same fill-level factorization performs quite well, when the original matrix is reordered. The

most common available reordering technique is Reverse Cuthill-McKee (RCM) [74] which

has been successfully used for ILU-P preconditioning.

In this thesis research, the GMRES linear solver including the ILU-P factorization is the

employed Petsc library developed by Argonne National Laboratory [1].

5.2.4 GMRES with Right Preconditioning

In right preconditioning AM−1z = b is solved where, z = Mx . Like the standard GMRES,

in the right preconditioned GMRES, the objective is minimizing the residual r = b − Ax.

Although the initial residual is r0 = b − AM−1z0 , z0 need not to be computed explicitly

and all the elements of the Krylov subspace are created without referring to z.

Right Preconditioned GMRES(m):



for j=1,2,...,m do:


compute wj = AM−1vj

hi,j = (wj , vi), i=1,2,...,j,

wj = wj −∑j

i=1 hijvi,

hj+1,j = ‖wj‖, If hj+1,j = 0 stop

vj+1 = wj+1/hj+1,j,


xm = x0+M−1Vmym where ym minimizes f(y) =∥∥βe1 − Hky

∥∥ , y ∈ <k.

4.Restart:

Compute rm = b − Axm; if satisfied (‖rm‖≤tolerance) then stop

else compute x0 := xm, v1 := rm/ ‖rm‖ and go to 2.

Applying the preconditioner, the Krylov subspace is right preconditioned orthogonal basis:

K =r0, AM−1r0, ..., (AM−1)m−1r0

(5.25)

This time the matrix-vector operator is AM−1v = Az , where z = M−1v is the precondi-

tioned vector. Nielsen et. al. [55] has suggested to scale ε in Eq. (5.9) by RMS of vector z

to improve the convergence, since using a constant ε could result in convergence stall after

couple of orders of the (non-linear) residual reduction:

Az =F (U + εz) − F (U)

ε, ε =

ε0

RMS(z)(5.26)

where ε0 is a small scalar usually in the order of√

εM (εM : machine accuracy). However,

as it was described in section 5.2.2, the small scalar value may be taken some orders of

magnitude larger than√

εM due to mesh length scale issues.

5.3 Solution Strategy

The goal of all CFD solver developers is reducing the cost of computation as well as im-

proving the accuracy of the solution. The latter could be achieved by applying higher-order

discretization methods. However, in a practical sense that is possible if and only if the cost

of the computation is comparable to the cost of the common second-order methods. That

is why implicit techniques (Newton family of methods) play a crucial role in achieving that

goal since the best explicit techniques are far less efficient with respect to computation cost

making higher-order computation quite uncompetitive. In addition to the computation cost,


i.e. CPU time, memory usage is also an issue for solver developers. But in most cases, CPU

time is the main concern since if storing a set of data (i.e. Jacobian matrix) is expensive,

then computing that data would be expensive too except in the case that the expensive

computation is performed once and the resultant work is used in multiple iterations.

Knowing the over all approach, which is the implicit formulation, and the main objective

which is the steady-state solution, we would like to lay out an efficient strategy to reach the

steady state solution as fast as possible.

The overall computing cost for a steady-state CFD problem by an implicit approach, can

be analyzed by dividing the solution process into three major parts:

1. Forming the linear system including linearization and the non-linear residual compu-

tation.

2. Solving the linear system including preconditioning.

3. Number of implicit iterations needed for steady-state convergence.

It is obvious that we would like to minimize the cost of the first and second part of the

solution process and reach the steady-state solution with minimum number of iterations.

The third part has a cumulative effect on the overall solution cost since in each (non-linear)

iteration the first two parts are covered once already.

Most CFD problems, including compressible flows, are highly non-linear, and that makes

them not too easy to solve via linearization. In addition to non-linearity in the mathematical

sense, the behavior of a compressible flow (the physics of the flow) can dramatically change

in some flow conditions for a fixed geometry. Therefore if the solution process is started from

an arbitrary initial condition which is not reasonably close to the steady-state solution, large

updates to the solution based on linearization are not likely to be helpful. Also, Newton’s

method is well known to converge to the solution quadratically if started from the vicinity

of the solution; otherwise it will diverge or stall quickly. Since finding a good approximate

solution in general (that is except the cases for which some close solution is available through

analytical or experimental data) is impossible without actually starting to solve the problem,

the solution process should consist of two phases:

1. Start-up phase

2. Newton phase


In the start-up phase, multiple linearizations with small or moderate solution updates are

required to advance the solution to the point that the linearization is accurate enough for

finding the steady-state solution, i.e. a good approximate solution as initial starting point

for Newton phase. Having a good initial guess, the solution process is switched to the

Newton iteration where superlinear or quadratic convergence is possible by taking a very

large (or infinite) time step.

5.3.1 Start-up Phase

Finding a good initial guess or reasonable approximate solution is about knowing the physics

of the problem and finding a proper way to get to that good initial solution faster by

employing various techniques such as mesh sequencing, multigrid, mixed explicit/implicit

iterations, exact solution of a simplified problem, potential flow solution, and so on. Hence,

like preconditioning, start-up is problem dependent and is more an art rather than an exact

science, implying that the proper start-up process is not unique either. In this research

a proper start-up process for compressible flows is suggested which is based on the defect

correction procedure [46].

In the start-up process, considering the fact that multiple iterations are required for advanc-

ing the solution toward a good initial guess, an implicit iterative process is used where the

linearization is performed based on the inexpensive first-order discretization, section(4.2),

and the flux calculation remains higher-order:

(I

∆t+

∂R

∂U

∣∣∣∣n

1st

)∆Un+1 = −RHigh(Un) , Un+1 = Un + ω∆Un+1 (5.27)

A relaxation factor, ω, is applied for the solution update; ω = 1 results in standard defect

correction, while using different values for ω (typically ω = 0.9-1.3) can over-relax or under-

relax the update [52]. Over-relaxation often is helpful in subsonic flows where it accelerates

the solution while under-relaxation can prevent divergence resulting from an inaccurate

solution update, for instance in transonic cases.

The Jacobian matrix on the left hand side is the first-order Jacobian and its computation

cost is approximately the same as the cost of the second-order residual calculation. The

cost of one Jacobian evaluation is 0.6-0.7 of the cost of a second-order residual evaluation

(limiting cost excluded) for the approximate analytical Jacobian and it is 1.3-1.5 of the

computation cost of the same residual evaluation if the finite-difference Jacobian is employed

(for moderate mesh sizes).


To reduce the cost of forming the linear system, the Jacobian matrix may be computed

and stored for several iterations [39]. With this approach the Jacobian is updated every j

iteration(s) (typically j = 1-4) [52].

The time step ∆t , which is actually scaled by the volume (area) of each control volume, is

based on the CFL number times maximum allowable explicit time step:

∆ti =CFL∆ti−max

Ai(5.28)

∆ti−max =Ai∮

CViλmaxds

, λmax : Maximum wave speed (5.29)

As was noted before and also has been graphically illustrated for a sample non-linear func-

tion, Fig (5.1), using the slope of the residual function for estimating the behavior of the

residual function over a large span of the independent variables is not accurate in the early

stage of the solution process. Therefore it makes sense to start from a relatively small ∆t

or CFL, and to gradually increase CFL to some modest number as we are getting closer to

the solution of the residual function, i.e. F (U) = 0. This justifies applying the first-order

linearization in the start-up process instead of the correct order Jacobian calculation; the

linearization does not allow large steps towards the steady-state solution, so why should we

spend too much work to compute it, especially if implementation of Jacobian in the linear

solver requires several matrix-vector multiplications? At the same time, solving a linear

system based on the first-order Jacobian is much easier than doing so for higher-order Jaco-

bian (even for the second-order one), because of the structure of the higher-order Jacobian

matrix. Therefore we are better off to form and solve a less expensive system, since we must

inevitably perform several iterations at the start-up phase.

The next step is solving the linear system. Since we are using an approximate linearization,

it does not make sense to solve the approximate linear system exactly, therefore the linear

system in each defect correction iteration, referred to as pre-iteration from now on, is solved

approximately. The goal of each pre-iteration is to reduce the non-linear higher-order resid-

ual by cheap linearization. Consequently it is logical to solve the linear system up to some

fraction of the non-linear residual, i.e. tolerance=C × Res(U), C 1 . Employing this

approach saves us from spending too much effort on finding a precise solution to a linear

system that we know will not give us the correct solution to the non-linear problem. The

right-preconditioned GMRES(m) algorithm is used with a limited subspace size (K=15-20)

and relatively a loose tolerance (0.1-0.05 Res(U)) is set for solving the linear system. No

restart is allowed, and if the tolerance is not reached within the maximum search directions

(subspace size) GMRES is terminated anyway and the computed solution update up to that


point is taken to move to another non-linear outer iteration. The preconditioner matrix is

the same as the linear system matrix which is the perfect choice for preconditioning because

the first-order Jacobian has been built already for forming the linear system, and it is the

most accurate preconditioner matrix for that case. To reduce the preconditioning cost, a

low fill-level incomplete factorization, ILU(1), is applied, which is accurate and efficient

enough for this first-order linear system.

Now the question is how close we should be to the solution before switching to Newton

iterations. Normally the decrease of some norm of the non-linear residual at the current

iteration compared to the initial residual, ‖Res(U)‖0

‖Res(U)‖n , is chosen for that purpose. That is, if

the solution reaches a specific relative norm of the non-linear residual, then the linearization

is accurate enough to take a very large time step at each iteration. It should be noted that

the value for this criterion varies from problem to problem. However, it is possible to find

reasonable values for different categories of the compressible flows (i.e. subsonic, supersonic

and transonic). More detail is provided in that regard in the results chapter.

5.3.2 Newton Phase

By this stage most of the transient behaviors of the non-linear CFD problem have been

removed from the flow field solution, and major steady features of the fluid flow have

appeared in the solution domain. Consequently the linearization of the non-linear residual

is fairly accurate for seeking the steady-state solution, and taking infinite or very large time

steps accelerates the solution toward quadratic or superlinear convergence, i.e. switching to

the Newton iteration formula:

∂R

∂U

∣∣∣∣n

High

∆Un+1 = −RHigh(Un) , Un+1 = Un + ∆Un+1 (5.30)

To have an accurate linearization, the higher-order Jacobian must be applied during the

Newton iterations, Eq. (5.30). This has been implemented through matrix-free GMRES,

using finite difference directional matrix-vector multiplication technique. The linear system

is right preconditioned by the first-order Jacobian, which indeed is essential due to bad

conditioning of the left hand side. ILU(P=2-4) is used for preconditioning and normally

for higher-order computation (especially the fourth-order transonic flow), increasing the fill-

level is considerably beneficial [52]. A fixed number of search directions is employed (K=30)

and it is important to keep them limited since the matrix-free GMRES performs multiple

perturbations of the non-linear residual which is very expensive for higher-order residual

computation. The linear system is again solved approximately but this time with a tighter


tolerance (10−2Res(U)) as an accurate update is required. No restart is allowed, and if

the tolerance is not reached, the next outer iteration starts with the best update from the

last inner GMRES iteration. Approximately solving the linear system in this way is called

the Inexact Newton method, and although it can increase the non-linear outer iteration

number, it reduces the over all CPU time as considerable computation time is saved by not

computing the exact solution of the linear system [64, 18, 45, 51]. Notice that the left hand

side is not quite exact because of the truncation error in the matrix-vector multiplications

and possibly is polluted by round off error, especially for higher-order terms. Therefore it

is preferable to give up the quadratic convergence instead of increasing the solution CPU

time.

Chapter 6

Results(I): Verification Cases

The first step in accuracy verification of an unstructured CFD solver is to verify the accu-

racy of the reconstruction procedure. For this purpose two complete sets of the reconstruc-

tion tests including straight and curved boundaries are presented here. Then to study the

correctness, basic performance and solution accuracy of the proposed higher-order unstruc-

tured Newton-Krylov solver, smooth subsonic and supersonic cases with known solutions

have been investigated. This chapter is devoted to numerical study of these test cases.

6.1 Reconstruction Test Cases

The objective here is to reconstruct a test function over a domain with different mesh

resolutions, and to measure the error between the reconstruction of the test function and

the exact value of the test function for discretized meshes. Two different domains have been

chosen for examining the over all accuracy of the reconstruction procedure. The first test

case is a unit square (a case with straight edges) and the second test case is an annulus

which has two curved edges. The test function is a smooth analytical function described by

Eq. (6.1).

f(x, y) = 1.0 + 0.1 sin(πx) sin(πy) (6.1)

The first step in reconstruction is computing the mean value for a control volume, implying

that we need to be able to integrate a given function over any control volume (including

boundary triangles with curved edges) at least as accurately as the nominal discretization

90

CHAPTER 6. RESULTS(I): VERIFICATION CASES 91

2nd-order L1 Ratio L2 Ratio L∞ Ratio

Mesh 1/460 CVs 0.0001667 — 0.0002948 — 0.0017002 —

Mesh 2/1948 CVs 4.551149e-5 3.66 8.73185e-5 3.37 0.0004369 3.89

Mesh 3/7856 CVs 1.15364e-5 3.94 2.27180e-5 3.84 0.0001099 3.97

Mesh 4/31668 CVs 2.92207e-6 3.95 5.81718e-6 3.9 2.75408e-5 3.99

Table 6.1: 2nd-order error norms for the square case

order. The integration of a given function over the area of a control volume is carried

out using Gauss quadrature rule, Eq. (6.2), where Wj is the Gauss weight. While this

procedure looks straightforward, determining Gauss point locations and their associated

weights robustly for higher-order integration over an area with curved edges are proved to

be a nontrivial task [95].

∫

CVi

f(x, y)dA =n∑

j=1

Wjf(xj, yj) (6.2)

In this research, for the square case a 6th-order integration routine using 7 Gauss points

and for the annulus case a 4th-order integration routine using 10 Gauss points (developed

by Dr. Carl Ollivier-Gooch in ANSLib frame work [58]) have been employed. All the grids

are generated using GRUMMP Version 0.3.2 [59] which generates high quality triangular

meshes for domains with curved boundaries [19].

6.1.1 Square Test Case

Four unstructured meshes shown in Fig (6.1) were generated for a unit square domain. The

density of control volumes is increased 4 times at each mesh with respect to the previous

mesh level. The mesh length scale is not uniform, with control volumes at the corners clearly

having larger length scales than the control volumes in the interior of the square domain.

The error norms and the ratio of error norms are tabulated in Table 6.1 through Table 6.3.

Despite non-uniform mesh distribution, all the error norm ratios for the Mesh 4 confirm

the nominal accuracy of the reconstruction. Fig (6.2) displays the Error-Mesh plot for the

square case. Notice that the error norm is reduced considerably by increasing the order

of accuracy, and the slope of the graph proves the reconstruction accuracy. Asymptotic

convergence of the error norms are clearly observed by examining the tabulated data and

Fig (6.2) for this series of meshes.


X

Y

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

(a) 460 Control Volumes

X

Y

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

(b) 1948 Control Volumes

X

Y

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

(c) 7876 Control Volumes

X

Y

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

(d) 31668 Control Volumes

Figure 6.1: Unstructured meshes for a square domain

3rd-order L1 Ratio L2 Ratio L∞ Ratio

Mesh 1/460 CVs 3.54269e-5 — 4.82035e-5 — 0.0002061 —

Mesh 2/1948 CVs 5.18616e-6 6.83 7.59731e-6 6.34 3.47214e-5 5.9

Mesh 3/7856 CVs 6.92254e-7 7.50 1.01755e-6 7.46 4.71759e-6 7.36

Mesh 4/31668 CVs 8.96936e-8 7.70 1.31888e-7 7.70 6.09472e-7 7.74

Table 6.2: 3rd-order error norms for the square case


4th-order L1 Ratio L2 Ratio L∞ Ratio

Mesh 1/460 CVs 3.98124e-6 — 9.87666e-6 — 0.0001084 —

Mesh 2/1948 CVs 3.30234e-7 12.06 8.20473e-7 12.03 8.39794e-6 12.90

Mesh 3/7856 CVs 2.11865e-8 15.60 5.32239e-8 15.40 5.49060e-7 15.30

Mesh 4/31668 CVs 1.35160e-9 15.70 3.40196e-9 15.65 3.4699e-8 15.82

Table 6.3: 4th-order error norms for the square case

Number of Control Volumes

L1N

orm

ofth

eErro

r

103 104 105

10-9

10-8

10-7

10-6

10-5

10-4

2nd-order3rd-order4th-order

1.98

2.94

3.97

Figure 6.2: Error-Mesh plot for the square case


2nd-order L1 Ratio L2 Ratio L∞ Ratio

Mesh 1/427 CVs 0.00087885 — 0.00143183 — 0.0091068 —

Mesh 2/1703 CVs 0.0001974 4.45 0.0003387 4.23 0.0035145 2.59

Mesh 3/6811 CVs 4.61658e-5 4.28 8.01761e-5 4.22 0.0008615 4.08

Mesh 4/27389 CVs 1.09032e-5 4.23 1.85523e-5 4.32 0.0002311 3.72

Table 6.4: 2nd-order error norms for the annulus case

3rd-order L1 Ratio L2 Ratio L∞ Ratio

Mesh 1/427 CVs 0.00037211 — 0.00051522 — 0.0020959 —

Mesh 2/1703 CVs 4.86257e-5 7.65 7.05354e-5 7.3 0.0003919 5.34

Mesh 3/6811 CVs 6.15972e-6 7.89 8.83369e-6 7.98 6.93723e-5 5.65

Mesh 4/27389 CVs 7.53171e-7 8.17 1.05894e-6 8.34 9.78171e-6 7.09

Table 6.5: 3rd-order error norms for the annulus case

6.1.2 Annulus Test case

This is the case that includes curved boundaries. The domain inside an annulus is dis-

cretized with four mesh densities. The Geometry and meshes are shown in Fig (6.3). Again

refinement is performed using global parameters and except for the center portion of the

annulus, refinement has been carried out relatively uniformly. The boundary region con-

trol volumes which normally introduce a major part of the reconstruction error still remain

larger than the interior control volumes. The error norms and their ratio for all orders of

reconstruction are presented in Table 6.4 to Table 6.6. l1 and l2 norms have converged

after 3 level refinement. l∞ norm has not converged as fast as the other two norms. Since

l∞ is the local error indicator and presents the maximum error in the reconstruction, the

slower convergence rate for l∞ often is expected. The nominal order of accuracy for l1 and

l2 are achieved through the employed 4 level meshes. The ratios for l∞ are also fairly close

to the nominal order of accuracy. Fig(6.4) demonstrates the reduction in l1 norm of the

error versus control volume density. The accuracy of the reconstruction also can be verified

by measuring the slope of this Error-Mesh plot. In addition, such a plot can give us a

good estimation of the average solution error for a certain mesh density. In other words

we can determine how many control volumes do we need to reach a certain level of error

based on the chosen discretization order. It is interesting to note that the l1 norm of the

reconstruction error on the finest mesh (for this test function) is 15 times smaller for the

3rd-order discretization and more than 500 times smaller for the 4th-order discretization

than the error for the 2nd-order discretization.


X

Y

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

(a) 427 Control Volumes

X

Y

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

(b) 1703 Control Volumes

X

Y

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

(c) 6811 Control Volumes

X

Y

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

(d) 27389 Control Volumes

Figure 6.3: Unstructured meshes for a curved domain (annulus)

4th-order L1 Ratio L2 Ratio L∞ Ratio

Mesh 1/427 CVs 8.03432e-5 — 0.00013598 — 0.0016860 —

Mesh 2/1703 CVs 5.87669e-6 13.67 1.11776e-5 12.16 0.0001678 10.04

Mesh 3/6811 CVs 3.64156e-7 16.14 7.14849e-7 15.63 1.42043e-5 11.82

Mesh 4/27389 CVs 2.09949e-8 17.3 3.88537e-8 18.39 1.15369e-6 12.31

Table 6.6: 4th-order error norms for the annulus case


Number of Control Volumes

L1N

orm

ofth

eErro

r

103 104

10-8

10-7

10-6

10-5

10-4

10-3 2nd-order3rd-order4th-order

2.08

3.03

4.11

Figure 6.4: Error-Mesh plot for the annulus case


X

Y

-300 -200 -100 0 100 200 3000

100

200

300

400

500

(a)

Figure 6.5: Circular domain over half a cylinder, Mesh 1 (1376 CVs)

6.2 Subsonic Flow Past a Semi-Circular Cylinder

The smooth inviscid flow over a semi-circular (half) cylinder with R = 1, at M∞ = 0.3 is

computed. The flow direction is from left to the right and the angle of attack is zero. The

far-field boundary is located at R = 300 to make sure that all perturbations are damped

effectively (Fig (6.5)). Three different meshes, Mesh 1 (1376 CVs), Mesh 2 (5539 CVs), and

Mesh 3 (22844 CVs) are generated for this purpose; close-ups are shown in Fig (6.6) to Fig

(6.8). A special effort has been made in the grid refinement process in order to achieve a

relatively self-similar refinement through the whole solution domain. Each mesh is nearly

four times denser than its immediate coarser level mesh implying that the mesh length scale

in all parts of the domain has been reduced approximately by a factor of two (section (3.3)).


X

Y

-2 -1 0 1 20

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Figure 6.6: Circular cylinder, Mesh 1 (1376CVs)

Mesh CVs Max ACViMin ACVi

√Amax

Amin

√(Amax)coarse

(Amax)fine

√(Amin)coarse

(Amin)fine

1 1376 1475.98 0.00360815 639.6 — —

2 5539 516.15 0.000853626 777.6 1.69 2.06

3 22844 125.98 0.000240918 723.2 2.02 1.88

Table 6.7: Sizes and ratios of the control volumes for circular cylinder meshes

Table 6.7 summarizes a brief information about sizes of the control volumes in each mesh

level. The length scale of each control volume can be assumed to be proportional to the

square root of the control volume area. By looking at the tabulated data, it is clear that

in each mesh level there is a wide variety of length scales which largely changes across the

solution domain. The table also shows the approximate refinement factor from one coarse

level to its immediate finer level. Assuming that the refinement is done uniformly (which

is not the case in reality), the length scale of the mesh is not exactly reduced by a factor

of two in each refinement. Since, we do not have a uniform mesh to start with, we are not

able to uniformly refine the mesh, and reduce the length scale of the control volumes in an

orderly fashion through whole solution domain (especially at boundary regions) it is not

too surprising that the measured order of the accuracy will be somewhat away from the

nominal expected order, section (3.3).


X

Y

-2 -1 0 1 20

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Figure 6.7: Circular cylinder, Mesh 2 (5539 CVs)

X

Y

-2 -1 0 1 20

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Figure 6.8: Circular cylinder, Mesh 3 (22844 CVs)


The potential (incompressible) flow solution is used as an initial solution for all the meshes

and orders of accuracy. Since the solution is started from a good initial guess, no start-up

procedure is needed. Newton iterations are performed using matrix-free GMRES with the

maximum Krylov subspace size of 30. However for the sake of the robustness, we keep theI

∆tterm and the time advance formula with large CFL number is applied, Eq. (2.12). To

reduce the cost of the inner iterations, where multiple residual perturbations are required,

a subspace of 20 is employed for early outer iterations. When the l2 norm of the non-linear

residual is dropped below 5× 10−9 the subspace of 30 is used. This is especially helpful for

the 4th-order case where the residual computation is very expensive. The tolerance of the

linear system solver is chosen to be 1× 10−2 of the l2 norm of the non-linear residual which

often is not reached within the allowable inner iterations. The convergence criterion for the

steady-state solution is 1 × 10−12 for the l2 norm of the non-linear residual. In the case of

Mesh 1 and 2, CFL=5000 for the 2nd and 3rd-order discretizations and CFL=1000 for the

4th-order discretization are used for starting the solution from the initial guess and it is

increased to CFL=10,000 when the non-linear residual is reduced by one order. For Mesh 3

the starting CFL is 100 for all orders of accuracy which is increased to CFL=5000 gradually,

and after one order reduction in residual, it is increased to CFL=10,000. The preconditioner

matrix for all cases is the approximate analytic Jacobian and the preconditioning strategy

is ILU(4).

The solution convergence history for the coarse mesh (Mesh 1) and the fine mesh (Mesh 3)

are shown in the Fig (6.9) and Fig (6.10). As it is evident full convergence is achieved, but

the CPU time for the 4th-order case is much larger than the other two orders of accuracy.

This was partially expected since the 4th-order discretization requires more operations,

and the cost of the reconstruction rises quadratically by increasing the discretization order.

However, a considerable part of the computing cost is due to noticeable increase in outer

iteration numbers. Furthermore, the linear system arising from the 4th-order discretization

is ill-conditioned and difficult to solve which results in relatively ineffective linearization of

the non-linear problem (considering the fixed subspace size and Inexact Newton approach)

demanding more outer iterations. On the contrary, both the 2nd and 3rd-order cases quickly

converge displaying the effectiveness of the linearization.

Pressure coefficient contours (banded) for all orders of accuracy (Mesh 1) are shown in

the Fig (6.11) to Fig (6.13). The quality of the contours, which reflect the smoothness of

the reconstructed solution, are visibly improved by increasing the order of accuracy of the

solution1. The difference between the contours’ quality is clearer in regions where control

1Visualization is done by Tecplot 360 using 3 nodal triangle finite element format; consequently althoughsolution is higher order its visual representation is only second order.


CPU-Time (Sec)

L2

0 50 100 15010-13

10-12

10-11

10-10

10-9

10-8

10-7

10-6

10-5

2nd-Order/Mesh13rd-Order/Mesh14th-Order/Mesh1

Figure 6.9: Convergence history for the coarse mesh (Mesh 1)

CPU-Time (Sec)

L2

0 500 1000 1500

10-12

10-11

10-10

10-9

10-8

10-7

10-6

10-52nd-Order/Mesh33rd-Order/Mesh34th-Order/Mesh3

Figure 6.10: Convergence history for the fine mesh (Mesh 3)


X

Y

-4 -2 0 2 40

1

2

3

4

5

6

7

1.00.50.30.1

-0.1-0.4-0.9-1.3-1.8-2.3-2.7-3.2

Cp

Figure 6.11: 2nd-order pressure coefficient contours, Mesh 1

X

Y

-4 -2 0 2 40

1

2

3

4

5

6

7

1.00.50.30.1

-0.1-0.4-0.9-1.3-1.8-2.3-2.7-3.2

Cp

Figure 6.12: 3rd-order pressure coefficient contours, Mesh 1


X

Y

-4 -2 0 2 40

1

2

3

4

5

6

7

1.00.50.30.1

-0.1-0.4-0.9-1.3-1.8-2.3-2.7-3.2

Cp

Figure 6.13: 4th-order pressure coefficient contours, Mesh 1

volumes are quite coarse. The 4th-order pressure coefficient contours over Mesh 3 are also

shown in Fig (6.14) as a reference for the quality comparison with previously mentioned

cases. Since flow is inviscid and the geometry is symmetric, the pressure contours are

symmetric respect to the normal axis passing through the center of the cylinder. Front and

rear stagnation regions where the maximum pressure occurs (Cp=1.0 ) and the suction peak

on the top of the cylinder are displayed in the contours.

The pressure coefficient distribution along the surface of the cylinder (Gauss point data over

the circle) is shown in the Fig (6.15) to Fig (6.17). In terms of the solution, all discritization

orders predict nearly the same pressure coefficient over the cylinder. In general and for

a smooth flow that is what is expected since results should differ within the orders of

the mesh size. In the 2nd-order case, when the mesh is coarse, the suction peak is not

recovered smoothly nor is it located quite at the top of the cylinder; however, these issues

are considerably improved for the 3rd and 4th-order solution over the same mesh. The 4th-

order case over the coarse mesh shows a slight over-prediction both at the front stagnation

point and at the suction peak compared to the same solution over the fine mesh. Finally,

solution over the fine mesh for all orders of accuracy is displayed in Fig (6.17) showing

that all of them essentially are converged to the same pressure distribution such that the

difference between them is not visible to plotting accuracy.


X

Y

-4 -2 0 2 40

1

2

3

4

5

6

7

1.00.50.30.1

-0.1-0.4-0.9-1.3-1.8-2.3-2.7-3.2

Cp

Figure 6.14: 4th-order pressure coefficient contours, Mesh 3

X

Cp

-1 -0.5 0 0.5 1

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

2nd-Order/Mesh13rd-Order/Mesh14th-Order/Mesh14th-Order/Mesh3

Figure 6.15: Pressure coefficient along the axis


X

Cp

-0.4 -0.2 0 0.2 0.4

-3.5

-3

-2.5

-2


Figure 6.16: Close up of the pressure coefficient along the axis (suction region)

X

Cp

-1 -0.5 0 0.5 1

-3

-2

-1

0

1


Figure 6.17: Pressure coefficient along the axis, Mesh 3


2nd-order L1 Ratio L∞ Ratio

Mesh 1/1376 CVs 0.000553 — 0.013409 —

Mesh 2/5539 CVs 9.4548e-5 5.85 0.003713 3.61

Mesh 3/22844 CVs 2.0492e-5 4.61 0.00129 2.88

Table 6.8: Error norms for total pressure, 2nd-order solution

3rd-order L1 Ratio L∞ Ratio

Mesh 1/1376 CVs 0.000426 — 0.010250 —

Mesh 2/5539 CVs 3.5353e-5 12.05 0.002030 5.05

Mesh 3/22844 CVs 6.0965e-6 5.8 0.000543 3.74

Table 6.9: Error norms for total pressure, 3rd-order solution

As flow is inviscid and subsonic (shock free), the total pressure of the flow should be pre-

served everywhere, and it is equal to the total pressure of the free-stream at the far-field.

Therefore deviation of the total pressure ratio from one can be interpreted as an error indi-

cator for these type of flows (E = Pt

Pt∞−1). The discrete norms of the errors in total pressure

ratio of control volume averages are computed for all orders of accuracy and meshes and

are tabulated in the Table 6.8 to Table 6.10. The l1 norm, a global measurement for error,

is roughly half an order less than the nominal order of the discretization for the 3rd-order,

and it is about right for the 4th-order discretization. The l∞ norm, a local measurement

for maximum error, follows the same trend and it is about one order less than the nominal

norm of the discretization for the 3rd-order solution.

The total pressure error also is shown in Fig (6.18) to Fig (6.21) over the coarse mesh for

all orders of accuracy and over the fine mesh for the 4th-order discretization (as a reference

for comparison). These pictures demonstrate that the surface boundary as the main source

of the error and show how different discretization orders under-predict and/or over-predict

the total pressure in stagnation regions and at the maximum acceleration area (top of the

cylinder). For instance the 2nd-order and 3rd-order discretizations slightly under-predict

the total pressure over nearly all the boundary surface, and this under-prediction reaches

its peak at the rear stagnation region. The 4th-order method slightly over-predicts the

total pressure nearly everywhere along the boundary surface which was consistent with the

4th-order pressure coefficient along the boundary. By refining the mesh and/or increasing

the accuracy order the error in total pressure improves dramatically (Fig (6.21)).

The Error-Mesh plot for l1, Fig (6.22), demonstrates the total pressure error reduction

versus mesh size. It should be noted that despite the deviation of the measured accuracy

order from its nominal value, the higher-order discretization still produces noticeably smaller

error. For instance in the case of the 3rd-order solution over the finest mesh, where the


X

Y

-2 0 20

1

2

3

4

5

6

0.0040.0020

-0.002-0.004-0.006-0.008-0.01-0.012-0.014-0.016

Total PressureError

Figure 6.18: Error in total pressure ratio, 2nd-order discretization, Mesh 1

X

Y

-2 0 20

1

2

3

4

5

6

0.0020

-0.002-0.004-0.006-0.008-0.01-0.012-0.014

Total PressureError

Figure 6.19: Error in total pressure ratio, 3rd-order discretization, Mesh 1


X

Y

-2 0 20

1

2

3

4

5

6

0.0160.0140.0120.010.0080.0060.0040.0020

-0.002

Total PressureError

Figure 6.20: Error in total pressure ratio, 4th-order discretization, Mesh 1

X

Y

-2 0 20

1

2

3

4

5

6

6E-054E-052E-050

-2E-05-4E-05-6E-05-8E-05-0.0001-0.00012

Total PressureError

Figure 6.21: Error in total pressure ratio, 4th-order discretization, Mesh 3


4th-order L1 Ratio L∞ Ratio

Mesh 1/1376 CVs 0.000535 — 0.0148117 —

Mesh 2/5539 CVs 2.7473e-5 19.47 0.001097 13.5

Mesh 3/22844 CVs 2.0954e-6 13.11 8.1810e-5 13.4

Table 6.10: Error norms for total pressure, 4th-order solution

CVs

L1N

orm

ofth

eErro

r(To

talP

ress

ure)

10000 20000 30000

10-5

10-4

2nd-Order3rd-Order4th-Order

2.2

2.54

3.71

Figure 6.22: Error-Mesh plot for the total pressure


CVs

CD

10000 20000 30000

10-6

10-5

10-4

10-3

10-2


3.41

3.34

4.83

Figure 6.23: Drag coefficient versus mesh size

largest deviation from the nominal accuracy occurs, the l1 norm is 3.4 times smaller than

the l1 norm of the 2nd-order solution on the same mesh.

The Drag coefficient can also be used for error measurement since the exact drag coefficient

for the inviscid subsonic flow over circular cylinder is zero. Figure (6.23) displays the drag

reduction versus mesh size. This time the measured accuracy order is somewhat better

than the nominal order showing some error cancellation due to symmetry of the solution.

As expected higher-order discretization results in much smaller drag, which is closer to the

exact solution.

To show the benefits of higher-order discretization, it is useful to assess the performance

of a higher-order solution versus its accuracy. This can be perfectly done for the circular

cylinder case. The CPU time of the solution versus the drag coefficient is shown in Fig

(6.24). Assume that the picture shown is the solver performance characteristic graph for

all meshes, and by changing the mesh size (within the range of the interest) the behavior

of the performance graph does not change. For a desired accuracy, for instance CD = 10−4

the 3rd-order solution has the minimum CPU time. For a fixed solution time of 100 seconds

again the 3rd-order case results in the minimum drag. If a large solution time is bearable,

the 4th-order case provides us the best solution accuracy, since its CD/CPU time slope is


CPU-Time (Sec)

CD

101 102 103

10-6

10-5

10-4

10-3

10-2


Figure 6.24: Drag coefficient versus CPU time

sharper than the other two orders of accuracy. This implies that if a very accurate solution is

needed it is more efficient to increase the discretization order rather than using a finer mesh.

This comparison is performed for a simple problem, as opposed to a practical engineering

case, and at same time engineering accuracy requirements are typically limited. However,

this case study still provides some interesting insight into the prospects for higher-order

methods.


X

Y

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

Figure 6.25: Annulus, Mesh 1 (108 CVs)

6.3 Supersonic Vortex

Supersonic flow inside an annulus geometry is studied. This flow is isentropic (shock free)

and it is a standard test case for accuracy measurement. The inner and outer radii are 2.0

and 3.0 respectively, and the inlet Mach number at the inner radius is equal to 2.0.

Five different meshes (Mesh 1 to Mesh 5) are employed in this test case (Fig (6.25) to Fig

(6.29)). Each mesh level has 4 times more control volumes than the immediate coarser level

and uniform refinement has been applied in the grid generation. Notice that meshes are

still irregular unstructured meshes and any finer level is generated completely independent

from the previous coarser level.

The exact solution for supersonic vortex can be found in Aftosmis and Berger [5]. Having

the exact solution provides us a direct option for accuracy measurement of the numerical

solution. The exact solution for the supersonic vortex is described in Eq. (6.3).


X

Y

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3


X

Y

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3



X

Y

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3


X

Y

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3



Ri = 2.0, Mi = 2.0, ρi = 1.0

R =√

x2 + y2

ρ = ρi

(1 +

γ − 1

2

(1 − R2

i

R2

)M2

i

) 1γ−1

URi

= RiMi(ρi)(γ−1)

2

u =yU

Ri

R2, v =

−xURi

R2, P =

ργ

γ(6.3)

6.3.1 Numerical Solution

The exact solution is used as an initial solution, which does not exactly satisfy the discretized

equations due to truncation error. Starting a numerical solution from its exact solution is

unrealistic in general, as if the exact solution was known, there would be no need for

numerical solution. However this is a special case for efficiency study of the developed

Newton-Krylov solver, which excludes the start-up issue completely. This way we can show

how the efficiency of our matrix-free approach is affected purely by discretization order

when the best starting solution is used. Newton iteration (infinite time step) is performed

for all cases. An approximate analytic Jacobian with ILU(4) is used for preconditioning.

The convergence criterion, as in section (6.2), is L2(Res(U)) = 1 × 10−12 and the same

subspace size has been used. Fig (6.30) and Fig (6.31) show the convergence history for the

Mesh 1 and Mesh 5 in terms of CPU time. Since the solution is started from a good initial

guess superlinear convergence is achieved from the very first iteration both for the 2nd and

3rd-order discretizations. The 4th-order case (as expected) is much slower than the other

two and it requires more outer iterations (2 more outer iterations for the fine mesh) to reach

full convergence. This shows that even when the solution is started from the best possible

initial guess (i.e. exact solution), the 4th-order discretization results in a complicated linear

system which requires considerable effort to solve.

The solution Mach contours over the coarse mesh for all orders of accuracy are displayed

in Fig (6.32) to Fig (6.34). The flow Mach number is supposed to be constant along any

given radius and decreases with increasing radius. In other words, flow at smaller radius has

larger acceleration with the maximum occurs along the inner wall boundary. By inspecting

the contour lines, it is clearly visible that the quality of the solution using the higher-

order discretization has been improved noticeably. This improvement is most visible close

to the inner wall portion. This shows that, especially in regions where the solution has


CPU-Time (Sec)

L2

0 1 2 3

10-15

10-13

10-11

10-9

10-7

10-5 2nd-Order/Mesh-13rd-Order/Mesh-14th-Order/Mesh-1

Figure 6.30: Convergence history for the coarse mesh (Mesh 1)

CPU-Time (Sec)

L2

0 50 100 150 200 25010-15

10-14

10-13

10-12

10-11

10-10

10-9

10-8

2nd-Order/Mesh-53rd-Order/Mesh-54th-Order/Mesh-5

Figure 6.31: Convergence history for the fine mesh (Mesh 5)


X

Y

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

1.91.81.71.61.51.41.31.21.1

Mach

Figure 6.32: 2nd-order Mach contours for the coarse mesh (Mesh 1)

large inadequately resolved derivatives (but is continuous), linear reconstruction of the flow

quantities is not the optimal choice.

Fig (6.35) to Fig (6.37) display the density error in the solution over the coarse grid. The

scale of the error map is the same for all discretization orders providing a reasonable sense

of how the solution error decreases when a higher-order discretization is employed. As

expected the boundaries are the original source of the error and the maximum error occurs

at the inner wall. The 4th-order density solution over the fine grid is shown in Fig (6.38) as

a reference. The behavior of the density contours is opposite to the behavior of the Mach

contours, and density reaches its maximum value at the outer wall.

6.3.2 Solution accuracy measurement

The density of the solution is used for the accuracy measurement. Error norms are tabulated

in Table 6.11 to Table 6.13 for all set of grids and discretization orders. The maximum error

norm (l∞) is about one order less than the nominal order of accuracy due to the boundary

error. The global norm, l1, however, converges to the nominal order of accuracy for all cases.

Hence the over all accuracy of the computed solution is consistent with the discretization


X

Y

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

1.91.81.71.61.51.41.31.2

Mach

Figure 6.33: 3rd-order Mach contours for the coarse mesh (Mesh 1)

X

Y

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

1.91.81.71.61.51.41.31.2

Mach

Figure 6.34: 4th-order Mach contours for the coarse mesh (Mesh 1)


X

Y

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

0.060.050.040.030.020.010

-0.01-0.02

Density Error

Figure 6.35: 2nd-order density error for the coarse mesh (Mesh 1)

X

Y

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

0.060.050.040.030.020.010

-0.01-0.02

Density Error

Figure 6.36: 3rd-order density error for the coarse mesh (Mesh 1)


X

Y

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

0.060.050.040.030.020.010

-0.01-0.02

Density Error

Figure 6.37: 4th-order density error for the coarse mesh (Mesh 1)

X

Y

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

2.42.221.81.61.41.2

Density

Figure 6.38: Density, 4th-order solution over the fine mesh (Mesh 5)


Mesh CVs L1 Ratio L∞ Ratio

1 108 0.014872 — 0.097239 —

2 427 0.003847 3.86 0.036135 2.69

3 1703 0.001073 3.59 0.012825 2.82

4 6811 0.000258 4.15 0.005168 2.48

5 27389 6.6334e-5 3.89 0.002019 2.56

Table 6.11: Solution error norms, 2nd-order discretization


1 108 0.044813 — 0.028904 —

2 427 0.000976 4.59 0.011439 2.53

3 1703 0.000139 7.0 0.002225 5.14

4 6811 2.4415e-5 5.69 0.000634 3.51

5 27389 3.3202e-6 7.35 0.000136 4.66

Table 6.12: Solution error norms, 3rd-order discretization

order. The error convergence plot for the l1 norm is shown in Fig (6.39) for graphical

solution accuracy verification.

Again here, some accuracy-efficiency analysis can be carried out. The solution CPU time

versus the Error norm is plotted in Fig (6.40). For a fixed CPU time, for example (CPU

time = 10 Sec), the 3rd-order performance characteristic line generates the minimum error.

For a fixed error level, l1 = 1 × 10−5, the 4th-order performance characteristic line has

the minimum computation time, although on that error level the 3rd and 4th-order lines

are very close to each other. In all cases the 2nd-order characteristic line is totally out

performed by higher-order counterparts. It should be mentioned that a similar performance

characteristic graph with the same trend, but different CPU time, can be achieved when

the solution starts from a constant Mach number as an initial guess with a start-up process.

The supersonic vortex is an internal flow; therefore, the accuracy of the solution for whole

domain was measured and studied. If one modifies the error quantification, or limits the


1 108 0.002199 — 0.01632 —

2 427 0.000410 5.36 0.005102 3.20

3 1703 2.3761e-5 17.25 0.000546 9.34

4 6811 2.0200e-6 11.76 8.3718e-5 6.52

5 27389 1.2514e-7 16.14 1.0024e-5 8.35

Table 6.13: Solution error norms, 4th-order discretization


CVs

L1N

orm

ofth

eErro

r(D

ensit

y)

102 103 10410-7

10-6

10-5

10-4

10-3

10-2


1.95

2.88

4.01

Figure 6.39: Error-Mesh plot for the solution (Density)

importance of the solution to a specific part of the domain, the performance characteristic

graph may change considerably. Therefore it is crucial to know what is important in a

flow field, and how the error is defined and quantified. All of these are application oriented

and vary from case to case. However, the studied test case clearly shows the potential

performance advantages of higher-order discretizations.


CPU-Time (Sec)

Erro

r(L1

)

100 101 10210-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1 2nd-Order3rd-Order4th-Order

Figure 6.40: Error versus CPU Time

Chapter 7

Results(II): Simulation Cases

In this chapter, the start-up procedure, fast convergence, robustness and overall performance

of the proposed higher-order unstructured Newton-Krylov solver have been studied in detail

for subsonic, transonic, and supersonic flows.

7.1 Subsonic Airfoil, NACA 0012, M = 0.63, α = 2.00

To study the convergence and robustness of the proposed Newton-Krylov solver with a

higher-order unstructured discretization, here a subsonic case is presented which includes

most of the features of the solver’s performance. The convergence characteristics are inves-

tigated for a series of meshes. Five different meshes with an O-Domain from a coarse to a

relatively fine mesh have been used (Table 7.1 and Fig (7.1) to Fig(7.5) ). All meshes have

proper refinement at the leading and trailing edges. The far field is located at 25 chords

and characteristic boundary conditions are implemented implicitly.

Mesh No. of CVs along the chord (upper side/lower side) Total No. of CVs

1 61/58 1245

2 101/99 2501

3 127/127 4958

4 198/192 9931

5 260/253 19957

Table 7.1: Mesh detail for NACA 0012 airfoil

124

CHAPTER 7. RESULTS(II): SIMULATION CASES 125

X

Y

0 0.2 0.4 0.6 0.8 1

-0.4

-0.2

0

0.2

0.4

Figure 7.1: NACA 0012, Mesh 1, 1245 CVs

X

Y

0 0.2 0.4 0.6 0.8 1

-0.4

-0.2

0

0.2

0.4



X

Y

0 0.2 0.4 0.6 0.8 1

-0.4

-0.2

0

0.2

0.4


X

Y

0 0.2 0.4 0.6 0.8 1

-0.4

-0.2

0

0.2

0.4



X

Y

0 0.2 0.4 0.6 0.8 1

-0.4

-0.2

0

0.2

0.4


7.1.1 Solution Process

The tolerance in solving the linear system for the start-up phase is 5 × 10−2 and for the

Newton part is 1 × 10−2. For all test cases a subspace of 30 has been set and no restart

is allowed. The preconditioning for the start-up pre-iterations is performed by employing

the approximate analytical Jacobian matrix with ILU-1 factorization and for the Newton

iteration the first-order finite difference Jacobian matrix with ILU-4 factorization is used.

The Newton iteration is matrix-free and ε = ε0‖z‖ with ε0 = 1 × 10−6 is used for directional

differencing. The initial condition is the free stream flow.

No attempt for optimizing the start-up process is made. The solution starts with 30 pre-

iterations in the start-up process to reach a good initial solution before switching to Newton

iterations. The CFL starts at 2.0 and is increased gradually to CFL=100 for the first 15

pre-iterations which are performed with first-order accurate flux evaluation. The remaining

15 pre-iterations are performed in the form of the defect correction with constant CFL of

100, where the first-order Jacobian is used both for constructing the left hand side and for

preconditioning the linear system. The right hand side (Eq. (5.27)) is evaluated to the

correct order of accuracy. The cost of each pre-iteration includes one first order Jacobian


X/C

Cp

0 0.2 0.4 0.6 0.8 1

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

4th-Order/Mesh5-Converged Solution2nd-Order/Mesh1/Start-up3rd-Order/Mesh1/Start-up4th-Order/Mesh1/Start-up

Figure 7.6: Cp over the upper surface after start-up, Mesh1 (1245 CVs)

evaluation and its incomplete factorization, one flux evaluation, and one system solve using

GMRES, which is not matrix-free since the Jacobian matrix is available explicitly.

Experiments for subsonic flow have shown that a reasonable starting point for Newton

iteration can be easily achieved by a relatively small number of pre-iterations, and there is

no need to decrease the residual by a significant factor. Only a rough physical solution over

the airfoil is good enough for starting Newton iterations. It is also interesting to look at the

solution after finishing the start-up and right before switching to Newton iteration. Fig (7.6)

displays the pressure coefficient over the upper surface of the airfoil at the end of the start-

up process for Mesh 1, which is the coarsest mesh. The approximate physical solution has

been reached for all discretization orders and the suction peak is nearly resolved. Although

the suction peak is recovered better in the start-up process for higher-order discretizations,

the difference is small and for the start-up where only an approximate solution is needed,

the 2nd-order defect correction generates a reasonably good solution to start the Newton

iterations for all discretization orders. Considering the difference in cost of the 2nd-order

residual evaluation compared to the 3rd and 4th-order residual evaluations, it makes sense

to use the 2nd-order defect correction scheme for start-up.

A similar graph is shown for the finest mesh (Mesh 5) after start-up process in Fig (7.7).


X/C

Cp

0 0.2 0.4 0.6 0.8 1

-1

-0.5

0

0.5

1

4th-Order/Mesh5-Converged Solution2nd-Order/Mesh5/Start-up3rd-Order/Mesh5/Start-up4th-Order/Mesh5/Start-up

Figure 7.7: Cp over the upper surface after start-up, Mesh 5 (19957 CVs)

While the general trend of the pressure coefficient over the airfoil is recovered, the suction

peak has not been properly resolved yet. That is not surprising since due to the mesh

volume, the diversity of mesh length scales, and the local time stepping approach, solution

time evolution is far from completion and more pre-iterations are needed to resolve the

suction peak. The noticeable point is the violent oscillations of the 4th-order discretization

around the suction region in the course of the solution evolution. This oscillation around

the extrema is normal, and it is the case both for the 2nd and 3rd-order discretizations.

But for the 4th-order case due to low dissipation in the fourth-order scheme, oscillations

are quite vigorous. These oscillations could be a source of instability in the start-up process

and indeed the 4th-order case is sensitive in the early period of solution process. Since the

cost of the higher-order start-up is more than the 2nd-order one, and it could generate noisy

solution state (before switching to Newton iteration), it is more robust to perform all the

defect correction part with the 2nd-order residual evaluation. A similar approach is taken

for all meshes of the current test case, where the defect correction iterations are performed

with 2nd-order accuracy.

After start-up, the solution process is switched to Newton iteration and an infinite CFL

is employed. To be able to compare the computing cost for different meshes, a work unit


has been used, which is simply equivalent to the cost of one residual evaluation for the

corresponding order of accuracy. It is obvious then that two different meshes have two

different work units in terms of CPU time. Convergence is reached (solution process is

stopped) when the L2 norm of the non-linear density residual falls below 1 × 10−12 .

Table 7.2 shows the convergence summary for the 2nd, 3rd and 4th-order discretizations in

terms of total number of residual evaluations, total CPU time, total work units, number of

Newton iterations, and the cost of Newton phase in terms of work units. For all meshes,

after 30 pre-iterations, the solution has converged after a few Newton iterations. The total

number of work units increases as finer meshes are used. The linear system arising from

a denser grid is more difficult to solve than a similar system arising from a coarser grid.

Using a constant subspace size without restart becomes less and less effective in solving

the linear system as the size and complexity of the system increases resulting in reducing

the linearization effectiveness. Consequently more outer (Newton) iterations are needed to

reduce the non-linear residual where the mesh size or discretization order increases. Also

the solution state after the start-up for the fine mesh is not as close as it was to the steady-

state solution for the case of coarse mesh and this is another contributing factor for slowing

the Newton convergence on the fine mesh. Notice that the total work unit has increased

roughly by a factor of 2.5 while the mesh size has increased by a factor of 16.

For a similar subsonic case, the total computation cost in terms of work units for one of the

fastest and most efficient available Newton-Krylov algorithm (2nd-order/artificial dissipa-

tion) for unstructured meshes [45] is just under 300. This confirms that the performance

of the current developed Newton Krylov is quite comparable with the world class standard

algorithms. However as it will be noted later a combination of mesh sequencing and multi-

grid techniques are needed to ensure such excellent performance as the size of the mesh

increases.

Figure (7.8) displays the the solution CPU time versus grid size in logarithmic scale. The

computing cost in terms of CPU time follows a power law relation, t = m(NCV )k , with

the mesh size and all discretization orders show more or less the same trend (k2nd =

1.131, k3rd = 1.021 and k4th = 1.163), showing the scalability of the developed solver CPU

time with the mesh size. This graph emphasizes the need for parallel computing and multi-

grid method in the case of large meshes. Figure (7.9) and Fig (7.10) show the total work

units and the Newton phase cost in terms of the work unit. The 2nd and the 3rd-order

solution have nearly the same Newton phase cost in terms of the work unit. The difference

between these two is mainly due to the start-up phase cost which for small meshes could

be considerable part of the total solution cost. For the 4th-order case, the cost of the start


Test Case No. of Res. Time (Sec) Work Unit No. of Newton Itr. Newton Phase Work Unit

Mesh 1

2nd 100 5.76 240.0 3 96.7

3rd 132 8.87 197.1 4 122.4

4th 244 27.09 258.0 7 226.8

Mesh 2

2nd 121 12.88 280.0 3 118.3

3rd 136 17.71 213.4 4 128.2

4th 283 58.15 312.6 8 274.8

Mesh 3

2nd 126 26.88 349.1 3 136.1

3rd 147 36.03 248.5 4 141.2

4th 247 90.54 289.3 7 239.2

Mesh 4

2nd 158 60.39 399.9 4 182.8

3rd 158 73.98 276.0 4 159.0

4th 318 217.60 371.4 9 317.8

Mesh 5

2nd 254 164.10 562.0 7 325.3

3rd 286 225.87 456.3 8 321.8

4th 542 682.0 639.8 16 577.1

Table 7.2: Convergence summary for NACA 0012 airfoil, M = 0.63, α = 20


No. of CVs

CPU

time(

Sec)

10000 20000

101

102

103


Figure 7.8: CPU time versus the grid size, NACA 0012, M = 0.63, α = 20

up is a minor part of the total solution cost since the 4th-order gets cheaper scaled start-up

than 3rd (than 2nd) due to normalization of the solution cost. It should be mentioned that

the start-up technique is neither unique nor optimized for these series of test cases and its

cost should be studied separately. If a reasonably good pre-computed initial solution is

available (e.g. full or incompressible potential solution, exact solution of some approximate

problem or an empirical solution) then the cost of the start-up phase can be ignored and

in practice the total solution cost would be equal to the Newton phase cost. The start-up

methodology in this research is reasonably efficient for small and moderate sized meshes;

however its cost would grow if more pre-iterations are needed for large meshes. In general

for large meshes, it is not efficient to perform the start-up procedure from scratch, and it

would be much more efficient if the computation procedure is started from a coarse mesh.

Therefore for very large meshes augmenting the pre-iterations start-up with the mesh se-

quencing technique could be quite helpful. At the same time the rise in total solution cost

due to increasing the mesh size both in the start up and Newton phases can be dramatically

reduced if a multi-grid procedure is employed.

The convergence history for the coarse grid (Mesh 1) and the fine grid (Mesh 5) are shown

in Fig (7.11) and Fig (7.12). Although the start up process is the same for all orders of


No. of CVs

Tota

lWor

kU

nit

10000 20000

200

300

400

500

600


Figure 7.9: Total work unit versus the grid size, NACA 0012, M = 0.63, α = 20

No. of CVs

New

ton

Itera

tion

Wor

kU

nit

10000 20000

100

200

300

400

500


Figure 7.10: Newton phase work unit versus the grid size, NACA 0012, M = 0.63, α = 20


CPU time

L2

5 10 15 20 25

10-13

10-11

10-9

10-7

10-5

10-32nd-Order/Newton Itr./Mesh13rd-Order/Newton Itr./Mesh14th-Order/Newton Itr./Mesh1

Figure 7.11: Convergence history, NACA 0012, Mesh 1, M = 0.63, α = 20

discretizations, the starting residual is different for two reasons. First the initial residual

(r0) which initializes the implicit start-up is based on the correct corresponding order of

residual evaluation. Second the mesh moments are pre-computed and employed based

on the different orders of accuracy. However as start-up reaches to its end the solution

state is almost the same for all discretization orders. After start-up full convergence is

achieved within a few Newton iterations in 2nd and 3rd order cases. For the 4th-order

case convergence is about two times slower (in terms of outer iterations). The superlinear

convergence for the 2nd and 3rd-order cases is evident.

Fig (7.13) examines the correlation between the non-linear residual and the residual of

the linear system solution. This graph shows the non-linear residual after applying the

update produced by solving the linear system versus the residual of that linear system. The

correlation between these two is nearly linear. Fig (7.14) shows the quality of the linear

system solution for each Newton iteration. The linear residual reduction within the GMRES

linear solver is plotted versus the final residual of that system for all outer iterations. In other

words this shows how much the linear system residual is reduced through inner iterations.

For the 2nd-order case the linear system is solved quite effectively and more than 8 orders

of residual reduction are achieved given the subspace size of 30 with no restart. For the 3rd-


CPU time

L2

100 200 300 400 500 60010-12

10-11

10-10

10-9

10-8

10-7

10-6

10-5

10-4 2nd-Order/Newton Itr./Mesh53rd-Order/Newton Itr./Mesh54th-Order/Newton itr./Mesh5

Figure 7.12: Convergence history, Mesh 5, M = 0.63, α = 20

L2 Linear System Residual

L2N

on-li

near

Resid

ual

10-13 10-12 10-11 10-10 10-9 10-8 10-7 10-6 10-510-15

10-14

10-13

10-12

10-11

10-10

10-9

10-8

10-7

10-6 2nd-Order/Newton Itr./Mesh33rd-Order/Newton Itr./Mesh34th-Order/Newton Itr./Mesh3

Figure 7.13: Non-linear residual versus linear system residual, Mesh 3, M = 0.63, α = 20


L2-linear System Residual

Log

(Lin

earS

yste

mRe

sidua

lRed

uctio

n)

10-1410-1210-1010-810-610-4

2

4

6

8

10

12

2nd-Order/Newton Itr./Mesh33rd-Order/Newton Itr./Mesh34th-Order/Newton Itr./Mesh3

Figure 7.14: Linear system residual dropping order, Mesh 3, M = 0.63, α = 20

order and 4th-order discretization this residual reduction is about 5 and 2 orders respectively

for the same subspace size. This reduction in quality in solving the linear system eventually

increases the number of Newton iterations for higher-order discretizations. However, it has

been shown that by using multiple restarts and solving the linear system up to machine

accuracy for each Newton iteration semi quadratic convergence is attainable even for the

4th-order discretization with dramatic penalty in computation cost [53].

The perfect preconditioning would cluster all eigenvalues at one. For the proposed precon-

ditioning strategy, it is impossible to exactly cluster all eigenvalues at one for two reasons.

First, the first-order linearization matrix (Jacobian matrix) is employed which is an approx-

imation for higher-order linearization and not equal to the original linearization. Secondly,

an incomplete factorization is used as the preconditioning technique instead of full factoriza-

tion. Therefore, the best that can be expected is that all eigenvalues of the preconditioned

operator would be scattered around unity while some of them could be very close to one.

As a result, the distance of the eigenvalues from unity can be used as one of the precondi-

tioning quality indicators. Also, it is desired to have eigenvalues located far from the origin

(zero) to avoid ill-conditioning and singularity issues [53]. To evaluate and compare the

quality of the preconditioning for different discretization orders the approximate eigenvalue


Re

Im 0.5 1 1.5 2

-1

-0.5

0

0.5

1


Figure 7.15: Eigenvalue pattern for the preconditioned system, Mesh 3, M = 0.63, α = 20

spectrum of the preconditioned linear system (estimated via GMRES algorithm) is plotted

at the last iteration (i.e. converged solution) for the Mesh 3 test case, Fig (7.15). The

eigenvalues associated with higher-order discretizations are scattered with larger distances

from one compared to the 2nd-order eigenvalues. This some how indicates a reduction in

the quality of system solving with increasing discretization order, which of course was not

hard to predict in the first place. At the same time, higher the discretization order closer to

the origin eigenvalues are located shifting the matrix toward singularity. This is especially

the case for the 4th-order discretization where one of the eigenvalues is very close to origin.

The estimated condition number of the linear system for Newton iterations are shown in

Fig (7.16) and Fig (7.17) for Mesh 3 and Mesh 5 cases. The condition number is shown

as a function of drop in residual. The Res0 or reference residual is the initial non-linear

residual of the corresponding mesh computed based on the far field flow condition. Therefore

the ratio of the non-linear residual at the end of each Newton iteration to Res0, reflects

the relative convergence after each Newton iteration. The first iteration condition number

shows the conditioning of the linear system formed based on the solution linearization at

the end of the start-up phase. The rest of the reported condition numbers are associated

to the linear system formed based on the solution update at the end of successive Newton

iterations. While the 2nd-order discretization graph initially has larger condition number,


Log(L2(Res0)/L2(Res))

Cond

ition

No.

0 2 4 6 8 10

20

40

60

80

100

120

140

160 2nd-Order/Mesh33rd-Order/Mesh34th-Order/Mesh3

Figure 7.16: Condition No. of the preconditioned system, Mesh 3, M = 0.63, α = 20

Log(L2(Res0)/L2(Res))

Cond

ition

No.

0 2 4 6 80

1000

2000

3000


Figure 7.17: Condition No. of the preconditioned system, Mesh 5, M = 0.63, α = 20


Log (L2(Res0)/L2(Res))

CL

0 2 4 6 8 100.317

0.318

0.319

0.32

0.321

0.322

0.3232nd-Order/Newton Itr./Mesh13rd-Order/Newton itr./Mesh14th-Order/Newton Itr./Mesh1

Figure 7.18: Lift coefficient convergence history, NACA 0012, Mesh 1, M = 0.63, α = 20

the condition number decreases gradually with convergence. In higher-order cases, initially,

the condition number is smaller compare to the 2nd-order case, but it starts to grow with

Newton iterations. In conclusion, the over all performance of the flow solver depends not

only on the conditioning of the linear system but also it relies on how well the non-linear

problem is represented by the linearization at each iteration. For higher-order discretizations

the quality of the linearization is not as good as it is for the 2nd-order discretization and

more outer or Newton iterations are needed for full convergence. It should be mentioned

that the reported condition numbers are relatively small (since the size of the meshes are

not very large). Generally speaking, the resultant preconditioned linear systems are well

conditioned and the condition numbers are reported just for monitoring the solution process.

Lift and drag coefficient convergence histories as a function of drop in residual are shown

in Fig (7.18) to Fig (7.21) for coarse and fine meshes.

In the case of coarse mesh, almost 6 orders drop in residual seems to be necessary for

lift and drag convergence. The lift and drag convergence are nearly achieved after 4-5

orders drop in residual for higher-order discretizations. In the case of the fine mesh all the

coefficients are completely converged after 4 orders drop in residual. As a practical matter,

this is an important point since the computation cost can be noticeably reduced by making



CD

-4 -2 0 2 4 6 8 10

0.0008

0.001

0.0012

0.0014

0.0016

0.00180.002

0.00220.00240.00260.0028 2nd-Order/Newton Itr./Mesh1

3rd-Order/Newton itr./Mesh14th-Order/Newton Itr./Mesh1

Figure 7.19: Drag coefficient convergence history, NACA 0012, Mesh 1, M = 0.63, α = 20

convergence criteria less strict, and terminating the solution process after fewer Newton

iterations for the fine mesh where the computing cost increases considerably. Following

this procedure, the higher-order (especially the 4th-order) discretization would be greatly

benefited, as the unnecessary computing cost to reduce the residual after a certain value is

avoided.

Lift and drag coefficients for all meshes and discretization orders are tabulated in Table 7.3.

In the finest mesh case, lift coefficients up to third decimal place and the drag coefficients

up to the 4th decimal place have converged to the same value. However the drag coefficient

still is far from zero. This is mainly due to lack of pressure recovery at the trailing edge

which is a singular point in potential flow. The 3rd-order drag coefficient is consistently

larger than the 2nd-order drag coefficient, which is surprising. However the 4th-order drag

is consistently the smallest, as was expected. Without drawing a general conclusion about

the 3rd-order drag, it is fair to say that behavior of the discretization order at the trailing

edge of the airfoil is a determining factor, and apparently for such a region discretization

methods with even orders of accuracy performed better than a discretization method with

an odd-order of accuracy possibly due to cancellation effect.

All of the test cases so far are performed with a far field size of 25 chords. To study the



CL

-1 0 1 2 3 4 5 60.28

0.29

0.3

0.31

0.32


Figure 7.20: Lift coefficient convergence history NACA 0012, Mesh 5, M = 0.63, α = 20


CD

0 2 4 6

0.0004

0.0006

0.0008

0.001

0.0012


Figure 7.21: Drag coefficient convergence history, NACA 0012, Mesh 5, M = 0.63, α = 20


Test Case Lift coefficient Drag coefficient

Mesh1

2nd 0.322318 0.00097664

3rd 0.317393 0.00184889

4th 0.322223 0.00072576

Mesh2

2nd 0.322302 0.00057225

3rd 0.321846 0.00103438

4th 0.322448 0.00040369

Mesh3

2nd 0.324905 0.00040197

3rd 0.325214 0.00049820

4th 0.325588 0.00034757

Mesh4

2nd 0.325159 0.000336973

3rd 0.324740 0.00037595

4th 0.325323 0.00032525

Mesh5

2nd 0.324801 0.00032136

3rd 0.324568 0.00032711

4th 0.324474 0.00030828

Table 7.3: Lift and drag coefficients for all meshes and discretization orders, NACA 0012,M = 0.63, α = 20, far field size of 25 chords


Test Case/Mesh Size /Far field distance Lift coefficient Drag coefficient

Mesh 4-1/10604 CVs/128 Chords

2nd 0.332506 0.00010622

3rd 0.332882 0.00012667

4th 0.332699 7.09651e-5


2nd 0.333575 4.88588e-5

3rd 0.333530 5.98949e-5

4th 0.334306 4.04234e-5


2nd 0.332908 1.82357e-5

3rd 0.333715 2.35558e-5

4th 0.333374 5.66076e-6

Table 7.4: Effect of the far field distance on lift and drag coefficients, NACA 0012, M =0.63, α = 20

effect of the far field size on lift and drag, and to show that computing smaller drag is

possible with the same mesh resolution, the far field size is increased considerably while the

mesh resolution, the number of points on the airfoil and refining factors remained the same.

In other words, the new meshes are equivalent to the meshes of the previous test cases up

to 25 chords. The results are tabulated in Table 7.4. In all cases the drag coefficient has

been reduced dramatically by increasing the far field distance. For instance, the 4th-order

computed drag over Mesh 5-2 has been reduced by more than 50 times comapred to the

4th-order computed drag over Mesh 5. The lift coefficient is also affected by the far field

distance (i.e. in the most cases the lift coefficient was slightly increased by extending the

far field size). It should be mentioned that in this research no far field correction is used,

and higher-order far field correction was beyond of the scope of the thesis research.

Reference [21] reports CL of 0.3289 and CD of 0.0004 for a 2nd-order computation of a

similar flow case over an adaptively refined Cartesian mesh with the total of 10694 CVs

(494 body nodes) and 128 chords far field distance, which in terms of the mesh and far field

size is closely comparable to the Mesh 4-1 test case.

Mach contours for the Mesh 3 (4958 CVs) are displayed in Fig (7.23) to Fig (7.25), where the

corresponding mesh is shown in Fig (7.22). Flow field features such as the stagnation region

just below the leading edge, flow acceleration near the leading edge (where the maximum

Mach number occurs), and flow deceleration near the trailing edge are visualized. Once


X

Y

-0.2 0 0.2 0.4 0.6 0.8 1

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 7.22: NACA 0012, Mesh 3 (4958 CVs)

again where the mesh is coarse, higher-order contours are visibly smoother than the 2nd-

order contours showing the improved quality of the flow computation over these regions.

The Mach profile (computed at Gauss points) along upper side of the airfoil for Mesh 1

(coarsest mesh, Fig (7.28)) are shown in Fig (7.26) and Fig (7.27). Since there is a sudden

change in the area of the control volumes over the surface of the airfoil in acceleration

region, which is often the case for coarse meshes, a jump is observed in Mach profile at

those locations. However, higher-order solution has reduced the amplitude of this jump so

that in the 4th-order case, the jump has disappeared. Also, the Mach profile over the lower

surface of the airfoil is shown in Fig (7.29), where the 2nd-order Mach profile is slightly

lower than its higher-order counterparts.

Fig (7.30) to Fig (7.32) display the Mach profile of upper/lower sides of all discretization

orders over the finest mesh, Mesh 5. As expected, all orders of accuracy result in essentially

the same Mach distribution. The close up over the acceleration region demonstrates that

in locations where the flow experiences large changes, linear reconstruction or 2nd-order

discretization does not result in as smooth a solution as high-order discretizations.

The error distribution in total pressure, 1−Pt/Pt∞ , along the chord for Mesh 1 and Mesh 5


X

Y

-0.2 0 0.2 0.4 0.6 0.8 1

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.920.860.800.740.680.610.550.490.430.370.310.250.180.120.06

Mach

Figure 7.23: 2nd-order Mach contours for NACA 0012 airfoil, Mesh 3, M = 0.63, α = 20

X

Y

-0.2 0 0.2 0.4 0.6 0.8 1

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.920.860.800.740.680.610.550.490.430.370.310.250.180.120.06

Mach

Figure 7.24: 3rd-order Mach contours for NACA 0012 airfoil, Mesh 3, M = 0.63, α = 20


X

Y

-0.2 0 0.2 0.4 0.6 0.8 1

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.920.860.800.740.680.610.550.490.430.370.310.250.180.120.06

Mach

Figure 7.25: 4th-order Mach contours for NACA 0012 airfoil, Mesh 3, M = 0.63, α = 20

X/C

Mac

h

0 0.2 0.4 0.6 0.8 10.2

0.4

0.6

0.8

1 2nd-Order/Mesh13rd-Order/Mesh14th-Order/Mesh14th-Order/Mesh5

Figure 7.26: Mach profile, upper side, NACA 0012 airfoil, Mesh 1, M = 0.63, α = 20


X/C

Mac

h

0.05 0.1 0.15 0.2

0.85

0.9

0.95


Figure 7.27: Mach profile close up, upper side, NACA 0012 airfoil, Mesh 1, M = 0.63, α = 20

X

Y

0 0.02 0.04 0.06

-0.02

0

0.02

0.04

SL

Figure 7.28: Mesh 1, close-up at the leading edge region


X/C

Mac

h

0 0.2 0.4 0.6 0.8 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7


Figure 7.29: Mach profile, lower side, NACA 0012 airfoil, Mesh 1, M = 0.63, α = 20

X/C

Mac

h

0 0.2 0.4 0.6 0.8 10.2

0.4

0.6

0.8


Figure 7.30: Mach profile, upper side, NACA 0012 airfoil, Mesh 5, M = 0.63, α = 20


X/C

Mac

h

0.04 0.06 0.08 0.1

0.94

0.95

0.96

0.97

0.98

0.99 2nd-Order/Mesh53rd-Order/Mesh54th-Order/Mesh5

Figure 7.31: Mach profile close up, upper side, NACA 0012 airfoil, Mesh 5, M = 0.63, α = 20

X/C

Mac

h

0 0.2 0.4 0.6 0.8 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

2nd-Order/Mesh53rd-order/Mesh54th-Order/Mesh5

Figure 7.32: Mach profile, lower side, NACA 0012 airfoil, Mesh 5, M = 0.63, α = 20


X/C

1-Pt

/Pt

0 0.2 0.4 0.6 0.8 1

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.042nd-Order/Mesh13rd-Order/Mesh14th-Order/Mesh1

∞

Figure 7.33: 1 − Pt

Pt∞, upper side, NACA 0012 airfoil, Mesh 1, M = 0.63, α = 20

X/C

1-Pt

/Pt

0 0.2 0.4 0.6 0.8 1

-0.01

-0.005

0

0.005

0.01

0.015


∞


Pt∞, lower side, NACA 0012 airfoil, Mesh 1, M = 0.63, α = 20


X/C

1-Pt

/Pt

0 0.2 0.4 0.6 0.8 1

-0.004

-0.002

0

0.002

0.0042nd-Order/Mesh53rd-Order/Mesh54th-Order/Mesh5

∞


Pt∞, upper side, NACA 0012 airfoil, Mesh 5, M = 0.63, α = 20

X/C

1-Pt

/Pt

0 0.2 0.4 0.6 0.8 1

-0.004

-0.002

0

0.002

0.0042nd-Order/Mesh53rd-order/Mesh54th-Order/Mesh5

∞


Pt∞, lower side, NACA 0012 airfoil, Mesh 5, M = 0.63, α = 20


are shown in Fig (7.33) to Fig (7.36). In all cases there are sudden spikes at the trailing edge,

and although the error amplitude over the fine mesh is noticeably decreased, the trailing

edge remains one of the main source of total pressure error. Another source of error is the

leading edge area which covers both the stagnation and acceleration regions. The 4th-order

solution for all cases has produced smaller entropy all over the airfoil compared to the 2nd

and 3rd-order discretizations, as its stagnation pressure error is nearly zero, (except the

above mentioned spikes at the trailing edge) including the leading edge region.


7.2 Transonic Airfoil, NACA 0012, M = 0.8, α = 1.250

For a transonic flow, in general, it is more difficult to get fast convergence. This is because

of the mixed subsonic/supersonic nature of the flow and the existence of discontinuities

(shocks) in the solution. The methodology for handling of discontinuity can increase the

complexity of the problem. This is true especially for implicit schemes, where there could

be a large change in the solution update in each iteration and limiter values could have large

oscillations. In the case of the matrix-free approach in which matrix-vector multiplication

is computed through flux perturbation, any oscillatory behavior in limiter could severely

degrade the solution convergence. All these facts amount to increasing the complexity and

difficulty in solving transonic flows.

The transonic flow around NACA 0012 at M = 0.8, α = 1.250 is studied. The Mesh 3

with 4958 CVs, Fig (7.3), of section 7.1 is employed for this test case. Flow is solved for

all orders of accuracy using Venkatakrishnan limiter with proper higher-order modification,

section 3.4.2. For the 2nd and 3rd-order cases K = 10 is used in the limiter, and for

the 4th-order discretization K = 1 is employed. The limiter values are allowed to change

through all iterations and no freezing is considered. The tolerance of solving the linear

system, like previous test cases, for the start-up phase is 5×10−2 and for the Newton phase

is 1 × 10−2. For all test cases a subspace of 30 has been set and no restart is allowed.

The preconditioning for the start-up pre-iterations is performed using the approximate

analytical Jacobian matrix with ILU-1 factorization and for the Newton iteration finite

difference Jacobian matrix with ILU-4 factorization is applied. The Newton iteration is

matrix-free and ε = ε0‖z‖ with ε0 = 1 × 10−7 is used for directional differencing. The initial

condition is free stream flow. Convergence is reached when the L2 norm of the non-linear

density residual falls below 1 × 10−12 .

For transonic flow before switching to Newton iteration, the shock locations in the flow field

and their strengths need to be captured relatively accurately; otherwise Newton iterations

will not decrease the residual of the non-linear problem effectively. This normally is achieved

by reducing the non-linear residual by some orders, typically 1.5-2, respect to the initial

residual in the course of the start-up process.

Multiple implicit pre-iterations are performed in the form of defect correction, before switch-

ing to Newton iteration. For the 2nd and 3rd-order start-up phases, pre-iterations in the

form of defect correction continue until the residual of the non-linear problem drops 1.5

order below the residual of the initial condition. In defect correction phase the starting

CFL number is 2 and it is increased gradually to 200 after 50 iterations. The CFL is not


X/C

Mac

h

0 0.2 0.4 0.6 0.8 1

0

0.5

1

1.5

2nd-Order3rd-Order4th-OrderAGARD

Figure 7.37: Mach profile at the end of start-up process, M = 0.8, α = 1.250

increased above the value of 200 as increasing CFL would not help that much when the

linearization is inaccurate, as in the start-up phase. 69 and 81 pre-iterations were required

in the start-up phase to reduce the residual by 1.5 order for the 2nd and 3rd-order dis-

cretizations respectively. In the Newton phase, the 2nd and 3rd-order cases are followed

with infinite time step. The start-up phase for the 4th-order discretization, includes 200

pre-iterations with similar CFL trend. Although the target residual reduction of 1.5 or-

ders was not achieved through pre-iterations, the solution after 200 iterations was good

enough for starting the Newton phase. Fig (7.37) displays the Mach profile at the end of

the start-up phase for all orders of accuracy. There are some oscillations especially along

the upper surface Mach profile close to the shock location however, the shock locations and

their strengths are captured reasonably accurately before switching to Newton iteration.

For the 4th-order transonic case, using an infinite time step does not help the convergence

acceleration and causes inaccurate linearization and limiter oscillation. Since this leads to

slow convergence, CFL=10,000 has been set for Newton phase.

The convergence summary is shown in the Table 7.5. Similar to the subsonic case, the

number of Newton iterations for the 4th-order discretizations is twice as large as for the

2nd and 3rd-order discretizations. The major cost of the solution in all cases is the start-up



2nd 197 65.6 279 4 91

3rd 241 106.7 281 5 119

4th 450 311.4 590 10 221

Table 7.5: Convergence summary for NACA 0012 airfoil, M = 0.8, α = 1.250

CPU time (Sec)

L2

50 100 150 200 250 300

10-12

10-10

10-8

10-6

10-4

10-2 2nd-Order/Newton Itr.3rd-Order/Newton Itr.4th-Order/Newton Itr.

Figure 7.38: Convergence history for NACA 0012, M = 0.8, α = 1.250

cost. Therefore there is a reasonable potential to reduce the total solution work dramatically

by employing an optimized start-up technique which is able to capture shocks and establish

the major flow features efficiently. Figure (7.38) demonstrates the convergence history

graph. This time, reduction in convergence slope for the 4th-order case is not only due to

poor convergence of the solution of the linear system in each Newton iteration, but also to

the limiter firing which changes the linearization in each iteration.

Again for a similar transonic case, the total computation cost in terms of work units

for a very fast Newton-Krylov algorithm (2nd-order/artificial dissipation) for unstructured

meshes [45] is just under 400 showing that the the performance of the developed Newton

Krylov in this research is very competitive.

Table 7.6 summarizes the lift and drag coefficients for all discretization orders which are


Test case CL CD

2nd 0.337593 0.0220572

3rd 0.339392 0.0222634

4th 0.345111 0.0224720

AGARD-211[2]/Structured (192×39) 0.3474 0.0221

Table 7.6: Lift and drag coefficients, NACA 0012, M = 0.8, α = 1.250

X

Y

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1.51.41.31.21.110.90.80.70.60.50.40.30.20.1

Mach

Figure 7.39: 2nd-order Mach contours, NACA 0012, M = 0.8, α = 1.250


X

Y

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1.41.31.21.110.90.80.70.60.50.40.30.20.1

Mach

Figure 7.40: 3rd-order Mach contours, NACA 0012, M = 0.8, α = 1.250

X

Y

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1.41.31.21.110.90.80.70.60.50.40.30.20.1

Mach

Figure 7.41: 4th-order Mach contours, NACA 0012, M = 0.8, α = 1.250


X

Y

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.950.90.850.80.750.70.650.60.550.50.450.40.350.30.250.20.150.10.05

φ

Figure 7.42: limiter φ (3rd-order), NACA 0012, M = 0.8, α = 1.250

in good agreement with the reference data [2]. Both CL and CD in transonic flow are

essentially influenced by the shock location and its strength, and to have a good prediction

for theses coefficients, it is crucial to capture the shocks accurately.

Figure (7.39) to Fig(7.41) display banded Mach contours for all orders of discretizations.

The weak shock at lower surface and the strong shock at upper surface of the airfoil are quite

visible. The presence of the discontinuity and firing of the limiter in shock regions results in

non-smooth contour lines in the shock vicinity (mostly for coarser control volumes). This

non-smoothness widens with the discretization order due to increase in the reconstruction

stencil size.

In addition to hampering convergence, using the limiter has the disadvantage of reducing

the accuracy of the solution due to altering the gradients and higher-order terms. The

behavior of the limiter values φ and σ is shown in Fig (7.42) and Fig (7.43). Some isolated

limiter firing is observed in the leading and trailing edge regions, but those are limited to

a small number of control volumes, and generally the value is close to one. The limiter is

mainly active in the strong upper shock region, while it remains inactive at the weak lower

shock. Note that wherever the gradient limiter, φ, is fired aggressively the higher-order

terms limiter, σ, is nearly zero.


X

Y

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.950.90.850.80.750.70.650.60.550.50.450.40.350.30.250.20.150.10.05

σ

Figure 7.43: limiter σ (3rd-order), NACA 0012, M = 0.8, α = 1.250

X/C

Mac

h

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

1.2

1.4

2nd3rd4thAGARD

Figure 7.44: Mach profile, NACA 0012, M = 0.8, α = 1.250


X/C

Mac

h

0.2 0.4 0.6 0.8

0.8

1

1.2

1.42nd3rd4thAGARD

Figure 7.45: Mach profile in shock regions, NACA 0012, M = 0.8, α = 1.250

The Mach profile along the surface is shown in Fig (7.44) and Fig (7.45). Both the location

and strength of the shocks are in good agreement with the AGARD data [2]. The 2nd-order

discretization has some overshoot right before the upper shock, which is also visible in the

related Mach contours. It appears that the 3rd-order discretization produces less noise

in this shock capturing case, and this is probably because of the quadratic reconstruction

characteristic. However, the cubic reconstruction associated to the 4th-order discretization

shows some oscillations inside the shock region; this behavior is typical in approximating a

(sharp) discontinuity by a higher-order polynomial.


7.3 Supersonic flow, Diamond airfoil , M = 2.0, α = 0.0

Supersonic flow field around the diamond airfoil has been computed using all discretization

orders at far field Mach number of 2.0 and zero angle of attack. The diamond airfoil has a

thickness of 15% and is located in a 7×14 rectangular domain. The total number of control

volumes in the mesh is 7771 with 80 control volumes along the chord. Appropriate mesh

refinement at the leading and trailing edges (shock locations) and also the apexes of the

airfoil (expansion locations) has been carried out to enhance the quality of the solution

capturing in these high gradient regions, Fig (7.46).

The interesting parts of the flow field are the discontinuities (shocks) and very sharp gradi-

ents (expansion fans). Because these features would result in extensive limiter firing which

would adversely affect the global accuracy, no limiter has been employed in the test case.

Therefore, the resultant solution will not be monotone and there will be considerable oscil-

lations around shocks and expansion fans. The tolerance in solving the linear system for

the start-up phase is 5 × 10−2 and for the Newton phase is 1 × 10−2. For all test cases a

subspace of 30 has been set and no restart is allowed. The preconditioning for the start-up

pre-iterations is performed using the approximate analytical Jacobian matrix with ILU-1

factorization and for the Newton iteration finite difference Jacobian matrix with ILU-4 fac-

torization is applied. The Newton iteration is matrix-free and ε = ε0‖z‖ with ε0 = 1 × 10−7

is used for directional differencing. The initial condition is free stream flow. Convergence

is reached when the L2 norm of the non-linear density residual falls below 1 × 10−12 .

In the current test case major physical flow characteristics, including both shocks and

expansion fans need to be captured approximately before switching to Newton iterations.

Otherwise due to highly non-linear and discontinuous regions in the supersonic flow field

convergence stalls shortly after the starting point. Therefore 30 pre-iterations in the form of

defect correction have been performed before switching to the Newton phase. The starting

CFL is 5 and it is increased to 50 after 10 iterations. For the start-up of the 4th-order

discretization case, the 2nd-order defect correction is used to reduce the start-up cost. That

improves the robustness of the start-up too since the 4th-order defect correction start-up for

such a flow (without limiter) could be unstable. The pressure coefficient along the chord at

the end of the start-up phase is shown in Fig (7.47) demonstrating the effectiveness of the

defect correction start-up for supersonic flow and the fact that the shocks and expansion

fans are fairly accurately established at the end of the start-up process.

After starting the Newton phase full convergence is achieved rapidly for all orders of dis-

cretizations. Figure (7.48) shows the convergence history for the diamond airfoil displaying


X

Y

0 0.5 1 1.5

0.5

1

1.5

(a) Diamond airfoil

X

Y

0.24 0.26 0.28 0.3

0.98

1

1.02

(b) Nose section

X

Y

0.6 0.7 0.8 0.9

0.9

1

1.1

(c) Mid section

Figure 7.46: Mesh 7771 CVs , diamond airfoil, M = 2.0, α = 0.0


X/C

Cp

0.2 0.4 0.6 0.8-0.2

-0.1

0

0.1

0.2

0.3

2nd-Order Start up (used for 2nd/4th-Order)3rd-Order Start upExact Solution

Figure 7.47: Cp at the end of start-up process, diamond airfoil, M = 2.0, α = 0.0

the fast convergence. Notice that the residual still is large right before the first Newton

iteration, but by that point the main structure of the flow field has been formed and the

residual has been dropped quickly after Newton iterations. In fact the high Mach num-

ber helps to damp all errors including low frequency errors quite effectively as soon as the

shocks and expansion fans are established over the airfoil since first, the wave propagation

speed is quite fast and second, all waves travel toward outlet without reflecting from bound-

aries. Table (7.7) provides the convergence summary for all discretization orders. For the

2nd-order case the major part of the total work is spent in the start-up phase while for the

4h-order case as expected, from previous test cases, most of the time is spent on the Newton

iterations. This is especially the case here since the 2nd-order start-up is employed and the

4th-order start-up is avoided (like the subsonic airfoil test case) reducing the relative cost

of the start-up phase for the 4th-order solution. For the 3rd-order case, the costs of the

start-up and Newton phases are split evenly.

The Mach contours for all discretization orders are shown in Fig (7.49) to Fig (7.51). All

the steady-state flow features, including symmetric shocks right at the sharp leading edge,

expansion fans at the lower and upper apexes, and symmetric fish tail shocks at the sharp

trailing edge of the diamond airfoil are clearly captured.


CPU time (Sec)

L2(R

es)

50 100

10-13

10-11

10-9

10-7

10-5

10-3

2nd-Order/Newton Itr.3rd-Order/Newton Itr.4th-Order/Newton Itr.

Figure 7.48: Convergence history for diamond airfoil, M = 2.0, α = 00


2nd 61 27.85 236.0 2 50.0

3rd 126 48.62 243.1 3 120.8

4th 254 133.57 301.5 7 250.0

Table 7.7: Convergence summary for diamond airfoil, M = 2.0, α = 00

Test case CD

2nd 0.0524258

3rd 0.0524661

4th 0.0524671

Exact 0.0524663

Table 7.8: Drag coefficient, diamond airfoil, M = 2.0, α = 00


X

Y

0.5 1 1.5

0.5

1

1.52.452.42.352.32.252.22.152.12.0521.951.91.851.81.751.71.651.61.55

Mach

Figure 7.49: 2nd-order Mach contours, diamond airfoil, M = 2.0, α = 0.0

X

Y

0.5 1 1.5

0.5

1

1.52.452.42.352.32.252.22.152.12.0521.951.91.851.81.751.71.651.61.55

Mach

Figure 7.50: 3rd-order Mach contours, diamond airfoil, M = 2.0, α = 0.0


X

Y

0.5 1 1.5

0.5

1

1.52.452.42.352.32.252.22.152.12.0521.951.91.851.81.751.71.651.61.55

Mach

Figure 7.51: 4th-order Mach contours, diamond airfoil, M = 2.0, α = 0.0

X/C

Cp

0.2 0.4 0.6 0.8-0.2

-0.1

0

0.1

0.2

0.3

2nd-OrderExact

Figure 7.52: 2nd-order Cp, diamond airfoil, M = 2.0, α = 0.0


X/C

Cp

0.2 0.4 0.6 0.8-0.2

-0.1

0

0.1

0.2

0.3

3rd-OrderExact

Figure 7.53: 3rd-order Cp, diamond airfoil, M = 2.0, α = 0.0

X/C

Cp

0.2 0.4 0.6 0.8-0.2

-0.1

0

0.1

0.2

0.3

4th-OrderExact

Figure 7.54: 4th-order Cp, diamond airfoil, M = 2.0, α = 0.0


Figure (7.52) to Fig (7.54) display the pressure coefficient along the chord of the diamond

airfoil for all orders of discretization. In all cases there are over and/or undershoots at

the leading and trailing edges of the airfoil (where the shocks are located) and at the mid-

dle section of the diamond (where the expansion fans are formed). Naturally, for higher

discretization order, the over and undershoots are worse. The 4th-order Cp shows some

oscillatory behavior in the front region, located between two large gradient points of the

diamond airfoil. This is due to the continuing effect of the over and undershoots which are

not fully damped at that region due to very large reconstruction stencil and the character-

istic of the cubic polynomial. It is worth mentioning that the higher-order reconstruction

procedure used in this research has some compactness issues for such a supersonic flow

since the reconstruction routine uses data from regions in the flow field which are physically

independent. That is, downstream data is included in the reconstruction. Despite all this,

there is an overall good agreement between the exact solution and the numerical solution

over the supersonic diamond airfoil.

The computed drag coefficient for all orders of discretization is tabulated in Table 7.8. Again

there is a very good match between the numerical and exact solution here. For the inviscid

supersonic diamond airfoil case, due to pre-determined shock and expansion fan locations,

the drag coefficient is solely a function of the shocks strength and acceleration of the flow

at the expansion apexes. This is why the computed drag coefficient is still fairly accurate

despite all these over and undershoots, as their mean values remain very close to zero.

Chapter 8

Concluding Remarks

8.1 Summary and Contributions

A fast implicit finite volume flow solver for higher-order unstructured (cell-centered) steady-

state computation of inviscid compressible flows (the Euler equations) has been developed.

The matrix-free GMRES algorithm is used as a solver for the linear system arising from the

discretized equations, avoiding explicit computation of the higher-order Jacobian matrix.

The solution process has been divided into two separate phases, i.e. a start-up phase and a

Newton phase.

A defect correction procedure is proposed for the start-up phase consisting of multiple

implicit pre-iterations. The approximate first-order analytical Jacobian is used both for

constructing and preconditioning the linear system on the left hand side, reducing the

complexity and cost of the pre-iterations. A low fill-level incomplete factorization, ILU(1),

is used for the right preconditioning of the low order linear system associated with the defect

correction iterations. The GMRES algorithm in its standard form is used for solving the

resultant linear system. Knowing that the linearization is not very accurate at this stage,

the CFL number is kept low (on the order of 102) and a moderate tolerance was set for

solving the linear system. The defect correction procedure was very effective in reaching an

approximate solution which includes most of the physical characteristics of the steady-state

flow.

Having an approximate solution at the end of the start-up phase where the linearization of

the flow field is accurate enough for steady-state solution, the solution process is switched to

the Newton phase taking an infinite time step. In this phase, a first-order finite difference

169

CHAPTER 8. CONCLUDING REMARKS 170

Jacobian is used for preconditioning of the higher-order linear system in the matrix-free

context. A high fill-level incomplete factorization, ILU(4), is applied for the right precondi-

tioning of the Newton-GMRES iterations to take advantage of the best possible precondi-

tioning of the employed approximate Jacobian1. Semi-quadratic or superlinear convergence

has been achieved within a few Newton iterations for most of the cases.

A modified version of Venkatakrishnan limiter [87] is used to enforce loose monotonicity

for transonic flows. The higher-order terms in the reconstruction polynomial were dropped

wherever the limiter fired firmly in the flow field. This has been performed through a dif-

ferentiable switch, σ, which is a function of the limiter value φ itself. Also, a large constant

value K in Venkatakrishnan limiter (typically K=10) is chosen for the limiting which some-

what sacrifices the monotonicity in the favor of rapid convergence. The over all proposed

limiting procedure insures fast convergence while reducing the accuracy disadvantages of

limiter application.

The effect of choosing ε in the accuracy of the computed matrix-vector products using

directional derivative technique for the matrix-free GMRES was analyzed throughly. It was

shown that for a fine mesh it is necessary to choose a relatively large ε to properly account

for the higher-order terms.

The issue of mesh refinement in accuracy measurement for unstructured meshes is revis-

ited. A straightforward methodology is applied for accuracy assessment of the higher-order

unstructured approach based on total pressure loss, drag measurement, and solution error

calculation.

The effect of the discretization order, mesh size and far field distance were studied com-

pletely and a careful measurement of the solver performance was provided. The accuracy,

fast convergence and robustness of the proposed higher-order unstructured Newton-Krylov

solver for different speed regimes were shown via several test cases.

Solutions of different orders of accuracy are compared in detail through several investiga-

tions. The possibility of reducing computational cost required for a given level of accuracy

using high-order discretization is demonstrated.

1For such a Newton-GMRES solver, it is shown that ILU(4) has an equal performance with full factoriza-tion, LU, in terms of outer iterations where the ILU(4) factorization cost both in terms of time and memoryusage remains fairly small compared to LU factorization [52].


8.2 Conclusions

While, in general, computing cost remains one of the main concerns for the higher-order com-

putation of fluid flow problems, the current research proves that reaching fast convergence—

and a reasonable computing cost—for higher-order unstructured solution is indeed possible.

Results for the implicit flow solution algorithm described in this thesis show that the 2nd

and 3rd-order solutions both display semi-quadratic or superlinear convergence for all test

cases if started from a good initial guess or approximate solution. The 4th-order solution

still converges quickly although it is slower than the other discretization orders requiring

nearly two times the number of outer iterations as the 2nd-order solution started from a

similar approximate flow solution.

The employed start-up procedure is very effective in providing a good approximate solution

as an initial condition for Newton phase with a reasonable cost. For subsonic or supersonic

cases, a limited number of pre-iterations is sufficient to reach a good initial solution. For the

transonic case, however, the number of pre-iterations increases since some residual reduction

target must be met before switching to Newton iterations.

The proposed start-up technique takes a considerable part of the solution time for the 2nd

and 3rd-order solutions. As a result, reduction of the start-up cost would significantly

improve the over all performance of the flow solver. Finding a reasonably good initial

solution quickly is something of an art rather than an exact science. If a pre-computed

solution is available then the cost of the start-up phase can be ignored and in practice the

total solution cost would be equal to the Newton phase cost. The first-order solution, full

potential solution, exact solution of the approximated problem or an empirical solution can

be picked as an initial solution for the Newton phase depending on the availability and the

problem case.

Preconditioning and the accuracy of the linear system solution is a vital factor in the

Newton phase of the Newton-GMRES solver. Evidence suggests that the number of outer

Newton iterations in the 4th-order case can be reduced to the level of the 2nd and 3rd-order

methods provided that the linear system resulting from the 4th-order discretization is solved

completely [53]. Given the current preconditioning strategy, this requires a considerable

increase (by a factor of 4-5) in the 4th-order residual evaluations in the matrix-free GMRES

algorithm. Therefore it is preferable to keep the subspace size limited to a moderate number

(i.e. 30) and bear the cost of increasing the outer iteration numbers by a factor of two.

The eigenvalue spectrum of the preconditioned system shows that the higher-order system is

shifted toward singularity, increasing the condition number of the linear system. Therefore


more effort is needed to solve the linear system resulting from the higher-order discretization,

especially in the vicinity of the steady-state solution.

For transonic flow, like smooth flows and despite using a limiter and the existence of normal

shocks, superlinear convergence for the 2nd and 3rd-order discretizations and fast conver-

gence for the 4th-order discretization have been obtained in the Newton phase. The applied

limiter, which was active mainly in the strong shock, did not have a strong effect on deteri-

oration of the convergence rate, and the employed continuous switch performed as designed

to remove higher-order terms near shocks.

Using an efficient start-up technique, a good preconditioner matrix, and an effective pre-

conditioning strategy are the key issues for the robustness and fast convergence of the

Newton-GMRES solver. The performance of the current flow solver in terms of CPU time

scales linearly with the mesh size for all orders of discretization. As an overall performance

assessment (including the start-up phase), the 3rd-order solution is about 1.3 to 1.5 times,

and the 4th-order solution is about 3.5-5 times, more expensive than the 2nd-order solution

with the developed solver technology. It should be mentioned that the current 2nd-order

solver algorithm, even without multigrid augmentation, is very competitive with the most

efficient reported results for unstructured meshes in terms of the normalized total solu-

tion time (work units). The 3rd and 4th-order schemes are comparable in efficiency to the

2nd-order scheme as measured by residual evaluations.

Mesh refinement is the critical issue in accuracy verification, and reducing the length scale

uniformly for irregular unstructured grids is not an easy task. Consequently in the accu-

racy assessment process the generated mesh levels must be examined closely especially at

boundaries. Boundaries in general and the solid wall boundary in particular are the main

source of the solution error.

Qualitative/quantitative comparisons of the fluid flow solutions for different discretization

orders over several meshes show that the quality of the solution for smooth flows is dramat-

ically improved using high-order discretization technique, especially in the case of coarse

grids. This suggests that depending on the error measurement criteria, significant effi-

ciency/accuracy gains for high-order discretization are attainable.

Since the developed implicit algorithm is matrix-free, memory usage will not be an issue. In

the matrix-free case, memory usage is affected by the type of the preconditioning technique

which in this thesis for ILU(4) is a bit more than two times of the required memory for

storing the first-order Jacobian, Table 5.1.


8.3 Recommended Future Work

The proposed solver algorithm works quite efficiently and accurately for a variety of inviscid

compressible flows. However, there are a number of areas where this research could be either

improved or extended.

8.3.1 Start-up

In the case of very large meshes, it is not efficient to perform the start-up procedure from

scratch; it is more efficient if the computation procedure is started from a coarser level

mesh. Therefore for very large meshes augmenting the pre-iterations with a mesh sequencing

technique could be quite helpful. At the same time total solution cost due to increasing

mesh size in the start up can be dramatically reduced if a multi-grid procedure is employed.

8.3.2 Preconditioning

Applying a more accurate preconditioning matrix (that is, a better approximation to the

higher-order Jacobian) instead of the first-order Jacobian would increase the system solving

quality reducing the outer iteration numbers especially in the case of the 4th-order method.

Having said that the preconditioning cost should be kept reasonably low since the factor-

ization cost of the preconditioner matrix should be much smaller than solving the original

linear system. Adding a multi-grid scheme to the preconditioning strategy could also be

helpful in the case of large meshes.

8.3.3 Reconstruction

In general, limiters adversely affect the solution accuracy via manipulation of the recon-

struction polynomial. This issue becomes more important for higher-order reconstruction

if the limiter fires unnecessarily and extensively in the flow field. As a result, WENO

schemes, in which limiters are avoided through a careful weighting non-oscillatory recon-

struction procedure, may be an appropriate solution for higher-order reconstruction of non-

smooth flows. Furthermore, the benefit of applying higher-order reconstruction for high

Mach number/non-smooth flows, considering the natural oscillatory behavior of the recon-

struction polynomial in such flows, still is an open area of research.


8.3.4 Extension to 3D

Extension of the current higher-order 2D solver algorithm to a 3D version, considering the

availability of the 3D reconstruction procedure, is reasonably straight forward if an appro-

priate higher-order representation of a 3D mesh is available. Since the 2D algorithm is

developed for unstructured meshes, there is no limitation in such an extension, and the

performance of the flow solver in terms of the convergence should be improved due to intro-

ducing the third dimension (3D relieving effect). However, the matrix storage requirement

in 3D needs to be carefully studied since storing even the first-order Jacobian (due to the

large number of meshes in 3D) for preconditioning could be challenging. The incomplete

factorization fill-level in 3D is another issue, and it may not be possible to use high fill-level

ILU(P) for 3D cases.

8.3.5 Extension to Viscous Flows

The introduced implicit procedure could be implemented for viscous flow computation

provided that the viscous residual function evaluation and a proper anisotropic unstruc-

tured/hybrid mesh are available. However there would be some issues regarding the condi-

tioning of the linear system due to wide variety of geometrical and physical length scales,

which need to be addressed by proper preconditioning. In the case of turbulence modeling

the degree of coupling of the turbulence model in the implicit procedure would be another

issue.

Bibliography

[1] PETSc, The Portable Extensible Toolkit for Scientific Computation. Argonne National

Lab, http://www-unix.mcs.anl.gov/petsc/petsc-as/index.html.

[2] Test Cases for Inviscid Flow Fields. Technical Report AR-211, Advisory Group for

Aerospace Research and Development (AGARD), NATO., 1985.

[3] R. Abgrall. On Essentially Non-oscillatory Schemes on Unstructured Meshes: Analysis

and Implementation. Journal of Computational Physics, 114:45–58, 1994.

[4] M. Aftosmis, D. Gaitonde, and S. T. Tavares. Behavior of Linear Reconstruction

Techniques on Unstructured Meshes. AIAA Journal, 33(11):2038–2049, 1995.

[5] M. J. Aftosmis and M. J. Berger. Multilevel Error Estimation and Adaptive h-

Refinement for Cartesian Meshes With Embedded Boundaries. AIAA Paper 2002-0863,

40th AIAA Aerospace Sciences Meeting and Exhibit, 2002.

[6] R. K. Agarwal. A Compact High-Order Unstructured Grids Method for the Solution

of Euler Equations. International Journal of Numerical Methods in Fluids, 31:121–147,

1999.

[7] F.R. Bailey. High-End Computing Challenges in Aerospace Design and Engineering. In

Third International Conference on Computational Fluid Dynamics (Invited Lecture),

2004.

[8] H. E. Bailey and R. M. Beam. Newton’s Method Applied to Finite Difference Approx-

imations for Steady State Compressible Navier-Stokes Equations. Journal of Compu-

tational Physics, 93:108–127, 1991.

[9] T. J. Barth. Analysis of Implicit Local Linearization Techniques for Upwind and TVD

Algorithms. AIAA Paper 87-0595, 25th Aerospace Sciences Meeting and Exhibit, 1987.

175

BIBLIOGRAPHY 176

[10] T. J. Barth. Aspects of Unstructured Grids and Finite-Volume Solvers for the Euler

and Navier-Stokes Equations, in Unstructured Grid Methods for Advection-Dominated

Flows. Technical Report AGARD-R787, AGARD, 1992.

[11] T. J Barth. Recent Development in High Order K-Exact Reconstruction on Unstruc-

tured Meshes. AIAA Paper 93-0668, 31th Aerospace Sciences Meeting and Exhibit,

1993.

[12] T. J. Barth and P. O. Frederickson. Higher-Order Solution of the Euler Equations

on Unstructured Grids Using Quadratic Reconstruction. AIAA Paper 90-0013, 28th

Aerospace Sciences Meeting and Exhibit, 1990.

[13] T. J. Barth and D. C. Jespersen. The Design and Application of Upwind Schemes

on Unstructured Meshes. AIAA Paper 89-0366, 27th Aerospace Sciences Meeting and

Exhibit, 1989.

[14] T. J. Barth and S. Linton. An Unstructured Mesh Newton Solver for Compressible

Fluid Flow and its Parallel Implementation. AIAA Paper 95-0221, 33rd Aerospace

Sciences Meeting and Exhibit, 1995.

[15] F. Bassi and S. Rebay. High-Order Accurate Discontinuous Finite Element Solution of

the 2D Euler Equations. Journal of Computational Physics, 138:251–285, 1997.

[16] M. Benzi. Preconditioning Techniques for Large Linear Systems: A Survey. Journal

of Computational Physics, 182:418–477, 2002.

[17] K. S. Bey and J. T. Oden. A Runge-Kutta Local Projection P1-Discontinuous Galerkin

Finite Element Method for High Speed Flows. AIAA Paper 91-1575, 1991.

[18] M. Blanco and D. Zingg. A Fast Solver for the Euler Equations on Unstructured Grids

Using a Newton-GMRES Method. Number AIAA Paper 97-0331, 1997.

[19] C. Boivin and C. Ollivier-Gooch. Guaranteed-Quality Triangular Mesh Generation for

Domains with Curved Boundaries. International Journal for Numerical Methods in

Engineering, 55(10):1185–1213, 2002.

[20] G. Corliss, Ch. Faure, A. Griewank, L. Hascoet, and U. Naumann. Automatic Differ-

entiation of Algorithms From Simulation to Optimization. Springer, 2002.

[21] Darren L. De Zeeuw. A Quadtree-Based Adaptively-Refined Cartesian-Grid Algorithm

for Solution of the Euler Equations. PhD thesis, Aerospace Engineering and Scientific

Computing in the University of Michigan, 1993.

BIBLIOGRAPHY 177

[22] M. Delanaye. Polynomial Reconstruction Finite Volume Schemes for the Compressible

Euler and Navier-Stokes Equations on Unstructured Adaptative Grids. PhD thesis,

Universite de Liege, Faculte des Sciences Appliquees, 1998.

[23] M. Delanaye and J. A. Essers. Quadratic-Reconstruction Finite-Volume Scheme for

Compressible Flows on Unstructured Adaptive Grids. AIAA Journal, 35(4):631–639,

1997.

[24] M. Delanaye, Ph. Geuzaine, and J. A. Essers. Compressible Flows on Unstructured

Adaptive Grids. AIAA Paper 97-2120, 13th AIAA Computational Fluid Dynamics

Conference, 1997.

[25] R. S. Dembo, S. C. Eisenstat, and T. Steihaug. Inexact Newton Methods. SIAM

Journal on Numerical Analysis, 19(2):400–408, 1982.

[26] J. E. Dennis and R. B. Schnable. Numerical Methods for Unconstrained Optimizations

and Non Linear Equations. Prentice-Hall, 1983.

[27] J. Forrester, E. Tinoco, and Yu. Jong. Thirty years of development and application of

CFD at Boeing Commercial Airplanes. AIAA Paper 2003-3439, 2003.

[28] O. Friedrich. Weighted Essentially Non-Oscillatory Schemes for the Interpolation of

Mean Values on Unstructured Grids. Journal of Computational Physics, 144:194–212,

1998.

[29] P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. Elsevier Academic

Press, 1986.

[30] A. G. Godfrey, C. R. Mitchell, and R. W. Walters. Practical Aspects of Spatially

High Accurate Methods. AIAA Paper 92-0054, 30th Aerospace Sciences Meeting and

Exhibit, 1992.

[31] J. J. Gottlieb and C. P. T. Groth. Assessment of Riemann Solvers for Unsteady

One-Dimensional Inviscid Flows of Perfect Gases. Journal of Computational Physics,

78(2):437–458, 1988.

[32] W. D. Gropp, D. E. Keyes, L. C. Mcinnes, and M. D. Tidriri. Globalized Newton-

Krylov-Schwarz Algorithms and Software for Parallel Implicit CFD. Int. J. High Per-

formance Computing Applications, 14:102–136, 2000.

[33] A. Harten and S. Osher. Uniformly high-order accurate nonoscillatory schemes. SIAM

Journal of Numerical Analysis, 24, 1987.

BIBLIOGRAPHY 178

[34] A. Haselbacher. A WENO Reconstruction Algorithm for Unstructured Grids Based on

Explicit Stencil Construction. Number AIAA Paper 2005-879 in 43rd AIAA Aerospace

Sciences Meeting and Exhibit, 2005.

[35] C. Hirsch. Numerical Computation of Internal and External Flows, Computational

Methods for Inviscid and Viscous Flows, volume 2. John Wiley & Sons, 1990.

[36] A. Jameson. Success and Challenges in Computational Aerodynamics. AIAA Paper

87-1184, 8th Computational Fluid Dynamics Conference, 1987.

[37] A. Jameson and D. Mavriplis. Finite-Volume Solution of the Two-Dimensional Euler

Equations on a Regular Triangular Mesh. AIAA Journal, 24:611–618.

[38] A. J. Jameson, W. Schmidt, and E. Turkel. Numerical Solutions of the Euler Equations

by a Finite-Volume Method Using Runge-Kutta Time-Stepping Schemes. AIAA Paper

81-1259, 1981.

[39] D. B. Kim and P. D. Orkwis. Jacobian Update Strategies for Quadratic and Near-

Quadratic Convergence of Newton and Newton-Like Implicit Schemes. AIAA Paper

93-0878, 1993.

[40] K. A. Knoll and D. E. Keyes. Jacobian Free Newton Krylov Methods: A Survey of

Approaches and Applications. Journal of Computational Physics, 193:357–397, 2004.

[41] C. B. Laney. Computational Gasdynamics. Cambridge University Press, 1998.

[42] R. J. Le Veque. Numerical Methods for Conservation Laws. Birkhauser Verlag, 1990.

[43] X-D. Liu, S. Osher, and T. Chan. Weighted Essentially Non-Oscillatory Schemes.

Journal of Computational Physics, 115, 1994.

[44] H. Luo, J. Baum, and R. Lohner. A Fast, Matrix-free Implicit Method for Compressible

Flows on Unstructured Girds. Journal of Computational Physics, 146:664–690, 1998.

[45] L. M. Manzano, J. V. Lassaline, P. Wong, and D. W. Zingg. A Newton-Krylov Al-

gorithm for the Euler Equations Using Unstructured Grids. AIAA Paper 2003-0274,

41th AIAA Aerospace Sciences Meeting and Exhibit, 2003.

[46] D. J. Mavriplis. On Convergence Acceleration Techniques for Unstructured Meshes.

Technical Report ICASE No. 98-44, Institute for Computer Applications in Science

and Engineering (ICASE), NASA Langley Research Center, NASA Langley Research

Center, Hampton VA 23681-2199, 1998.

BIBLIOGRAPHY 179

[47] F. Mavriplis. CFD in aerospace in the new millenium. Canadian Aeronautics and

Space Journal, 46(4):167–176, 2000.

[48] A. Nejat and C. Ollivier-Gooch. A High-Order Accurate Unstructured GMRES Solver

for Poisson’s Equation. 11th Annual Conference of the Computational Fluid Dynamics

Society of Canada, pages 344–349, 2003.

[49] A. Nejat and C. Ollivier-Gooch. A High-Order Accurate Unstructured GMRES Solver

for the Compressible Euler Equations. Third International Conference on Computa-

tional Fluid Dynamics, 2004.

[50] A. Nejat and C. Ollivier-Gooch. A High-Order Accurate Unstructured GMRES Algo-

rithm for Inviscid Compressible Flows. AIAA Paper 2005-5341, 17th AIAA Computa-

tional Fluid Dynamics Conference, 2005.

[51] A. Nejat and C. Ollivier-Gooch. A High-Order Accurate Unstructured Newton-Krylov

Solver for Inviscid Compressible Flows. AIAA Paper 2006-3711, 36th AIAA Fluid

Dynamics Conference, 2006.

[52] A. Nejat and C. Ollivier-Gooch. On Preconditioning of Newton-GMRES algorithm for

a Higher-Order Accurate Unstructured Solver. 14th Annual Conference of the CFD

Society of Canada, 2006.

[53] A. Nejat and C. Ollivier-Gooch. Effect of Discretization Order on Preconditioning

and Convergence of a Higher-Order Unstructured Newton-Krylov Solver for Inviscid

Compressible Flows. AIAA Paper 2007-719, 45th AIAA Aerospace Sciences Meeting

and Exhibit, 2007.

[54] J. Nichols and D. W. Zingg. A Three-Dimensional Multi-Block Newton-Krylov Flow

Solver for the Euler Equations. AIAA Paper 2005-5230, 17th AIAA Computational

Fluid Dynamics Conference, 2005.

[55] E. J. Nielsen, W. K. Anderson, R. W. Walters, and D. E. Keyes. Application of Newton-

Krylov Methodology to a Three-Dimensional Unstructured Euler Code. AIAA Paper

95-1733, 12th AIAA Computational Fluid Dynamics Conference, 1995.

[56] C. Ollivier-Gooch. High-Order ENO Schemes for Unstructured Meshes Based on Least-

Square Reconstruction. AIAA Paper 97-0540, 35th AIAA Aerospace Sciences Meeting

and Exhibit, 1997.

[57] C. Ollivier-Gooch. Quasi-ENO Schemes for Unstructured Meshes Based on Unlimited

Data-Dependent Least-Squares Reconstruction. Journal of Computational Physics,

133:6–17, 1997.

BIBLIOGRAPHY 180

[58] C. Ollivier-Gooch. Programmers Reference Guide for the ANSLib Generic Finite-

Volume Solver Software Library. Advanced Numerical Simulation Laboratory

(ANSLab), Mechanical Engineering Department, The University of British Columbia,

2002.

[59] C. Ollivier-Gooch. GRUMMP (Generation and Refinement of Unstructured, Mixed-

Element Meshes in Parallel) Version 0.3.2 User’s Guide. Advanced Numerical Simu-

lation Laboratory (ANSLab), Mechanical Engineering Department, The University of

British Columbia, 2005.

[60] C. Ollivier-Gooch and M. Van Altena. A Higher-Order Accurate Unstructured Mesh

Finite-Volume Scheme for the Advection-Diffusion Equation. Journal of Computational

Physics, 181:729–752, 2002.

[61] O. Onur and S. Eyi. Effects of the Jacobian Evaluation on Newton’s Solution of the

Euler Equations. International Journal for Numerical Methods In Fluids, 49:211–231,

2005.

[62] P. D. Orkwis. Comparison of Newton’s and Quasi-Newton’s Method Solvers for Navier-

Stokes Equations. AIAA Journal, 31:832–836, 1993.

[63] A. Pueyo and D. W. Zingg. An Efficient Newton-GMRES Solver for Aerodynamic

Computations. AIAA Paper 97-1955, 1997.

[64] A. Pueyo and D. W. Zingg. Progress in Newton-Krylov Methods for Aerodynamic

Calculations. AIAA Paper 97-0877, 35th Aerospace Sciences Meeting and Exhibit,

1997.

[65] A. Pueyo and D. W. Zingg. Improvement to a Newton-Krylov Solver for Aerodynamic

Flows. AIAA Paper 98-0619, 36th Aerospace Sciences Meeting and Exhibit, 1998.

[66] S. De Rango and D. W. Zingg. Aerodynamic Computations Using a Higher-Order

Algorithm. AIAA Paper 99-0167, 37th AIAA Aerospace Sciences Meeting and Exhibit,

1999.

[67] S. De Rango and D. W. Zingg. Further Investigation of Higher-Order Algorithm for

Aerodynamic Computations. AIAA Paper 2000-0823, 38th Aerospace Sciences Meeting

and Exhibit, 2000.

[68] S. De Rango and D. W. Zingg. Higher-Order Aerodynamic Computations on Multi-

Block Grids. AIAA Paper 2001-0263, 39th AIAA Aerospace Sciences Meeting and

Exhibit, 2001.

BIBLIOGRAPHY 181

[69] P. Roe. Approximate Riemann Solvers, Parameter Vectors, and Difference Schemes.

Journal of Computational Physics, 43:357–372, 1981.

[70] P. Roe. Characteristic-Based Schemes for the Euler Equations. Annual Review of Fluid

Mechanics, 18:337–365, 1986.

[71] A. Rohde. Eigenvalues and Eigenvectors of the Euler Equations in General Geometries.

AIAA Paper 2001-2609, 39th Aerospace Sciences Meeting and Exhibit, 2001.

[72] Y. Saad, , and M. H. Schultz. A Generalized Minimal Residual Algorithm for Solving

Non-Symmetric Linear Systems. SIAM J. Sci., Stat. Comp., 7:856–869, 1986.

[73] Y. Saad. A Flexible Inner-Outer Preconditioned GMRES Algorithm. SIAM Journal

of Scientific Computing, 14(2):461–469, 1993.

[74] Y. Saad. Iterative Methods for Sparse Linear Systems. Siam, second edition, 2003.

[75] P.R. Spalart. Topics in Detached-Eddy Simulation. In Third International Conference

on Computational Fluid Dynamics (Invited Lecture), 2004.

[76] Spitaleri R. M. and Regolo V. Multiblock Multigrid Grid Generation Algorithms :

Overcoming Multigrid Anisotropy. Applied mathematics and computation (Appl. math.

comput.), 84:247–267, 1997.

[77] J. L. Steger and R. F. Warming. Flux Vector Splitting of the Inviscid Gasdynamics

Equations with Application to Finite Difference Methods. Journal of Computational

Physics, 40(263-293), 1981.

[78] M. Tadjouddine, S. A. Forth, and N. Qin. Elimination AD Applied to Jacobian As-

sembly for an Implicit Compressible CFD Solver. International Journal for Numerical

Methods in Fluids, 47:1315–1321, 2005.

[79] M. D. Tidriri. Preconditioning Techniques for the Newton-Krylov Solution of Com-

pressible Flows. Journal of Computational Physics, 132:51–61, 1997.

[80] M. Van Altena. High-Order Finite-Volume Discretisations for Solving a Modified

Advection-Diffusion Problem on Unstructured Triangular Meshes. Master’s thesis,

The University of British Columbia, Mechanical Engineering Department, 1999.

[81] H. A. Van Der Vorst. Iterative Krylov Methods for Large Linear Systems. Cambridge

University Press, 2003.

BIBLIOGRAPHY 182

[82] B. Van Leer. Towards the Ultimate Conservative Difference Scheme. V. A Second-

Order Sequel to Godunov’s Method. Journal of Computational Physics, 32:101–136,

1979.

[83] B. Van Leer, J. L. Thomas, P. L. Roe, and R. W. Newsome. A Comparision of Numer-

ical Flux Formulas for the Euler and Navier-Stokes Equations. AIAA Paper 87-1104,

1987.

[84] K. J. Vanden and P. D. Orkwis. Comparison of Numerical and Analytical Jacobians.

AIAA Journal, 34(6):1125–1129, 1996.

[85] V. Venkatakrishnan. Newton Solution of Inviscid and Viscous Problems. AIAA Paper

88-0143, 26th Aerospace Sciences Meeting and Exhibit, 1988.

[86] V. Venkatakrishnan. On the Accuracy of Limiters and Convergence to Steady State

Solutions. AIAA Paper 93-0880, 31st Aerospace Sciences Meeting and Exhibit, 1993.

[87] V. Venkatakrishnan. Convergence to Steady state Solutions of the Euler Equations on

Unstructured Grids With Limiters. Journal of Computational Physics, 118:120–130,

1995.

[88] V. Venkatakrishnan. A Perspective on Unstructured Grid Flow Solvers. Technical

Report ICASE 95-3, NASA, 1995.

[89] V. Venkatakrishnan and T. J. Barth. Application of Direct Solvers to Unstructured

Meshes for the Euler and Navier-Stokes Equations Using Upwind Schemes. AIAA

Paper 89-0364, 27th Aerospace Sciences Meeting and Exhibit, 1989.

[90] V. Venkatakrishnan and D. Mavriplis. Implicit Solvers for Unstructured Meshes. Jour-

nal of Computational Physics, 105(83-91), 1993.

[91] V. Venkatakrishnan and D. J. Mavriplis. Implicit Solvers for Unstructured Meshes.

AIAA Paper 91-1537, 1991.

[92] D. S. Watkins. Fundamentals of Matrix Computations. John Wiley & Sons, 1991.

[93] D. L. Whitaker. Three Dimensional Unstructured Grid Euler Computations Using a

Fully-Implicit Upwind Method. AIAA Paper 93-3337, 1993.

[94] L. Wigton. Application of MACSYMA and Sparse Matrix Technology to Multielement

Airfoil Calculations. AIAA Paper 87-1142, 1987.

BIBLIOGRAPHY 183

[95] D. J. Willis, J. Peraire, and J. K. White. A Quadratic Basis Function, Quadratic

Geometry, High Order Panel Method. AIAA Paper 2005-0854, 43rd AIAA Aerospace

sciences Meeting and Exhibit, 2005.

[96] D. W. Zingg, S. De Rango, M. Nemec, and T. H. Pulliam. Comparison of Several

Spatial Discretizations for the Navier-Stokes Equations. Journal of Computational

Physics, 160:683–704, 2000.

Documents

A Higher-Order Accurate Unstructured Finite Volume Newton ...tetra.mech.ubc.ca/ANSLab/publications/Nejat-PhD.pdfA Higher-Order Accurate Unstructured Finite Volume Newton-Krylov Algorithm