52
FAS and Solver Performance Matthew Knepley Mathematics and Computer Science Division Argonne National Laboratory Fall AMS Central Section Meeting Chicago, IL Oct 05–06, 2007 M. Knepley (ANL) FAS AMS ’07 1 / 27

FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

FAS and Solver Performance

Matthew Knepley

Mathematics and Computer Science DivisionArgonne National Laboratory

Fall AMS Central Section MeetingChicago, IL

Oct 05–06, 2007

M. Knepley (ANL) FAS AMS ’07 1 / 27

Page 2: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Payoff

Why should I care?

1 Optimal multilevel solvers are necessary2 Processor flops are increasing much faster than bandwidth3 Nonlinear algorithms can be efficient than linear algorithms4 Presents an opportunity for numerical algebraic geometry

M. Knepley (ANL) FAS AMS ’07 2 / 27

Page 3: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Payoff

Why should I care?

1 Optimal multilevel solvers are necessary2 Processor flops are increasing much faster than bandwidth3 Nonlinear algorithms can be efficient than linear algorithms4 Presents an opportunity for numerical algebraic geometry

M. Knepley (ANL) FAS AMS ’07 2 / 27

Page 4: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Payoff

Why should I care?

1 Optimal multilevel solvers are necessary2 Processor flops are increasing much faster than bandwidth3 Nonlinear algorithms can be efficient than linear algorithms4 Presents an opportunity for numerical algebraic geometry

M. Knepley (ANL) FAS AMS ’07 2 / 27

Page 5: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Payoff

Why should I care?

1 Optimal multilevel solvers are necessary2 Processor flops are increasing much faster than bandwidth3 Nonlinear algorithms can be efficient than linear algorithms4 Presents an opportunity for numerical algebraic geometry

M. Knepley (ANL) FAS AMS ’07 2 / 27

Page 6: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Simulation Basics

Outline

1 Simulation Basics

2 Newton-Multigrid

3 Machine Performance

4 FAS and Multigrid-Newton

5 Possible Extensions

M. Knepley (ANL) FAS AMS ’07 3 / 27

Page 7: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Simulation Basics

Necessity Of SimulationExperiment are ...

Expensive

Impossible

Difficult

Dangerous

M. Knepley (ANL) FAS AMS ’07 4 / 27

Page 8: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Simulation Basics

Why Optimal Algorithms?

The more powerful the computer,the greater the importance of optimalityExample:

Suppose Alg1 solves a problem in time CN2, N is the input sizeSuppose Alg2 solves the same problem in time CNSuppose Alg1 and Alg2 are able to use 10,000 processors

In constant time compared to serial,Alg1 can run a problem 100X largerAlg2 can run a problem 10,000X larger

Alternatively, filling the machine’s memory,Alg1 requires 100X timeAlg2 runs in constant time

M. Knepley (ANL) FAS AMS ’07 5 / 27

Page 9: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Outline

1 Simulation Basics

2 Newton-Multigrid

3 Machine Performance

4 FAS and Multigrid-Newton

5 Possible Extensions

M. Knepley (ANL) FAS AMS ’07 6 / 27

Page 10: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

What Is Optimal?

I will define optimal as an O(N) solution algorithm

These are generally hierarchical, so we needhierarchy generationassembly on subdomainsrestriction and prolongation

M. Knepley (ANL) FAS AMS ’07 7 / 27

Page 11: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

The Bratu Problem

∆u + λeu = f in Ω (1)u = g on ∂Ω (2)

Also called the Solid-Fuel Ignition equationCan be treated as a nonlinear eigenvalue problemHas two solution branches until λ ∼= 6.28

M. Knepley (ANL) FAS AMS ’07 8 / 27

Page 12: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Newton’s Method

0 = F (u + δu) ∼= F (u) + J(u)δu (3)

so thatu + δu = u − J(u)−1F (u) (4)

Quadratic convergenceJ can be solved approximately (Dembo-Eisensat-Steihaug)

M. Knepley (ANL) FAS AMS ’07 9 / 27

Page 13: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid

Smoothing (typically Gauss-Seidel)

xnew = S(xold ,b) (5)

Coarse-grid Correction

Jcδxc = R(b − Jxold ) (6)xnew = xold + RT δxc (7)

M. Knepley (ANL) FAS AMS ’07 10 / 27

Page 14: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Convergence

Convergence to ||r || < 10−9||b|| using GMRES(30)/ILU

Elements Iterations128 10256 17512 24

1024 342048 674096 1168192 167

16384 32932768 55865536 920

131072 1730

M. Knepley (ANL) FAS AMS ’07 11 / 27

Page 15: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Convergence

Convergence to ||r || < 10−9||b|| using GMRES(30)/MG

Elements Iterations128 5256 7512 6

1024 72048 64096 78192 6

16384 732768 665536 7

131072 6

M. Knepley (ANL) FAS AMS ’07 11 / 27

Page 16: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid Memory Access 7/23/15, 8:39 AM

Page 1 of 2file:///PETSc3/presentations-git/figures/MG/MGMemoryAccess_0.svg

Processor

Memoryx b J

J

Jx

x

b

b

R2

R1

r

rS 2

S1

S 0

M. Knepley (ANL) FAS AMS ’07 12 / 27

Page 17: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid Memory Access 7/23/15, 8:43 AM

Page 1 of 2file:///PETSc3/presentations-git/figures/MG/MGMemoryAccess_1.svg

Processor

Memoryx b J

J

Jx

x

b

b

R2

R1

r

rS 2

S1

S 0

M. Knepley (ANL) FAS AMS ’07 12 / 27

Page 18: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid Memory Access 7/23/15, 8:39 AM

Page 1 of 2file:///PETSc3/presentations-git/figures/MG/MGMemoryAccess_0.svg

Processor

Memoryx b J

J

Jx

x

b

b

R2

R1

r

rS 2

S1

S 0

M. Knepley (ANL) FAS AMS ’07 12 / 27

Page 19: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid Memory Access 7/23/15, 8:43 AM

Page 1 of 2file:///PETSc3/presentations-git/figures/MG/MGMemoryAccess_1.svg

Processor

Memoryx b J

J

Jx

x

b

b

R2

R1

r

rS 2

S1

S 0

M. Knepley (ANL) FAS AMS ’07 12 / 27

Page 20: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid Memory Access 7/23/15, 8:39 AM

Page 1 of 2file:///PETSc3/presentations-git/figures/MG/MGMemoryAccess_0.svg

Processor

Memoryx b J

J

Jx

x

b

b

R2

R1

r

rS 2

S1

S 0

M. Knepley (ANL) FAS AMS ’07 12 / 27

Page 21: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid Memory Access 7/23/15, 8:45 AM

Page 1 of 2file:///PETSc3/presentations-git/figures/MG/MGMemoryAccess_2.svg

Processor

Memoryx b J

J

Jx

x

b

b

R2

R1

r

rS 2

S1

S 0

M. Knepley (ANL) FAS AMS ’07 12 / 27

Page 22: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid Memory Access 7/23/15, 8:39 AM

Page 1 of 2file:///PETSc3/presentations-git/figures/MG/MGMemoryAccess_0.svg

Processor

Memoryx b J

J

Jx

x

b

b

R2

R1

r

rS 2

S1

S 0

M. Knepley (ANL) FAS AMS ’07 12 / 27

Page 23: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid Memory Access 7/23/15, 8:46 AM

Page 1 of 2file:///PETSc3/presentations-git/figures/MG/MGMemoryAccess_3.svg

Processor

Memoryx b J

J

Jx

x

b

b

R2

R1

r

rS 2

S1

S 0

M. Knepley (ANL) FAS AMS ’07 12 / 27

Page 24: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid Memory Access 7/23/15, 8:39 AM

Page 1 of 2file:///PETSc3/presentations-git/figures/MG/MGMemoryAccess_0.svg

Processor

Memoryx b J

J

Jx

x

b

b

R2

R1

r

rS 2

S1

S 0

M. Knepley (ANL) FAS AMS ’07 12 / 27

Page 25: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid Memory Access 7/23/15, 8:47 AM

Page 1 of 2file:///PETSc3/presentations-git/figures/MG/MGMemoryAccess_4.svg

Processor

Memoryx b J

J

Jx

x

b

b

R2

R1

r

rS 2

S1

S 0

M. Knepley (ANL) FAS AMS ’07 12 / 27

Page 26: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid Memory Access 7/23/15, 8:39 AM

Page 1 of 2file:///PETSc3/presentations-git/figures/MG/MGMemoryAccess_0.svg

Processor

Memoryx b J

J

Jx

x

b

b

R2

R1

r

rS 2

S1

S 0

M. Knepley (ANL) FAS AMS ’07 12 / 27

Page 27: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid Memory Access 7/23/15, 8:47 AM

Page 1 of 2file:///PETSc3/presentations-git/figures/MG/MGMemoryAccess_5.svg

Processor

Memoryx b J

J

Jx

x

b

b

R2

R1

r

rS 2

S1

S 0

M. Knepley (ANL) FAS AMS ’07 12 / 27

Page 28: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid Memory Access 7/23/15, 8:39 AM

Page 1 of 2file:///PETSc3/presentations-git/figures/MG/MGMemoryAccess_0.svg

Processor

Memoryx b J

J

Jx

x

b

b

R2

R1

r

rS 2

S1

S 0

M. Knepley (ANL) FAS AMS ’07 12 / 27

Page 29: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid Memory Access 7/23/15, 8:48 AM

Page 1 of 2file:///PETSc3/presentations-git/figures/MG/MGMemoryAccess_6.svg

Processor

Memoryx b J

J

Jx

x

b

b

R2

R1

r

rS 2

S1

S 0

M. Knepley (ANL) FAS AMS ’07 12 / 27

Page 30: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid Memory Access 7/23/15, 8:39 AM

Page 1 of 2file:///PETSc3/presentations-git/figures/MG/MGMemoryAccess_0.svg

Processor

Memoryx b J

J

Jx

x

b

b

R2

R1

r

rS 2

S1

S 0

M. Knepley (ANL) FAS AMS ’07 12 / 27

Page 31: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Newton-Multigrid

Linear Multigrid Memory Access 7/23/15, 8:49 AM

Page 1 of 2file:///PETSc3/presentations-git/figures/MG/MGMemoryAccess_7.svg

Processor

Memoryx b J

J

Jx

x

b

b

R2

R1

r

rS 2

S1

S 0

M. Knepley (ANL) FAS AMS ’07 12 / 27

Page 32: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Machine Performance

Outline

1 Simulation Basics

2 Newton-Multigrid

3 Machine Performance

4 FAS and Multigrid-Newton

5 Possible Extensions

M. Knepley (ANL) FAS AMS ’07 13 / 27

Page 33: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Machine Performance

STREAM Benchmark

Simple benchmark program measuring sustainable memory bandwidth

Protoypical operation is Triad (WAXPY): w = y + αxMeasures the memory bandwidth bottleneck (much below peak)Datasets outstrip cache

Machine Peak (MF/s) Triad (MB/s) MF/MW Eq. MF/sMatt’s Laptop 1700 1122.4 12.1 93.5 (5.5%)Intel Core2 Quad 38400 5312.0 57.8 442.7 (1.2%)Tesla 1060C 984000 102000.0* 77.2 8500.0 (0.8%)

Table: Bandwidth limited machine performance

http://www.cs.virginia.edu/stream/

M. Knepley (ANL) FAS AMS ’07 14 / 27

Page 34: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Machine Performance

Analysis of Sparse Matvec (SpMV)

AssumptionsNo cache missesNo waits on memory references

Notationm Number of matrix rowsnz Number of nonzero matrix elementsV Number of vectors to multiply

We can look at bandwidth needed for peak performance(8 +

2V

)mnz

+6V

byte/flop (8)

or achieveable performance given a bandwith BWVnz

(8V + 2)m + 6nzBW Mflop/s (9)

Towards Realistic Performance Bounds for Implicit CFD Codes, Gropp,Kaushik, Keyes, and Smith.

M. Knepley (ANL) FAS AMS ’07 15 / 27

Page 35: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Machine Performance

Improving Serial PerformanceFor a single matvec with 3D FD Poisson, Matt’s laptop can achieve atmost

1(8 + 2) 1

7 + 6bytes/flop(1122.4 MB/s) = 151 MFlops/s, (10)

which is a dismal 8.8% of peak.

Can improve performance byBlockingMultiple vectors

but operation issue limitations take over.

M. Knepley (ANL) FAS AMS ’07 16 / 27

Page 36: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Machine Performance

Improving Serial PerformanceFor a single matvec with 3D FD Poisson, Matt’s laptop can achieve atmost

1(8 + 2) 1

7 + 6bytes/flop(1122.4 MB/s) = 151 MFlops/s, (10)

which is a dismal 8.8% of peak.

Better approaches:Unassembled operator application (Spectral elements, FMM)

N data, N2 computationNonlinear evaluation (Picard, FAS, Exact Polynomial Solvers)

N data, Nk computation

M. Knepley (ANL) FAS AMS ’07 16 / 27

Page 37: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

FAS and Multigrid-Newton

Outline

1 Simulation Basics

2 Newton-Multigrid

3 Machine Performance

4 FAS and Multigrid-Newton

5 Possible Extensions

M. Knepley (ANL) FAS AMS ’07 17 / 27

Page 38: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

FAS and Multigrid-Newton

Matrix-Free Smoothing7/23/15, 8:50 AM

Page 1 of 1file:///PETSc3/presentations-git/figures/MG/linearJacobi.svg

=

We can use point Jacobi

xnewi = xold

i + J−1ii (bi − JT

i xold ) (11)

In the nonlinear case,

JTi xold = eT

i ∇F (x)xold (12)

which might be calculated automatically using AD.M. Knepley (ANL) FAS AMS ’07 18 / 27

Page 39: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

FAS and Multigrid-Newton

Nonlinear Gauss-Seidel

If we have an initial guess u, b = 0− F (u),

xnewi = xold

i + J−1ii (bi − JT

i xold ) (13)

xnewi = xold

i + J−1ii (−Fi(u)−∇Fi(u)T xold ) (14)

xnewi = xold

i − J−1ii (Fi(u) +∇Fi(u)T xold ) (15)

xnewi = xold

i − J−1ii Fi(u + xold ) (16)

unewi = uold

i − J−1ii Fi(uold ) (17)

This is just Newton’s method on a single equation at a time . . .

which is Nonlinear Gauss-Seidel.

M. Knepley (ANL) FAS AMS ’07 19 / 27

Page 40: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

FAS and Multigrid-Newton

Nonlinear Gauss-Seidel

If we have an initial guess u, b = 0− F (u),

xnewi = xold

i + J−1ii (bi − JT

i xold ) (13)

xnewi = xold

i + J−1ii (−Fi(u)−∇Fi(u)T xold ) (14)

xnewi = xold

i − J−1ii (Fi(u) +∇Fi(u)T xold ) (15)

xnewi = xold

i − J−1ii Fi(u + xold ) (16)

unewi = uold

i − J−1ii Fi(uold ) (17)

This is just Newton’s method on a single equation at a time . . .

which is Nonlinear Gauss-Seidel.

M. Knepley (ANL) FAS AMS ’07 19 / 27

Page 41: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

FAS and Multigrid-Newton

Nonlinear Gauss-Seidel

If we have an initial guess u, b = 0− F (u),

xnewi = xold

i + J−1ii (bi − JT

i xold ) (13)

xnewi = xold

i + J−1ii (−Fi(u)−∇Fi(u)T xold ) (14)

xnewi = xold

i − J−1ii (Fi(u) +∇Fi(u)T xold ) (15)

xnewi = xold

i − J−1ii Fi(u + xold ) (16)

unewi = uold

i − J−1ii Fi(uold ) (17)

This is just Newton’s method on a single equation at a time . . .

which is Nonlinear Gauss-Seidel.

M. Knepley (ANL) FAS AMS ’07 19 / 27

Page 42: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

FAS and Multigrid-Newton

Nonlinear MultigridMost authors just offer an ansatz with nonlinear smoothing

xnew = S(xold ,b) (18)

and coarse-grid correction

Fc(xc) = Fc(xc) + γR(b − F (xold )) (19)

xnew = xold +1γ

RT (xc − xc) (20)

where x is an approximate solution.

If F is a linear operator L, the correction reduces to

Lc(xc) = Lc(xc) + γR(b − L(xold )) (21)Lc(xc − xc) = γR(b − L(xold )) (22)

Lcδxc = γRr (23)M. Knepley (ANL) FAS AMS ’07 20 / 27

Page 43: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

FAS and Multigrid-Newton

Nonlinear MultigridMost authors just offer an ansatz with nonlinear smoothing

xnew = S(xold ,b) (18)

and coarse-grid correction

Fc(xc) = Fc(xc) + γR(b − F (xold )) (19)

xnew = xold +1γ

RT (xc − xc) (20)

where x is an approximate solution.

and the update becomes

xnew = xold +1γ

RT δxc (21)

xnew = xold + RT L−1c Rr (22)

M. Knepley (ANL) FAS AMS ’07 20 / 27

Page 44: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

FAS and Multigrid-Newton

Nonlinear MultigridIt is instructive to look at the alternate derivation of Barry Smith

Begin with the nonlinear generalization F (u) = 0, for a correction

Jcxc = R(b − Jxold ) (23)Jcxc = −R(F (u) + Jxold ) (24)

and then using Taylor series

F (uold ) = F (u) + J(uold − u) + . . . (25)Fc(uold

c + xc) = Fc(uoldc ) + Jcxc + . . . (26)

we have the correction

Fc(uoldc + xc)− Fc(uold

c ) = −RF (uold ) (27)Fc(uold

c + xc) = Fc(uoldc )− RF (uold ) (28)

M. Knepley (ANL) FAS AMS ’07 21 / 27

Page 45: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

FAS and Multigrid-Newton

Nonlinear MultigridIt is instructive to look at the alternate derivation of Barry Smith

Begin with the nonlinear generalization F (u) = 0, for a correction

Jcxc = R(b − Jxold ) (23)Jcxc = −R(F (u) + Jxold ) (24)

and then using Taylor series

F (uold ) = F (u) + J(uold − u) + . . . (25)Fc(uold

c + xc) = Fc(uoldc ) + Jcxc + . . . (26)

and the same update

xnew = xold + RT xc (27)

M. Knepley (ANL) FAS AMS ’07 21 / 27

Page 46: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

FAS and Multigrid-Newton

Spectrum of Methods

7/23/15, 8:51 AM

Page 1 of 1file:///PETSc3/presentations-git/figures/MG/methodSpectrumI.svg

Newton-Multigrid FAS

When does linearization happen?Which Jacobian entries are updated?

M. Knepley (ANL) FAS AMS ’07 22 / 27

Page 47: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

FAS and Multigrid-Newton

Nonlinear Convergence

Convergence to ||r || < 10−9||r0|| using Newton/GMRES(30)/ILU

Elements Iterations32 464 4

128 4256 4512 4

1024 42048 44096 48192 4

16384 432768 465536 4

131072 4

M. Knepley (ANL) FAS AMS ’07 23 / 27

Page 48: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

FAS and Multigrid-Newton

Nonlinear Convergence

M. Knepley (ANL) FAS AMS ’07 23 / 27

Page 49: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Possible Extensions

Outline

1 Simulation Basics

2 Newton-Multigrid

3 Machine Performance

4 FAS and Multigrid-Newton

5 Possible Extensions

M. Knepley (ANL) FAS AMS ’07 24 / 27

Page 50: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Possible Extensions

Polynomial Solvers

A great opportunity exists for polynomial solvers

Better performanceBandwidth considerations only intensify on multicore chipsPetascale systems will need these improvements

More robustMost practical engineering calculations are quadratic

New algorithmsCan multiple solutions speed up convergence?

M. Knepley (ANL) FAS AMS ’07 25 / 27

Page 51: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Possible Extensions

Conclusions

Newton-Multigrid providesGood nonlinear solvesSimple interface for software librariesLow computational efficiency

Multigrid-FAS providesGood nonlinear solvesLower memory bandwidth and storagePotentially high computational efficiencyRequires formation on small systems “on the fly”

M. Knepley (ANL) FAS AMS ’07 26 / 27

Page 52: FAS and Solver Performance - Rice U - Computational and ...mk51/presentations/PresAMS2007.pdfold) (11) In the nonlinear case, JT i x old = eT i rF(x)xold (12) which might be calculated

Possible Extensions

PETSc Resources

http://www.mcs.anl.gov/petscCan download tarballs or clone a Mercurial repositoryHyperlinked documentation

ManualManual pages for evey methodHTML of all example code (linked to manual pages)

FAQFull support at [email protected]

M. Knepley (ANL) FAS AMS ’07 27 / 27