35
Multilevel Preconditioning Package version 3.1 Trilinos Users Group Trilinos Users Group Meeting Meeting November ’04 November ’04 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC04-94AL85000. Marzio Sala

Multilevel Preconditioning Package version 3.1

  • Upload
    lynda

  • View
    43

  • Download
    1

Embed Size (px)

DESCRIPTION

Multilevel Preconditioning Package version 3.1. Trilinos Users Group Meeting November ’04. Marzio Sala. Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC04-94AL85000. Outline. - PowerPoint PPT Presentation

Citation preview

Page 1: Multilevel Preconditioning Package version 3.1

Multilevel Preconditioning Packageversion 3.1

Trilinos Users Group MeetingTrilinos Users Group MeetingNovember ’04November ’04

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy under contract DE-AC04-94AL85000.

Marzio Sala

Page 2: Multilevel Preconditioning Package version 3.1

Outline

1. Basic concepts of multigrid/multilevel methods2. Configuring and building ML3. Using ML4. Interoperability with Trilinos packages5. Example of usage6. Numerical results7. Continuing Research8. Documentation, mailing lists, and getting help

Page 3: Multilevel Preconditioning Package version 3.1

ML Package

§ Provides parallel multilevel preconditioning for linear solver methods

§ Developers: – Ray Tuminaro, Jonathan Hu, Marzio Sala, Micheal Gee

§ Main methods– Geometric:

• Grid refinement hierarchy• 2-level FE basis function domain decomposition

– Algebraic (smoothed aggregation)• Classical smoothed aggregation• Extensions for non-symmetric systems• Edge-element for Maxwell’s equations

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 4: Multilevel Preconditioning Package version 3.1

Multilevel Preconditioners§ We are interesting in solving

A x = b, A nxn, x, b n

arising from FE discretizations on 2D/3D unstructured grids

§ The linear system is solved using Krylov accelerators (e.g., CG or GMRES)

§ A preconditioner is required:– Applicable in massively parallel environments– For large scale computations

§ Multilevel approaches furnish a very effective way to define scalable preconditioners

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 5: Multilevel Preconditioning Package version 3.1

Multilevel Preconditioners (2)

Main idea:1. Define a set of approximate solutions, on coarser

“grids” 2. Need a projection (P) and a restriction (R) to

move from one “grid” to the finer/coarser “grid”3. On each “grid”, approximate solvers (smoothers)

will dump the high frequencies of the error4. On the coarser “grid”, a direct solver is used

Two families of multilevel methods: geometric and algebraic methods

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 6: Multilevel Preconditioning Package version 3.1

Geometric Multigrid

§ Need to define a hierarchy of grids (and the corresponding restriction/prolongation operators)– 1D example:

§ However, for unstructured grids on complex geometries, the definition of the coarse grids can be problematic:– Computation of restriction/prolongator may be

computational expensive for completely uncorrelated grids

– Difficult to define boundary conditions on coarser grids

fine:

coarse:

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 7: Multilevel Preconditioning Package version 3.1

Algebraic Multigrid

§ Build multilevel operators (Ak, Pk, Rk, Sk’s) automatically to define a hierarchy:Akxk=bk, k=1,…,L

§ Once Pk’s are defined, the rest follows “easily”:q Rk = Pk

T (usually)q Ak-1 = Rk-1 Ak Pk-1 (triple matrix product) q Smoother (iterative method) Sk

+ Gauss-Seidel, polynomial, conjugate gradient, etc.

§ Smoothed aggreation (SA) algebraic multigrid– interpolation operators Pk’s are built automatically by aggregation of fine grid nodes– also main difficulty

Same Goal: Solve A x = b

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 8: Multilevel Preconditioning Package version 3.1

Smoothed Aggregation

§ Builds P by creating aggregates (set of contiguous nodes)

§ Works with structured and unstructured grids§ No need for nodal coordinates, based on matrix

entries (graph) only

Page 9: Multilevel Preconditioning Package version 3.1

Multilevel (V) Cycle

function xk = MLV(Ak,rk) if (k == 1) xk = Ak

-1 rk

else xk ← Spre,k(Ak,rk,0); qk ← rk – Ak xk

rk-1 ← Rk qk

xk-1 ← MLV(Ak-1,rk-1) xk ← xk + Pk xk-1

xk ← Spost,k(Ak,qk,xk); end

⇦ Direct (parallel) solver

⇦ presmoothing

⇦ postsmoothing

⇦ Ak-1 = Rk Ak Pk

⇦ Usually, RK = PKT

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 10: Multilevel Preconditioning Package version 3.1

Overview of ML capabilities

multilevel cyclingV, W, full V, full WAdditive and hybrid domain decomposition methods

Coarser level construction Uncoupled, MIS, METIS, ParMETIS, ZOLTAN

Smoothers

Jacobi, (symmetric) Gauss-Seidel, any AztecOO and IFPACK preconditioner (like ILU), SPAI, polynomial

Coarse solverSuperLU, SuperLU_DIST,Any Amesos solver

Kernels Sparse matrix-matrix

Other tools Visualization and analysis

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 11: Multilevel Preconditioning Package version 3.1

Common Decisions for ML Users

How to configure and build ML Choose among supported packagesMatrix and vector formats Aztec/ML/Epetra matrices

Coarsening strategy - How many levels?- Aggressive coarsening?

What smoother to use - # of smoothing steps- Smoothers can be different on each level- Finest level may require light-weight smoothers

Coarse solver - Serial, distributed- How many processors use for the coarse problem

Cycling strategy V, W, domain decomposition…Analyzing ML results Some visualization capabilities

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 12: Multilevel Preconditioning Package version 3.1

Configuring and Building ML

§ Builds by default when Trilinos is configured/built§ Configure help

$ Trilinos/packages/ml/configure --help§ ML interfaces with a variety of packages

– within Trilinos– outside Trilinos

§ Check out the supported packages you may need!

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 13: Multilevel Preconditioning Package version 3.1

Configuring and Building ML (2)

Configure option Used for--enable-epetra (def) Epetra_RowMatrix interface--enable-teuchos (def) Teuchos::ParameterList--enable-amesos Coarse Solver--enable-anasazi Eigenanalysis--enable-ifpack (def) Smoothers--enable-aztecoo (def) Examples--enable-triutils (def) Examples--with-ml_metis Local aggregation scheme--with-ml_parmetis3x Global aggregation scheme--with-ml_zoltan RCB partitioning--with-ml_superlu Coarse solver (serial)--with-ml_superludist Coarse solver (distributed)--with-ml_parasails SPAI smoother

Page 14: Multilevel Preconditioning Package version 3.1

Matrix Format

§ ML is not based on any matrix format, you just need– matvec() – getrow()

• returns the nonzero indices and values for any local row

§ Wrappers exist for– Aztec matrices (DMSR, DVBR)– Epetra matrices

• Any ML object can be (cheaply) wrapped as an Epetra object and vice-versa

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 15: Multilevel Preconditioning Package version 3.1

ML Aggregation Choices

§ MIS– Expensive, usually best with many processors

§ Uncoupled (UC)– Cheap, usually works well.

§ Hybrid = UC + MIS– Uncoupled on fine grids, MIS on coarser grids

§ Local graph partitioning– At least one aggregate per processor– Requires METIS

§ Global graph partitioning– Aggregates can span processors– Requires ParMETIS

Fixe

d A

ggre

gate

size

Var

iabl

e A

ggre

gate

si

ze

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 16: Multilevel Preconditioning Package version 3.1

Large aggregates(METIS/ParMETIS)

Small aggregates(Uncoupled/MIS)

ML Aggregation Choices (2)

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 17: Multilevel Preconditioning Package version 3.1

4 Aggregates on second levelFine level mesh: 32 x 32 nodes16 aggregates

ML Aggregation Choices: METIS

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 18: Multilevel Preconditioning Package version 3.1

ML Aggregation Choices: Uncoupled/MIS

§ Uncoupled/MIS aggregation (Greedy algorithm)– parallel can be complicated

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 19: Multilevel Preconditioning Package version 3.1

ML Smoother Choices

§ A smoother is an approximate solver that effectively dumps the high-frequencies of the error

Before smoothing

After smoothing

Page 20: Multilevel Preconditioning Package version 3.1

)( )(1)()1( iii AxbDxx −+= −+ ω

ML Smoother Choices (2)

§ Jacobi– Simplest, cheapest, usually least effective.

– Damping parameter (ω) needed

§ Point Gauss-Seidel– Equation satisfied one unknown at a time– Can be problematic in parallel (may need damping)

• processor-based (stale off-proc values)

§ Block Gauss-Seidel– Satisfy several equations simultaneously by modifying several

DOFs (inverting subblock associated with DOFs).– Blocks can correspond to aggregates

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 21: Multilevel Preconditioning Package version 3.1

ML Smoother Choices (3)

§ AztecOO– Any Aztec preconditioner can be used as smoother. – Most interesting are ILU & ILUT: may be more robust than Gauss-

Seidel§ IFPACK:

– Block Jacobi and (symmetric) Gauss-Seidel– Incomplete factorizations– Exact LU solvers (through Amesos)

§ Hiptmair– Specialized 2-stage smoother for Maxwell’s Eqns.

§ MLS– Approximation to inverse based on Chebyshev polynomials of

smoothed operator.– In serial, competitive with true Gauss-Seidel– Doesn’t degrade with # processors (unlike processor-based GS)

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 22: Multilevel Preconditioning Package version 3.1

Coarsest-level Solvers

§ Any smoother can be used at the coarse level…

§ … but a direct method should be preferred:– Incomplete factorizations (via Aztec)– Serial or distributed SuperLU– Amesos interface:

• LAPACK, KLU, UMFPACK, SuperLU• SuperLU_DIST, MUMPS• Using Amesos with SuperLU_DIST or MUMPS, one can

specify the number of processors

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 23: Multilevel Preconditioning Package version 3.1

ML Cycling Choices

§ “Multigrid” options:– V is default (usually works best)– W more expensive, may be more robust– Full MG (V cycle) more expensive (less

conventional within preconditioners)

§ “Domain decomposition” options (for 2-level methods):– Additive methods (requires no matvec in the

preconditioning step and only one application of the smoother)

– Various hybrid methods (generally more effective, but more expensive)

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

W Cycle

FMV Cycle

Page 24: Multilevel Preconditioning Package version 3.1

Multilevel Methods: Issues to be aware of

§ Severely stretched grids or anisotropies§ Loss of diagonal dominance§ Atypical stencils§ Jumps in material properties§ Non-symmetric matrices§ Boundary conditions§ Systems of PDEs§ Non-trivial null space (Maxwell’s equations, elasticity)

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 25: Multilevel Preconditioning Package version 3.1

What can go wrong?

§ Small aggregates high complexity§ Large aggregates poor convergence§ Different size aggregates both

– Try different aggregation methods, drop tolerances

§ Stretched grids poor convergence§ Variable regions poor convergence

– Try different smoothers, drop tolerances

§ Ineffective smoothing poor convergence (perhaps due to non-diagonal dominance or non-symmetry in

operator)– Try different smoothers

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 26: Multilevel Preconditioning Package version 3.1

Things to Try if Multigrid Isn’t Working

§ Smoothers– Vary number of smoothing steps.– More `robust’ smoothers: block Gauss-Seidel, ILU, MLS polynomial

(especially if degradation in parallel).– Vary damping parameters (smaller is more conservative).

§ Try different aggregation schemes§ Try fewer levels§ Try drop tolerances, if …

– high complexity (printed out by ML).– Severely stretched grids– Anisotropic problems– Variable regions

§ Reduce prolongator damping parameter– Concerned about operators properties (e.g. highly non-symmetric).

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 27: Multilevel Preconditioning Package version 3.1

Analyzing ML matrices*** Analysis of ML_Operator `A matrix level 0' ***

Number of global rows = 256 Number of equations = 1 Number of stored elements = 1216 Number of nonzero elements = 1216 Mininum number of nonzero elements/row = 3 Maximum number of nonzero elements/row = 5 Average number of nonzero elements/rows = 4.750000 Nonzero elements in strict lower part = 480 Nonzero elements in strict upper part = 480 Max |i-j|, a(i,j) != 0 = 16 Number of diagonally dominant rows = 86 (= 33.59%) Number of weakly diagonally dominant rows = 67 (= 26.17%) Number of Dirichlet rows = 0 (= 0.00%) ||A||_F = 244.066240 Min_{i,j} ( a(i,j) ) = -14.950987 Max_{i,j} ( a(i,j) ) = 15.208792 Min_{i,j} ( abs(a(i,j)) ) = 0.002890 Max_{i,j} ( abs(a(i,j)) ) = 15.208792 Min_i ( abs(a(i,i)) ) = 2.004640 Max_i ( abs(a(i,i)) ) = 15.208792 Min_i ( \sum_{j!=i} abs(a(i,j)) ) = 2.004640 Max_i ( \sum_{j!=i} abs(a(i,j)) ) = 15.205902 max eig(A) (using power method) = 27.645954 max eig(D^{-1}A) (using power method) = 1.878674

Total time for analysis = 3.147979e-03 (s)

Page 28: Multilevel Preconditioning Package version 3.1

Analyzing ML Preconditioner*** ************************************************ ****** Analysis of the spectral properties using LAPACK ****** ************************************************ ****** Operator = A *** Computing eigenvalues of finest-level matrix *** using LAPACK. This may take some time... *** results are on file `eig_A.m'.

min |lambda_i(A)| = 0.155462 max |lambda_i(A)| = 26.3802 spectral condition number = 169.689

*** ************************************************ *** *** Analysis of the spectral properties using LAPACK *** *** ************************************************ *** *** Operator = P^{-1}A *** Computing eigenvalues of ML^{-1}A *** using LAPACK. This may take some time... *** results are on file `eig_PA.m'.

min |lambda_i(ML^{-1}A)| = 0.776591 max |lambda_i(ML^{-1}A)| = 1.09315 spectral condition number = 1.40762

Page 29: Multilevel Preconditioning Package version 3.1

ML Interoperability with Trilinos Packages

ML

Epetra

Accepts user data as Epetra objects

Can be wrapped as Epetra_Operator

TSF

TSF interface exists

Othermatvecs

Othersolvers

Accepts other solversand MatVecs

AmesosIFPACKAnasazi

Direct solvers, Smoothers, eigensolver

Aztecoo

Meros

Via Epetra & TSF

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 30: Multilevel Preconditioning Package version 3.1

ML with Epetra

Two classes wrap ML preconditioners as Epetra preconditioners:

1) ML_Epetra::MultiLevelPreconditioner– A very easy, black-box interface to ML – Can be used as AztecOO preconditioner– All parameters specified using ParameterList– Still not ready for Maxwell users

2) ML_Epetra::MultilLevelOperator– Users need to build the preconditioner step-by-step– Ready for Maxwell users

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 31: Multilevel Preconditioning Package version 3.1

ML with Epetra: Example of useTrilinos_Util_CrsMatrixGallery Gallery(“laplace_3d", Comm);Gallery.Set("problem_size", 100*100*100); // linear system matrix & linear problemEpetra_RowMatrix * A = Gallery.GetMatrix();Epetra_LinearProblem * Problem = Gallery.GetLinearProblem();

// Construct outer solver objectAztecOO solver(*Problem);solver.SetAztecOption(AZ_solver, AZ_cg);

// Set up multilevel precond. with smoothed aggr. defaultsParameterList MLList; // parameter list for ML optionsML_Epetra::SetDefaults(“SA”,MLList);MLList.set(“aggregation: type”, “METIS”);// create preconditioner ML_Epetra::MultiLevelPreconditioner * MLPrec = new ML_Epetra::MultiLevelPreconditioner(*A, MLList, true);

solver.SetPrecOperator(MLPrec); // set preconditionersolver.Iterate(500, 1e-12); // iterate at most 500 times

delete MLPrec;

Trilinos/packages/ml/examples/ml_example_MultiLevelPreconditioner.cpp

Triu

tils

Azt

ecO

OM

LA

ztec

OO

List.set(“max levels”, 4); List.set(“smoother (level 0)”, “Jacobi”); List.set(“damping factor”, 0.5); List.set(“smoother (level 1)”, “IFPACK”); List.set(“coarse: type”, “Amesos-MUMPS”); List.set(“coarse: max procs”, 4); List.set(“output level”, 10);

Page 32: Multilevel Preconditioning Package version 3.1

• 3D Thermal convection in a cube• Incompressible Navier-Stokes + energy • Smoother: processor-based ILU• Coarse solver: SuperLU• Sandia ASCI Red machine

Numerical Experiments: MPSalsa

Procs geom aggr

4 135 120

32 625 480

256 3645 2560

proc fine gridunknowns

avg its per time avg its per time avg its per time avg its per timeNewt step (sec) Newt step (sec) Newt step (sec) Newt step (sec)

4 24,565 28 106 20 88 33 78 30 71

32 179,685 32 163 34 140 44 94 50 109

256 1,373,125 35 273 38 187 47 219 58 152

Gauss-SeidelILUgeom aggr geom aggr

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 33: Multilevel Preconditioning Package version 3.1

Numerical Experiment: Zpinch simulation

procs # elements its complexity16 155,904 41 1.1450 492, 912 42 1.15

128 1,259,976 47 1.16240 ~ 2,400,000 52 1.18

CG preconditioned with V(1,1) AMG cycle

||r||2/||r0||2 < 10-8

2-stage Hiptmair smoother (4th order polynomials in each stage)Edge element interpolationPlatform : Sandia’s Cplant

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help

Page 34: Multilevel Preconditioning Package version 3.1

On-going Research

§ Smoothed aggregation for non-symmetric systems§ Aggregation schemes for stretched elements§ New smoothers (via IFPACK)§ Compatible relaxation for smoothed aggregation§ Adaptive smoothed aggregation

– Starts with existing AMG method– Iteratively detects slow-to-converge modes (e.g., rigid

body modes)– Goal: Improve prolongator during simulation

§ Variable-block aggregation§ Nonlinear preconditioning

>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help