August 4, 2007, Deflation Methods in Fermion Inverters, 1 Methods for Fermion Inverters Walter Wilcox Baylor University Joint with Ron Morgan (Mathematics

August 4, 2007, Deflation Methods in Fermion Inverters, 1

Methods for Fermion Inverters

Walter Wilcox Baylor University

Joint with Ron Morgan (Mathematics Dept.) and Abdou Abdel-Rehim (Baylor Postdoctoral Fellow)


Outline•Deflation basics

•Morgan/Wilcox algorithm (non-Hermitian)•GMRES-DR (Morgan)•GMRES-Proj (multiple rhs’s of Ax=b)•shifting (multi-mass)•new results (rhs’s with D-BiCGStab)

•M. Lüscher algorithm

•A. Stathopoulos/K. Orginos algorithm(Hermitian)

(arXiv: 0706.2298v4)

(arXiv: 0707.0131v1; talk)

(math-ph/0405053, arXiv: 0707.0505, 0707.0502)


Deflation or related in lattice QCD problems (not comprehensive!)

• de Forcrand, Nucl. Phys. B (P.S.) 47, p.228,1996. (experiments with multiple rhs’s.)

• Edwards, Heller, Narayanan, Nucl. Phys. B 540; Dong, Lee, Liu, Zhang, Phys. Rev. Lett., 85, 5051, 2000 457, 1999. (projecting out low overlap H2 eigenmodes)

• Neff, Eicker, Lippert, Negele, Schilling, Phys. Rev. D64:114509, 2001. (deflation for all-to-all propagators on 5M)

• Giusti, Hoelbling, Lüscher and Wittig, Comput. Phys. Commun. 153, 31, 2003. (quark propagator low-mode preconditioning)

• DeGrand and Schaefer, Comput. Phys. Commun. 185-191, 2004. (low mode averaging)


Related talks at this conference

• J. Bloch, “An Iterative Method to Compute the Overlap Dirac Operator at Nonzero Chemical Potential”. (Bloch, Frommer, Lang, Wettig, arXiv: 0704.3486)

• M. Clark, “Adaptive Multi-Grid for QCD” (Brannick et al, arXiv: 0707.4018.)

• K. Orginos, “A Solver for Multiple Right-Hand Sides” (0707.0131v1)


Krylov subspace:

Starting, residual vectors:

q is poly of degree m or less that has value 1 at 0.

},...,,,{ 01

02

00 rArAArrSpan m

r = r0 - Ax̂

r = q(A)r0 = iq(i )zir0 = izi

Deflation basics


Matrix: bidiagonal, diagonal is 0.1, 1, 2, 3, …1999, superdiagonal is all 1’s

GMRES polynomial of degree 10

0 100 200 300 400 500 600 700 800 900 1000-0.2

0

0.2

0.4

0.6

0.8

1

1.2


GMRES polynomial of degree 100(close up view)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-0.2

0

0.2

0.4

0.6

0.8

1

1.2


GMRES polynomial of degree 150(close up view)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1


Residual norm curve

0 50 100 15010

-3

10-2

10-1

100

101

102

Solution of the Linear Equations

Res

idua

l Nor

m

Matrix-vector Products

Matrix vector products


Small eigenvalues and Krylov methods

• SPD matrix: convergence is approximately related to

Example: Eigenvalues: 0.1, 1, 2, 3, . . . n

If one removes 4 eigenvalues, convergence is improved by factor of about 6 (remove 10 for improvement factor of 10).

Non-restarted methods like CG and BiCGStab naturally remove some eigenvalues as the iteration proceeds – leads to superlinear convergence.

Problem: Restarted GMRES often cannot “remove” eigenvalues.

1/ n


Solution: Add approximate eigenvectors to the subspace.

• Morgan, 1995 (GMRES-IR), 2002 (GMRES-DR)

Subspace:

Eigenvector portion + Krylov portion

},...,,,,,...,,{ 01

02

0021 rArAArryyySpan kmk

yi’s are chosen to be harmonic Ritz vectors.


GMRES-DR(m,k):

• Solves linear equations and compute eigenvalues simultaneously.

• Add approximate eigenvectors to the Krylov subspace for the linear equations which essentially removes the corresponding eigenvalues and can thus improve convergence.


0 50 100 150 200 250 300 350 40010

-5

10-4

10-3

10-2

10-1

100

101

102

Matrix-Vector Products

Re

sid

ua

l No

rm

GMRES(25)

BiCGStabG-DR(25,10)

Full-GMRES

GMRES-DR vs. other methodsMatrix: bidiagonal


Aspects of GMRES-DR

• GMRES-DR creates an Arnoldi-like recurrence,

where Vk is the n by k matrix with columns spanning the approximate eigenvectors, and Hk (bar) is small, k+1 by k.

• Have both approximate eigenvectors and their products with A in compact storage.

kkk HVAV 1


For multiple right-hand sides

• Solve first right-hand side with GMRES-DR

• Use the computed eigenvectors for the other right-hand sides. Method is

GMRES-Proj: Alternate:

1) projection over eigenvectors2) cycles of regular GMRES


0 50 100 150 200 250 300 350 40010

-5

10-4

10-3

10-2

10-1

100

101

102


Re

sid

ua

l No

rm

GMRES(25)

BiCGStabG-DR(25,10)

Full-GMRESG(15)-Proj(10)

GMRES-Proj for the 2nd rhs (following GMRES-DR for 1st rhs)

Matrix: bidiagonal


Wilson 203 x 32 at cr


Twisted Mass 203 x 32 at cr


• Krylov subspaces are shift invariant in that A-I generates the same Krylov subspace no matter what the shift.

• So the goal is to solve all shifted systems with ONE Krylov subspace.

• For non-restarted methods this has been done, for example; QMR (Freund) and BiCGStab (Frommer).

Multi-masses or multiple shifts

bxIA ii )(


Restarted methods with multiple shifts

Restarting makes it more difficult because the shifted residuals are not parallel to one another, generating different Krylov subspaces.

Frommer and Glassner (restarted GMRES): • Force residuals to all be parallel after a

restart. • Can continue using one Krylov subspace for

all shifted systems• Minimal residual property is maintained only

for the base shift system.


GMRES-DR for multiple shifts

• Subspaces generated by GMRES-DR are combination of approximate eigenvectors portion and Krylov subspace portion, but remarkably when put together, they are Krylov themselves (with a different starting vector).

• So, GMRES-DR can be restarted like GMRES for multiple shifts.


Multiple right-hand sides with multiple shifts and deflation

• Deflating eigenvalues is difficult for multiple shifts because one can not keep residual vectors parallel unless one has exact eigenvectors.

• Solution: force error to be in the direction of one vector,namely vk+1 from

• Then can correct error at the end.• Need solution of one extra right-hand side.

kkk HVAV 1


Solution of ten right hand sides. Matrix: bidiagonal

Blue: base system (sigma=0)

Red: shifted system (sigma=-2)

Green: base (uncorrected)


=> Deflated BiCGStab (D-BiCGStab)

What if you don’t like restarting, but still want to solve multiple right-hand sides?

Problem: Projection over eigenvectors is not good enough to last for the entire run of BiCGStab.

Solution: Use a projection over both right and left eigenvectors.

Deflated BiCGStab for the second and subsequent right-

hand sides:1) Project over right and left eigenvectors2) Run BiCGStab


Deflated BiCGStab for the 2nd rhs with Left-Right Projection

Matrix: bidiagonal

0 50 100 150 200 250 300 350 40010

-5

10-4

10-3

10-2

10-1

100

101

102


Re

sid

ua

l No

rm

GMRES(25)

BiCGStabD-BiCGStab

G-DR(25,10)

Full-GMRESG(15)-Proj(10)


203x32 Wilson at cr


203x32 Wilson at cr : number of

eigenvectors

• Speedup at cr on 203x32: BiCG/D-BiCG ~ 5• Speedup at cr on 164: ~ 2.7



Some questions about Wilson matrix computations

• How does the optimal number of eigenvalues depend on the size of the problem? Test: all at cr.

Answer: it increases, but not nearly proportional to n (fortunately).

Proj/D-BiCG

n=24,576 (84) k~10n=49,152 (83x16) k~15n=98,304 (83x32) k~20n=393,216 (164) k~20n=1,536,000 (203x32) k~40


Some questions about Wilson matrix computations

• After eigenvalues are deflated, how does the number of iterations vary with the size of the problem? (cr.)

Answer: it increases, but not nearly proportional to n (fortunately).

Proj D-BiCG

n=24,576 (84) iters ~ 100n=49,152 (83x16) iters ~ 125n=98,304 (83x32) iters ~ 140n=393,216 (164) iters ~ 300 300n=1,536,000 (203x32) iters ~ 600 500

~ 45% increase in iters with volume change of 3.


Hermitian Systems

• Lüscher’s domain decomposition deflation algorithm.

• Stathopoulos/Orginos multiple right-hand side deflation algorithm.

• Both are tested for dynamical Wilson/clover fermions, using MHM.


Lüscher algorithm

• Breaks problem up into a deflation subspace, S, defined on 44 blocks and the orthogonal complement, . Uses projectors PL and/or PR. Basic deflated system:

where

• After SAP preconditioning:

S

(x) (x) k (x)(A 1)kl ( l ,)k,l1

N

PLD(x) PL(x), (1 PR )(x) 0

PL(x) (x) Dk (x)(A 1)kl ( l ,)k, l1

N

PR(x) (x) k (x)(A 1)kl ( l ,D)k, l1

N

PLDMSAP(x) PL(x), (x) PRMSAP(x)


More algorithm details…• Outer part - Uses SAP (Schwartz Alternating Procedure)

as a right preconditioner on 43x8 blocks and the GCR (Generalized Conjugate Residual - consistent with SAP) algorithm for the Krylov inverter on the global (orthogonal) space. (See Lüscher’s PoS LAT2005:002, 2006 talk based on earlier work.) Preconditioning reduces the iteration count of GCR and deflation overhead.

• Inner part (“little Dirac”) - This system is fairly large. It is even-odd preconditioned and “global mode” deflated. Also uses GCR here.

• See

Frank, Vuik, SIAM. J. Sci. Comput. 23, 442, 2001Nabben, Vuik, SIAM. J. Sci. Comput. 27, 1742, 2006

for mathematical background of domain decomposition and preconditioning.


Even more algorithm details…

• Initial eigenvalue computation: Prepares deflated space for later use. Applies inverse iteration (SAP) to Ns (=20) random global vectors to get low modes, which are then projected onto domains. “Little Dirac” subspace (=NsNd): 20x2592 or 20x8192 (small vectors). Total overhead: 150s for 243x48 lattice and 184s for 323x64 lattice (tuned to mcr).

• Solves the MHM matrix on the full system knocking out the deflated eigenvectors with PL projector while again using GCR algorithm. Done for one mass at a time.

• Deflation on domains works because of “local coherence”. Only a small number of global vectors projected onto the blocks are needed to project out low modes.


Analogy

• Domain decomposition deflation achieved by using low modes which are smooth but far from being approximate eigenmodes => “local coherence”.

• Tested on 2 flavor Wilson/clover configs (50).


323x64 lattice solver times

• Peak speedup (BiCG/DFL): 366/32=11.4• Integrated speedup (BiCG/DFL): 966/314=3.1 (5 masses)• ~ 13% outer (?% inner) increase in iters with a volume

change of 3.


Stathopoulos/Orginos algorithm

• eig-CG(Nev,m) like GMRES-DR, solves linear equations and does simultaneous improvements of the deflated eigenvectors. The eigenvector part is restarted, which, however, does not affect the solution of the CG linear equations.

• incremental eig-CG(s) calls eig-CG, and adds Nev new eigenvectors to a separate subspace after each rhs, and does orthogonalization. It is used for the first s s1 rhs’s.

• init-CG uses the final information generated by incremental eig-CG. Accuracy is the key!


More technical page• eig-CG(Nev,m) has a restarted subspace of maximum

dimension m. (Made up of Nev previous eigenvectors , Nev current eigenvectors and (m-2*Nev) Krylov vectors). Uses Rayleigh-Ritz to compute eigenvectors and appends portions of the CG search space (Krylov part) to the eigenvectors. Typically, however, the linear equations converge faster than the eigenvalue part.

• Incremental eig-CG(s) (s = 2,…) accepts (s-1)*Nev eigenvectors, calls eig-CG for s s1 rhs’s, and accumulates another Nev approximate Ritz vectors from each new right hand side. Needs significant storage.

• init-CG does a standard Galerkin projection on the initial solution vector. A single restart is done.

• Tested on “several” anisotropic, 2 flavor Wilson fermion gauge fields. Uses single precision, except on dot products. m=100, Nev=10 for 48 total rhs’s (s1=24).


Convergence of deflated eigenvalues

• Point: converges as fast as if they weren’t restarting.


Incremental RHS solver history

• Spike on last 24 caused by a restart necessary because of eigenvector accuracy.


Solver performance vs. quark mass

• Last 24 right-hand sides only compared to non-deflated CG. Peak speedup ~10 on smaller lattice near mcr. Integrated speedup ~6 (all rhs’s.)

• Peak speedup ~ 6.9 on larger lattice near mcr.• ~ 190% increase in iters with a volume change of 3.


Summaries…

• For Hermitian systems (MHM), the Stathopoulos/Orginos algorithm is effective for a sufficiently large number of rhs’s. Uses many eigenvectors, but no spectral preconditioners. Uses eigenvectors on starts (and a single init-CG restart). Krylov/RayleighRitz based. Needs accurate eigenvectors which improve over additional rhs’s. Like GMRES-DR solves linear equations at the same time as computing eigenvectors. “Large” V2 problem.


• Lüscher’s algorithm, built within his SAP+GCR inverter, applied to MHM, works well for QCD and also defeats critical slow down. There is a overhead in compute time for subspace generation, but gets amortized over many rhs’s or masses. Uses many inexact eigenvectors and makes extensive use of spectral preconditioners. DD+Krylov+preconditioning. Uses eigenvectors at every iteration, but very small number of iterations. Deflation on domains is a new idea. “Small” V2 problem.


• Deflated GMRES is also Krylov/RayleighRitz based. Useful for multiple rhs’s as well as shifting. D-BiCGStab can be used for multiple rhs’s also. We would do Wilson/clover without the MHM step plus shifting for various masses. We don’t need spectral preconditioners for GMRES-Proj or D-BiCGStab; we use a modest number of fairly accurate eigenvectors, which are used at restarts or a single time for D-BiCGStab (better eigenvector accuracy needed for D-BiCGStab than Proj.). “Mild” V2 problem. (Caveat: Our lattices are 8X smaller than Lüscher’s.)


is a breakthrough method for lattice QCD!


Serial Multi-Mass

• Because of the twisted mass, , it is not possible to apply multi-mass solvers to twisted mass problems simultaneously with even-odd preconditioning.

• We can accelerate the convergence of twisted-mass problems with multiple masses and even-odd preconditioning.

• The method is based on solving the systems serially but using an improved initial guess by making a minimal residual projection over available solutions of the previous systems. Improves as the number of solved systems increases.


Mass number

kappa X0=0 With projectionHigh -> low

With projectionLow -> high

1 0.157290 0.005 1270 820 1270

2 0.157250 0.007 1210 640 1030

3 0.157210 0.009 1150 580 940

4 0.157170 0.011 1090 520 760

5 0.157130 0.013 1030 460 610

6 0.157090 0.015 940 400 490

7 0.157050 0.017 880 370 370

8 0.157010 0.019 850 430 310

9 0.156970 0.021 790 520 280

10 0.156930 0.023 760 610 340

11 0.156890 0.025 730 730 190

Total MVP 10,700 6,080 6,590

Using Serial multi-mass with Twisted Mass Fermions

Documents

August 4, 2007, Deflation Methods in Fermion Inverters, 1 Methods for Fermion Inverters Walter Wilcox Baylor University Joint with Ron Morgan (Mathematics