Algebraic SubAlgebraic Sub--structuring (Domain Decomposition) structuring (Domain Decomposition) for Largefor Large--scale Electromagnetic Applicationscale Electromagnetic Application
Weiguo Gao Chao Yang Sherry LiLawrence Berkeley National Laboratory
Zhaojun BaiUniversity of California, Davis
Joint Work with Rich Lee, Kwok Ko (SLAC)
01/13/05 DD16 2
OutlineOutline
Algebraic Multi-Level Sub-structuring (AMLS)Why does it work?Implementation issuesNumerical results and performance
Focus on electromagnetic applicationCompare with shift-and-invert Lanczos (SIL)
time, memory, accuracy
DOE SciDAC ProjectsTerascale Optimal PDE Simulations (David Keyes)Advanced Computing for 21st Century Accelerator Science and Technology (Kwok Ko, Rob Ryne)
01/13/05 DD16 3
BackgroundBackground
Generalized sparse eigenvalue problem: K symmetric, M SPDNeed large number of small nonzero eigenvalues
Sub-structuring dates back to the 1960’s (CMS)Plenty of engineering literature
AMLS has recently been used successfully in structural engineering (Bennighof, Kaplan, Lehoucq)
Compute vibration modesPerform frequency response analysis
Open questions remain as a general-purpose solverAccuracyPerformance
K xMx λ=
01/13/05 DD16 4
EM ApplicationEM Application
Can we extend the success story from structural engineering to electromagnetic applications (accelerator cavity design)?Important for the next generation linear accelerator design (SLAC)
Curl-curl formulation of Maxwell’s equation
( )
( ) B
E
EEEE
Γ=×∇×Γ=×
Ω=−×∇×∇
on 0 on 0 in 0
nnλ
01/13/05 DD16 5
Challenges of Challenges of EigenproblemsEigenproblems in Accelerator Designin Accelerator Design
Large matrix size for realistic structuresTens of millions to hundreds of millions
Small eigenvalues (tightly-clustered) out of a large-eigenvalue dominated eigenspectrum
Many small nonzero eigenvalues desired
Large null space in the stiffness matrixUp to a quarter of the dimension
Requires high accuracy for eigenpairs
01/13/05 DD16 6
To Answer The Question…To Answer The Question…
Look at the description of the algorithm to see if it is applicableAnalyze approximation properties of the algorithm (error estimate)Examine the complexity of the implementation
01/13/05 DD16 7
Single Level Single Level SubstructuringSubstructuring
1. Partitioning & reordering for (K,M)
2. Block Gaussian elimination (congruence transformation)
11K
22K
33K
11M
22M
33M
TKLLK −−= 1ˆ TMLLM −−= 1ˆ
11K
22K
33K
11M
22M
33M
01/13/05 DD16 8
Single Level (cont)Single Level (cont)
3. Sub-structure calculation for a subset of the modes(mode selection)
3. Subspace assembling
)3(33
)3()3(33
)()()(
ˆˆ2,1,
vMvK
ivMvK iii
iiii
µ
µ
=
==
1S
2S
=S
3S
( ) 3,2,1,)()(2
)(1 == ivvvS i
kii
i iL
01/13/05 DD16 9
Single Level (cont)Single Level (cont)
5. Projection (Rayleigh-Ritz)
6. Unravel
( ) qSMSqSKS TT )ˆ(ˆ θ=
( )mD θθθ ,,,diag 21 L=
mT SQLZ −=
)ˆ,),,,(( 121
Tmm LKLKqqqQ −−== L
01/13/05 DD16 10
Algebraic AnalysisAlgebraic Analysis
Canonical form
=
3
2
1
3
2
1
ˆyyy
VV
Vx 1Λ
2Λ
3Λ
MM
II
I
=x ≈ε
ε
1V
2V
3V
1S
2S
3Sε
yy
u=
01/13/05 DD16 11
Example 1 (BCS09)Example 1 (BCS09)
plot-1y
01/13/05 DD16 12
Example 2 (Accelerator Model)Example 2 (Accelerator Model)
plot-1y plot-2y
01/13/05 DD16 13
Error BoundError Bound
hxu
h
nM
n
12
111ˆ
2111
)ˆ,(sin
)(
λλλλ
λλλθ
−−
≤∠
−≤−
measures the size of the “truncated” components of Related work
Bourquin et al. (CMS analysis)Bekas & Saad (Spectral Schur Complement)Elssel & Voss (minmax theory for rational eigenvalueproblem)
yh
01/13/05 DD16 14
Multilevel Algorithm (AMLS)Multilevel Algorithm (AMLS)
1. Matrix partitioning and reordering using nested dissection2. Block elimination and congruence transformation3. Mode selection for sub-structures and separators4. Subspace assembling5. Projection calculation6. Eigenvalues of the projected problem
separator
substructure
01/13/05 DD16 15
ImplementationImplementation
Major Operations:Transformations and projection
Steps 2-5 can be interleaved
Eigenpairs of the projected problem
Cost:Flops: more than a single sparse Choleskyfactorization Storage: Block Cholesky factor + Projected matrix + some other stuffNO Triangular solves, NO orthogonalization
01/13/05 DD16 16
Task Dependency (Greedy Algorithm)Task Dependency (Greedy Algorithm)
1 2
3
7 14
15
01/13/05 DD16 17
Bottom LevelBottom Level
Eliminator:
Modes:
Eliminate K: one sided update
Congruence transformation on M: two sided update
Store half-projected (dashed box)
1iL
1,ˆiM
1111111 Λ= VMVK
01/13/05 DD16 18
Higher LevelHigher Level
Additional updates of previously “half-projected” columns(dashed box)
Completion of some blocks (red box)
01/13/05 DD16 19
Example: Accelerator ModelExample: Accelerator Model
A 6-cell DDS structureN = 65K NNZ = 1455772nev = 100
All the coupling modes are selected
SIL took 407 sec (ARPACK + sparse LDLT)
01/13/05 DD16 20
When Is AMLS Faster?When Is AMLS Faster?
Many eigenvalues are wanted (up to ~8% in this case)SIL requires multiple shifts (factorizations)
01/13/05 DD16 21
Memory ProfileMemory Profile
Save up to 50% memory with 13% re-compute timeSIL needs ~308 Mbytes memory
01/13/05 DD16 22
Accuracy Compared with SILAccuracy Compared with SIL
Levels=5, increasing nmodes of sub-structures
01/13/05 DD16 23
Concluding RemarksConcluding Remarks
EM in accelerator simulation is a truly challenging engineering problemBetter understanding of accuracy AMLS software to be released
General-purpose, memory efficientApplication-tuned: null space handling
Performance advantage shows up when:The problem is large enoughA large number of eigenpairs are needed
Many tuning parametersNumber of levels, number of modes, tolerances