Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
A High-Performance Preconditioned SVD Solver for Accurately Computing Large-Scale Singular Value Problems in PRIMME
Lingfei Wu, Andreas Stathopoulos (Advisor), Eloy Romero (Advisor), College of William and Mary, Williamsburg, Virginia, USA Introduction
v Experimental setup: Ø Compare PRIMME_SVDS against TRLBD, Krylov-Schur (KS), Generalized
Davidson (GD), and Jacob-Davidson (JD) in SLEPc Ø Seek a small number of extreme singular triplets W/ and W/O preconditioning Ø Matrices and vectors distributed on total 1024 processes on Edison (NERSC) or 96 processes on SciClone (W&M)
v Our approach: a preconditioned hybrid, two-stage SVD method on top of the state-of-the-art eigensolver PRIMME
Experiments and Results
PRIMME_SVDS: A High-Performance SVD Solver
State-of-The-Art SVD Solvers
v The problem: find k extreme singular triplets of Am×n Aνi = σiui , i = 1, 2, …, k , k << n
v Iterative methods for computing SVD: � Hermitian eigenvalue problem on C = ATA or C = AAT � Hermitian eigenvalue problem on B = [0 AT; A 0] � Lanczos bidiagonalization method (LBD)
A = PBdQT and Bd = XΣYT where U = PX and V = QY. v Comparison of different approaches
Fig. 3: Comparing different methods on relat9 and delaunay_n24
v Our goal: solve large-scale, sparse SVD problems with unprecedented efficiency, robustness and accuracy
v Scarcity of SVD software for large-scale problems
v A clear need for a high quality SVD solver software Ø Implement state-of-the-art methods with massively parallelism Ø Compute smallest singular triplets effectively and accurately Ø Leverage preconditioning to deal with growing size and
difficulty of real-world problems Ø Flexible interface suitable for black-box and advanced usage
Software State-of-the-art Methods
Classic Methods
Lang MPI/SMP
Precond-itioning
Extreme SVs
Fast Full Accuracy
PRIMME PRIMME_SVDS Multimethod C Both Y Y Y
N IRRHLB N/A Matlab N N Y Y
N IRLBA N/A Matlab N N Y Y
N JDSVD N/A Matlab N Y Y Y
N SVDIFP N/A Matlab N Y Y N
SLEPc N/A Many C MPI Y Y N
PROPACK N/A LBD F77 SMP N Y N
SVDPACK N/A Lanczos F77 N N Y N
Applications DNA Least squares Graph clustering QCD
Matrix relat9 cage15 delaunay_n24 Laplacian
Dimension 12,360,060×549,336 5,154,859 16,777,216 8,000 per proc
NNZ 38,955,420 99,199,551 100,663,202 55,760 per proc
Target Smallest Smallest Largest Largest
Preconditioning N/A BoomerAMG N/A N/A
Eigenmethod on C
• Fast for largest SVs • Slow for smallest SVs • Achieve accuracy of
O(κ(A)||A||εmach)
Eigenmethod on B
• Slow for largest SVs • Extremely slow for
smallest SVs (interior eigenvalue problem)
• Achieve accuracy of O(||A||εmach)
LBD on A
• Fast for largest SVs • Similar to C but
exhibits irregular convergence for smallest SVs
• Achieve accuracy of O(||A||εmach)
GD+k on C + dynamic tolerance
JDQMR on B + initial guesses
from C + refined projection
PRIMME_SVDS: a preconditioned two-stage SVD
method
v Results I: methods comparison (low accuracy)
0 500 1000 1500 2000 2500 300010−10
10−5
100
105
Matvecs
||AT u
− m
v||
A=diag([1:10 1000:100:1e6]), Prec=A+rand(1,10000)*1e4
GD+1 on BGD+1 on C to limitGD+1 on C to eps||A||2switch to GD+1 on B
Fig. 2: Hybrid, two-stage SVD method and an example
This work is supported by NSF under grants No. CCF 1218349 and ACI SI2-SSE 1440700, and by DOE under a grant No. DE-FC02-12ER41890
v Key components: Ø Stage I: fast convergence on matrix C to compute approximate
eigenvalues and eigenvectors Ø Stage II: improve accuracy by exploiting power of PRIMME and
specialized refined projection on matrix B v Parallel Implementation of PRIMME_SVDS Ø User defined data distribution among processes Ø User defined parallel matrix-vector and preconditioning functions Ø User provided global summation for dot products
Table 3: Dimension of the matrices and parameters used in the experiments
Fig. 1: Advantages and disadvantages of different approaches
Table 1: Functionalities of different SVD solvers
v Results II: scalability performance (high accuracy)
Fig. 4: Strong scaling: seek smallest on relat9 and cage15
Fig. 5: Strong and Weak scaling: seek largest on delaunay_n24 and Laplacian
v Highlights of PRIMME_SVDS: Ø Among the fastest and most robust production level software for
computing a small number of singular triplets Ø Exploits preconditioning for large-scale problems Ø Computes efficiently smallest SVs in full accuracy Ø Demonstrates good scalability under both strong and weak scaling even for highly sparse matrices Ø Free software, available at: https://github.com/primme/primme
Number of MPI Processes16 32 64 128 256 512
Run
time
(Sec
ond)
0
500
1000
1500Seeking Smallest without Preconditioning
Para
llel S
peed
up
16
32
64
128
256
Number of MPI Processes16 32 64 128 256 512 1024
Run
time
(Sec
ond)
0
500
1000
1500Seeking Smallest with Preconditioning
Para
llel S
peed
up
16
32
64
128
256
512
Number of MPI Processes16 32 64 128 256 512
Run
time
(Sec
ond)
0
500
1000
1500
2000Seeking Largest without Preconditioning
Para
llel S
peed
up
16
32
64
128
256
512
Number of MPI Processes64 125 216 343 512 729 1000
Run
time
(Sec
ond)
0
0.005
0.01
0.015
0.02Seeking Largest without Preconditioning
Para
llel E
ffici
ency
0
0.25
0.5
0.75
1
Operations Kernels or Libs Cost per iteration Scalability Dense algebra: MV, MM, Inner Prods, Scale
BLAS (eg, MKL, ESSL, OpenBlas, ACML)
O((m+n)*numSVs) Good
Sparse algebra: SpMV, SpMM, Preconditioner
User defined (eg, PETSc, Trilinos, HYPRE, librsb)
O(1) calls Application dependent
Reduction User defined (eg, MPI_AllReduce)
O(1) calls of size O(numSVs)
Machine dependent
Table 2: Parallel characteristics of PRIMME operations
Number of MPI Processes20 40 60 80 100
Spee
dup
over
TR
LBD
100
101
102 Seeking Smallest without Preconditioning
PRIMME_SVDSJD on CKS on CTRLBD
Number of MPI Processes20 40 60 80 100
Spee
dup
Ove
r TR
LBD
1
1.5
2
2.5
3
3.5
4Seeking Largest without Preconditioning
PRIMME_SVDSGD on CKS on CTRLBD