Upload
dangbao
View
220
Download
0
Embed Size (px)
Citation preview
Efficient computation of sparse matrix
functions for large scale electronic
structure calculations: The CheSS
library
Stephan Mohr, William Dawson, Michael Wagner, Damien
Caliste, Takahito Nakajima, Luigi Genovese
Barcelona Supercomputing Center
ELSI Connector Meeting 2017
location, 16th August 2017
Motivation Applicability Theory Sparsity Accuracy Scaling with matrix properties Parallel scaling Comparison
Outline
History and motivation for CheSS
Applicability of CheSS
Short overview of the theory behind CheSS
Various performance data
2 / 18
Motivation Applicability Theory Sparsity Accuracy Scaling with matrix properties Parallel scaling Comparison
Motivation for CheSS
CheSS is a “spin-off” of the linear scaling version of BigDFT.
localized basis set leads to sparse matrices
we have to exploit this sparsity to reach linear scaling
we did not find a package that fits our need
=⇒ we created our own sparse matrix routines within BigDFT
We have the same situation in all DFT codes with a localized basis set=⇒ we created a standalone library: CheSS
CheSS can be obtained for free from https://launchpad.net/chess
A paper is in review: https://arxiv.org/abs/1704.00512
3 / 18
Motivation Applicability Theory Sparsity Accuracy Scaling with matrix properties Parallel scaling Comparison
Applicability of CheSS
CheSS performs best for matrices exhibiting a small spectral width.
Can this be obtained in practice?
S H
system #atoms sparsity εmin εmax κ sparsity εmin εmax λ ∆HL
DNA 15613 99.57% 0.72 1.65 2.29 98.46% -29.58 19.67 49.25 2.76bulk pentacene 6876 98.96% 0.78 1.77 2.26 97.11% -21.83 20.47 42.30 1.03perovskite 768 90.34% 0.70 1.50 2.15 76.47% -20.41 26.85 47.25 2.19Si nanowire 706 93.24% 0.72 1.54 2.16 81.61% -16.03 25.50 41.54 2.29water 1800 96.71% 0.83 1.30 1.57 90.06% -26.55 11.71 38.26 9.95
4 / 18
Motivation Applicability Theory Sparsity Accuracy Scaling with matrix properties Parallel scaling Comparison
Basic idea
In CheSS we approximate matrix functions by Chebyshev polynomials:
p(M) =c0
2I +
npl∑i=1
ciTi (M̃) ,
with
M̃ = σ(M− τ I) ; σ =2
εmax − εmin; τ =
εmin + εmax
2
and
cj =2
npl
npl−1∑k=0
f
[1
σcos
(π(k + 1
2 )
npl
)+ τ
]cos
(πj(k + 1
2
npl
).
Recursion relation for the Chebyshev polynomials:
T0(M̃) = I ,
T1(M̃) = M̃ ,
Tj+1(M̃) = 2M̃Tj(M̃)− Tj−1(M̃) .
Each column independent=⇒ easily parallelizable.
Strict sparsity pattern=⇒ linear scaling
5 / 18
Motivation Applicability Theory Sparsity Accuracy Scaling with matrix properties Parallel scaling Comparison
Available functions
CheSS can calculate those matrix functions needed for DFT:
density matrix: f (x) = 11+eβ(x−µ)
(or f (x) = 12
[1− erf
(β(ε− x)
)])
energy density matrix: f (x) = x1+eβ(x−µ)
(or f (x) = x2
[1− erf
(β(ε− x)
)])
matrix powers: f (x) = xa (a can be non-integer!)
We can calculate arbitrary functions by changing only the coefficients cj !
Only requirement:function f must be well representable by Chebyshev polynomials over theentire eigenvalue spectrum.
6 / 18
Motivation Applicability Theory Sparsity Accuracy Scaling with matrix properties Parallel scaling Comparison
Sparsity and truncation
CheSS works with predefined sparsity patterns.
In general there are three:pattern for the original matrix Mpattern for the matrix function f (M)auxiliary pattern to perform the matrix-vector multiplications
At the moment all of them must be defined by the user.Typically: distances atoms / basis functions
1 original matrix M
2 exact calculation of M−1 withoutsparsity constraints
3 sparse calculation of M−1 usingCheSS within the sparsity pattern
4 difference between Fig. 2 andFig. 3
7 / 18
Motivation Applicability Theory Sparsity Accuracy Scaling with matrix properties Parallel scaling Comparison
Accuracy – error definition
There are two possible factors affecting the accuracy of CheSS:
error introduced due to the enforced sparsity (truncating thematrix-vector multiplications)
error introduced by the Chebyshev fit
This also affects the definition of the “exact solution”. Two possibilities:
1 calculate the solution exactly and without sparsity constraints andthen crop to the sparsity pattern. Shortcoming: violates in generalthe identity f −1(f (M)) = M
2 calculate the solution within the sparsity pattern, and define as exactone that which fulfills f̂ −1(f̂ (M)) = M
According error definitions:
1 wf̂sparse= 1|f̂ (M)|
√∑(αβ)∈f̂ (M)
(f̂ (M)αβ − f (M)αβ
)2
2 wf̂ −1(f̂ ) = 1|f̂ (M)|
√∑(αβ)∈f̂ (M)
(f̂ −1(f̂ (M))αβ −Mαβ
)2
8 / 18
Motivation Applicability Theory Sparsity Accuracy Scaling with matrix properties Parallel scaling Comparison
Accuracy
Inverse:
wf̂ −1(f̂ ): error due to Chebyshev fitbasically zero
wf̂sparse: error due to sparsity pattern
very small
10-12
10-11
10-10
10-9
10-8
10-7
10-6
10 100 1000
mean r
ela
tive e
rror
condition number
wf̂-1
(f̂)wf̂sparse
Density matrix:
energy (i.e. Tr(KH)): relative errorof only 0.01%
slightly larger error for smallspectral width:eigenvalues are denser, finitetemperature smearing affects more
0.000
0.005
0.010
0.015
0.020
0.025
0.001 0.01 0.1 1
rela
tive
err
or
in %
gap (eV)
spectral width 50.0 eVspectral width 100.0 eVspectral width 150.0 eV
9 / 18
Motivation Applicability Theory Sparsity Accuracy Scaling with matrix properties Parallel scaling Comparison
Scaling with matrix size and sparsity
Series of matrices with the same “degree of sparsity”(DFT calculations of water droplet of various size).
Example: calculation of the inverse
Runtime only depends on the number of non-zero elements of M
no dependence on the total matrix size
0
20
40
60
80
100
120
5x106
1x107
1.5x107
2x107
run
tim
e (
se
co
nd
s)
number of non-zero elements in the original matrix
matrix size 6000matrix size 12000matrix size 18000matrix size 24000matrix size 30000matrix size 36000
10 / 18
Motivation Applicability Theory Sparsity Accuracy Scaling with matrix properties Parallel scaling Comparison
Scaling with spectral properties
CheSS is extremely sensitive to the eigenvalue spectrum:
required polynomial degree strongly increases with the spectral width
as a consequence the runtime strongly increases as well
a good input guess for the eigenvalue bounds helps a lot
Example: calculation of the inverse
1
10
100
1000
10 100 1000 10
100
1000
10000
runtim
e (
seconds)
poly
nom
ial degre
e
condition number
runtime "bounds default"runtime "bounds adjusted"
npl "bounds default"npl "bounds adjusted"
11 / 18
Motivation Applicability Theory Sparsity Accuracy Scaling with matrix properties Parallel scaling Comparison
Scaling with spectral properties
For the density matrix the performance depends on two parameters:
spectral width (the smaller the better)
HOMO-LUMO gap (the larger the better)
In both cases the polynomial degree can increase considerably=⇒ CheSS less efficient.
0
50
100
150
200
0.001 0.01 0.1 1 0
500
1000
1500
2000
2500
3000
run
tim
e (
se
co
nd
s)
po
lyn
om
ial d
eg
ree
HOMO-LUMO gap (eV)runtime, εmax-εmin=50.0 eV
runtime, εmax-εmin=100.0 eVruntime, εmax-εmin=150.0 eV
npl, εmax-εmin=50.0 eVnpl, εmax-εmin=100.0 eVnpl, εmax-εmin=150.0 eV
12 / 18
Motivation Applicability Theory Sparsity Accuracy Scaling with matrix properties Parallel scaling Comparison
Parallel scaling
The most compute intensive part of CheSS is the matrix-vectormultiplications.
Easily parallelizable
identical for all operations
Example: Calculation of M−1 (runs performed with 16 OpenMP threads)
0
5
10
15
20
25
30
35
500 1000 1500 2000 2500
sp
ee
du
p
number of cores
matrix size 12000matrix size 24000matrix size 36000
ideal
4
8
16
32
64
128
128 256 512 1024 2048
run
tim
e(s
eco
nd
s)
number of cores
matrix size 12000matrix size 24000matrix size 36000
ideal
13 / 18
Motivation Applicability Theory Sparsity Accuracy Scaling with matrix properties Parallel scaling Comparison
Extreme scaling
We have also performed extreme-scaling tests from 1536 to 16384 cores.
Example: Calculation of M−1 (runs performed with 8 OpenMP threads)
1
2
3
4
5
6
7
8
9
10
11
2000 4000 6000 8000 10000 12000 14000 16000
speedup
number of cores
matrix size 96000ideal
Given the small matrix (96, 000) the obtained scaling is acceptable.
We will try to further improve the scaling.14 / 18
Motivation Applicability Theory Sparsity Accuracy Scaling with matrix properties Parallel scaling Comparison
Comparison with other methods: Inverse
Comparison of the matrix inversion between:
CheSS
SelInv
ScaLAPACK
LAPACK
1
10
100
2 4 8
16
32
64
128
runtim
e (
seconds)
κ
matrix size 6000 sparsity S: 97.95 % sparsity S
-1: 88.45 %
CheSS
1
10
100
2 4 8
16
32
64
128
κ
matrix size 12000 sparsity S: 98.93 % sparsity S
-1: 93.73 %
SelInv
1
10
100
2 4 8
16
32
64
128
κ
matrix size 18000 sparsity S: 99.27 % sparsity S
-1: 95.65 %
ScaLAPACK
1
10
100
2 4 8
16
32
64
128
κ
matrix size 24000 sparsity S: 99.45 % sparsity S
-1: 96.66 %
LAPACK
1
10
100
2 4 8
16
32
64
128
κ
matrix size 30000 sparsity S: 99.56 % sparsity S
-1: 97.28 %
15 / 18
Motivation Applicability Theory Sparsity Accuracy Scaling with matrix properties Parallel scaling Comparison
Comparison with other methods: Density matrix
Comparison of the density matrix calculation between:
CheSS hybrid MPI/OpenMP
PEXSI hybrid MPI/OpenMP
CheSS MPI-only
PEXSI MPI-only
10
100
1000
50 100 150 200
runtim
e (
seconds)
spectral width (eV)
matrix size 6000 sparsity S: 97.95 % sparsity H: 92.97 % sparsity K: 88.45 %
CheSS hybrid(160 MPI x 12 OMP)
10
100
1000
50 100 150 200
spectral width (eV)
matrix size 12000 sparsity S: 98.93 % sparsity H: 96.25 % sparsity K: 93.73 %
PEXSI hybrid(160 MPI x 12 OMP)
10
100
1000
50 100 150 200
spectral width (eV)
matrix size 18000 sparsity S: 99.27 % sparsity H: 97.42 % sparsity K: 95.65 %
CheSS MPI-only(1920 MPI x 1 OMP)
10
100
1000
50 100 150 200
spectral width (eV)
matrix size 24000 sparsity S: 99.27 % sparsity H: 97.42 % sparsity K: 95.65 %
PEXSI MPI-only(1920 MPI x 1 OMP)
10
100
1000
50 100 150 200
spectral width (eV)
matrix size 30000 sparsity S: 99.27 % sparsity H: 97.42 % sparsity K: 95.65 %
16 / 18
Motivation Applicability Theory Sparsity Accuracy Scaling with matrix properties Parallel scaling Comparison
Conclusions
CheSS is a flexible tool to calculate matrix functions for DFT
can easily be extended to further functions
exploits sparsity of the matrices, linear scaling possible
works best for small spectral widths of the matrices
very good parallel scaling (both MPI and OpenMP)
Most important:CheSS is about to be interfaced by ELSI!
17 / 18