Upload
erik-mccoy
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
UNEDF 2011 ANNUAL/FINAL MEETING
Progress report on the BIGSTICKconfiguration-interaction code
Calvin Johnson1
Erich Ormand2
Plamen Krastev1,2,3
1San Diego State University, 2Lawrence Livermore Lab, 3Harvard University
Supported by DOE Grants DE-FG02-96ER40985,DE-FC02-09ER41587, and DE-AC52-07NA27344
UNEDF 2011 ANNUAL/FINAL MEETING
We have good news and bad news...
We have good news and bad news...
....both the same thing........both the same thing....
....the postdoc (Plamen Krastev) got a permanent staff position in scientific computing at Harvard.
....the postdoc (Plamen Krastev) got a permanent staff position in scientific computing at Harvard.
BIGSTICK:
General purpose M-scheme configuration interaction (CI) code
On-the-fly calculation of the many-body Hamiltonian
Fortran 90, MPI and OpenMP
35,000+ lines in 30+ files and 200+ subroutines
Faster set-up
Faster Hamiltonian application
Rewritten for “easy” parallelization
New parallelization scheme
REDSTICK BIGSTICK
2
BIGSTICK:
Flexible truncation scheme: handles ‘no core’ ab initio Nhw truncation, valence-shell (sd & pf shell) orbital truncation; np-nh truncations; and more.
Applied to ab initio calculations, valence shell calculations (in particular level densities, random interaction studies, and benchmarking projected HF), cold atoms, and electronic structure of atoms (benchmarking RPA and HF for atoms).
REDSTICK BIGSTICK
2
Version 6.5 is available at NERSC: unedf/lcci/BIGSTICK/v650/
BIGSTICK uses factorization algorithm reduces storage of Hamiltonian arrays
5
Nuclide Space Basis dim matrix store factorization
56Fe pf 501 M 290 Gb 0.72 Gb7Li Nmax=12 252 M 3600 Gb 96 Gb7Li Nmax=14 1200 M 23 Tb 624 Gb
12C Nmax=6 32M 196 Gb 3.3 Gb12C Nmax=8 590M 5000 Gb 65 Gb12C Nmax=10 7800M 111 Tb 1.4 Tb16O Nmax=6 26 M 142 Gb 3.0 Gb16O Nmax=8 990 M 9700 Gb 130 Gb
Comparison of nonzero matrix storage with factorization
TRIUMF – Feb 2011
UNEDF 2011 ANNUAL/FINAL MEETING
BIGSTICK:
2
Micah Schuster, Physics MS project
BIGSTICK:
2
Joshua Staker, Physics MS project
BIGSTICK:
2
BIGSTICK:
2
3
BIGSTICK
3
UNEDF 2011 ANNUAL/FINAL MEETING
Major accomplishment as of last year:excellent scaling of mat-vec multiply
This demonstrates our factorization algorithm, as predicted, facilitates
efficient distribution of mat-vec ops
This demonstrates our factorization algorithm, as predicted, facilitates
efficient distribution of mat-vec ops
Major accomplishments after last UNEDF meeting:
Rebalanced workload with additional constraint for dimension of local Lanczos vectors (Krastev)
Fully distributed Lanczos vectors with hermiticity on (Krastev)
Major steps towards distributing Lanczos vectors with suppressed hermiticity (Krastev)
OpenMP implementations in matrix-vector multiply (Ormand & Johnson)
Significant progress in 3-body implementation (Johnson & Ormand)
Added restart option (Johnson)
Implemented in-lined 1-body density matrices (Johnson)
6
UNEDF 2011 ANNUAL/FINAL MEETING
Highlighting accomplishments for 2010-2011:
Add OpenMP
Reduce memory load/ node -- Lanczos vectors-- matrix information (matrix elements/jumps)
Speed up reorthogonalization-- I/O is bottleneck
UNEDF 2011 ANNUAL/FINAL MEETING
Highlighting accomplishments for 2010-2011:
Add OpenMP
-- Crude 1st generation by Johnson (about 70-80% efficiency)
-- 2nd generation by Ormand (nearly 100% efficiency)
Hybrid OpenMP+MPI implemented, full testing delayed due to reorthogonalization issues
UNEDF 2011 ANNUAL/FINAL MEETING
Highlighting accomplishments for 2010-2011:
Add OpenMP
Reduce memory load/ node -- Lanczos vectors-- matrix information (matrix elements/jumps)
We break up the Lanczos vectors so only part on each node
Future: separate forward/backward multiplication
4pzJ 4nzJ
3pzJ 3nzJ
Vin
1
2
3
4
Vout
1
2
3
4
1 1
2 2
Proton sector Neutron sector
Lanczos vectors distribution:
22
4pzJ 4nzJ
3pzJ 3nzJ
Vin
1
2
3
4
Vout
1
2
3
4
1 1
2 2
Proton sector Neutron sector
Lanczos vectors distribution:
Hermiticity on
Forward and …
22
4pzJ 4nzJ
3pzJ 3nzJ
Vin
1
2
3
4
Vout
1
2
3
4
1 1
2 2
Proton sector Neutron sector
Lanczos vectors distribution:
Hermiticity on
Forward and …… backward application of H
22
4pzJ 4nzJ
3pzJ 3nzJ
Vin
1
2
3
4
Vout
1
2
3
4
1 1
2 2
Proton sector Neutron sector
Lanczos vectors distribution:
Hermiticity on
Each compute node needs at a minimum TWO sectors from initial and TWO sectors from final Lanczos vector
Forward and …… backward application of H
22
Vin
1
2
Vout
1
2
Lanczos vectors distribution:
Hermiticity off
4pzJ 4nzJ
3pzJ 3nzJ
1 1
2 2
Proton sector Neutron sector
Forward application of H on one node and …
23
Vin
1
2
Vout
1
2
Lanczos vectors distribution:
Hermiticity off
4pzJ 4nzJ
3pzJ 3nzJ
1 1
2 2
Proton sector Neutron sector
Forward application of H on one node and …
… backward application of H on another node
4pzJ 4nzJ
3pzJ 3nzJ
1 1
2 2
1
2
1
2
23
Vin
1
2
Vout
1
2
Lanczos vectors distribution:
Hermiticity off
4pzJ 4nzJ
3pzJ 3nzJ
1 1
2 2
Proton sector Neutron sector
Forward application of H on one node and …
… backward application of H on another node
4pzJ 4nzJ
3pzJ 3nzJ
1 1
2 2
1
2
1
2
Each compute node needs ONE sector from initial and ONE sector from final Lanczos vector
23
Comparison of memory requirements for distributing Lanczos vectors:
Nuclide Space Basis dim Store Hermiticity ON
Hermiticity OFF
12C Nmax = 10 7800M 117GB 8.44GB 4.39GB
60Zn pf 2300M 34GB 8.65GB 4.45GB
24
Memory required to store 2 Lanczos vectors (double precision) on a node
Comparison of memory requirements for distributing Lanczos vectors:
Nuclide Space Basis dim Store Hermiticity ON
Hermiticity OFF
12C Nmax = 10 7800M 117GB 8.44GB 4.39GB
60Zn pf 2300M 34GB 8.65GB 4.45GB
24
Memory required to store 2 Lanczos vectors (double precision) on a node
Distribution scheme with suppressed hermiticity is the most memory efficient. This is the scheme of choice for us
UNEDF 2011 ANNUAL/FINAL MEETING
Highlighting accomplishments for 2010-2011:
Add OpenMP
Reduce memory load/ node -- Lanczos vectors-- matrix information (matrix elements/jumps)
Speed up reorthogonalization-- I/O is bottleneck
UNEDF 2011 ANNUAL/FINAL MEETING
Highlighting accomplishments for 2010-2011:
Add OpenMP
Reduce memory load/ node -- Lanczos vectors-- matrix information (matrix elements/jumps)
Speed up reorthogonalization-- I/O is bottleneck
We (i.e. PK) spent time trying to make MPI/IO efficient for our needs via striping, etc.
Analysis by Rebecca Hartman-Baker (ORNL) suggests our I/O still running sequentially rather than in parallel.
Now we will store all Lanczos vectors in memory a la MFDn(makes restarting an interrupted run difficult)
UNEDF 2011 ANNUAL/FINAL MEETING
Next steps for remainder of project period:
•Store Lanczos vectors in RAM (end of summer)•Write paper on factorization algorithm (drafted, finish by9/2011)•Fully implement MPI/ OpenMP hybrid code (11/2011)•Write up paper for publication of code (early 2012)
UNEDF 2011 ANNUAL/FINAL MEETING
UNEDF Deliverables for BIGSTICK
•The LCCI project will deliver final UNEDF versions of LCCI codes, scripts, and test cases will be completed and released. Current version (6.5) at NERSC; expect final version by end of year; plans to publish in CPC or similar venue.
•Improve the scalability of BIGSTICK CI code up to 50,000 cores.Main barrier was reorthogonalization; now putting Lanczos vectors in memory to minimize I/O
• Use BIGSTICK code to investigate isospin breaking in pf shell Delayed due to problem with I/O hardware on Sierra
UNEDF 2011 ANNUAL/FINAL MEETING
SciDAC-3 possible deliverables for BIGSTICK
(End of SciDAC-2: 3-body forces on 100,000 cores)
•Run with 3-body up to 1,000,000 cores on Sequoia,Nmax =10/12 for 12,14C
•Add in 4-body forces; investigate alpha-clustering with effective 4-body forces (via SRG or Lee-Suzuki)
•Currently interfaces with Navratil’s TRDENS to generate densities, spectroscopic factors, etc, needed for RGM reactioncalculations; will improve this: develop fast post-processingwith factorization
•Investigate general unitary-transform effective interactions, adding constraint to observables
31
Sample application: cold atomic gases at unitarity in a harmonic trap
Using only 1 generator (d/dr) (very much like UCOM)
Fit to A =3, 1-, 0+
A = 4, 0+,1+, 2+
UNEDF -- MSU June 2010
starting rms = 2.32final rms = 0.58
UNEDF 2011 ANNUAL/FINAL MEETING
UNEDF 2011 ANNUAL/FINAL MEETING
Cross-fertilization of LCCI project:
BIGSTICKMFDn
On-the-fly construction of basis states and matrix elements
On-the-fly construction of basis states and matrix elements
Reorthogonalization and Lanczos vector management
Reorthogonalization and Lanczos vector management
NuShellX
J-projecte
d basis
J-projecte
d basisJ-projected basis
J-projected basis