Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
1
Understanding the building blocks of matter by solving Quantum
Chromodynamics
SciDAC-3 PI Meeting, Rockville, MarylandJuly 26th, 2013
David RichardsJefferson Lab
1
Who we areWhy we areExploiting new architectures for LQCDAlgorithms for NP calculationsA sampling of Science
12
Frithjof Karsch - Brookhaven National LaboratoryRichard Brower -Boston UniversityRobert Edwards -Thomas Jefferson National Accelerator FacilityMartin Savage -University of WashingtonJohn Negele -Massachusetts Institute of TechnologyRob Fowler - University of North Carolina (SUPER)Andreas Stathopoulos -College of William and Mary
SciDAC-3 Scientific Computation Application Partnership projectProject Director: Frithjof Karsch, Brookhaven National Laboratory Project Co-director for Science: David Richards, JLabProject Co-director for Computation: Richard Brower, Boston University
Teams
Computing Properties of Hadrons, Nuclei and Nuclear Matter from QCD
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]
2
Lattice QCD for NP: Science
4
Lattice QCD for NP: Computation
Leadership-class Clusters + GPUS + MIC
Software + Algorithms
+ Fastmath
Multigrid with HYPRE for Lattice QCD, Andrew Pochinsky
QDP++ and Chroma on GPUs
• QDP-JIT/C is production ready. • QDP-JIT/PTX is fully featured, requires some small integration with
QUDA• Have performed full 2+1 flavor Gauge Generation on Blue Waters with
both variants• JIT/PTX has used Chroma’s internal solvers, JIT/C has used QUDA
solvers• Paper Submitted to SC’13.
• Prognosis: Excellent, Tasks will be complete by End of Year
5
AIM: put all of application code on GPUQDP-JIT - Just-in-time
Porting Lattice QCD Calculations to Novel Architectures: Balint Joo, Frank Winter
0 512 1024 1536 2048 2560 3072 3584 4096 4608Titan Nodes (GPUs)
0
50
100
150
200
250
300
350
400
450
TFLO
PS
BiCGStab: 723x256DD+GCR: 723x256BiCGStab: 963x256DD+GCR: 963x256
Strong Scaling, QUDA+Chroma+QDP-JIT(PTX)
B. Joo, F. Winter (JLab), M. Clark (NVIDIA)
3
GPUs and Heterogenous Architectures
• 2010: QUDA parallelized (SC ’10 Paper)• 2011: 256 GPUs on Edge cluster (SC ’11
Paper)• 2012: 768 GPUs on TitanDev (ACSS ’12
Contribution, Invited APS Contribution)• 2013: On BlueWaters (NVIDIA GTC
Contribution)
Solver performance
DD: domain-decomposed solver - architecture-aware
0 256 512 768 1024 1280 1536 1792XK7 Nodes
0
2000
4000
6000
8000
10000
12000
14000
16000
Tim
e fo
r tra
j 226
(sec
)
CPUQDP-JIT (PTX)QDP-JIT (PTX) + QUDA
Anisotropic Clover HMC. V=48x48x48x512, physical pion mass attempt, NCSA BlueWaters
Tue Jun 25 10:45:47 2013
3
Gauge Generation: HMC - I
Aim: to put the whole of the HMC code so as to exploit the GPU
GPUStrong scaling for anisotropic clover HMC - key for spectroscopy 483.512
QUDA solver
3
Gauge Generation: HMC - IICurrent production running
94! 95!
172!
195! 188!
215!
96! 96!
187!
211! 204!
235!
93! 93!
164!
197!186!
219!
95! 95!
172!
205!192!
229!
97! 97!
185!
215!202!
237!
0!
50!
100!
150!
200!
250!
Uncompressed! Compressed! Uncompressed! Compressed! Uncompressed! Compressed!Intel® Xeon® E5-2680 (SNB-EP)! Intel® Xeon Phi™ (KNC) 5110P! Intel® Xeon Phi™ (KNC)
B1PRQ-7110P!
GFL
OPS
!
Preconditioned Wilson CG, on Intel(R) Xeon(tm) and Intel(R) Xeon Phi (tm) processors, in Single Precision!from B. Joo, D. Kalamkar, K. Vaidyanathan, M. Smelyanskiy, K. Pamnany, V. Lee, P. Dubey, W. Watson III, !
ISC 2013, LNCS 7905, pp. 40–54, 2013!KNC 5110P: 60 cores @ 1.053 GHz, MPSS 2.1.4346-16(Gold)!
KNC B1PRQ-7110P: 61 cores @ 1.1 GHz, B1 stepping, MPSS 2.1.3552-1, 7936 MB [email protected] GHz!!
V=24x24x24x128 ! V=32x32x32x128! V=40x40x40x96! V=48x48x24x64! V=32x40x24x96!
Intel Xeon-Phi
9
Collaboration with Intel Parallel Labs
•Performance competitive with Nvidia• Exploit e.g. Stampede
Typical LQCD WorkflowGenerate the configurationsl Leadership level
t=0 t=T
Analyze• Typically mid-range
level
Few big jobs Few big files
Many small jobs Many big filesI/O movement
Extractl Extract information from
measured observables
10
IMPORTANT FOR NP
Correlation functions: Distillation• Use the new “distillation” method.• Observe
• Truncate sum at sufficient i to capture relevant physics modes – we use 64: set “weights” f to be unity
• Meson correlation function
• Decompose using “distillation” operator as
Eigenvectors of Laplacian
Perambulators
M. Peardon et al., PRD80,054506 (2009)
This is a Capacity-computing task: solver with many RHS
GPUs
All-Mode Averaging - I
Chulwoo Jung, BNL
T. Blum, T. Izubichi, E. Shintani, arXiv: 1208.4349
All-Mode Averaging - IIElectromagnetic form factor of the proton
Andreas Strathopolous - AM/CS at William and Mary - development of methods for multiple right-hand sides
Wick Contraction Methods
K. Orginos, W. Detmold
Nuclear Physics calculations involve many contractions...
15
Momentum-dependent Phase Shifts
0
20
40
60
80
100
120
140
160
180
800 850 900 950 1000 1050
Extension to I = 1 ππ elastic scattering
I = 2 ππ elastic scattering
0.5
1.0
1.5
2.0
2.5
exotics
isoscalar
isovectorYM glueball
negative parity positive parity
Hadron Structure
16
Momentum fraction of quarks in proton
How is spin apportioned in a proton?
−200
−180
−160
−140
−120
−100
−80
−60
−40
−20
0
−B
[MeV
]
1+0+
1+
0+
1+
0+
12
+ 12
+
32
+
12
+
32
+
0+ 0+
0+
0+
d nn nΣ H-dib nΞ3He 3ΛH 3ΛHe 3ΣHe4He 4ΛHe 4ΛΛ He
s = 0 s = −1 s = −2
2-body3-body4-body
Three- and Four-body systems
Binding energies at physical strange-quark mass
17
Chiral Transition Temperature
OUTLOOK
• Effective use of JIT to exploit accelerated architectures is basis of work with SUPER to develop domain specific compiler for LQCD. Apply lessons to other domain-specific frameworks• Effective exploitation of Xeon-Phi• Collaboration with FASTMATH for multi-grid methods• Lattice QCD for nuclear physics characterised by heavy capacity requirements as well as capability requirement
- Methods for multiple RHS Wick-contraction methods
• Exciting Physics
19