16
Quarks, GPUs and Exotic Matter Bálint Joó, Jefferson Lab Ron Babich, NVIDIA (presenter) NVIDIA Theater SC’12 Salt Lake City, Utah Nov 2012

Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

Quarks, GPUs and Exotic Matter

Bálint Joó, Jefferson LabRon Babich, NVIDIA (presenter)

NVIDIA Theater SC’12Salt Lake City, Utah

Nov 2012

Page 2: Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

Acknowledgements • Science Results: Hadron Spectrum Collaboration • Software:

– The QUDA Community & NVIDIA– Frank Winter for his work on the JIT version of QDP++

• Machines: – USQCD National Facility for access to clusters at JLab, JLab SciComp Team– LLNL for access to Edge Cluster, – NERSC for access to Dirac Cluster – Oak Ridge Leadership Computing Facility, for access to TitanDev, and for Directors Discretionary

Allocation– NSF NICS for access to Keeneland Cluster– NCSA for access to BlueWaters

• Funding: US DOE– Contract DE-AC05-06OR23177: under which Jefferson Science Associates, LLC, manages and operates

Jefferson Laboratory, Grant No: DE-FC02-06ER41440: USQCD SciDAC II project)• Funding: NSF

– Grants: PHY-0835713 and OCI-0946441 • Special thanks from Balint to Ron for stepping in to present this talk.

Page 3: Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

Nuclear Physics and QCD• Ordinary matter is made up of atoms

– atom = nucleus + “orbiting” electrons– nucleus = protons + neutrons (nucleons)– nucleon = quarks + gluons

• Almost all of our mass comes from quarks & gluons• Quantum Chromodynamics (QCD) is the theory of quarks

and gluons.– quarks carry color charge (r,g,b)– gluons carry the color interactions eg. (-r,+b)

• We can only see things with net 0 color charge– never see individual quarks, gluons, only combinations – color charges must cancel between quarks and gluons– QCD allows “exotics”: quark-gluon excitations, glueballs

Page 4: Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

QCD in Nuclear Physics

Hägler, Musch, Negele, Schäfer, EPL 88 61001

• Can QCD predict the spectrum of hadrons ?– what is the role of the gluons?– what about exotics?– GlueX experiment of Jefferson Lab 12GeV, Hall D

• How do quarks and gluons make nucleons?– what are distributions of quarks, gluons, spin, etc ?– GPD experiments e.g. Jefferson Lab, Halls A & B

• QCD must explain nuclear interactions– ab initio calculations for simple systems– bridges to higher level effective theories

• QCD phase structure, equation of state– experiments at RHIC– input to higher level effective theories– astrophysics (physics of the Early Universe)

Page 5: Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

• Lattice QCD is the only known model independent, non-perturbative technique for carrying out QCD calculations.– Replace continuum space-time with lattice– Gluons live on links as SU(3) matrices– Quarks live on sites as vectors/spinors.– Change QCD to a system similar to a crystal

Lattice QCD

Evaluate Path Integral Using Markov Chain Monte Carlo Method

Page 6: Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

Large Scale LQCD Simulations

• Stage 1: Generate Configurations– snapshots of QCD vacuum– configurations generated in sequence– capability computing needed for large

lattices and light quarks• Stage 2a: Compute quark propagators

– task parallelizable (per configuration)– capacity workload (but can also use capability h/w)

• Stage 3: Extract Physics– on workstations,

small cluster partitions

• Stage 2b: Contract propagators into Correlation Functions– determines the physics you’ll see– complicated multi-index tensor contractions

Titan Image Courtesy of Oak Ridge Leadership Computing Facility (OLCF), Oak Ridge National Laboratory

Page 7: Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

The Lattice Dirac Equation • Describes how quarks interact with the gluons• Must be solved in gauge generation (Stage 1)

– O(1M) times, in sequence – ~60%-80% of workload spent in solvers

• Must be solved to generate quark propagators (Stage 2)– O(10M) times, but task parallel– solver is >90% of workload

• Operator has dimension ~100M, but very sparse – Efficient Matrix-Vector operations are crucial– Need optimized solvers

Aee -Deo

-Doe Aoo( ) φ = χ

Page 8: Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

Software: Chroma + QUDA• Chroma is a large lattice QCD framework

– algorithms for gauge generation, quark propagators etc– abstractions for components (solvers)– open source: http://usqcd.jlab.org/usqcd-docs/chroma/– developed/maintained through US DOE SciDAC funding– integrates QUDA library as a solver component

• R. G. Edwards, B. Joo, Nucl. Phys. Proc. Suppl. 140 (2005) 832

• QUDA is a highly optimized library for lattice QCD on GPUs– Linear Solvers, Force Terms, interfaces to code-bases– open source: http://lattice.github.com/quda– developed/maintained by NVIDIA & QUDA Community

• M. Clark, R. Babich, K. Barros, R. C. Brower, C. Rebbi, Comp. Phys. Commun. 181:1517-1528, 2010

Page 9: Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

QUDA Performance Optimization• LQCD is typically memory bound

– Dslash: Nearest neighbour stencil in 4D• Wilson Formulation: 0.92 FLOP/B (SP)• Staggered Formulation: ~0.66 FLOP/B (SP)• Key Optimizations focus on being memory friendly

• Layout data for coalesced memory access • Use symmetries to compress SU(3) matrices

– 2 row storage or 8 parameter storage– reconstruct 3rd row with “free” FLOPs– trade bandwidth for compute

• Use reduced precision if possible (e.g. 16bit)– mixed precision solver – iterative refinement + reliable updates

• Fuse BLAS like kernels - increase reuse

(V-1 sites)x12 floats12 floats

(V-1 sites) x 4 floats4 floats Pad

1 block

Page 10: Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

Using GPUs in Capacity Mode• USQCD National Facility (FNAL, JLab, BNL)

– distributed computational facility for LQCD– JLab and Fermilab operate GPU clusters– JLab GPU cluster used for generating quark propagators.

� � � � ��

���

���

���

���

����

����

����

����

����

������� ��

������

����� ��

�������������

������

�������������

������

�����������

������

�������� ��!"�

��

�#�

$��

���

%&

'!

Orange Bars: from NERSC Dirac ClusterOther data from JLab 9G & 10G Clusters

JLab 9G GPU Cluster

JLab: 127 quad nodes: Mix of Tesla C2050, M2050, and GTX 285/480/580 GPUs

FNAL: 72 dual nodes: M25050 GPUs

Page 11: Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

0

500

1000

1500

2000

Science From GPU ClustersJ. J. Dudek, R. G. Edwards, “Hybrid Baryons in QCD”, Phys. Rev. D85, 054016

• Hybrid Excitations in mesons, and baryons at a common scale of ~1200 MeV• Pattern suggests chromo-magnetic excitation

– common in mesons, baryons. – “Effective degree of freedom” ?– first principle calculation can agree with or disfavor effective models

Page 12: Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

Point to take home here:• These analysis computations are extremely demanding. • Need (apart from the gauge configurations):

– innovation in the method of the computation• so called “distillation” technique• variational method, with large operator basis

– optimized formulation of the lattice theory• Anisotropic lattices: cleaner determination of excited states

– availability of cheap capacity FLOPs• GPUs highly cost effective.• recall: O(10 M) solves of the Dirac Equation• lots of partitions of 4-16 GPUs (today)• => 32-64 GPUs tomorrow for larger lattices

Page 13: Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

Gauge Generation on GPUs• Gauge Generation is not task parallel

– proceeds sequentially– O(1M) solves of the Dirac Equation– needs the concentrated power of

capability computing facilities• Need to scale to 100s-1000s of GPUs• Two main obstacles:

– Host/Accelerator Model & Amdahl’s law

• Code not running on GPU limits speedup.

– Hardware Bottleneck• Ratio of peak device memory/PCIe2

bandwidths ~ 170/16 (for Fermi)• PCIe3, GPU Direct, etc should help here.

Sapp =1

(1− P ) + PS

Page 14: Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

16 32 64 128 256 512 1024 2048 4096 8192Interlagos Sockets (16 core/socket)

0.0625

0.125

0.25

0.5

1

2

4

8

16

32

64

128

Tflo

ps S

usta

ined

Titan, XK6 nodes, CPU only: Single Precision Reliable-IBiCGStab SolverRosa, XE6 nodes, CPU only: Single Precision Reliable IBiCGStab solverTitan, XK6 nodes, GPU only: Single Precision (single/single) Reliable BiCGStab solverTitan, XK6 nodes, GPU only: Mixed Precision (half/single) Reliable BiCGStab solverTitan, XK6 nodes, GPU only: Mixed Precision (half/single) GCR solver with Domain Decomposed preconditioner

Strong Scaling: 483x512 Lattice (Weak Field), Chroma + QUDA

100 Tflops

Architecture Aware Algorithms

• A domain decomposed preconditioner combined with a GCR solver – reduced communication needs in the Linear Solver– strong scaled to 768 nodes on the TitanDev Cray XK6 system

(Fermi Tesla GPUs) at the OLCF

Our work with strong scaling targets the newly installed Cray XK7 Titan System at Oak Ridge Leadership Computing Facility (OLCF) (pictured above) and other large scale GPU based systems such as NCSA BlueWaters, Keeneland and others

R. Babich, M. Clark, B. Joo, G. Shi,R. Brower, S. Gottlieb, SC’11, Seattle

Image Courtesy of Oak Ridge Leadership Computing Facility (OLCF), Oak Ridge National Laboratory

Page 15: Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

Moving more code to GPUs• Work with Frank Winter,

University of Edinburgh

• Re-wrote QDP++ layer on which Chroma is based, to run on GPUs

• Innovative Just-In-Time (JIT) compilation of C++ expression templates to GPU kernels

• Works on Cray XK too in ‘Just-Before-Time’ mode– pre-generate kernels on

regular Linux

16 32 64 128 256number of XK6 nodes

128

256

512

1024

2048

4096

Tim

e fo

r tra

ject

ory

(sec

)

Chroma (CPU only)Chroma(CPU) + QUDA SolversChroma(QDP-JIT) + QUDA

2 Flavor Wilson HMC (Gauge + 2 Flavor + Hasenbusch monomials), 323x96 lattice

• QDP-JIT version is fastest• Still suffers strong scaling effects eventually

- but problem size is small (fits on 1GPU)- expect better on current larger lattice sizes

- Data from TitanDev at OLCF, B. Joo & F. Winter

Sublattice/GPUis too small

Significantgain fromQDP-JIT

- F. Winter, "Accelerating QDP++ using GPUs" arXiv:1105:2279[hep-lat]

Preliminary

Page 16: Quarks, GPUs and Exotic Matter - NVIDIA · 2013. 7. 5. · – nucleon = quarks + gluons • Almost all of our mass comes from quarks & gluons • Quantum Chromodynamics (QCD) is

Conclusions• GPUs have brought a disruptive leap in the cost effectiveness

of lattice QCD calculations at the capacity level– enabled new analysis methods (e.g. distillation)– are producing discovery level science of great interest to

nuclear physics experiments (e.g. at Jefferson Lab)• By using architecture aware solvers, we have been able to

strong scale LQCD to over 100 TFlops sustained performance on TitanDev – expect even more performance from Kepler GPUs

• The QDP-JIT effort will allow us to move Chroma completely to the accelerators:– reduce Amdahl’s law, maximize speedup

• We look forward to more exciting science from GPUs