19
1 Understanding the building blocks of matter by solving Quantum Chromodynamics SciDAC-3 PI Meeting, Rockville, Maryland July 26th, 2013 David Richards Jefferson Lab 1 Who we are Why we are Exploiting new architectures for LQCD Algorithms for NP calculations A sampling of Science

Understanding the building blocks of matter by solving ......• 2013: On BlueWaters (NVIDIA GTC Contribution) Solver performance DD: domain-decomposed solver - architecture-aware

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

  • 1

    Understanding the building blocks of matter by solving Quantum

    Chromodynamics

    SciDAC-3 PI Meeting, Rockville, MarylandJuly 26th, 2013

    David RichardsJefferson Lab

    1

    Who we areWhy we areExploiting new architectures for LQCDAlgorithms for NP calculationsA sampling of Science

  • 12

       Frithjof Karsch - Brookhaven National LaboratoryRichard Brower -Boston UniversityRobert Edwards -Thomas Jefferson National Accelerator FacilityMartin Savage -University of WashingtonJohn Negele -Massachusetts Institute of TechnologyRob Fowler - University of North Carolina (SUPER)Andreas Stathopoulos -College of William and Mary

    SciDAC-3 Scientific Computation Application Partnership projectProject Director: Frithjof Karsch, Brookhaven National Laboratory   Project Co-director for Science:  David Richards, JLabProject Co-director for Computation: Richard Brower, Boston University   

    Teams

    Computing Properties of Hadrons, Nuclei and Nuclear Matter from QCD

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]

  • 2

    Lattice QCD for NP: Science

  • 4

    Lattice QCD for NP: Computation

    Leadership-class Clusters + GPUS + MIC

    Software + Algorithms

    + Fastmath

    Multigrid with HYPRE for Lattice QCD, Andrew Pochinsky

  • QDP++ and Chroma on GPUs

    • QDP-JIT/C is production ready. • QDP-JIT/PTX is fully featured, requires some small integration with

    QUDA• Have performed full 2+1 flavor Gauge Generation on Blue Waters with

    both variants• JIT/PTX has used Chroma’s internal solvers, JIT/C has used QUDA

    solvers• Paper Submitted to SC’13.

    • Prognosis: Excellent, Tasks will be complete by End of Year

    5

    AIM: put all of application code on GPUQDP-JIT - Just-in-time

    Porting Lattice QCD Calculations to Novel Architectures: Balint Joo, Frank Winter

  • 0 512 1024 1536 2048 2560 3072 3584 4096 4608Titan Nodes (GPUs)

    0

    50

    100

    150

    200

    250

    300

    350

    400

    450

    TFLO

    PS

    BiCGStab: 723x256DD+GCR: 723x256BiCGStab: 963x256DD+GCR: 963x256

    Strong Scaling, QUDA+Chroma+QDP-JIT(PTX)

    B. Joo, F. Winter (JLab), M. Clark (NVIDIA)

    3

    GPUs and Heterogenous Architectures

    • 2010: QUDA parallelized (SC ’10 Paper)• 2011: 256 GPUs on Edge cluster (SC ’11

    Paper)• 2012: 768 GPUs on TitanDev (ACSS ’12

    Contribution, Invited APS Contribution)• 2013: On BlueWaters (NVIDIA GTC

    Contribution)

    Solver performance

    DD: domain-decomposed solver - architecture-aware

  • 0 256 512 768 1024 1280 1536 1792XK7 Nodes

    0

    2000

    4000

    6000

    8000

    10000

    12000

    14000

    16000

    Tim

    e fo

    r tra

    j 226

    (sec

    )

    CPUQDP-JIT (PTX)QDP-JIT (PTX) + QUDA

    Anisotropic Clover HMC. V=48x48x48x512, physical pion mass attempt, NCSA BlueWaters

    Tue Jun 25 10:45:47 2013

    3

    Gauge Generation: HMC - I

    Aim: to put the whole of the HMC code so as to exploit the GPU

    GPUStrong scaling for anisotropic clover HMC - key for spectroscopy 483.512

    QUDA solver

  • 3

    Gauge Generation: HMC - IICurrent production running

  • 94! 95!

    172!

    195! 188!

    215!

    96! 96!

    187!

    211! 204!

    235!

    93! 93!

    164!

    197!186!

    219!

    95! 95!

    172!

    205!192!

    229!

    97! 97!

    185!

    215!202!

    237!

    0!

    50!

    100!

    150!

    200!

    250!

    Uncompressed! Compressed! Uncompressed! Compressed! Uncompressed! Compressed!Intel® Xeon® E5-2680 (SNB-EP)! Intel® Xeon Phi™ (KNC) 5110P! Intel® Xeon Phi™ (KNC)

    B1PRQ-7110P!

    GFL

    OPS

    !

    Preconditioned Wilson CG, on Intel(R) Xeon(tm) and Intel(R) Xeon Phi (tm) processors, in Single Precision!from B. Joo, D. Kalamkar, K. Vaidyanathan, M. Smelyanskiy, K. Pamnany, V. Lee, P. Dubey, W. Watson III, !

    ISC 2013, LNCS 7905, pp. 40–54, 2013!KNC 5110P: 60 cores @ 1.053 GHz, MPSS 2.1.4346-16(Gold)!

    KNC B1PRQ-7110P: 61 cores @ 1.1 GHz, B1 stepping, MPSS 2.1.3552-1, 7936 MB [email protected] GHz!!

    V=24x24x24x128 ! V=32x32x32x128! V=40x40x40x96! V=48x48x24x64! V=32x40x24x96!

    Intel Xeon-Phi

    9

    Collaboration with Intel Parallel Labs

    •Performance competitive with Nvidia• Exploit e.g. Stampede

  • Typical LQCD WorkflowGenerate the configurationsl Leadership level

    t=0 t=T

    Analyze• Typically mid-range

    level

    Few big jobs Few big files

    Many small jobs Many big filesI/O movement

    Extractl Extract information from

    measured observables

    10

    IMPORTANT FOR NP

  • Correlation functions: Distillation• Use the new “distillation” method.• Observe

    • Truncate sum at sufficient i to capture relevant physics modes – we use 64: set “weights” f to be unity

    • Meson correlation function

    • Decompose using “distillation” operator as

    Eigenvectors of Laplacian

    Perambulators

    M. Peardon et al., PRD80,054506 (2009)

    This is a Capacity-computing task: solver with many RHS

    GPUs

  • All-Mode Averaging - I

    Chulwoo Jung, BNL

    T. Blum, T. Izubichi, E. Shintani, arXiv: 1208.4349

  • All-Mode Averaging - IIElectromagnetic form factor of the proton

    Andreas Strathopolous - AM/CS at William and Mary - development of methods for multiple right-hand sides

  • Wick Contraction Methods

    K. Orginos, W. Detmold

    Nuclear Physics calculations involve many contractions...

  • 15

    Momentum-dependent Phase Shifts

    0

    20

    40

    60

    80

    100

    120

    140

    160

    180

    800 850 900 950 1000 1050

    Extension to I = 1 ππ elastic scattering

    I = 2 ππ elastic scattering

    0.5

    1.0

    1.5

    2.0

    2.5

    exotics

    isoscalar

    isovectorYM glueball

    negative parity positive parity

  • Hadron Structure

    16

    Momentum fraction of quarks in proton

    How is spin apportioned in a proton?

  • −200

    −180

    −160

    −140

    −120

    −100

    −80

    −60

    −40

    −20

    0

    −B

    [MeV

    ]

    1+0+

    1+

    0+

    1+

    0+

    12

    + 12

    +

    32

    +

    12

    +

    32

    +

    0+ 0+

    0+

    0+

    d nn nΣ H-dib nΞ3He 3ΛH 3ΛHe 3ΣHe4He 4ΛHe 4ΛΛ He

    s = 0 s = −1 s = −2

    2-body3-body4-body

    Three- and Four-body systems

    Binding energies at physical strange-quark mass

    17

  • Chiral Transition Temperature

  • OUTLOOK

    • Effective use of JIT to exploit accelerated architectures is basis of work with SUPER to develop domain specific compiler for LQCD. Apply lessons to other domain-specific frameworks• Effective exploitation of Xeon-Phi• Collaboration with FASTMATH for multi-grid methods• Lattice QCD for nuclear physics characterised by heavy capacity requirements as well as capability requirement

    - Methods for multiple RHS Wick-contraction methods

    • Exciting Physics

    19