35
Dr Chris Maynard Application Consultant, EPCC c.maynard @ed.ac.uk +44 131 650 5077 UKQCD software for lattice QCD P.A. Boyle, R.D. Kenway and C.M. Maynard UKQCD collaboration

UKQCD software for lattice QCD

Embed Size (px)

DESCRIPTION

UKQCD software for lattice QCD. P.A. Boyle, R.D. Kenway and C.M. Maynard UKQCD collaboration. Contents. Motivation Brief introduction to QCD What is the science What we actually calculate BYOC UKQCD software Why use more than one code base UKQCD contributions to code bases Conclusions. - PowerPoint PPT Presentation

Citation preview

Page 1: UKQCD software for lattice QCD

Dr Chris MaynardApplication Consultant, EPCC

[email protected]+44 131 650 5077

UKQCD software for lattice QCD

UKQCD software for lattice QCD

P.A. Boyle, R.D. Kenway and C.M. Maynard

UKQCD collaborationP.A. Boyle, R.D. Kenway and C.M. Maynard

UKQCD collaboration

Page 2: UKQCD software for lattice QCD

AHM2008

Contents

• Motivation• Brief introduction to QCD

– What is the science– What we actually calculate– BYOC

• UKQCD software– Why use more than one code base– UKQCD contributions to code bases

• Conclusions

Page 3: UKQCD software for lattice QCD

AHM2008

What is stuff?

Experiments similar to the Large Hadron Collider (LHC) probe the structure of matter

LHC will switch onNeed a theory to interpret and understand phenomena and predict new ones!

10 September 2008

Page 4: UKQCD software for lattice QCD

AHM2008

The Structure of matter

• Quarks have–Mass feel gravity–Charge feel electromagnetism–Flavour feel weak interaction–Colour feel strong interaction

•Strong interaction binds quarks into hadrons

–Protons and neutron–Glued together by gluons

Page 5: UKQCD software for lattice QCD

AHM2008

What are Gluons?

Gluons carry or mediate the strong interaction

Quarks feel each other’s presence by exchanging momentum via gluons (virtual spring?)

Similar to photon in electromagnetism

Unlike photon carry charge of strong interaction (colour) – couple to themselves

Gluons are sticky!

Page 6: UKQCD software for lattice QCD

AHM2008

Introducing QCD!

• 1972 DJ. Gross, F. Wilzcek HD Politzer– construct a Quantum field theory of quarks and gluon

based on a symmetry group for colour – Quantum Chromodynamics (QCD)

– (prove QCD is asymptotically free)– QFT for strong interaction

• 2004 Receive Noble prize– Are we done?

… um, not quite

Page 7: UKQCD software for lattice QCD

AHM2008

Asymptotic freedom

• Short distance/high momentum strength of interaction is small

• Converse is infrared slavery

• Low momentum strong

coupling–Quarks are confined in hadrons–Proton mass is ~1 GeV

• Analytic tool (perturbation theory) only works for small interactions

A Feynman diagram

Page 8: UKQCD software for lattice QCD

AHM2008

Quarks and gluons on a lattice

• Replace 4d space-time with grid– Lattice spacing a

• Quark fields on sites (x) – 4 component spinor (complex vector) on each site

• Gluon fields on links U(x)– 3x3 complex matrix on each link

• Equations of motion are partial differential equations – Replace with finite difference– Large ( Volume ), sparse matrix (Fermion matrix)– Contains quark-gluon coupling

Page 9: UKQCD software for lattice QCD

AHM2008

Numerical computation

• Infinite dimensional path integral high dimensional sum– Hybrid Monte Carlo (HMC) and variants – Update quark and gluon fields– Invert the fermion matrix each update – Krylov subspace methods – conjugate

gradient

• Generate many paths – many gluon field configurations

• Compute (or measure) quantities of interest on each configuration– Invert the fermion matrix

• Average over all configurations

Page 10: UKQCD software for lattice QCD

AHM2008

Why lattice QCD is hard

qN mC

1

min

max • Fermion matrix is badly conditioned

• up and down quarks are nearly massless

• Statistical uncertainty– Require at least O(105) MC

updates N~O(102) 1-5% stat error for

basic quantities

• Systematic uncertainty– Several quark masses (chiral limit)– Bigger box required for lighter masses

– 2 or more volumes (105-7) and lattice spacings

• Scales badly with problem size– a6 or a7 and at least 1/mq

Page 11: UKQCD software for lattice QCD

AHM2008

The bottom line

• Need to invert matrices which are• Very Large ~ O(107), • Really badly conditioned

– CN O(104) or more

• Many, many times–~O(106)

Page 12: UKQCD software for lattice QCD

AHM2008

Quarks and gluons on a computer

• Interaction is local – Nearest neighbour interactions

Parallel decomposition Sub-volume of lattice on each processorSimple communication pattern (halo-swap)Regular communication pattern

Page 13: UKQCD software for lattice QCD

AHM2008

Anatomy of a calculation

• Gluon and quark fields distributed– Not Fermion matrix

– Exploit local interactions (sparsity) when evaluating matrix-vector operations

– Matrix-vector is M(x,y;U)•(x)

– Colour matrix U(x) is small and dense, not split across PE

• Dominated by matrix-vector and global sums in iterative solver– Double-precision floating point

• Computation versus communication– Smaller local volume

– Greater proportion of data “near” processor – More communication

• Machine limiting factors– Memory bandwidth

– Comms latency and bandwidth

• QCD is ideally suited to MPP machine– Build yer own?

Page 14: UKQCD software for lattice QCD

AHM2008

QCD-on-a-chip (14K ASIC)

• ASIC from IBM technology library• PowerPC 440 embedded CPU core• 64-bit FPU - One FMA per cycle• 4MB fast embedded DRAM

On chip memoryand Ethernet controller•Custom design

•High speed serial links (DMA)•Prefetching EDRAM controller• Bootable Ethernet JTAG interface

•400 MHz peak is 0.8Gflop/s•Network is 6d torus of nearest neighbour

Page 15: UKQCD software for lattice QCD

AHM2008

QCDOC performance

Saturate single link bandwidth for even small packet size

Low latency

Good for small local volume

Global vol 164

22x42 local volume 1K PE

Super-linear scaling as data goes “on-chip

Linear thereafter

Page 16: UKQCD software for lattice QCD

AHM2008

EdinburghGlasgow

LiverpoolCambridge

Oxford

Southampton

Swansea

Plymouth

Buy SmartDraw!- purchased copies print this document without a watermark .

Visit www.smartdraw.com or call 1-800-768-3729.

UKQCD collaboration

• 8 UK universities–Plymouth joined in 2007

• Prior to QCDOC era (up to 2002)–Consensus on form of calculation–Collaboration owned FORTRAN code–Assembler kernel for performance on Cray T3D/T3E

• QCDOC era–Several (3) different calculations–Each sub-group collaborates internationally–Two open source c++ codes–CPS and chroma–Assembler kernels for performance

Page 17: UKQCD software for lattice QCD

AHM2008

SciDAC

• US DoE program– Funds all US groups– Hardware and software– USQCD

• Code development– Common code

environment

• UKQCD actively collaborates with USQCD– Sub-project by sub-project

• USQCD and UKQCD orgs are funding based– Lateral collaboration based on science!– Collaborate on software module development

Page 18: UKQCD software for lattice QCD

AHM2008

CPS before QCDOC

• Developed by Columbia University (CU) for QCDSP machine– Ancestor of QCDOC– Originally not ANSI c++ code– many QCDSP specific features– Not readily portable– Belongs to CU developers

• UKQCD chose this code base– Building your own machine is a big risk– CPS code base most likely to run on QCDOC from day 1– Does have required functionality

• EPCC project to ANSI-fy the code– Code now ran correctly (if slowly) everywhere else

Page 19: UKQCD software for lattice QCD

AHM2008

UKQCD contribution to CPS

• Assembler version of key kernel– P.A. Boyle via BAGEL assembler generator (see later)

• UKQCD develops new Rational Hybrid Monte Carlo (RHMC)

algorithm– Implement and test in CPS (Clarke and Kennedy)– New algorithm has many parameters– Tuning and testing is a non-trivial task

• CU+BNL+RBRC (RBC) + UKQCD new physics project – (2+1 flavour DWF) – up and down degenerate + strange quarks

• UKQCD contribute to AsqTad 2+1 flavour project– Other contributors in USA (MILC)

Page 20: UKQCD software for lattice QCD

AHM2008

• HMC alg evolves 2 degenerate flavours (M is fermion matrix)– Quark fields are anti-commuting Grassmann

variables

• Take square root to do one flavour • Approximate square root to a Rational Function• Roots and poles calculated with multi-shift

solver

RHMC

• Terms with largest contribution to the fermion force– Change the MC update the most– Cost the least to compute

• Change CG tolerance– Loosen CG for terms which

contribute least– Can reduce CG count 2x

• Keep algorithm exact with Metropolis accept/reject

Page 21: UKQCD software for lattice QCD

AHM2008

Implementing multi-timescale RHMC

• Can use RHMC nth-root to implement algorithmic tricks

• Multiple pseudofermions are better

• Mass preconditioning

• Multiple timescales – Gluon, triple strange, light– Allows a larger overall step-size with good acceptance

• Higher order integration schemes

• RHMC algorithm 5-10 times faster– Binaries frozen since March 2006

Page 22: UKQCD software for lattice QCD

AHM2008

CPS: good and bad

• CPS is written around target (QCDOC) hardware

• Code base runs (correctly) on target hdw– Helps reduce risk when building own machine– Includes much requisite functionality

• Adoption of CPS allowed UKQCD to focus on its strength– Very successful algorithmic development– Based on direct collaboration with RBC

• Still need to do measurements– Invert fermion matrix (quark propagators) on gluon configurations– Do measurements on different architectures

Page 23: UKQCD software for lattice QCD

AHM2008

Chroma/qdp++

• Open source c++ code base– Used by many different groups world-wide

• Multi-platform by design

• Highly modular, layered design– QMP: Wrapper around message passing library e.g. MPI– QDP++: Builds lattice valued physics data objects and manipulation

methods– Hides message passing layer– Allows “Under-the-hood” optimisations by expert developers– Includes IO

– Chroma The physics library– Rich physics functionality

• UKQCD has historical links with main developers

Page 24: UKQCD software for lattice QCD

AHM2008

multi1d<LatticeColorMatrix> u(Nd) for(int mu=1; mu < Nd; ++mu){ for(int nu=0; nu < mu; ++nu){ LatticeColorMatrix tmp_0 = shift(u[nu],FORWARD,mu) * adj(shift(u[mu],FORWARD,nu)); LatticeColorMatrix tmp_1 = tmp_0 * adj(u[nu]); Double tmp = sum(real(trace(u[mu]*tmp_1))); w_plaq += tmp; } }

multi1d<LatticeColorMatrix> u(Nd) for(int mu=1; mu < Nd; ++mu){ for(int nu=0; nu < mu; ++nu){ LatticeColorMatrix tmp_0 = shift(u[nu],FORWARD,mu) * adj(shift(u[mu],FORWARD,nu)); LatticeColorMatrix tmp_1 = tmp_0 * adj(u[nu]); Double tmp = sum(real(trace(u[mu]*tmp_1))); w_plaq += tmp; } }

multi1d<LatticeColorMatrix> u(Nd) for(int mu=1; mu < Nd; ++mu){ for(int nu=0; nu < mu; ++nu){ LatticeColorMatrix tmp_0 = shift(u[nu],FORWARD,mu) * adj(shift(u[mu],FORWARD,nu)); LatticeColorMatrix tmp_1 = tmp_0 * adj(u[nu]); Double tmp = sum(real(trace(u[mu]*tmp_1))); w_plaq += tmp; } }

qdp++ :: plaquette example

Lattice valued data objects

Manipulation methods

Page 25: UKQCD software for lattice QCD

AHM2008

qdp++ :: Abstraction

• Data objects are lattice valued– No site index – No explicit sum over index

• Linear algebra is encoded– Code knows how to multiply 3x3 matrices together

• This has two consequences

• Expert HPC developer can modify implementation– Optimisation, parallelism, architecture features– Interface remains the same

• Application developer (Physicist) writes code which looks like

maths!

Page 26: UKQCD software for lattice QCD

AHM2008

qdp :: Code like maths

multi1d<LatticeColorMatrix> u(Nd) for(int mu=1; mu < Nd; ++mu){ for(int nu=0; nu < mu; ++nu){ LatticeColorMatrix tmp_0 = shift(u[nu],FORWARD,mu) * adj(shift(u[mu],FORWARD,nu)); LatticeColorMatrix tmp_1 = tmp_0 * adj(u[nu]); Double tmp = sum(real(trace(u[mu]*tmp_1))); w_plaq += tmp; } }

Page 27: UKQCD software for lattice QCD

AHM2008

qdp :: Code like maths

multi1d<LatticeColorMatrix> u(Nd) for(int mu=1; mu < Nd; ++mu){ for(int nu=0; nu < mu; ++nu){ LatticeColorMatrix tmp_0 = shift(u[nu],FORWARD,mu) * adj(shift(u[mu],FORWARD,nu)); LatticeColorMatrix tmp_1 = tmp_0 * adj(u[nu]); Double tmp = sum(real(trace(u[mu]*tmp_1))); w_plaq += tmp; } }

Page 28: UKQCD software for lattice QCD

AHM2008

qdp :: Code like maths

multi1d<LatticeColorMatrix> u(Nd) for(int mu=1; mu < Nd; ++mu){ for(int nu=0; nu < mu; ++nu){ LatticeColorMatrix tmp_0 = shift(u[nu],FORWARD,mu) * adj(shift(u[mu],FORWARD,nu)); LatticeColorMatrix tmp_1 = tmp_0 * adj(u[nu]); Double tmp = sum(real(trace(u[mu]*tmp_1))); w_plaq += tmp; } }

Page 29: UKQCD software for lattice QCD

AHM2008

Chroma: potential Downside

• At time of QCDOC development didn’t have RHMC

functionality

• Heavy use of c++ templates can defeat some compilers– Stuck with Gnu compilers– Code is very advanced c++. Not easy for beginners

• Main program driven by XML input files– All objects created on the fly– Requires a lot of functions to be registered– QCDOC has small memory (especially .text)

• Chroma fails to compile on QCDOC– Runs out of .text segment– Physics library compiles OK

Page 30: UKQCD software for lattice QCD

AHM2008

UKhadron

• Old-style main program– Calls qdp++ and chroma library– Harness the power of qdp++

– Focused on UKQCD physics requirements – Most of measurement code for DWF project– Iterative solvers

• Pros– Runs on QCDOC and everywhere– Control over code - small group of developers– Can build integrated analysis code on top of qdp++

• Cons– Compiling can be a headache!– UKhadron requires specific versions of qdp++/choma– Which require specific versions of Gnu compiles and libxml2

Page 31: UKQCD software for lattice QCD

AHM2008

BAGEL

• Assembler generator written by Peter Boyle– http://www.ph.ed.ac.uk/~paboyle/bagel/Bagel.html

• Composed in two parts– library to which one can programme a generic RISC assembler kernel – set of programmes that use the library to produce key QCD and linear

algebra operations – generator is retargetable, key targets are ppc440, bgl, and powerIII.

• Allows kernels to run at up to 50% of peak on target arch

Page 32: UKQCD software for lattice QCD

AHM2008

Modular Build

QMP Libxml2 Bagel lib

Bagel apps (bagel qdp, bagel wilson dslash)

qdp++

Chroma

UKhadron

• Both a blessing and a curse– Allows for modular, independent code development– Plug in performance code– Highly portable performance– Module version and compiler version dependence can be a problem

Can plug in other kernels eg SSE wilsonDslash

Page 33: UKQCD software for lattice QCD

AHM2008

Future

• Fastest machines are now BlueGene/P– Multicore

• Cray XT4/BlackWidow – Multicore/vector machine

• Multi-threading in qdp++: Mixed mode code– Shared memory intra-node – Message passing inter-node

• PGAS languages?– UPC/CoArrayFORTRAN/Chapel/FORTRESS/X10?

• Hardware designed for QCD (BlueGene/Q)– Performance kernel libraries

Page 34: UKQCD software for lattice QCD

AHM2008

Physics

• CPS on QCDOC Gluon cfgs

• UKhadron on QCDOC +BlueGene/L + HECToR correlation functions

• Implemented “twisted BC” in UKhadron new calculation

• World’s best calculation of charge radius of pion

• Can determine CKM matrix elements– Tests standard model at LHC– P.A. Boyle et al JHEP07(2008)112

LQCD data (TW BC)Exp data

Page 35: UKQCD software for lattice QCD

AHM2008

Conclusions

• QCD very complex problem

• Software is performance critical– Very complicated

• UKQCD operates in a complex and changing– Collaborative environment – internally and externally– Hardware regime

• Complex and evolving strategy– Allows maximum flexiblity – Gets the science done!