12
HPC 101 HEAnet National Conference 2016 Paddy Doyle Senior Sysadmin – Research IT / TCHPC (IT Services) Date 2016-11-03

Title — Calibri Bold 26pt - HEAnet › 2016 › files › 240 › LT10 HPC101.pdf · 2016-11-21 · Trinity College Dublin, The University of Dublin Big Picture Large industry –Circa

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Title — Calibri Bold 26pt - HEAnet › 2016 › files › 240 › LT10 HPC101.pdf · 2016-11-21 · Trinity College Dublin, The University of Dublin Big Picture Large industry –Circa

HPC 101HEAnet National Conference 2016

Paddy DoyleSenior Sysadmin – Research IT / TCHPC (IT Services)

Date 2016-11-03

Page 2: Title — Calibri Bold 26pt - HEAnet › 2016 › files › 240 › LT10 HPC101.pdf · 2016-11-21 · Trinity College Dublin, The University of Dublin Big Picture Large industry –Circa

Trinity College Dublin, The University of Dublin

Page 3: Title — Calibri Bold 26pt - HEAnet › 2016 › files › 240 › LT10 HPC101.pdf · 2016-11-21 · Trinity College Dublin, The University of Dublin Big Picture Large industry –Circa

Brief Overview

Big picture

Motivation for HPC

What that means for software

What that means for hardware

Typical day as a HPC sysadmin

Page 4: Title — Calibri Bold 26pt - HEAnet › 2016 › files › 240 › LT10 HPC101.pdf · 2016-11-21 · Trinity College Dublin, The University of Dublin Big Picture Large industry –Circa

Trinity College Dublin, The University of Dublin

Big Picture

Large industry

– Circa $10 billion annual spend

Major vendors

– HP, IBM, Dell, SGI, Fujitsu, Intel

Largest HPC systems:

– 10,000,000s of CPU cores

– Many 10,000s of nodes

– 100s of cabinets

– 15MW of power!

High Performance Computing in numbers

Page 5: Title — Calibri Bold 26pt - HEAnet › 2016 › files › 240 › LT10 HPC101.pdf · 2016-11-21 · Trinity College Dublin, The University of Dublin Big Picture Large industry –Circa

Trinity College Dublin, The University of Dublin

Measuring Performance: Top500.org

High Performance LINPACK benchmark

– Dense linear algebra

FLOPS: FLoating-point Operations Per Second

List of most powerful machines

Machine Performance FLOPS

Typical PC 100 GFLOPS 100,000,000,000

Sunway TaihuLight (#1) 93 PFLOPS 93,000,000,000,000,000

Page 6: Title — Calibri Bold 26pt - HEAnet › 2016 › files › 240 › LT10 HPC101.pdf · 2016-11-21 · Trinity College Dublin, The University of Dublin Big Picture Large industry –Circa

Trinity College Dublin, The University of Dublin

Top 500 Performance DevelopmentCurrently Peta-scale; when will we reach Exa-scale?

Page 7: Title — Calibri Bold 26pt - HEAnet › 2016 › files › 240 › LT10 HPC101.pdf · 2016-11-21 · Trinity College Dublin, The University of Dublin Big Picture Large industry –Circa

Trinity College Dublin, The University of Dublin

Motivation for HPC

Bigger:

– memory-bound problems

Faster:

– CPU-bound problems

“HPC is the art of getting bigger things done faster” – D. Frost

Page 8: Title — Calibri Bold 26pt - HEAnet › 2016 › files › 240 › LT10 HPC101.pdf · 2016-11-21 · Trinity College Dublin, The University of Dublin Big Picture Large industry –Circa

Trinity College Dublin, The University of Dublin

What that means for software

Parallel languages and libraries

– MPI, OpenMP, CUDA, OpenCL, PGAS

– BLAS, MKL, ATLAS, FFTW, Boost, PLASMA, PETSc

System administration

– Resource manager, queuing system

– Uniform environments

– Parallel filesystem (100s or 1000s of client nodes)

Software must communicate between cores and compute nodes

Page 9: Title — Calibri Bold 26pt - HEAnet › 2016 › files › 240 › LT10 HPC101.pdf · 2016-11-21 · Trinity College Dublin, The University of Dublin Big Picture Large industry –Circa

Trinity College Dublin, The University of Dublin

What that means for hardware

Specialised hardware vs commodity servers

– Cray, IBM BlueGene

CPU: many-core, larger caches

Accelerator cards:

– GPGPU, Intel Xeon PHI

High-speed, low-latency networks

– Infiniband (40, 56, 96Gb/s; <1µs)

– Topologies: fat-tree, torus

Parallel filesystem

– Fast spinning disk, flash drives, hierarchies

Many cores, fast networking

Page 10: Title — Calibri Bold 26pt - HEAnet › 2016 › files › 240 › LT10 HPC101.pdf · 2016-11-21 · Trinity College Dublin, The University of Dublin Big Picture Large industry –Circa

Trinity College Dublin, The University of Dublin

Typical day of HPC sysadmin

[Occasionally] design, rack, install, provision new systems

What software do researchers need?

– ‘yum install’ or ‘./configure; make’

– Build gcc-6.2.0, then openmpi-2.0.1 using gcc, then boost-1.62 using both, THEN try to compile their software

– Compile scientific software (sometimes without Makefiles)

– Complex software stack!

Node / queue / network: health checks and auto-remediation

Tweak provisioning config (Salt, Ansible, Puppet etc)

“Why did my job fail?”

Page 11: Title — Calibri Bold 26pt - HEAnet › 2016 › files › 240 › LT10 HPC101.pdf · 2016-11-21 · Trinity College Dublin, The University of Dublin Big Picture Large industry –Circa

Thank You

Page 12: Title — Calibri Bold 26pt - HEAnet › 2016 › files › 240 › LT10 HPC101.pdf · 2016-11-21 · Trinity College Dublin, The University of Dublin Big Picture Large industry –Circa

Trinity College Dublin, The University of Dublin

References / Sources

– https://www.nextplatform.com/2016/06/22/hpc-spending-outpaces-market-will-continue/

– https://www.top500.org/statistics/list/

– https://www.olcf.ornl.gov/titan/

– https://www.top500.org/statistics/perfdevel/

– http://neilashton.co.uk/publications/

– http://hiwpp.noaa.gov/hpc/

– http://www.hpc-ch.org/first-realistic-simulation-of-the-formation-of-the-milky-way-computed-at-cscs/

– https://becksteinlab.physics.asu.edu/learning/53/density-functional-theory-simulation-of-rhodium-nanoframes-and-carbon-nanotube-graphene-pillars

– http://info.adtechglobal.com/blog/bid/304327/Don-t-Forget-the-Fabric-The-Role-of-High-bandwidth-Low-latency-Interconnects-in-High-Performance-Clusters

– https://computing.llnl.gov/tutorials/bgq/

– http://frabz.com/meme-generator/what-i-do/

– http://vignette2.wikia.nocookie.net/matrix/images/d/df/Thematrixincode99.jpg/revision/latest?cb=20140425045724

– http://www.quickmeme.com/meme/355ovv