40
Introduction to Introduction to High-Performance Computing High-Performance Computing Dr. Axel Kohlmeyer Scientific Computing Expert Information and Telecommunication Section The Abdus Salam International Centre for Theoretical Physics http://sites.google.com/site/akohlmey/ [email protected]

Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

Introduction toIntroduction toHigh-Performance ComputingHigh-Performance Computing

Dr. Axel Kohlmeyer

Scientific Computing Expert

Information and Telecommunication SectionThe Abdus Salam International Centre

for Theoretical Physics

http://sites.google.com/site/akohlmey/

[email protected]

Page 2: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

2HPC Introduction

Why use Computers in Science?

● Use complex theories without a closed solution:solve equations or problems that can only be solved numerically, i.e. by inserting numbers into expressions and analyzing the results

● Do “impossible” experiments:study (virtual) experiments, where the boundary conditions are inaccessible or not controllable

● Benchmark correctness of models and theories:the better a model/theory reproduces known experimental results, the better its predictions

Page 3: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

3HPC Introduction

What is High-Performance Computing (HPC)?

● Definition depends on individual person> HPC is when I care how fast I get an answer

● Thus HPC can happen on:● A workstation, desktop, laptop, smartphone!● A supercomputer● A Linux/MacOS/Windows/... cluster● A grid or a cloud ● Cyberinfrastructure = any combination of the above

● HPC also means High-Productivity Computing

Page 4: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

4HPC Introduction

Parallel Workstation

● Most computers today are parallel workstations=> multi-core processors

● Running Linux OS (or MacOS X) allows programming like traditional Unix workstation

● All processors have access to all memory● Uniform memory access (UMA):

1 memory pool for all, same speed for all● Non-uniform memory access (NUMA):

multiple pools, speed depends on “distance”

Page 5: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

5HPC Introduction

An HPC Cluster is...

● A cluster needs:● Several computers, nodes, often in special cases

for easy mounting in a rack● One or more networks (interconnects) to

hook the nodes together● Software that allows the nodes to communicate

with each other (e.g. MPI)● Software that reserves resources to individual users

● A cluster is: all of those components working together to form one big computer

Page 6: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

6HPC Introduction

What is Grid Computing?

● Loosely coupled network of compute resources● Needs a “middleware” for transparent access to

inhomogeneous resources, find matching ones● Modeled after power grid

=> share resources not needed right now● Run a global authentication framework

=> Globus, Unicore, Condor, Boinc● Run an application specific client

=> SETI@home, Folding@home

Page 7: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

7HPC Introduction

What is Cloud Computing?

● Simplified: “Grid computing made easy”● Grid: use “job description” to match calculation

request to a suitable available host, use “distinguished name” to uniquely identify users, opportunistic resource management

● Cloud: provide virtual server instance on shared resource as needed with custom OS image, commercialization (cloud service providers, dedicated or spare server resources), physical location flexible, web frontend

Page 8: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

8HPC Introduction

What is Supercomputing (SC)?

● The most visible manifestation of HPC● Programs run on the fastest and largest

computers in the world (=> Top500 List)● Desktop vs. Supercomputer in 2012 (peak):

● Desktop processor (1 core): ~10 GigaFLOP/s● Tesla C2050 GPU (448 cores): >500 GigaFLOP/s● “K” supercomputer: >10 PetaFLOP/s

● Sustained vs. peak: “K” 93%, “Jaguar” 75%, “Nebulae” 43%, “Roadrunner” 76%, BG/P, 82%

Page 9: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

9HPC Introduction

Why would HPC matter to you?

● Scientific computing is becoming more important in many research disciplines

● Problems become more complex, need teams of researchers with diverse expertise

● Scientific (HPC) application development limited often limited by lack of training

● More knowledge about HPC leads to more effective use of HPC resources and better interactions with (computational) colleagues

Page 10: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

10HPC Introduction

Research Disciplines in HPC

Molecular Biosciences

31%

Chemistry17%

Physics17%

Astronomical Sciences

12%

Materials Research6%

Earth Sciences3%

All 19 Others4%

Advanced Scientific Computing

2%

Atmospheric Sciences

3%

Chemical, Thermal Systems

5%

Page 11: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

11HPC Introduction

Some Examples...

● Simulation of physical phenomena:● Climate modeling● Galaxy formation

● Data mining● Gene sequencing● Detecting potential Tornados

● Visualization● Reducing large data sets into

pictures a scientist understands

Moore, OKTornadic

Storm

Page 12: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

12HPC Introduction

Why Would I Need HPC?

● My problem is big

● My problem is complex

● My computer is too small and too slow● My software is not efficient and/or not parallel

Page 13: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

13HPC Introduction

HPC vs. Computer Science

● Most people in HPC are no computer scientists● Software has to be correct first and (then)

efficient; packages can be over 30 years “old”● Technology is a mix of “high-end” & “stone age”

(Extreme hardware, MPI, Fortran, C/C++) ● So what skills do I need to for HPC:

● Common sense, cross-discipline perspective● Good understanding of calculus and (some) physics● Patience and creativity, ability to deal with “jargon”

Page 14: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

14HPC Introduction

HPC is a Pragmatic Discipline

● Raw performance is not always what matters:how long does it take me to get an answer?

● HPC is more like a craft than a science:=> practical experience is most important=> leveraging existing solutions is preferred over inventing new ones requiring rewrites=> a good solution today is worth more than a better solution tomorrow=> a readable and maintainable solution is better than a complicated one

Page 15: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

15HPC Introduction

How to Get My Answers Faster?

● Work harder=> get faster hardware (get more funding)

● Work smarter=> use optimized algorithms (libraries!)=> write faster code (adapt to match hardware)=> trade convenience for performance (e.g. compiled program vs. script program)

● Delegate parts of the work=> parallelize code, (grid/batch computing)=> use accelerators (GPU/MIC CUDA/OpenCL)

Page 16: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

16HPC Introduction

What Determines Performance?

● How fast is my CPU?● How fast can I move data around?● How well can I split work into pieces?

Very application specific:=> never assume that a good solution for one problem is as good a solution for another=> always run benchmarks to understand requirements of your applications and properties of your hardware=> respect Amdahl's law

Page 17: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

17

A Simple Calculator

1) Enter numberon keyboard=> register 1

2) Turn handleforward = addbackward= subtract

3) Multiply = addregister 1 with shifts until register 2 is 0

4) Register 3= result

Register 2

Register 1

Register 3

Controls

Arithmetic Unit

Page 18: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

18

A Simple CPU

● The basic CPU design is not much different from the mechanical calculator.

● Data still needs to be fetched into registers for the CPU to be able to operate on it.

Arithmetic/Logic UnitControl Unit RegistersFetch Next Instruction Add Sub

Mult Div

And Or

Not …

Integer

Floating Point

Fetch Data Store Data

Increment Instruction Ptr

Execute Instruction

Page 19: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

19

A Typical Computer

CPUMemory

Controller

BusController

RA

M

RA

M

RA

M

RA

M

Net

wo r

k

US

BGraphicsProcessor SA

TA

MassStorage

PeripheralsDisplay

Page 20: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

20

Running Faster v1: Cache Memory

● Registers are very fast, but very expensive● Loading data from memory

is slow, but RAM is cheapand there can be a lot of it

● Cache memory = small buffer of fastmemory between regular memoryand CPU; buffers blocks of data

● Cache can come in multiple “levels”, L#:L1: fastest/smallest <-> L3: slowest/largestcan be within CPU, or external

Page 21: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

21

Running Faster v2: Pipelining

● Multiple steps in one CPU “operation”:fetch, decode, execute, memory, write back=> multiple functional units

● Using a pipeline can improve their utilization,allows for faster clock

● Dependencies andbranches can stallthe pipeline=> branch prediction=> no “if” in inner loop

Page 22: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

22

Running Faster v3: Superscalar

● Superscalar CPU => instruction level parallelism● Some redundant functional units in single CPU

=> multiple instructions executed at same time● Often combined with pipelined CPU design● No data dependencies,

no branches● Not SIMD/SSE/MMX● Optimization:

=> loop unrolling

Page 23: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

23

Running Faster v4: Multi-core

● Maximum CPU clockrate limited by physics

● Implement multiplecomplete, pipelined,and superscalar CPUsinto one processor

● Need parallel softwareto take advantage

● Memory speed limiting

Page 24: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

24HPC Introduction

How Do We Measure Performance?

● For numerical operations: FLOP/s= Floating-Point Operations per second

● Theoretical maximum (peak) performance:clock rate x number of double precision addition and/or multiplications completed per clock=> 2.5 Ghz x 4 FLOP/clock = 10 GigaFLOP/s=> can never be reached (data load/store)

● Real (sustained) performance:=> very application dependent=> Top500 uses Linpack (linear algebra)

Page 25: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

25HPC Introduction

Fast and Slow Operations

● Fast (6): add, subtract, multiply● Medium (40): divide, modulus, sqrt()● Slow (300): most transcendental functions● Very slow (1000): power (xy for real x and y)

Often only the fastest operations are pipelined, so code will be the fastest when using only add and multiply => linear algebra => BLAS (= Basic Linear Algebra Subroutines) plus LAPACK (Linear Algebra Package)

Page 26: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

26HPC Introduction

Software Optimization

● Writing maximally efficient code is hard:=> most of the time it will not be executed exactly as programmed, not even for assembly

● Maximally efficient code is not very portable:=> cache sizes, pipeline depth, registers, instruction set will be different between CPUs

● Compilers are smart (but not too smart!) and can do the dirty work for us, but can get fooled

=> modular programming: generic code for most of the work plus well optimized kernels

Page 27: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

27HPC Introduction

Tips For Efficient Software● Write “compiler-friendly” code:

● Use algorithms with mostly “fast” operations● Break down long statements into smaller ones

-> the compiler will have to do it as well, but you know much better what you want-> small statements have less dependencies => better for superscaler/pipelined CPUs

● Use loops, but– Avoid “if” statements, complex loop bodies, function calls

● Try to access data in forward order, not random

● Use kernels in optimized (performance) libraries

Page 28: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

28HPC Introduction

A High-Performance Problem

Page 29: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

29HPC Introduction

Two Types of Parallelism

● Functional parallelism:different people areperforming differenttasks at the same time

● Data parallelism:different people areperforming the sametask, but on differentequivalent andindependent objects

Page 30: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

30HPC Introduction

Amdahl's Law vs. Real Life

● The speedup of a parallel program is limited by the sequential fraction of the program.

● This assumes perfect scaling and no overhead

32 64 128 256 512 1024 2048 4096

0%

25%

50%

75%

100%

1 Vesicle CG-System, 2 MPI / 6 OpenMP (SP)

OtherI/OCommNeighborKs paceBondPa ir

# of Nodes

Pe

rce

nta

ge

of

Ti m

e

Page 31: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

31HPC Introduction

Performance of SC Applications

● Strong scaling: fixed data/problem set;measure speedup with more processors

● Weak scaling: data/problem set increases with more processors; measure if speed is same

● Linpack benchmark: weak scaling test, more efficient with more memory => 50-90% peak

● Climate modeling (WRF): strong scaling test,work distribution limited, load balancing, serial overhead => < 5% peak (similar for MD)

Page 32: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

32HPC Introduction

Strong Scaling Graph

220 470 1006 2150 4596

0.1

0.15

0.24

0.39

0.61

8 Vesicles CG-System / 30,902,832 CG-Beads

12 MPI / 1 OpenMP6 MPI / 2 OpenMP4 MPI / 3 OpenMP2 MPI / 6 OpenMP2 MPI / 6 OpenMP (SP)

# Nodes

Tim

e p

er

MD

Ste

p ( s

ec)

Page 33: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

33HPC Introduction

Weak Scaling Graph

512 1024 2048 4096

0.05

0.1

0.15

0.2

Weak Scaling: 7,544 CG-Beads/Node

12 MPI / 1 OpenMP6 MPI / 2 OpenMP4 MPI / 3 OpenMP2 MPI / 6 OpenMP2 MPI / 6 OpenMP (SP)

# Nodes

Tim

e p

er

MD

-Ste

p (

sec)

Page 34: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

34HPC Introduction

Performance within an Application

128 256 384 768 128 256 384 768 768

0

5

10

15

20

25

Rhodopsin Benchmark, 860k Atoms, 64 Nodes, Cray XT5

OtherNeighborCommKspaceBondPair

# PE

Tim

e in

sec

onds

Page 35: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

35HPC Introduction

Multi-core MPI Performance vs. MPI+OpenMP

1024 2048 3072 6144 1024 2048 3072 6144 6144

0

5

10

15

20

25

Rhodopsin Benchmark, 860k Atoms, 512 Nodes, Cray XT5

OtherNeighborCommKs pa ceBondPa ir

# PE

Tim

e i

n s

eco

nd

s

Page 36: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

36HPC Introduction

Parallel Efficiency vs. Physics

Page 37: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

37HPC Introduction

A Real Life HPC Problem

● C code to study relations in social networks● Two steps:

1) construct a large matrix with yes/no information (1 or 0)2) process matrix by pruning lines and inserting corresponding entries into a second matrix

● Input parameters for block sizes (relation depth)● 80% of time in one (small) subroutine● Program too slow and needs too much RAM

Page 38: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

38HPC Introduction

What To Do

● Profiling to confirm performance info(true, except for very large blocks, then a different step becomes dominant)

● Since only 1/0 information is stored, replace “unsigned long” (64-bit) with “char” (8-bit)

● Add OpenMP multi-threading, since critical subroutine has loops that are suitable

● Test on different hardware to determine sensitivity to CPU vs. memory performance

Page 39: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

39HPC Introduction

Page 40: Introduction to High-Performance Computing · 2012-11-05 · HPC Introduction 5 An HPC Cluster is... A cluster needs: Several computers, nodes, often in special cases for easy mounting

Introduction toIntroduction toHigh-Performance ComputingHigh-Performance Computing

Dr. Axel Kohlmeyer

Scientific Computing Expert

Information and Telecommunication SectionThe Abdus Salam International Centre

for Theoretical Physics

http://sites.google.com/site/akohlmey/

[email protected]