Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Introduction to MPINicholas Pritchard
Semester 2, 2019
Remainder of the Course
1. Why bother with HPC (today)
2. What is MPI (today)
3. Point to point communication
4. User-defined datatypes / Writing parallel code / How to use a super-computer
5. Collective communication
6. Communicators
7. Process topologies
8. File/IO and Parallel profiling
9. Hadoop/Spark
10. More Hadoop / Spark
11. Alternatives to MPI
12. The largest computations in history / general interest / exam prep
Introduction
Introduction
• What is meant by high-performance computing?• Performing computations not possible on a single machine
• Complex applications at high-speeds
• Mainly scientific and engineering applications
• What is meant by distributed computing• Computations done on multiple machines
High-performance computing can refer to a single machine but in general refers to distributed parallelism
Why is this different to threading?
• Threading → Shared memory space
• Distributed parallelism → Isolated memory space
Levels of Parallelism
• SISD - Single instruction, single data• Your standard CPU core
• SIMD - Single instruction, multiple data• GPUs, Vector units in CPUs
• MISD – Multiple instruction, single data• Pipelining applications, generally very niche
• MIMD – Multiple instruction, multiple data• Generalised parallel computing
• Our focus
Where is the future heading?
• Traditionally HPC was reserved for large computing labs• Local machines were vastly inferior
• Many people came from a ‘mainframe’ mentality
• Desktops got much, much faster, cheaper and available• GPUs
• Multi-core CPUs
• HPC reserved for very specialized uses
Where is the future heading?
• Traditionally HPC was reserved for large computing labs• Local machines were vastly inferior• Many people came from a ‘mainframe’ mentality
• Desktops got much, much faster, cheaper and available• GPUs• Multi-core CPUs
• HPC reserved for very specialized uses
• New cloud-era computing is bringing HPC to the front again• Enormous data sizes• On-demand HPC example• Much faster networking capabilities
Why should you care?
• Electronic computers were invented to crunch numbers – We still need to crunch numbers
• Learning to write scalable code is something difficult to teach yourself
• Writing fast, reliable, useful code will never go out of style• Someone has to build these tools
• Large, numerical computations are your friends
• Most of the tools used in data-science, AI etc. (Keras, R, Numpy etc.) are all high-performant codes.
• High-performance computing is about using all the compute you have
Resources
• MPI – The Complete Reference (Snir et al. 1996)• Mainly 1.0 → Later versions focus on specific improvements, not the basics
• Introduction to Parallel Computing (Grama et al., 2nd edn)
There are many excellent resources online.
I am not the first, nor the last
Finally, I will try to add extra links/slides for interest where possible.
Aside: Practical tips
• You will be writing some complicated code this semester
• Make your life easy; use version control• Github
• Bitbucket
• Gitlab
• Etc.
• Almost all clusters use Linux • Basic bash scripting
• Use any editor you want – as long as it’s Vim ;)
What is MPI?
What is MPI
• Message Passing Interface
• All machines run the same code
• Messages are sent between them to guide computation
The end
What is MPI? Why is MPI?
• What happens when you run out of compute power?• Too much data
• Too many steps
a) Wait for a large enough computer?
b) Wait for a fast enough computer?
c) Staple a bunch of computers together?
What is MPI? Why is MPI?
• What happens when you run out of compute power?• Too much data
• Too many steps
a) Wait for a large enough computer?
• RAM is expensive
• We can probably find a problem with more data requirements than you have memory
What is MPI? Why is MPI?
• What happens when you run out of compute power?• Too much data
• Too many steps
a) Wait for a large enough computer?
b) Wait for a fast enough computer?
c) Staple a bunch of computers together?
What is MPI? Why is MPI?
• What happens when you run out of compute power?• Too much data
• Too many steps
a) h
b) Wait for a fast enough computer?
• Fast machines are exceptionally expensive
• Moore’s law is running out
• I can probably find a problem that needs answers faster than you can generate them
What is MPI? Why is MPI?
• What happens when you run out of compute power?• Too much data
• Too many steps
a) Wait for a large enough computer?
b) Wait for a fast enough computer?
c) Staple a bunch of computers together?
What is MPI? Why is MPI?
• What happens when you run out of compute power?• Too much data• Too many steps
c) Staple a bunch of computers together?
• Also called ‘building a super-computer’
• Just like solving any problem. Break it down, pass it around• Split the data across many machines →Most common• Split the tasks across many machines → Useful but quite tricky• Splits tasks and data across many machines → Rare but obscenely good
What is MPI? Why is MPI?
• So you have your super-computer
• Who built it?• Processor type• Cache sizes• Data representation
• How are the machines connected?• Ethernet• Infiniband
• What if we upgrade some of the computer• Not all machines the same
What is MPI
• Message Passing Interface
• All machines run the same code
• Messages are sent between them to guide computation
• MPI is a standard not a library itself
• MPI is portable
• MPI can work with heterogenous clusters
• Your MPI code can work with various configurations of machines
What MPI includes
• Defines an application programming interface
• Allow efficient communication• Avoid memory-to-memory copying
• Implementations in heterogeneous environments
• C and Fortran bindings
• Reliable communication
• Can be implemented on many platforms
• Thread-safety
What MPI does not include
• Shared-memory operations (openMP)
• Features requiring OS support not available at the time
• Programming tools
• Debugging tools
• Thread support
• Task management
• File I/O operations → Handled by MPI-IO (we’ll get to this later)
N.B. Some implementations of MPI offer extra functionality
What are the implementations?
• MPICH• Argonne National Laboratory + Mississippi State University• Generally sticks to the standard as close as possible
• Open MPI• Built by various super-computing facilities • Often found on actual hardware (will probably have extra functionality)
• Commercial offerings• HP• Intel• Microsoft• Etc.
Hello MPI
#include "mpi.h"#include <stdio.h>
int main(int argc, char *argv[]){
MPI_Init(&argc, &argv);
printf("Hello world\n");
MPI_Finalize();
return 0;}
All MPI programs need:
• MPI_Init
• MPI_Finalize()
Compiling MPI Programs
• If in doubt, refer to your compiler’s documentation
• mpicc→ calls the compilers
• Then standard flags as usual• -o
• -Wall
• -O1 -O2 -O3 (numerical optimization)
• $ mpicc -o helloWorld.exe helloworld.c
Running MPI Programs
• Not set by the standard
• mpiexec <args> part of MPI-2• Recommendation, not
requirement
• Often called mpirun
• -n → Number of processes
$ mpiexec -n 4 helloWorld.exe
• Call be called locally and will run multiple processes on one machine• An excellent debugging method
• Let’s you check your parallel logic before moving to the cluster
Hello MPI Better
#include "mpi.h"#include <stdio.h>
int main(int argc, char *argv[]){
int rank, size;MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &rank);MPI_Comm_size(MPI_COMM_WORLD, &size); printf("I am process %d of %d\n", rank, size);MPI_Finalize();
return 0;}
Some important ideas here:
• Communicators• Rank
• Size
• Procedure specification
Communicators
• Processes exist as part of a communicator• It’s a group of processes
• All processes are part of the MPI_COMM_WORLD communicator• It’s all the processes
• Rank – The ‘id’ of this process in that communicator
• Size – The number of processes in that communicator
Often important for processes to work out what job they should do
Rank 0 is often the ‘root’ process.
Procedure Specification
• Look at the function call
int MPI_Comm_rank(MPI_Comm comm, int *rank)
• In MPI arguments take one of three types• IN – Used but not updated (e.g. comm)
• OUT – May be updated (e.g. rank)
• INOUT - Both used and updated (less common but very important)
Procedure Specification
• Look at the function call
int MPI_Comm_rank(MPI_Comm comm, int *rank)
• It is common for arguments passed IN on one process to be passed as OUT on another process →We’re not in Kansas anymore
• OUT or INOUT arguments cannot be aliased with any others
Procedure specification
• OUT or INOUT arguments cannot be aliased with any others
E.g.
Called as
void copyIntBuffer(int *pin, int *pout, int len){int i;for(i = 0; i<len; ++i){
pout[i] = pin[i];}
}
int a[10];
copyIntBuffer(a, a+3, 7);
Perfectly legal in C
Perfectly illegal in MPI
May (probably will) break your program
“Just because you can,
doesn’t mean you should can.”
Error Handling
• Look at the function call
int MPI_Comm_rank(MPI_Comm comm, int *rank)
• Look, it’s an error code. We should probably listen to it• By default, fatal errors are fatal
• A successful call will return MPI_SUCCESS• Other codes are procedure dependent
• Tip – Write your own error handlers
Error handling
#include "mpi.h"#include <stdio.h>
int main(int argc, char *argv[]){
int rank, size;MPI_Status status;MPI_Init(&argc, &argv);status = MPI_Comm_rank(MPI_COMM_WORLD, &rank);error_handle(status);status =MPI_Comm_size(MPI_COMM_WORLD, &size); error_handle(status);printf(“I am process %d of %d\n", rank, size);MPI_Finalize();
return 0;}
void error_handle(int status){switch(status){
case MPI_SUCCESS:break;
case 1://printf("help\n");break;
default://printf("call ghost-busters\n");break;
}}
Summary
• High-performance computing is important for the hardest problems we have• Also teaches generally good code practice
• MPI is a standard for orchestrating distributed parallelism
• Processes exist in communicators
• All processes are part of MPI_COMM_WORLD by default
• MPI functions have their own style and rules independent of language
• Error handling is important for debugging
Next Time
• Point-to-point communication → Let’s pass some messages
• Generic MPI call types
• MPI_Datatypes
• How to use a super-computer *Examples included*
• Recommended things to do• Install MPI locally
• Get familiar with the documentation (https://www.mpi-forum.org/docs/)
• Have a go at running a ‘hello MPI’ example
The Project
Quick Reminder
• Tasked to write some numerical code with sparse matrices
• Key components• Sparse matrix representation
• Numerical operations• Threading
• ‘House-keeping’
• Report writing
Important: We are not marked on your code’s performance but your assessment of performance
Adjustment to brief
• Multi-character command line options need two dashes• E.g. --mm• Allows you to use standard CLA parsing functions (getopt etc.)
• In scalar multiplication the scalar will always be a float
• Output files must include timing information at the end of the file• Time to load and convert matrix files (s)• Time to execute requested operations (s)• Formatting of time values (number of decimal places etc.) is not important
• An adjust brief is available online
• Larger example matrices will be available after the lecture
Sparse Matrices
• Brief only gives a rough overview
• You are expected to read around yourself
• Three methods are presented• Coordinate form
• Compressed row-form
• Compressed column-form
• You do not need to implement them all but you are expected to mention them in your report (i.e. why you decided to use one form over another• It would be fine to use one representation for particular operations for instance.
Sparse Matrices – Elaborating on the Report
• Consider when a sparse representation saves memory over a dense matrix
• Consider the rough time-complexity of various operations
• There other sparse matrix representations, you do not need to investigate those
Coordinate form
• The most intuitive sparse representation
• Specifies all information about all non-zero values (also known as ‘ijv’ or ‘triplet’ form)
• Particularly nice when constructing matrices• Often used when converting between forms
• Each non-zero has three values (i, j, v)
• Can be stored as a list of triplets
• Can be stored as three separate arrays
Compressed Row Form
• Takes access patterns into account• “Hey, we normally look at all elements in each row”
• Stores value in order of “Left to right, top to bottom”
Specified by three arrays
• Value[nnz] – Stores the data value for each non-zero
• Cols[nnz] – Gives the column index for the i-th non-zero
• Rows[rows] – The ‘strange’ one • Rows[0] = 0
• Rows[i] = Rows[i-1] + |nnz| in row i-1
Compressed Row Form
• Rows[rows] – The ‘strange’ one • Rows[0] = 0
• Rows[i] = Rows[i-1] + |nnz| in row i-1
• Let’s you extract how many elements in the values array are in each row
• Vals[Rows[i]] to Vals[Rows[i+1] – 1]• Slices the values array into per-row sections
Compressed Column Form
• Takes access patterns into account• “Hey, we normally look at all elements in each column”
• Stores value in order of “Top to bottom, left to right”
Specified by three arrays
• Value[nnz] – Stores the data value for each non-zero
• Rows[nnz] – Gives the row index for the i-th non-zero
• Cols[rows] – The ‘strange’ one • Cols[0] = 0
• Cols[i] = Cols[i-1] + |nnz| in column i-1
Compressed Column Form
• Cols[rows] – The ‘strange’ one • Cols[0] = 0
• Cols[i] = Cols[i-1] + |nnz| in column i-1
• Let’s you extract how many elements in the values array are in each row
• Vals[Cols[i]] to Vals[Cols[i+1] – 1]• Slices the values array into per-column sections
Numerical Operations
• Marks have been allocated relative to difficulty
• A missing operation implementation will lose marks in both code and report (but not massively)• Want to encourage a complete, correct submission
House-Keeping
• We specify inputs and output quite exactly for a number of reasons• Makes starting the project simpler• Makes marking the project faster• Makes assessing the project fairer (the high-level ‘design’ is the same for
everyone)
• Feel free to ask for clarifications from the tutors or myself
• Some small hints• We are expecting UNIX newlines (Windows appends both \r\n)• To test the numerical performance of your code, the number of calls to write
a file should be minimized (save it to the end)
Closing hints
• Start early
• Use good coding practice• Multiple files where appropriate
• Version control is your friend
• Build scripts (makefiles, bash scripts etc.)
• Start simple but correct, then improve
• Start the report early
• Ask for help if you’re stuck
Helpful tips
Hit the ground running
• Writing and running code for a remote cluster can be tricky• Not your computer (ew.)• Maybe a different OS (Linux)• Not as many bells and whistles (no VS
code, Eclipse, CLion, Visual Studio etc.)
• Some things may be new (or rusty), some things perhaps not
We want you to learn high-performance computing, not rage
against the machine
Linux
What is Linux
• Open-source operating system
• Powers: • 94% of the world’s supercomputers• Majority of internet servers • Majority of financial trades• All android devices
• Focused on making software development easier and available
• Comes in various ‘distros’ – most either Debian or Arch based • Debian flavours are typically more ‘friendly’• Arch flavours are more cutting-edge
What is Linux
• Strongly encourage you to use Linux this course• It’s what our cluster uses
• Growing in popularity among professionals
• Check out these distros (Ubuntu is a good first-timer choice)• Ubuntu
• Debian
• Red Hat
• Mint
• CentOS
How to Linux
• Many guides available. Here are a couple of good ones
• Linux Foundation
• Digital Ocean
SSH
What is SSH
• Stands for ‘Secure SHell’
• Is a channel allowing secure access to another terminal• Can be on your local machine
• Or on a remote machine
• Let’s you send commands to another machine and have the results piped back• Perfect for accessing a super-computer or other networked machine from
your own
How to SSH
• Windows• Use PuTTY
• Try Microsoft’s OpenSSH client (beta)
• Mac• Open a terminal
• $ ssh <address> (may need to $ brew install ssh)
• Linux • Open a terminal
• $ ssh <address>
How to SSH
• Many useful guides online. Here are a couple:
• ssh.com
• How to Geek
Version Control
What is Version Control
• Too much to go into detail here
• Built to keep track of changes to code files
• E.g. Subversion/SVN, Git, Mecurial
• Git is the canonical choice (GitHub, Bitbucket, GitLab)
• Many, many good guides• git themselves
• Guide using GitHub
• A clean guide
Vim
What is Vim / Text-editor
• Before the days of desktops and windowed environments the terminal was all you had
• Of course, people needed to edit files (code) and tools were built to do this
• Vim is one of the most ubiquitous (and is available on our cluster)
• Others include:• Nano• Emacs
• Useful to edit files on a cluster remotely. Not necessarily the ‘best’ way to code from scratch
How to <editor>
• Vim• Linux.com
• Tutorials Point
• Another good introduction
• Emacs• GNU Foundation
• A good introduction
• Nano• Gentoo Linux
TakeawayChoose the editor you are comfortable with
A Note on Fortran
What is Fortran
• Formula Translator
• The first ‘high-level’ programming language
• Strong focus on number crunching
• Very strict (but simple) form for programs
• Compilers are therefore extremely aggressive
Fortran programs are often the fastest
Most programs run on super-computers are written in Fortran
So why not use it?
• System operations (file reading, networking, memory management etc.) can be painful
• You are probably more familiar with C, and it is more application focused. • Nowadays, Fortran programs are used only for numerical algorithms
• E.g. Numpy (via BLAS)• BLAS• LAPACK• NETLIB
Fortran looks like it will stick around for a long time (latest language revision 2008)“I don’t know what the language of the year 2000 will look like, but I know it will be called Fortran” – Tony Hoare, winner of 1980 Turing Award
Installing MPI
Linux (Assuming Ubuntu)
• Open a terminal
• MPICH• $ sudo apt install mpich
• Compile with mpicc
• Run with mpiexec
• Open MPI• $ sudo apt install open-mpi
• Compile with mpicc
• Run with mpirun
• Any other distribution – use your own package manager where possible
Mac
• Open a terminal
• MPICH• $ brew install mpich
• Compile with mpicc
• Run with mpiexec
• Open MPI• $ brew install open-mpi
• Compile with mpicc
• Run with mpirun
Windows
• Microsoft – MPI (MS-MPI) – based on MPICH
• Strong recommendation to try using Linux• Virtual Machine (Virtual box)• Windows Subsystem for Linux (might be okay)• Install Linux along-side (a fun thing to learn)
• If not, I recommend using Visual Studio• Go for the ‘full-fat’ windows experience
• MS-MPI available from their download center• Comes with a wizard
• Check out a basic example