37
©Jesper Larsson Träff WS15/16 Introduction to Parallel Computing Jesper Larsson Träff, Angelos Papatriantafyllou {traff,papatriantafyllou}@par.tuwien.ac.at Parallel Computing, 184-5 Favoritenstrasse 16, 3. Stock Sprechstunde: Per email-appointment

Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Introduction to Parallel Computing

Jesper Larsson Träff, Angelos Papatriantafyllou {traff,papatriantafyllou}@par.tuwien.ac.at

Parallel Computing, 184-5

Favoritenstrasse 16, 3. Stock

Sprechstunde: Per email-appointment

Page 2: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Parallel Computing

Parallel computers are around and everywhere

(it was not always like that)

How to use them? •Efficiently? •In practice?

How do they look?

Algorithms, Languages, Interfaces, (Applications)

Architecture, Models

What are they good for? What is a parallel computer? Why is that?

Page 3: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Parallel Computing: Prerequisites

Some understanding of: •Programming, programming languages (we will use C…) •Algorithms and data structures, asymptotic analysis of algorithms O(f(n)) •Computer architecture •Operating systems

Page 4: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

•Introduction: aims, motivation, basics, history (Amdahl‘s Law, Moore‘s „Law“,…) •Shared memory parallel computing •Concrete language: OpenMP, pthreads, Cilk •Distributed memory parallel computing •Concrete interface: MPI (Message-Passing Interface) •New architectures, new languages (GPU, CUDA, OpenCL) •Other languages, paradigms

This VU: Introduction to parallel computing

Theory and PRACTICE: Learning by doing the project

Page 5: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Focus on Principles: parallel algorithms, (architectures), languages and interfaces

Introduction to parallel computing

Lot‘s of approches, languages, interfaces that will not be treated – but possible to follow up later: bachelor-thesis, project, master-thesis, master lectures, seminars. See us!

Standard, paradigmatic, actual, much-used languages and interfaces (MPI, OpenMP, pthreads/C threads, Cilk)

Page 6: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Prerequisites (≥3rd Semester, STEOP)

•C/C++, Fortran (Java) programming skills •Operating systems •Algorithms&Data structures •Computer architecture •…

Interest in solving problems faster and better in theory and practice…

Page 7: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Lectures, exercises, projects

•Monday, 10:00-12:00 MANDATORY •Occasionally: Thursday, 10:00-12:00 (also MANDATORY)

MONDAY: Freihaus Hörsaal 7 (FH7) THURSDAY: EI 5 Hochenegg (Gusshausstarsse 25)

Project work: ON YOUR OWN – there will be Q&A sessions (Thursday slots) Can start early, complete before end of lecture, discussion/examination at end of semester (late January, early February)

Page 8: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Lectures, exercises, projects

Capacity?

Lecture was originally planned for 40+ students…

We only have two parallel systems…

SIGN UP in TISS SIGN OFF in TISS

…if you decide not to follow the lecture, makes administration easier

Page 9: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Requirements, credit (4 hours/week, 6ECTS)

•Lecture attendance MANDATORY •Active participation during lectures •Presentation of project work (exam) MANDATORY •Hand-in of project work MANDATORY:

1. Short write-up 2. Program code 3. Results

Practical project work: should be done in groups of 2

NOTE: See us (“Sprechstunde”) in case of problems with schedule (unable to finish project in time)

Page 10: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Practical project work: should be done in groups of 2

GRADE: Based on project presentation and hand-in

Requirements, credit (4 hours/week, 6ECTS)

•Lecture attendance MANDATORY •Active participation during lectures •Presentation of project work (exam) MANDATORY •Hand-in of project work MANDATORY:

1. Short write-up 2. Program code 3. Results

Page 11: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Practical project work: should be done in groups of 2

NOTE: Solutions to project exercises can possibly be found somewhere. Don’t cheat yourself!!

Requirements, credit (4 hours/week, 6ECTS)

•Lecture attendance MANDATORY •Active participation during lectures •Presentation of project work (exam) MANDATORY •Hand-in of project work MANDATORY:

1. Short write-up 2. Program code 3. Results

Page 12: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Practical project work: should be done in groups of 2

NOTE: Solutions to project exercises can possibly be found somewhere. Don’t cheat us: be open about what you took from others, plagiarism will automatically result in grade 6 (fail)!

Requirements, credit (4 hours/week, 6ECTS)

•Lecture attendance MANDATORY •Active participation during lectures •Presentation of project work (exam) MANDATORY •Hand-in of project work MANDATORY:

1. Short write-up 2. Program code 3. Results

Page 13: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Grading: DON’T OPTIMIZE

Active participation in lectures… Written project solution, quality of programs (correctness, performance, readibility), oral explanation, knowledge of course material

Each project will consist of 3 parts; deliberately, not everything is said explicitly (but enough should be said)

Very roughly: •1-2: All parts solved, performance/speed-up achieved, everything can be explained •2-3: 2 out of 3… •Fail: Less than 1 out of tree

Page 14: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Grading: DON’T OPTIMIZE

Groups of two: Stand or fall as a group, ideally both get same grade

Means:… Both group members should have contributed and feel responsible for all parts of the solutions

Page 15: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

ECTS breakdown

•Planning, intro („Vorbesprechung“): 1h •Lectures: 15 x 2h = 30h •Preparation: 45h •OpenMP: 20h •Cilk: 20h •MPI: 20h •Write-up: 10h •Presentation, including preparation: 4h

Total: 150h = 6ECTS

Page 16: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Project exercises

•Programming exercises using the main three languages/interfaces covered in the lecture (OpenMP/pthreads, Cilk, MPI). Each exercise will explore the same problem in all three paradigms •Tentatively: Select 1 (or 2) out of 4 •Focus on achieving and documenting improved performance (good benchmarking) •Correctness first! •(Some) room for creativity

Page 17: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Report: •IN ENGLISH (as far as possible), DEUTSCH ERLAUBT •State problem, hypothesis, explain (briefly) solution, implementation details and issues, state of solution (correct, what works, what not), testing and benchmarking approach, document performance (plots or tables) •Compare/comment on paradigms •8-15 pages per exercise, including performance plots

Project exercises, credit

Document solution with code and short report

Code: readable, compilable, correct

Project exercises in groups of two

Page 18: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Schedule –TENTATIVE

5.10: Planning, overview („Vorbesprechung“) 12.10: Motivation, concepts 19.10: Example problems: merging, prefix-sums 26.10: LECTURE FREE 2.11: LECTURE FREE 9.11: OpenMP 16.11: Presumably NO LECTURE (do project work) 23.11: OpenMP 30.11: OpenMP, Cilk 7.12: Cilk 14.12: Distributed memory architectures & programming, MPI 11.1: MPI 18.1: Other architectures and interfaces 25.1: Project Q&A

Project work

1.2: Project hand-in

22.10: Projects presentation (IMPORTANT! in EI 5)

Some Thursdays

Page 19: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Schedule –TENTATIVE

Idea: All basics, and all 3 interfaces (OpenMP, Cilk, MPI) covered before Christmas (so: some Thursdays may be necessary) January: Other architectures and interfaces; project work

Project hand-in: 1.2.2015 Exams: 8-12.2.2015

IF problems with any of these dates, contact us in advance! Later hand-in of projects NOT possible (a later or earlier examination may be, but with good reason)

Per group sign-up in TISS

Page 20: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Literature, course material

Slides in English – will be made available at

www.par.tuwien.ac.at/teaching/2015w/184.710.psp

Look here and TISS for information (cancelled lectures, change of plans, …). Will try to keep up to date, timely (but lectures will not be ready much in advance…)

No script; slides should be enough for doing the project work, additional material can be found easily

Page 21: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Organizational

TUWEL for •Forming the groups •Getting accounts •Your discussions? •Uploading code/reports

Register in groups of 2 now (until 30.10.15)! First exercise: Apply for account (ssh key) via TUWEL until 2.11.15

Page 22: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Literature: general

•Thomas Rauber, Gudula Rünger: Parallel Programming for multicore and cluster systems. Springer, 2nd edition 2013 •Grama, Gupta, Karypis, Kumar: Introduction to Parallel Computing. Second edition. Pearson 2003 •Michael J. Quinn: Parallel Programming in C with MPI and OpenMP. McGraw-Hill, 2004 •Calvin Lin, Lawrence Snyder: Principles of parallel programming. Addison-Wesley, 2008 •Peter Pacheco: An introduction to parallel programming. Morgan Kaufmann, 2011

Randal E. Bryant, David R. O‘Hallaron: Computer Systems. Prentice-Hall, 2011

Page 23: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Literature: general

•(almost) NEW: Encyclopedia of Parallel Computing. David Padua (editor). Springer, 2011.

Handbook of Parallel Computing. Rajasekaran/Reif (editors). Chapman&Hall, 2008

Page 24: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Literature: OpenMP, MPI, CUDA

Chandra, Dagum et al.: Parallel Programming in OpenMP. Morgan Kaufmann, 2001 Barbara Chapman, Gabriele Jost, Ruud van der Pas: Using OpenMP. MIT, 2008

MPI: A message-passing interface standard. Version 3.1 Message Passing Interface Forum, June 4th, 2015. www.mpi-forum.org/docs/docs.html William Gropp, Ewing Lusk, Anthony Skjellum: Using MPI. MIT, 1999

David B. Kirk, Wen-mei Hwu: Programming massively parallel processors. Morgan Kaufmann, 2010

Page 25: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Page 26: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Systems, hardware

OpenMP, Cilk 48-core AMD-based shared-memory cluster, „Saturn“

MPI 36-node InfiniBand AMD-based 2x8 core cluster = 576 processor cores, „Jupiter“

Access via ssh (instructions to follow), program at home/TU. No actual lab

saturn.par.tuwien.ac.at

jupiter.par.tuwien.ac.at

Page 27: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Jupiter: small InfiniBand cluster, AMD processors

Saturn: AMD-based, shared-memory system

Page 28: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Other systems at TU Wien parallel computing

•Pluto: 16-core Ivy-bridge system + 2xNVidia K20x GPU + 2xIntel Xeon-Phi 60-core accelerator • Mars: 80-core Intel Westmere system, 1TB shared memory •Ceres: 64-core Oracle/Fujitsu shared memory system, Sparc-based with HW-support for 512 threads, 1TB shared memory

Research systems for bachelor, master and PhD work…

Page 29: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

The Austrian Top500 HPC system: www.vsc.ac.at

Page 30: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Access to TU Wien systems saturn and jupiter

Login via ssh

Get account by sending email to Markus Levonyak, see TUWEL exercise. Some information on how to login and use in TUWEL

Page 31: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Using the systems

•Saturn: for shared-memory project part (Cilk and OpenMP) •Jupiter: for distributed memory project part (MPI

Free access, interactive use till Christmas January: (probably) use only via batch system (slurm)

START EARLY on the projects!

Page 32: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Research Group Parallel computing

Some information at www.par.tuwien.ac.at

Favoritenstrasse 16, 3rd floor

Next to U1, Taubstummengasse, exit Floragasse

Page 33: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Research Group Parallel computing

Exam (Early Februray) in Favoritenstrasse 16, 3rd floor, HK 03 20

Contact: Use lecture first,TUWEL second, contact us per email (for questions, appointment): [email protected],ac,at [email protected]

Page 34: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

TU Wien Research Group Parallel Computing

Our themes

1. HPC languages, interfaces – design, algorithmic support and implementation (MPI, PGAS)

2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

3. Parallel algorithms 4. Scheduling in theory and practice 5. Communication networks (routing), memory-hierarchy 6. Experimental parallel computing – benchmarking, validation,

reproducibility 7. (Heterogeneous parallel computing: interfaces, autotuning,

scheduling)

Page 35: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Algorithms

Programming languages

Architectures

Applications

Parallel computing

TU Wien Research Group Parallel Computing

Page 36: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

Algorithms

Programming interfaces

Architectures

Applications

TU Wien parallel computing

Algorithms

Page 37: Introduction to Parallel Computing · 2. Interfaces for multi-core parallel computing – algorithmic support and implementation: task-parallel models, lock- and wait-free data structures

©Jesper Larsson Träff WS15/16

VU Parallel Computing

VU Parallel Algorithms •PRAM •Network algorithms

VU High Performance Computing

SE Topics in Parallel Programming Models, Algorithms, Architectures

VU Advanced Multiprocesor Programming •Programming models, lock-free algorithms and data structures

Bachelor:

Master:

Bachelor thesis

Master‘s thesis Project