CS2403 Programming Languages Concurrency

Preview:

DESCRIPTION

CS2403 Programming Languages Concurrency. Chung-Ta King Department of Computer Science National Tsing Hua University. (Slides are adopted from Concepts of Programming Languages , R.W. Sebesta). Outline. Parallel architecture and programming Language supports for concurrency - PowerPoint PPT Presentation

Citation preview

CS2403 Programming Languages

Concurrency

Chung-Ta KingDepartment of Computer ScienceNational Tsing Hua University

(Slides are adopted from Concepts of Programming Languages, R.W. Sebesta)

2

Outline

Parallel architecture and programming Language supports for concurrency

Controlling concurrent tasksSharing dataSynchronizing tasks

3

Sequential Computing

von Neumann arch. with Program Counter (PC) dictates sequential execution

Traditional programming thus follows a single thread of controlThe sequence of program

points reached as control flows through the program

Program counter(Introduction to Parallel Computing, Blaise Barney)

4

Sequential Programming Dominates

Sequential programming has dominated throughout computing history

Why?Why is there no need to change programming

style?

5

2 Factors Help to Maintain Perf.

IC technology: ever shrinking feature sizeMoore’s law, faster switching, more functionalities

Architectural innovations to remove bottlenecks in von Neumann architecture Memory hierarchy for reducing memory latency:

registers, caches, scratchpad memoryHide or tolerate memory latency: multithreading,

prefetching, predication, speculationExecuting multiple instructions in parallel:

pipelining, multiple issue (in-/out-of-order, VLIW), SIMD multimedia extensions (inst.-level parallelism, ILP)

(Prof. Mary Hall, Univ. of Utah)

6

End of Sequential Programming?

Infeasible for continuing improving performance of uniprocessorsPower, clocking, ...

Multicore architecture prevails (homogeneous or heterogeneous)Achieve performance gains with simpler

processors Sequential programming still alive!

Why?Throughput versus execution time

Can we live with sequential prog. forever?

7

Parallel Programming

A programming style that specify concurrency (control structure) & interaction (communication structure) between concurrent subtasksStill in imperative language style

Concurrency can be expressed at various levels of granularityMachine instruction level, high-level language

statement level, unit level, program level Different models assume different

architectural supportLook at parallel architectures first

(Ananth Grama, Purdue Univ.)

8

An Abstract Parallel Architecture

How is parallelism managed? Where is the memory physically located? What is the connectivity of the network?

(Prof. Mary Hall, Univ. of Utah)

9

Flynn’s Taxonomy of Parallel Arch.

Distinguishes parallel architecture by instruction and data streamsSISD: classical uniprocessor architecture

S I S D Single Instruction,

Single Data

S I M D Single Instruction,

Multiple Data

M I S D Multiple Instruction,

Single Data

M I M D Multiple Instruction,

Multiple Data

(Introduction to Parallel Computing, Blaise Barney)

10

Parallel Control Mechanisms

(Prof. Mary Hall, Univ. of Utah)

11

2 Classes of Parallel Architecture

Shared memory multiprocessor architecturesMultiple processors can operate independently

but share the same memory systemShare a global address space where each

processor can access every memory locationChanges in a memory location

effected by one processor are visible to all other processors like a bulletin board

(Introduction to Parallel Computing, Blaise Barney; Prof. Mary Hall, Univ. of Utah)

12

2 Classes of Parallel Architecture

Distributed memory architecturesProcessing units (PEs) connected by an

interconnectEach PE has its own distinct address space

without a global address space, and they explicitly communicate to exchange data

Ex.: PC clusters of connected by commodity Ethernet

(Introduction to Parallel Computing, Blaise Barney; Prof. Mary Hall, Univ. of Utah)

13

Shared Memory Programming

Often as a collection of threads of controlEach thread has private data, e.g., local stack,

and a set of shared variables, e.g., global heapThreads communicate implicitly by writing and

reading shared variablesThreads coordinate through locks and barriers

implemented using shared variables

(Prof. Mary Hall, Univ. of Utah)

14

Distributed Memory Programming

Organized as named processesA process is a thread of control plus local address

space -- NO shared dataA process cannot see the memory contents of other

processes, nor can it address and access themLogically shared data is partitioned over processesProcesses communicate by explicit send/receive.

i.e., asking the destination process to access its local data on behalf of the requesting process

Coordination is implicit in communication events blocking/non-blocking send and receive

(Prof. Mary Hall, Univ. of Utah)

15

Distributed Memory Programming

Private memory looks like mailbox

(Prof. Mary Hall, Univ. of Utah)

16

Specifying Concurrency

What language supports are needed for parallel programming?

Specifying (parallel) control flowsHow to create, start, suspend, resume, stop

processes/threads? How to let one process/thread explicitly wait for events or another process/thread?

Specifying data flows among parallel flowsHow to pass a data generated by one

process/thread to another process/thread?How to let multiple process/thread access

common resources, e.g., counter, with conflicts

17

Specifying Concurrency

Many parallel programming systems provide libraries and perhaps compiler pre-processors to extend a traditional imperative language, such as C, for parallel programmingExamples: Pthread, OpenMP, MPI,...

Some languages have parallel constructs built directly into the language, e.g., Java, C#

So far, the library approach works fine

18

Shared Memory Prog. with Threads

Several thread libraries: PThreads: the POSIX threading interface

POSIX: Portable Operating System Interface for UNIX

Interface to OS utilitiesSystem calls to create and synchronize threads

OpenMP is newer standardAllow a programmer to separate a program into

serial regions and parallel regionsProvide synchronization constructsCompiler generates thread program & synch.Extensions to Fortran, C, C++ mainly by directives

(Prof. Mary Hall, Univ. of Utah)

19

Thread Basics

A thread is a program unit that can be in concurrent execution with other program units

Threads differ from ordinary subprograms:When a program unit starts the execution of a

thread, it is not necessarily suspendedWhen a thread’s execution is completed,

control may not return to the callerAll threads run in the same address space but

have own runtime stacks

20

Message Passing Prog. with MPI

MPI defines a standard library for message-passing that can be used to develop portable message-passing programs using C or FortranBased on Single Program, Multiple Data (SPMD)All communication, synchronization require

subroutine calls no shared variablesProgram runs on a single processor just like any

uniprocessor program, except for calls to message passing library

It is possible to write fully-functional message-passing programs by using only six routines

(Prof. Mary Hall, Univ. of Utah; Prof. Ananth Grama, Purdue Univ. )

21

Message Passing Basics

The computing systems consists of p processes, each with its own exclusive address spaceEach data element must belong to one of the

partitions of the space; hence, data must be explicitly partitioned and placed

All interactions (read-only or read/write) require cooperation of two processes - the process that has the data and one that wants to access the data

All processes execute asynchronously unless they interact through send/receive synchronizations

(Prof. Ananth Grama, Purdue Univ. )

22

Controlling Concurrent Tasks

Pthreads:Program starts with a single master thread, from

which other threads are created errcode = pthread_create(&thread_id,

&thread_attribute, &thread_fun, &fun_arg);

Each thread executes a specific function, thread_fun(), representing thread’s computation

All threads execute in parallel Function pthread_join() suspends execution of

calling thread until the target thread terminates

(Prof. Mary Hall, Univ. of Utah)

23

Pthreads “Hello World!”

#include <pthread.h>void *thread(void *vargp);int main() { pthread_t tid; pthread_create(&tid, NULL, thread, NULL); pthread_join(tid, NULL); pthread_exit((void *)NULL);}void *thread(void *vargp){ printf("Hello World from thread!\n"); pthread_exit((void *)NULL);}

(http://www.cs.binghamton.edu/~guydosh/cs350/hello.c)

24

Controlling Concurrent Tasks (cont.)

OpenMP:Begin execution as a single process and fork

multiple threads to work on parallel blocks of code single program multiple data

Parallel constructs are specified using Pragmas

(Prof. Mary Hall, Univ. of Utah)

25

OpenMP Pragma

All pragmas begin: #pragma Compiler calculates loop bounds for each

thread and manages data partitioningSynchronization also automatic (barrier)

(Prof. Mary Hall, Univ. of Utah)

26

OpenMP “Hello World!”

#include <omp.h>int main (int argc, char *argv[]) { int th_id, nthreads; #pragma omp parallel private(th_id) { th_id = omp_get_thread_num(); printf("Hello World: %d\n", th_id); #pragma omp barrier if ( th_id == 0 ) { nthreads = omp_get_num_threads(); printf("%d threads\n",nthreads); } } return EXIT_SUCCESS;}

(http://en.wikipedia.org/wiki/OpenMP#Hello_World)

27

Controlling Concurrent Tasks (cont.)

Java:The concurrent units in Java are methods

named runA run method code can be in concurrent

execution with other such methodsThe process in which the run methods execute

is called a threadClass myThread extends Thread {public void run () {...}

}...Thread myTh = new MyThread ();myTh.start();

28

Controlling Concurrent Tasks (cont.)

Java Thread class has several methods to control the execution of threadsThe yield is a request from the running thread

to voluntarily surrender the processorThe sleep method can be used by the caller of

the method to block the threadThe join method is used to force a method to

delay its execution until the run method of another thread has completed its execution

29

Controlling Concurrent Tasks (cont.)

Java thread priority:A thread’s default priority is the same as the

thread that create itIf main creates a thread, its default priority is NORM_PRIORITY

Threads defined two other priority constants, MAX_PRIORITY and MIN_PRIORITY

The priority of a thread can be changed with the methods setPriority

30

Controlling Concurrent Tasks (cont.)

MPI:Programmer writes the code for a single

process and the compiler includes necessary librariesmpicc -g -Wall -o mpi_hello mpi_hello.c

The execution environment starts parallel processesmpiexec -n 4 ./mpi_hello

(Prof. Mary Hall, Univ. of Utah)

31

MPI “Hello World!”

#include "mpi.h"int main(int argc, char *argv[]) { int rank, size; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf(”Hello World from process %d of

%d\n", rank, size); MPI_Finalize(); return 0;}

(Prof. Mary Hall, Univ. of Utah)

32

Sharing Data

Pthreads:Variables declared outside of main are sharedObject allocated on the heap may be shared (if

pointer is passed)Variables on the stack are private: passing pointer to

these around to other threads can cause problemsShared variables can be read and written directly by

all threads need synchronization to prevent racesSynchronization primitives, e.g., semaphores, locks,

mutex, barriers, are used to sequence the executions of the threads to indirectly sequence the data passed through shared variables

(Prof. Mary Hall, Univ. of Utah)

33

Sharing Data (cont.)

OpenMP:shared variables are shared; default is sharedprivate variables are privateLoop index is private int bigdata[1024]; void* foo(void* bar) { int tid; #pragma omp parallel \ shared (bigdata) private (tid) { /* Calc. here */ } }

(Prof. Mary Hall, Univ. of Utah)

34

Sharing Data (cont.)

MPI:int main( int argc, char *argv[]) { int rank, buf; MPI_Status status; MPI_Init(&argv, &argc); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank == 0) { buf = 123456; MPI_Send(&buf, 1, MPI_INT, 1, 0,

MPI_COMM_WORLD); } else if (rank == 1) { MPI_Recv(&buf, 1, MPI_INT, 0, 0,

MPI_COMM_WORLD, &status);} MPI_Finalize();}

(Prof. Mary Hall, Univ. of Utah)

35

Synchronizing Tasks

A mechanism that controls the order in which tasks execute

Two kinds of synchronizationCooperation: one task waits for another, e.g., for

passing datatask 1 task 2

a = ... ... = ... a ...Competition: tasks compete for exclusive use of

resource without specific ordertask 1 task 2

sum += local_sum sum += local_sum

36

Synchronizing Tasks (cont.)

Pthreads:Provide various synchronization primitives,

e.g., mutex, semaphore, barrierMutex: protects critical sections -- segments of

code that must be executed by one thread at any timeProtect code to indirectly protect shared data

Semaphore: synchronizes between two threads using sem_post() and sem_wait()

Barrier: synchronizes threads to reach the same point in code before going any further

37

Pthreads Mutex Example

pthread_mutex_t sum_lock;int sum; main() { ... pthread_mutex_init(&sum_lock, NULL); ... } void *find_min(void *list_ptr) {

int my_sum; pthread_mutex_lock(&sum_lock); sum += my_sum; pthread_mutex_unlock(&sum_lock);

}

38

Synchronizing Tasks (cont.)

OpenMP:OpenMP has reduce operationsum = 0; #pragma omp parallel for reduction(+:sum)for (i=0; i < 100; i++) { sum += array[i]; }OpenMP also has critical directive that is

executed by all threads, but restricted to only one thread at a time

#pragma omp critical [( name )] new-line sum = sum + 1;

(Prof. Mary Hall, Univ. of Utah)

39

Synchronizing Tasks (cont.)

Java:A method that includes the synchronized

modifier disallows any other method from running on the object while it is in execution

public synchronized void deposit(int i) {…}

public synchronized int fetch() {…}The above two methods are synchronized

which prevents them from interfering with each other

40

Synchronizing Tasks (cont.)

Java:Cooperation synchronization is achieved via wait, notify, and notifyAll methods

All methods are defined in Object, which is the root class in Java, so all objects inherit them

The wait method must be called in a loopThe notify method is called to tell one waiting

thread that the event it was waiting has happened

The notifyAll method awakens all of the threads on the object’s wait list

41

Synchronizing Tasks (cont.)

MPI:Use send/receive to complete task synchronizations,

but semantics of send/receive have to be specializedNon-blocking send/receive:

Non-blocking send/receive: send() and receive() calls will return no matter whether data has arrived

Blocking send/receive:Unbuffered blocking send() does not return until

matching receive() is encountered at receiving process Buffered blocking send() will return after the sender has

copied the data into the designated bufferBlocking receive() forces the receiving process to wait

(Prof. Ananth Grama, Purdue Univ. )

42

Unbuffered Blocking

(Prof. Ananth Grama, Purdue Univ. )

43

Buffered Blocking

(Prof. Ananth Grama, Purdue Univ. )

44

Summary

Concurrent execution can be at the instruction, statement, subprogram, or program level

Two fundamental programming style: shared variables and message passing

Programming languages must provide supports for specifying control and data flows

Recommended