Upload
owen-norman
View
232
Download
2
Tags:
Embed Size (px)
Citation preview
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 1
FIT5174 Distributed & Parallel Systems
Lecture 4
Shared Memory Parallel Programming
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 2
Acknowledgement
These slides are based on material by Jeremy Johnson with material from Pthreads Programming by Nichols, Buttlar, and Farrell and POSIX Threads Programming Tutorial (computing.llnl.gov/tutorials/pthreads) by Blaise Barney
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 3
Useful tutorial links• Many of you have been having difficulty with the C programming language,
especially with its declaration and use of memory variables, points, and the typing and typecasting of variables and pointers. The following web pages provide useful guides to these topics. They are available on the FIT5174 web pages.
• http://pw1.netcom.com/~tjensen/ptr/pointers .htm• http://denniskubes
.com/2012/08/16/the-5-minute-guide-to-c-pointers/• http://www.tutorialspoint.com/cprogramming/c_pointers .htm• http://www.openismus.com/documents/cplusplus/cpointers .shtml• http://stackoverflow
.com/questions/2733960/pointer-address-type-casting• http://bytebeats.com/2011/07/29/pointer-type-casting/
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 4
Introduction• Objective: To learn how to write parallel programs using threads (using the
Pthreads library) and to understand the execution model of threads vs. processes.
• Topics– Concurrent programming with UNIX Processes
– Introduction to shared memory parallel programming with Pthreads• Threads• fork/join• race conditions• Synchronization• performance issues - synchronization overhead, contention and granularity, load balance,
cache coherency and false sharing.
– Introduction parallel program design paradigms• Data parallelism (static scheduling)• Task parallelism with workers• Divide and conquer parallelism (fork/join)
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 5
Processes• Processes contain information about program resources and
program execution state– Process ID, process group ID, user ID, and group ID– Environment– Working directory– Program instructions– Registers– Stack– Heap– File descriptors– Signal actions– Shared libraries– Inter-process communication tools (such as message queues, pipes,
semaphores, or shared memory).
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 6
UNIX Process
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 7
Threads• An independent stream of instructions that can be scheduled
to run– Stack pointer– Registers (program counter)– Scheduling properties (such as policy or priority)– Set of pending and blocked signals– Thread specific data
• “lightweight process”– Cost of creating and managing threads much less than processes– Threads live within a process and share process resources such as
address space
• Pthreads – standard thread API (IEEE Std 1003.1)
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 8
Threads within a UNIX Process
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 9
Shared Memory Model
• All threads have access to the same global, shared memory
• All threads within a process share the same address space
• Threads also have their own private data
• Programmers are responsible for synchronizing access (protecting) globally shared data.
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 10
Simple Examplevoid do_one_thing(int *);
void do_another_thing(int *);
void do_wrap_up(int, int);
int r1 = 0, r2 = 0;
extern int
main(void)
{
do_one_thing(&r1);
do_another_thing(&r2);
do_wrap_up(r1, r2);
return 0;
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 11
do_another_thing() i j k--------------------------------------main()
main()--------do_one_thing() --------do_another_thing()---------
r1r2
SPPCGP0GP1…
PIDUIDGID
Open FilesLocksSockets…
Stack
Text
Data
Heap
Registers
Identity
Resources
Virtual Address Space
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 12
Simple Example (Processes)int shared_mem_id, *shared_mem_ptr;
int *r1p, *r2p;
extern int main(void)
{
pid_t child1_pid, child2_pid;
int status;
/* initialize shared memory segment */
if ((shared_mem_id = shmget(IPC_PRIVATE, 2*sizeof(int), 0660)) == -1)
perror("shmget"), exit(1);
if ((shared_mem_ptr = (int *)shmat(shared_mem_id, (void *)0, 0)) == (void *)-1
)
perror("shmat failed"), exit(1);
r1p = shared_mem_ptr;
r2p = (shared_mem_ptr + 1);
*r1p = 0;
*r2p = 0;
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 13
Simple Example (Processes) if ((child1_pid = fork()) == 0) {
/* first child */
do_one_thing(r1p);
return 0;
} else if (child1_pid == -1) {
perror("fork"), exit(1);
}
/* parent */
if ((child2_pid = fork()) == 0) {
/* second child */
do_another_thing(r2p);
return 0;
} else if (child2_pid == -1) {
perror("fork"), exit(1);
}
/* parent */
if ((waitpid(child1_pid, &status, 0) == -1))
perror("waitpid"), exit(1);
if ((waitpid(child2_pid, &status, 0) == -1))
perror("waitpid"), exit(1);
do_wrap_up(*r1p, *r2p);
return 0;
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 14
do_one_thing() i j k---------------------------main()
main()--------do_one_thing() --------do_another_thing()---------
SPPCGP0GP1…
PIDUIDGID
Open FilesLocksSockets
…
Stack
Text
Data
Heap
Registers
Identity
Resources
Virtual Address Space
do_another_thing() i j k---------------------------main()
main()--------do_one_thing() --------do_another_thing()---------
SPPCGP0GP1…
PIDUIDGID
Open FilesLocksSockets
…
Stack
Text
Data
Heap
Registers
Identity
Resources
Virtual Address Space
Shared Memory
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 15
Simple Example (PThreads)int r1 = 0, r2 = 0;
extern int
main(void)
{
pthread_t thread1, thread2;
if (pthread_create(&thread1,
NULL,
do_one_thing,
(void *) &r1) != 0)
perror("pthread_create"), exit(1);
if (pthread_create(&thread2,
NULL,
do_another_thing,
(void *) &r2) != 0)
perror("pthread_create"), exit(1);
if (pthread_join(thread1, NULL) != 0)
perror("pthread_join"),exit(1);
if (pthread_join(thread2, NULL) != 0)
perror("pthread_join"),exit(1);
do_wrap_up(r1, r2);
return 0;
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 16
do_another_thing() i j k--------------------------------------main()
main()--------do_one_thing() --------do_another_thing()-----------------r1r2
SPPCGP0GP1…
PIDUIDGID
Open FilesLocksSockets…
Stack
Text
Data
Heap
Registers
Identity
Resources
Virtual Address Space
do_another_thing() i j k--------------------------------------main()
Stack
SPPCGP0GP1…
Registers
Thread 1
Thread 2
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 17
Concurrency and Parallelism
Time
do_one_thing()do_another_thing() do_wrap_up()
do_one_thing() do_another_thing() do_wrap_up()
do_one_thing()
do_another_thing()
do_wrap_up()
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 18
Unix Fork• The fork() call
– Creates a child process that is identical to the parent process
– The child has its own PID
– The fork() call provides different return values to the parent [child’s PID] and the child [0]
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 19
--------fork()-----------------
PID = 7274
--------fork()-----------------
PID = 7274
--------fork()-----------------
PID = 7275
fork
Parent
Child
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 20
Thread Creation• pthread_create creates a new thread and makes
it executable– pthread_create (thread,attr,start_routine,arg)
• thread - unique identifier• attr – attribute• Start_routine – the routine the newly created
thread will execute• arg – a single argument passed to
start_routine
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 21
Thread Creation
• Once created, threads are peers, and may create other threads
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 22
Thread Join• "Joining" is one way to accomplish synchronization
between threads.
• The pthread_join() function blocks the calling thread until the specified threadid thread terminates.
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 23
Fork/Join Overhead• Compare the overhead of procedure call, process
fork/join, thread create/join
– Procedure call (no args)• 1.2 10-8 sec (.12 ns)
– Process• 0.0012 sec (1.2 ms)
– Thread• 0.000042 sec (42 s)
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 24
Race Conditions
• When two or more threads access the same resource at the same time
Tim
e
Thread 1 Thread 2 Balance
Withdraw $50 Withdraw $50Read Balance $125 Read Balance $125Set Balance $75 Set Balance $75
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 25
Bad Countint sum= 0;
void count(int *arg)
{
int i;
for (i=0;i<*arg;i++) {
sum++;
}
}
int main(int argc, char **argv)
{
int error,i;
int numcounters = NUMCOUNTERS;
int limit = LIMIT;
pthread_t tid[NUMCOUNTERS];
pthread_setconcurrency(numcounters);
for (i=0;i<numcounters;i++)
{
error = pthread_create(&tid[i],NULL,(void *(*)(void *))count,&limit);
}
for (i=0;i<numcounters;i++)
{
error = pthread_join(tid[i],NULL);
}
printf("Counters finished with count = %d\n",sum);
printf("Count should be %d X %d = %d\n",numcounters,limit,numcounters*limit);
return 0;
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 26
Mutex• Mutex variables are for protecting
shared data when multiple writes occur.
• A mutex variable acts like a "lock" protecting access to a shared data resource. Only one thread can own (lock) a mutex at any given time
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 27
Mutex Operations• pthread_mutex_lock (mutex)
– The pthread_mutex_lock() routine is used by a thread to acquire a lock on the specified mutex variable. If the mutex is already locked by another thread, this call will block the calling thread until the mutex is unlocked.
• Pthread_mutex_unlock (mutex) – will unlock a mutex if called by the owning thread.
Calling this routine is required after a thread has completed its use of protected data if other threads are to acquire the mutex for their work with the protected data.
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 28
Good Countint sum= 0;
pthread_mutex_t lock;
void count(int *arg)
{
int i;
for (i=0;i<*arg;i++)
{
pthread_mutex_lock(&lock);
sum++;
pthread_mutex_unlock(&lock);
}
}
int main(int argc, char **argv)
{
int error,i;
int numcounters = NUMCOUNTERS;
int limit = LIMIT;
pthread_t mytid, tid[MAXCOUNTERS];
pthread_setconcurrency(numcounters);
pthread_mutex_init(&lock,NULL);
for (i=1;i<=numcounters;i++)
{
error = pthread_create(&tid[i],NULL,(void *(*)(void *))count, &limit);
}
for (i=1;i<=numcounters;i++)
{
error = pthread_join(tid[i],NULL);
}
printf("Counters finished with count = %d\n",sum);
printf("Count should be %d X %d = %d\n",numcounters,limit,numcounters*limit);
return 0;
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 29
Better Countint sum= 0;
pthread_mutex_t lock;
void count(int *arg)
{
int i;
int localsum = 0;
for (i=0;i<*arg;i++)
{
localsum++;
}
pthread_mutex_lock(&lock);
sum = sum + localsum;
pthread_mutex_unlock(&lock);
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 30
Linked Listtypedef struct llist_node {
int index;
void *datap;
struct llist_node *nextp;
} llist_node_t;
typedef llist_node_t *llist_t;
int llist_insert_data (int index, void *datap, llist_t *llistp)
{
llist_node_t *cur, *prev, *new;
int found = FALSE;
for (cur=prev=*llistp; cur != NULL; prev=cur, cur=cur->nextp) {
if (cur->index == index) {
free(cur->datap);
cur->datap = datap;
found=TRUE;
break;
}
else if (cur->index > index){
break;
}
}
if (!found) {
new = (llist_node_t *)malloc(sizeof(llist_node_t));
new->index = index;
new->datap = datap;
new->nextp = cur;
if (cur==*llistp)
*llistp = new;
else
prev->nextp = new;
}
return 0;
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 31
Race Conditions for Linked Lists
• When two or more threads insert things can go awry
new 1 new 2
prev cur
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 32
Threadsafe Code• Refers to an application's ability to execute
multiple threads simultaneously without "clobbering" shared data or creating "race" conditions.
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 33
Threadsafe Linked Listtypedef struct llist {
llist_node_t *first;
pthread_mutex_t mutex;
} llist_t;
int llist_init (llist_t *llistp)
{
int rtn;
llistp->first = NULL;
if ((rtn = pthread_mutex_init(&(llistp->mutex), NULL)) !=0)
fprintf(stderr, "pthread_mutex_init error %d",rtn), exit(1);
return 0;
}
int llist_insert_data (int index, void *datap, llist_t *llistp)
{
llist_node_t *cur, *prev, *new;
int found = FALSE;
pthread_mutex_lock(&(llistp->mutex));
for (cur=prev=llistp->first; cur != NULL; prev=cur, cur=cur->nextp) {
… pthread_mutex_unlock(&(llistp-
>mutex));
return 0;
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 34
Access Patterns and Granularity
• Lock entire list (coarse grain) or lock individual nodes (fine grain)?
• Individual nodes allows more concurrency but incurs more overhead and is more difficult to program.
• Use readers/writers lock (allow multiple readers but exclusive writing)
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 35
Condition Variables• While mutexes implement synchronization by
controlling thread access to data, condition variables allow threads to synchronize based upon the actual value of data.
• Without condition variables, the programmer would need to have threads continually polling (possibly in a critical section), to check if the condition is met.
• A condition variable is a way to achieve the same goal without polling
• Always used with a mutex
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 36
Using Condition variablesThread A
• Do work up to the point where a certain condition must occur (such as "count" must reach a specified value)
• Lock associated mutex and check value of a global variable
• Call pthread_cond_wait() to perform a blocking wait for signal from Thread-B. Note that a call to pthread_cond_wait() automatically and atomically unlocks the associated mutex variable so that it can be used by Thread-B.
• When signalled, wake up. Mutex is automatically and atomically locked.
• Explicitly unlock mutex• Continue
Thread B
• Do work
• Lock associated mutex
• Change the value of the global variable that Thread-A is waiting upon.
• Check value of the global Thread-A wait variable. If it fulfills the desired condition, signal Thread-A.
• Unlock mutex.
• Continue
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 37
Condition Variable Examplevoid *watch_count(void *idp)
{
int i=0, save_state, save_type;
int *my_id = idp;
pthread_mutex_lock(&count_lock);
while (count < COUNT_THRES) {
pthread_cond_wait(&count_hit_threshold, &count_lock);
}
pthread_mutex_unlock(&count_lock);
return(NULL);
}
void *inc_count(void *idp)
{
int i=0, save_state, save_type;
int *my_id = idp;
for (i=0; i<TCOUNT; i++) {
pthread_mutex_lock(&count_lock);
count++;
if (count == COUNT_THRES) {
pthread_cond_signal(&count_hit_threshold);
}
pthread_mutex_unlock(&count_lock);
}
return(NULL);
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 38
Reader/Writer Locktypedef struct rdwr_var {
int readers_reading;
int writer_writing;
pthread_mutex_t mutex;
pthread_cond_t lock_free;
} pthread_rdwr_t;
typedef void * pthread_rdwrattr_t;
#define pthread_rdwrattr_default NULL;
int pthread_rdwr_init_np(pthread_rdwr_t *rdwrp, pthread_rdwrattr_t *attrp);
int pthread_rdwr_rlock_np(pthread_rdwr_t *rdwrp);
int pthread_rdwr_runlock_np(pthread_rdwr_t *rdwrp);
int pthread_rdwr_wlock_np(pthread_rdwr_t *rdwrp);
int pthread_rdwr_wunlock_np(pthread_rdwr_t *rdwrp);
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 39
Reader/Writer Lockint llist_insert_data (int index, void *datap, llist_t *llistp)
{
…
pthread_rdwr_wlock_np(&(llistp->rwlock));
…
pthread_rdwr_wunlock_np(&(llistp->rwlock));
return 0;
}
int llist_find_data(int index, void **datapp, llist_t *llistp)
{
…
pthread_rdwr_rlock_np(&(llistp->rwlock));
…
pthread_rdwr_runlock_np(&(llistp->rwlock));
return 0;
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 40
Reader/Writer Lock Init
int pthread_rdwr_init_np(pthread_rdwr_t *rdwrp, pthread_rdwrattr_t *attrp)
{
rdwrp->readers_reading = 0;
rdwrp->writer_writing = 0;
pthread_mutex_init(&(rdwrp->mutex), NULL);
pthread_cond_init(&(rdwrp->lock_free), NULL);
return 0;
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 41
Read Lock
int pthread_rdwr_rlock_np(pthread_rdwr_t *rdwrp){
pthread_mutex_lock(&(rdwrp->mutex));
while(rdwrp->writer_writing) {
pthread_cond_wait(&(rdwrp->lock_free), &(rdwrp->mutex));
}
rdwrp->readers_reading++;
pthread_mutex_unlock(&(rdwrp->mutex));
return 0;
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 42
Read Unlockint pthread_rdwr_runlock_np(pthread_rdwr_t *rdwrp)
{
pthread_mutex_lock(&(rdwrp->mutex));
if (rdwrp->readers_reading == 0) {
pthread_mutex_unlock(&(rdwrp->mutex));
return -1;
}
else {
rdwrp->readers_reading--;
if (rdwrp->readers_reading == 0) {
pthread_cond_signal(&(rdwrp->lock_free));
}
pthread_mutex_unlock(&(rdwrp->mutex));
return 0;
}
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 43
Write Lockint pthread_rdwr_wlock_np(pthread_rdwr_t *rdwrp)
{
pthread_mutex_lock(&(rdwrp->mutex));
while(rdwrp->writer_writing || rdwrp->readers_reading) {
pthread_cond_wait(&(rdwrp->lock_free), &(rdwrp->mutex));
}
rdwrp->writer_writing++;
pthread_mutex_unlock(&(rdwrp->mutex));
return 0;
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 44
Write Unlockint pthread_rdwr_wunlock_np(pthread_rdwr_t *rdwrp)
{
pthread_mutex_lock(&(rdwrp->mutex));
if (rdwrp->writer_writing == 0) {
pthread_mutex_unlock(&(rdwrp->mutex));
return -1;
}
else {
rdwrp->writer_writing = 0;
pthread_cond_broadcast(&(rdwrp->lock_free));
pthread_mutex_unlock(&(rdwrp->mutex));
return 0;
}
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 45
Parallel Programming• Task parallelism vs. data parallelism
• Fork/join parallelism (divide & conquer)
• Static scheduling
• Dynamic scheduling with workers
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 46
Sequential Summingint X[MAXSIZE];
int icount(int l,int u)
{
int i;
int y = 0;
for (i=l; i<=u;i++)
y = y + X[i];
return y;
}
int rcount(int l,int u)
{
int m;
int y1,y2;
if ( (u-l) == 0)
return X[l];
else
{
m = (l+u)/2;
y1 = rcount(l,m);
y2 = rcount(m+1,u);
return (y1 + y2);
}
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 47
Summing with a Parallel Loop
int sum= 0;
int numcounters;
int size;
pthread_mutex_t lock;
void count(int *id)
{
int i,lsum;
lsum = 0;
for (i=*id;i<size;i+=numcounters)
{
lsum = lsum + X[i];
}
pthread_mutex_lock(&lock);
sum = sum + lsum;
pthread_mutex_unlock(&lock);
}
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 4 - 2013 48
Summing with Workers
void get_task(int *start, int *stop)
{
pthread_mutex_lock(&task_lock);
*start = task_index;
if (*start + task_chunk > n)
*stop = n;
else
*stop = *start + task_chunk;
task_index = *stop;
pthread_mutex_unlock(&task_lock);
}
void worker()
{
int start,stop,i;
int y = 0;
get_task(&start,&stop);
for (i=start; i<stop;i++)
y = y + X[i];
pthread_mutex_lock(&sum_lock);
sum = sum + y;
pthread_mutex_unlock(&sum_lock);
}