43
Multi-core Programming Programming with Windows Threads

Multi-core Programming Programming with Windows Threads

Embed Size (px)

Citation preview

Page 1: Multi-core Programming Programming with Windows Threads

Multi-core Programming

Programming with Windows Threads

Page 2: Multi-core Programming Programming with Windows Threads

Basics of VTune™ Performance Analyzer

2

Topics

• Explore Win32 Threading API functions– Create threads – Wait for threads to terminate– Synchronize shared access between threads

Page 3: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 3

Windows* HANDLE type

• Each Windows object is referenced by HANDLE type variables– Pointer to kernel objects– Thread, process, file, event, mutex, semaphore,

etc.• Object creation functions return HANDLE• Object controlled through its handle

– Don’t manipulate objects directly

Page 4: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 4

Windows* Thread Creation

HANDLE CreateThread(

LPSECURITY_ATTRIBUTES ThreadAttributes,

DWORD StackSize,

LPTHREAD_START_ROUTINE StartAddress,

LPVOID Parameter,

DWORD CreationFlags,

LPDWORD ThreadId ); // Out

Page 5: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 5

LPTHREAD_START_ROUTINE

• CreateThread() expects pointer to global function– Returns DWORD– Calling convention WINAPI– Single LPVOID (void *) parameter

• Thread begins execution of function

DWORD WINAPI MyThreadStart(LPVOID p);

Page 6: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 6

Using Explicit Threads

• Identify portions of code to thread• Encapsulate code into function

– If code is already a function, a driver function may need to be written to coordinate work of multiple threads

• Add CreateThread call to assign thread(s) to execute function

Page 7: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 7

Destroying Threads

• Frees OS resources– Clean-up if done with thread before program

completes

Process exit does this for you

• BOOL CloseHandle(HANDLE hObject);

Page 8: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 8

Example: Thread Creation#include <stdio.h>#include <windows.h>

DWORD WINAPI helloFunc(LPVOID arg ) { printf(“Hello Thread\n”); return 0;

}

main() {HANDLE hThread =

CreateThread(NULL, 0, helloFunc, NULL, 0, NULL );

}

What Happens?

Page 9: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 9

Example Explained

• Main thread is process• When process goes, all threads go• Need some method of waiting for a thread to

finish

Page 10: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 10

#include <stdio.h>#include <windows.h>BOOL threadDone = FALSE ;

DWORD WINAPI helloFunc(LPVOID arg ) { printf(“Hello Thread\n”); threadDone = TRUE ;return 0;

}main() {

HANDLE hThread = CreateThread(NULL, 0, helloFunc, NULL, 0, NULL ); while (!threadDone); }

Waiting for Windows* Thread

Not a good idea!

// wasted cycles!

Page 11: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 11

• Wait for one object (thread)DWORD WaitForSingleObject(

HANDLE hHandle, DWORD dwMilliseconds );

• Calling thread waits (blocks) until – Time expires

• Return code used to indicate this

– Thread exits (handle is signaled)• Use INFINITE to wait until thread termination

• Does not use CPU cycles

Waiting for a Thread

Page 12: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 12

• Wait for up to 64 objects (threads)DWORD WaitForMultipleObjects(

DWORD nCount, CONST HANDLE *lpHandles, // arrayBOOL fWaitAll, // wait for one or allDWORD dwMilliseconds)

• Wait for all: fWaitAll==TRUE• Wait for any: fWaitAll==FALSE

– Return value is first array index found

Waiting for Many Threads

Page 13: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 13

Notes on WaitFor* Functions

• Handle as parameter• Used for different types of objects• Kernel objects have two states

– Signaled– Non-signaled

• Behavior is defined by object referred to by handle– Thread: signaled means terminated

Page 14: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 14

Example: Multiple Threads#include <stdio.h>#include <windows.h>const int numThreads = 4;

DWORD WINAPI helloFunc(LPVOID arg ) { printf(“Hello Thread\n”); return 0; }

main() { HANDLE hThread[numThreads]; for (int i = 0; i < numThreads; i++) hThread[i] = CreateThread(NULL, 0, helloFunc, NULL, 0, NULL );

WaitForMultipleObjects(numThreads, hThread,TRUE, INFINITE);

}

Page 15: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 15

• Modify the previous example code to print out– Appropriate “Hello Thread” message– Unique thread number

• Use for-loop variable of CreateThread loop

• Sample output:Hello from Thread #0Hello from Thread #1Hello from Thread #2Hello from Thread #3

Activity 1 - “HelloThreads”

Page 16: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 16

What’s wrong?

• What is printed for myNum?DWORD WINAPI threadFunc(LPVOID pArg) { int* p = (int*)pArg; int myNum = *p; printf( “Thread number %d\n”, myNum);}. . .// from main():for (int i = 0; i < numThreads; i++) { hThread[i] = CreateThread(NULL, 0, threadFunc, &i, 0, NULL);}

Page 17: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 17

Hello Threads TimelineTime main Thread 0 Thread 1

T0 i = 0 --- ----

T1 create(&i) --- ---

T2 i++ (i == 1) launch ---

T3 create(&i) p = pArg ---

T4 i++ (i == 2) myNum = *p

myNum = 2

launch

T5 wait print(2) p = pArg

T6 wait exit myNum = *p

myNum = 2

Page 18: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 18

Race Conditions

• Concurrent access of same variable by multiple threads– Read/Write conflict– Write/Write conflict

• Most common error in concurrent programs• May not be apparent at all times

Page 19: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 19

How to Avoid Data Races

• Scope variables to be local to threads– Variables declared within threaded functions– Allocate on thread’s stack– TLS (Thread Local Storage)

• Control shared access with critical regions– Mutual exclusion and synchronization– Lock, semaphore, event, critical section, mutex…

Page 20: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 20

Solution – “Local” StorageDWORD WINAPI threadFunc(LPVOID pArg) { int myNum = *((int*)pArg); printf( “Thread number %d\n”, myNum);}. . .

// from main():for (int i = 0; i < numThreads; i++) { tNum[i] = i; hThread[i] = CreateThread(NULL, 0, threadFunc, &tNum[i], 0, NULL);}

Page 21: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 21

Windows* Mutexes

• Kernel object reference by handle• “Signaled” when available• Operations:

– CreateMutex(…) // create new– WaitForSingleObject // wait & lock– ReleaseMutex(…) // unlock

• Available between processes

Page 22: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 22

Windows* Critical Section

• Lightweight, intra-process only mutex

• Most useful and most used

• New type CRITICAL_SECTION cs;

Create and destroy operations

InitializeCriticalSection(&cs)

DeleteCriticalSection(&cs);

Page 23: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 23

Windows* Critical Section

CRITICAL_SECTION cs ;• Attempt to enter protected code

EnterCriticalSection(&cs)

– Blocks if another thread is in critical section– Returns when no thread is in critical section

• Upon exit of critical sectionLeaveCriticalSection(&cs)

– Must be from obtaining thread

Page 24: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 24

Example: Critical Section#define NUMTHREADS 4CRITICAL_SECTION g_cs; // why does this have to be global?int g_sum = 0;

DWORD WINAPI threadFunc(LPVOID arg ) { int mySum = bigComputation(); EnterCriticalSection(&g_cs); g_sum += mySum; // threads access one at a

time LeaveCriticalSection(&g_cs); return 0;}main() { HANDLE hThread[NUMTHREADS]; InitializeCriticalSection(&g_cs); for (int i = 0; i < NUMTHREADS; i++) hThread[i] = CreateThread(NULL,0,threadFunc,NULL,0,NULL); WaitForMultipleObjects(NUMTHREADS, hThread, TRUE, INFINITE); DeleteCriticalSection(&g_cs);}

Page 25: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 25

Numerical Integration Example4.0

2.0

1.00.0

4.0

(1+x2)f(x) =

4.0

(1+x2) dx = 0

1

X

static long num_steps=100000; double step, pi;

void main(){ int i; double x, sum = 0.0;

step = 1.0/(double) num_steps; for (i=0; i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0 + x*x); } pi = step * sum; printf(“Pi = %f\n”,pi);}

Page 26: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 26

Activity 2 - Computing Pi

• Parallelize the numerical integration code using Windows* Threads

• How can the loop iterations be divided among the threads?

• What variables can be local?• What variables need to be

visible to all threads?

static long num_steps=100000; double step, pi;

void main(){ int i; double x, sum = 0.0;

step = 1.0/(double) num_steps; for (i=0; i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0 + x*x); } pi = step * sum; printf(“Pi = %f\n”,pi);}

Page 27: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 27

Windows* Events

• Used to signal other threads that some event has occurred– Data is available, message is ready

• Threads wait for signal with WaitFor* function

• Two kinds of events– Auto-reset – Manual-reset

Page 28: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 28

Types of Events•

• Event stays signaled until any one thread waits and is released– If no threads waiting, state stays

signaled– Once thread is released, state

reset to non-signaled

• Event remains signaled until reset by API call– Multiple threads can wait and be

released– Threads not originally waiting

may start wait and be released

Caution: Be careful when using WaitForMultipleObjects to wait for ALL events

Manual-resetAuto-reset

Page 29: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 29

Windows* Event CreationHANDLE CreateEvent( LPSECURITY_ATTRIBUTES lpEventAttributes, BOOL bManualReset, // TRUE => manual reset BOOL bInitialState, // TRUE => begin signaled LPCSTR lpName); // text name for object

• Set bManualReset to TRUE for manual-reset event; FALSE for auto-reset event

• Set bInitialState to TRUE for event to begin in signaled state; FALSE to begin unsignaled

Page 30: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 30

Event Set and Reset

Set an event to signaled stateBOOL SetEvent( HANDLE event );

Reset manual-reset eventBOOL ResetEvent( HANDLE event );

Pulse eventBOOL PulseEvent( HANDLE event );

Page 31: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 31

Example: Thread Search

• Created thread searches for item– Signals if item is found

• Main thread waits for signal and thread termination– Print message if item found– Print message upon thread termination

• Illustrates – Using generic HANDLE type– Not waiting for every object to signal

Page 32: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 32

Example: Events

DWORD WINAPI threadFunc(LPVOID arg) { BOOL bFound = bigFind() ;

if (bFound) { SetEvent(hObj[0]); // signal data was found bigFound() ; }

moreBigStuff() ; return 0;}

HANDLE hObj[2]; // 0 is event, 1 is thread

Page 33: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 33

Example: Main function. . . hObj[0] = CreateEvent(NULL, FALSE, FALSE, NULL); hObj[1] = CreateThread(NULL,0,threadFunc,NULL,0,NULL);/* Do some other work while thread executes search */ DWORD waitRet =

WaitForMultipleObjects(2, hObj, FALSE, INFINITE); switch(waitRet) { case WAIT_OBJECT_0: // event signaled printf("found it!\n"); WaitForSingleObject(hObj[1], INFINITE) ;

// fall thru case WAIT_OBJECT_0+1: // thread signaled printf("thread done\n"); break ; default: printf("wait error: ret %u\n", waitRet); break ; }

. . .

Page 34: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 34

Example: Main function. . .

hObj[0] = CreateEvent(NULL, FALSE, FALSE, NULL);hObj[1] = CreateThread(NULL,0,threadFunc,NULL,0,NULL);

/* Do some other work while thread executes search */

DWORD waitRet = WaitForMultipleObjects(2, hObj, FALSE, INFINITE);

switch(waitRet) { case WAIT_OBJECT_0: // event signaled printf("found it!\n"); WaitForSingleObject(hObj[1], INFINITE) ; // fall thru case WAIT_OBJECT_0+1: // thread signaled printf("thread done\n"); break ; default: printf("wait error: ret %u\n", waitRet); break ; }. . .

Page 35: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 35

Windows* Semaphores

• Synchronization object that keeps a count– Represents the number of available resources– Formalized by Edsger Dijkstra (1968)

• Two operations on semaphores– Wait [P(s)]: Thread waits until s > 0, then s = s-1– Post [V(s)]: s = s + 1

• Semaphore is in signaled state if count > 0

Page 36: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 36

Win32* Semaphore CreationHANDLE CreateSemaphore( LPSECURITY_ATTRIBUTES lpEventAttributes, LONG lSemInitial, // Initial count value LONG lSemMax, // Maximum value for

count LPCSTR lpSemName); // text name for object

• Value of lSemMax must be 1 or greater• Value of lSemInitial must be

– greater than or equal to zero, – less than or equal to lSemMax, and – cannot go outside of range

Page 37: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 37

Wait and Post Operations• Use WaitForSingleObject to wait on semaphore

– If count is == 0, thread waits– Decrement count by 1 when count > 0

• Increment semaphore (Post operation)BOOL ReleaseSemaphore( HANDLE hSemaphore, LONG cReleaseCount, LPLONG lpPreviousCount );

– Increase semaphore count by cReleaseCount– Returns the previous count through lpPreviousCount

Page 38: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 38

Semaphore Uses

• Control access to data structures of limited size– Queues, stacks, deques– Use count to enumerate available elements

• Control access to finite number of resourse– File descriptors, tape drives…

• Throttle number of active threads within a region

• Binary semaphore [0,1] can act as mutex

Page 39: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 39

Semaphore Cautions

• No ownership of semaphore• Any thread can release a semaphore, not just

the last thread that waits– Use good programming practice to avoid this

• No concept of abandoned semaphore– If thread terminates before post, semaphore

increment may be lost– Deadlock

Page 40: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 40

Example: Semaphore as Mutex

• Main thread opens input file, waits for thread termination

• Threads will– Read line from input file– Count all five-letter words in line

Page 41: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 41

Example: Main

main(){ HANDLE hThread[NUMTHREADS];

hSem1 = CreateSemaphore(NULL, 1, 1, NULL); // Binary semaphore hSem2 = CreateSemaphore(NULL, 1, 1, NULL); // Binary semaphore

fd = fopen(“InFile”, “r”); // Open file for read

for (int i = 0; i < NUMTHREADS; i++) hThread[i] = CreateThread(NULL,0,CountFives,NULL,0,NULL);

WaitForMultipleObjects(NUMTHREADS, hThread, TRUE, INFINITE); fclose(fd);

printf(“Number of five letter words is %d\n”, fiveLetterCount);}

HANDLE hSem1, hSem2;FILE *fd; int fiveLetterCount = 0;

Page 42: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 42

Example: SemaphoresDWORD WINAPI CountFives(LPVOID arg) { BOOL bDone = FALSE ; char inLine[132]; int lCount = 0;

while (!bDone) { WaitForSingleObject(hSem1, INFINITE); // access to input bDone = (GetNextLine(fd, inLine) == EOF); ReleaseSemaphore(hSem1, 1, NULL); if (!bDone) if (lCount = GetFiveLetterWordCount(inLine)) { WaitForSingleObject(hSem2, INFINITE); // update

global fiveLetterCount += lCount; ReleaseSemaphore(hsem2, 1, NULL); } }}

Page 43: Multi-core Programming Programming with Windows Threads

Programming with Windows Threads 43

Programming with Windows ThreadsWhat’s Been Covered

• Create threads to execute work encapsulated within functions

• Typical to wait for threads to terminate• Coordinate shared access between threads to

avoid race conditions– Local storage to avoid conflicts– Synchronization objects to organize use