View
282
Download
3
Category
Preview:
Citation preview
“Going Parallel with C++11” SUPERCOMPUTING 2012
Joe Hummel, PhD
UC-Irvinehummelj@ics.uci.edu
http://www.joehummel.net/downloads.html
2
New standard of C++ has been ratified◦ “C++0x” ==> “C++11”
Lots of new features Workshop will focus on concurrency
features
C++ 11
Going Parallel with C++11
3
Why are we here?
Async programming: Better responsiveness…
GUIs (desktop, web, mobile)
Cloud
Windows 8
Parallel programming:
Better performance…
Financials
Scientific
Big data
C C
C CC C
C C
Going Parallel with C++11
4
Hello World --- of threading
#include <thread>#include <iostream>
void func(){ std::cout << "**Inside thread " << std::this_thread::get_id() << "!" << std::endl;}
int main(){ std::thread t; t = std::thread( func );
t.join(); return 0;}
A simple function for thread to do…
Create and schedule thread…
Wait for thread to finish…
Going Parallel with C++11
5
Hello world…
Demo #1
Going Parallel with C++11
6
Avoiding early program termination…
#include <thread>#include <iostream>
void func(){ std::cout << "**Hello world...\n";}
int main(){ std::thread t; t = std::thread( func );
t.join(); return 0;}
(1) Thread function must do exception handling; unhandled exceptions ==> termination…
void func(){ try { // computation: } catch(...) { // do something: }}
(2) Must join, otherwise termination… (avoid use of detach( ), difficult to use safely)
Going Parallel with C++11
7
Old school:◦ distinct thread functions (what we just saw)
New school:◦ lambda expressions (aka anonymous functions)
Pick your style…
Going Parallel with C++11
8
A thread that loops until we tell it to stop…
Demo #2
When user presses ENTER, we’ll tell thread to stop…
Going Parallel with C++11
9
(1) via thread function
#include <thread>#include <iostream>#include <string>
using namespace std;
. . .
int main(){ bool stop(false); thread t(loopUntil, &stop);
getchar(); // wait for user to press enter:
stop = true; // stop thread: t.join(); return 0;}
void loopUntil(bool *stop){ auto duration = chrono::seconds(2);
while (!(*stop)) { cout << "**Inside thread...\n"; this_thread::sleep_for(duration); }}
Going Parallel with C++11
10
(2) via lambda expressionint main(){
bool stop(false); thread t = thread( [&]() { auto duration = chrono::seconds(2);
while (!stop) { cout << "**Inside thread...\n"; this_thread::sleep_for(duration); } } );
getchar(); // wait for user to press enter:
stop = true; // stop thread: t.join(); return 0;}
thread t( [&] () { auto duration = chrono::seconds(2);
while (!stop) { cout << "**Inside thread...\n"; this_thread::sleep_for(duration); } } );
lambda expression
Closure semantics: [ ]: none, [&]: by ref, [=]: by val, …
lambda arguments
11
Lambdas:◦ Easier and more readable -- code remains inline◦ Potentially more dangerous ([&] captures everything by
ref)
Functions:◦ More efficient -- lambdas involve class, function objects◦ Potentially safer -- requires explicit variable scoping◦ More cumbersome and illegible
Trade-offs
Going Parallel with C++11
12
Multiple threads looping…
Demo #3
When user presses ENTER, all threads
stop…
Going Parallel with C++11
13
Solution#include <thread>#include <iostream>#include <string>#include <vector>#include <algorithm>
using namespace std;
int main(){ cout << "** Main Starting **\n\n"; bool stop = false; . . .
getchar();
cout << "** Main Done **\n\n"; . . . return 0;}
vector<thread> workers;
for (int i = 1; i <= 3; i++){ workers.push_back( thread([i, &stop]() { while (!stop) { cout << "**Inside thread " << i << "...\n"; this_thread::sleep_for(chrono::seconds(i)); } }) );}
stop = true; // stop threads:
// wait for threads to complete:for ( thread& t : workers ) t.join();
Going Parallel with C++11
14
Matrix multiply…
A Real Example
Going Parallel with C++11
15
Multi-threaded solutionint rows = N / numthreads;int extra = N % numthreads;int start = 0; // each thread does [start..end)int end = rows;
vector<thread> workers;
for (int t = 1; t <= numthreads; t++){
if (t == numthreads) // last thread does extra rows:end += extra;
workers.push_back( thread([start, end, N, &C, &A, &B](){
for (int i = start; i < end; i++)for (int j = 0; j < N; j++){
C[i][j] = 0.0;for (int k = 0; k < N; k++)
C[i][j] += (A[i][k] * B[k][j]);}
}));
start = end;end = start + rows;
}
for (thread& t : workers) t.join();
// 1 thread per core:numthreads = thread::hardware_concurrency();
16
Parallelism alone is not enough…
High-Performance Computing
HPC == Parallelism + Memory Hierarchy ─ Contention
Expose parallelism
Maximize data locality:• network• disk• RAM• cache• core
Minimize interaction:• false sharing• locking• synchronization
Going Parallel with C++11
17
Cache-friendly matrix multiply
XGoing Parallel with C++11
18
Loop interchange is first step…
Cache-friendly solution
workers.push_back( thread([start, end, N, &C, &A, &B](){
for (int i = start; i < end; i++)for (int j = 0; j < N; j++)
C[i][j] = 0.0;
for (int i = start; i < end; i++)
for (int k = 0; k < N; k++)for (int j = 0; j < N; j++)
C[i][j] += (A[i][k] * B[k][j]);
}));
Next step is to block multiply…
Going Parallel with C++11
19
C++11 Features and Status
Going Parallel with C++11
20
No compiler as yet fully implements C++11
Visual C++ 2012 has best concurrency support
◦ Part of Visual Studio 2012
gcc 4.7 has best overall support◦ http://gcc.gnu.org/projects/cxx0x.html
clang 3.1 appears very good as well◦ I did not test
◦ http://clang.llvm.org/cxx_status.html
Compilers…
Going Parallel with C++11
21
Compiling with gcc
# makefile
# threading library: one of these should work# tlib=threadtlib=pthread
# gcc 4.6:ver=c++0x# gcc 4.7:# ver=c++11
build:g++ -std=$(ver) -Wall main.cpp -l$(tlib)
Going Parallel with C++11
22
Executive SummaryConcept Header Summary
Threads <thread> Standard, low-level, type-safe; good basis for building HL systems (futures, tasks, …)
Futures <future> Via async function; hides threading, better harvesting of return value & exception handling
Locking <mutex> Standard, low-level locking primitives
Condition Vars <condition_variable> Low-level synchronization primitives
Atomics <atomic> Predictable, concurrent access without data race
Memory Model “Catch Fire” semantics; if program contains a data race, behavior of memory is undefined
Thread Local Thread-local variables [ problematic => avoid ]
Going Parallel with C++11
23
Use mutex to protect against concurrent access…
Locking
thread t1([&]() { m.lock(); sum += compute();
m.unlock(); });
#include <mutex>
mutex m;int sum;
thread t2([&]() { m.lock(); sum += compute();
m.unlock(); });
Going Parallel with C++11
24
“Resource Acquisition Is Initialization”◦ Advocated by B. Stroustrup for resource management
◦ Uses constructor & destructor to properly manage resources (files, threads, locks, …) in presence of exceptions, etc.
RAII
thread t([&](){ m.lock();
sum += compute();
m.unlock();
});
thread t([&]() { lock_guard<mutex> lg(m);
sum += compute();
}
);
should be written as…
Locks m in constructor
Unlocks m in destructor
Going Parallel with C++11
25
Use atomic to protect shared variables…◦ Lighter-weight than locking, but much more limited in applicability
Atomics
thread t1([&]() { count++; });
#include <atomic>
atomic<int> count;count = 0;
thread t2([&]() { count++; });
thread t3([&]() { count = count + 1; });Xnot safe…Going Parallel with C+
+11
26
Atomics enable safe, lock-free programming◦ “Safe” is a relative word…
Lock-free programming
thread t1([&]() { x = 42; done = true;
});
thread t2([&]() { while (!done) ; assert(x==42);
});
int x;atomic<bool> done;done = false;
doneflag
thread t1([&]() { if (!initd) { lock_guard<mutex> _(m); x = 42; initd = true; } << consume x, … >> });
int x;atomic<bool> initd;initd = false;
thread t2([&]() { if (!initd) { lock_guard<mutex> _(m); x = 42; initd = true; } << consume x, … >> });
lazyinit
27
Demo Prime numbers…
Going Parallel with C++11
28
Futures provide a higher-level of abstraction◦ Starts an asynchronous operation on some thread, await result…
Futures
#include <future> . .
future<int> fut = async( []() -> int { int result = PerformLongRunningOperation(); return result; });..
try{ int x = fut.get(); // join, harvest result: cout << x << endl;}catch(exception &e){ cout << "**Exception: " << e.what() << endl;}
return type…
Going Parallel with C++11
29
May run on the current thread May run on a new thread Often better to let system decide…
Execution of futures
// run on current thread when someone asks for value (“lazy”):future<T> fut1 = async( launch::sync, []() -> ... );future<T> fut2 = async( launch::deferred, []() -> ... );
// run on a new thread:future<T> fut3 = async( launch::async, []() -> ... );
// let system decide:future<T> fut4 = async( launch::any, []() -> ... );future<T> fut5 = async( []() ... );
Going Parallel with C++11
30
Demo
NetflixMovieReview
s(.txt)
Netflix Data
Mining App
Computes average review for a movie…
Netflix data-mining…
Going Parallel with C++11
31
C++ committee thought long and hard on memory model semantics…
◦ “You Don’t Know Jack About Shared Variables or Memory Models”, Boehm and Adve, CACM, Feb 2012
Conclusion:◦ No suitable definition in presence of race conditions
Solution:◦ Predictable memory model *only* in data-race-free
codes◦ Computer may “catch fire” in presence of data races
Memory Model
Going Parallel with C++11
32
Example…
thread t1([&]() { x = 1;
r1 = y; });
t1.join();
int x, y, r1, r2;
x = y = r1 = r2 = 0;
thread t2([&]() { y = 1;
r2 = x; });
t2.join();
What can we say about r1
and r2?Going Parallel with C++11
33
Dekker’s example…
If we think in terms of all possible thread interleavings
(aka “sequential consistency”),
then we know r1 = 1, r2 = 1, or both
In C++ 11? Not only are the values of x, y, r1 and r2
undefined, but the program may crash!
Going Parallel with C++11
34
A program is data-race-free (DRF) if no sequentially-consistent execution results in a data race. Avoid anything else.
C++ 11 Memory Model
Def: two memory accesses conflict if they 1. access the same scalar object or contiguous sequence of bit fields, and2. at least one access is a store.
Def: two memory accesses participate in a data race if they 1. conflict, and2. can occur simultaneously.
via independent threads, locks, atomics, …
Going Parallel with C++11
35
Beyond Threads
Going Parallel with C++11
36
Tasks are a higher-level abstraction
◦ Idea: developers identify work run-time system deals with execution details
Tasks vs. Threads
Task: a unit of work; an object denoting an ongoing operation or computation.
Going Parallel with C++11
37
Microsoft PPL: Parallel Patterns Library
Example
#include <ppl.h>
for (int i = 0; i < N; i++)for (int j = 0; j < N; j++)
C[i][j] = 0.0;
// for (int i = 0; i < N; i++)Concurrency::parallel_for(0, N, [&](int i){for (int k = 0; k < N; k++)for (int j = 0; j < N; j++)C[i][j] += (A[i][k] * B[k][j]);
});Matrix Multiply
Going Parallel with C++11
38
Execution Model
C C
C CC C
C C
Windows Process
Thread Pool
workerthread
workerthread
workerthread
workerthread
parallel_for( ... );tasktasktasktask
global work queue
Parallel Patterns Library
Resource Manager
Task Scheduler
Windows
39
That’s it!
Going Parallel with C++11
40
Presenter: Joe Hummel◦ Email: hummelj@ics.uci.edu◦ Materials: http://www.joehummel.net/downloads.html
References:◦ Book: “C++ Concurrency in Action”, by Anthony Williams
◦ Talks: Bjarne and friends at MSFT’s “Going Native 2012” http://channel9.msdn.com/Events/GoingNative/GoingNative-2012
◦ Tutorials: really nice series by Bartosz Milewski http://bartoszmilewski.com/2011/08/29/c11-concurrency-tutorial/
◦ FAQ: Bjarne Stroustrup’s extensive FAQ http://www.stroustrup.com/C++11FAQ.html
Thank you for attending
Going Parallel with C++11
Recommended