Multithread programming 20151206_서진택

멀티쓰레드 프로그래밍 :01. 쓰레딩 기본

2014 년 10 월 8일서진택 , jintaeks@gmail.com

목차 index문장 statement 과 표현식 expression rvalue reference Move semantics쓰레드 thread

문장 statement 과 표현식 expression

문장 statement 과 표현식 expressionrvalue referenceMove semantics쓰레드 thread

문장 statement 과 표현식 expression Statement 를 구성하는 일부로서 , 값 value 을 가지면 expression 이라고 합니다 .

Statement 전체가 값 value 을 가진다면 , 그 statement 는 expression 입니다 .

expression 은 항상 statement 입니다 .– i = i + 4;– 위 문장에서 expression 은 다음과 같은 것들입니다 .– i– i+4– i=i+4

아래의 소스에서 expression 인 문장은 2 개입니다 .int GetSum( int iLeft_, int iRight_ ){ return iLeft_ + iRight_;}//GetSum()

void Test( const int& iData_ ){ std::cout << __FUNCTION__ << " const reference" << std::endl;}//Test()

int main(){ int i = 3; int j = 0; j = GetSum( 4, 5 ); // 문장이 값 9 를 가지므로 expression 입니다 . if( j == 9 ) { printf( "%i\r\n", j ); // 문장이 값 3 을 가지므로 expression 입니다 . }//if Test( i );}5

표현식의 lvalue/rvalueint main(){ int i = 3; int j = 0; i = i + 4; //(1) Test( i );

i 의 주소가 [5000] 번지라고 가정합니다 . 32bit 환경에서 j의 주소는 [4996] 번지가 될 것입니다 .같은 변수 i에 대해서 i가 등호 =의 왼쪽에 사용되면 [5000] 으로 해석하고 , i 가 등호의 오른쪽에 사용되면 [5000] 번지가 가리키는 값 , 즉 3 으로 해석한다는 것에 주목하세요 .즉 변수 i는 사용되는 환경에 따라 expression 에서 평가되는 값 value 이 달라집니다 .표현식은 값을 가지는데 , 표현식이 등호 =의 왼쪽에 올 때 가져야 하는 값을 lvalue, 등호의 오른쪽에 올 때 가져야 하는 값을 rvalue 라고 합니다 .명심하세요 . lvalue/rvalue 는 문장이나 변수에는 해당하지 않습니다 . lvalue/rvalue 는 표현식expression 이 가지는 값입니다 .(1)문장에서 등호의 왼쪽에 있는 표현식 i는 lvalue 를 가지며 , lvalue 의 값은 [5000] 입니다 . (1) 문장에서 등호의 오른쪽에 있는 표현식 i+4 는 rvalue 를 가지며 , rvalue 의 값은 7입니다 .i=i 라는 문장에서 등호의 왼쪽에 있는 표현식 i 는 lvalue 를 가지며 , 값은 [5000] 입니다 . 등호의 오른쪽에 있는 i 는 rvalue 를 가지며 값은 3 입니다 .

rvalue 를 함수의 인자로 전달void Test2( int i, int j ){ printf( "%p, %p\r\n", &i, &j );}//Test2()

int main(){ int i = 3; int j = 0; Test2( 5, i );}//main()

표현식의 rvalue 가 값을 인자로 받는 함수로 전달하는 경우 , 호출된 함수 내부에서는 rvalue 성질이 유지되지 않습니다 .main() 에서 Test2() 를 호출할 때 , rvalue 5 를 넘겼지만 , Test2() 함수 내부에서는 이 값을 이름name i 로 참조할 수 있기 때문입니다 .

lvalue referencevoid Test2( int i, int& j ){ printf( "%p, %p\r\n", &i, &j );}//Test2()

int main(){ int i = 3; int j = 0; Test2( i, i );}//main()

Test2() 가 두번째 인자를 참조 lvalue reference 로 받으면 , main() 의 Test2() 호출은 같은 표현식 i에 대해서 첫번째 i는 rvalue 3 을 전달하고 , 두번째 i 는 lvalue [5000] 을 전달합니다 .왜냐하면 Test2( …, int& j) 가 두번째 인자 parameter 의 lvalue reference 를 요구하고 있기 때문입니다 .Test2( i, 5 ); 처럼 호출하면 컴파일 시간 에러가 발생합니다 . 왜냐하면 표현식 5의 lvalue 를 구할 수 없기 때문입니다 .5+3 이나 i+5 같은 이름을 가지지 않는 표현식은 rvalue 만 가집니다 .

Test2( i, 5 ); // error

rvalue 도 어딘가에 할당됨void Test2( int i, const int& j ){ printf( "%p, %p\r\n", &i, &j );}//Test2()

int main(){ int i = 3; int j = 0; Test2( i, 5 );}//main()

하지만 const lvalue reference 를 받으면 , rvalue 를 가지는 표현식을 함수의 인자로 전달하는 것이 가능합니다이것은 컴파일러가 생성한 코드가 함수로 전달하는 표현식의 값을 , 사용자가 접근할 수 없는 어떤 메모리 공간에 저장하기 때문에 가능한 일입니다 .프로그래머는 정상적인 방법으로 [4992] 를 알아낼 방법이 없습니다 .j == 5, &j==[4984] 이지만 , *j는 에러입니다 .컴파일러가 이미 rvalue 의 주소값을 전달하고 있다면 프로그래머가 의도적으로 rvalue 의 주소값을 지속적으로 전달하도록 할 수 없을까요 ? 그것이 rvalue reference 입니다 .

rvalue reference, std::move

rvalue reference함수가 rvalue reference 를 받도록 인자를 선언할 수 있습니다 .

타입 이름과 변수 이름 사이에 && 를 사용합니다 .– int&& iData_

그러면 컴파일러는 인자에 해당하는 표현식이 rvalue 를 가지는 경우 , rvalue 의 주소를 전달하도록 코드를 생성합니다 .

rvalue referencevoid Test( int& iData_ ){ std::cout << __FUNCTION__ << " reference" << std::endl;}//Hello()

void Test( const int& iData_ ){ std::cout << __FUNCTION__ << " const reference" << std::endl;}//Test()

void Test( int&& iData_ ){ std::cout << __FUNCTION__ << " rvalue reference" << std::endl;}//Test()

int main(){ int i = 3; int j = 0; Test( 5 ); Test( j == 5 ); Test( j );

/** output: Test rvalue reference Test rvalue reference Test reference 계속하려면 아무 키나 누르십시오 . . . */

rvalue 인자를 다시 rvalue 인자로 전달하기void Hello( int& iData_ ){ std::cout << "Hello reference" << std::endl;}//Hello()

void Hello( int&& iData_ ){ std::cout << "Hello rvalue reference" << std::endl;}//Hello()

void Test( int&& iData_ ){ std::cout << __FUNCTION__ << " rvalue reference" << std::endl; Hello( iData_ );}//Test()

Test() 는 rvalue reference iData_ 를 받았지만 , Test() 내부에서 iData_ 는 이제 이름으로 참조할 수 있으므로 (스택에 할당되고 변수의 주소가 &iData_ 이므로 ), iData_ 라는 변수 표현식은 더 이상 rvalue 가 아닙니다 .그러므로 Test() 내부의 Hello( iData ); 호출은 lvalue reference 를 인자로 받는 Hello(int& iData_) 를 호출합니다 .

template<typename T>typename std::remove_reference<T>::type&& MyMove( T&& t_ ){ return static_cast<std::remove_reference<T>::type&&>(t_);}//MyMove()

void Test( int&& iData_ ){ std::cout << __FUNCTION__ << " rvalue reference" << std::endl; Hello( MyMove( iData_ ) );}//Test()

rvalue reference 를 받은 함수가 내부에서 호출하는 다른 함수에 인자의 rvalue reference 를 그대로 전달하는 방법은 컴파일러로 하여금 강제로 캐스팅 casting 하도록 하는 것입니다 .MyMove() 는 인자로 받은 rvalue reference 의 명시적인 rvalue reference 를 리턴합니다 .그러므로 오버로드된 두 개의 Hello() 중에서 Hello(int&&) 를 호출하는 것이 가능합니다 .

std::movevoid Hello( int& iData_ ){ std::cout << "Hello reference" << std::endl;}//Hello()

void Test( int&& iData_ ){ std::cout << __FUNCTION__ << " rvalue reference" << std::endl; Hello( std::move( iData_ ) );}//Test()

인자로 받은 rvalue reference 의 값을 그대로 유지하는 표준 구현이 std::move() 입니다 .이렇게 외워 두세요 .

인자로 받은 rvalue reference 를 , 다른 함수에 rvalue reference 로 전달하기 위해서는 std::move() 를 반드시 사용해야 합니다 .

Move Semantics

Move Semantics deep copy 가 필요한 클래스는 copy constructor 와

copy assignment operator 를 제공해야 합니다 . container 가 deep copy 가 필요한 객체들을 노드 node

로 유지할 때 , 노드의 삽입이 일어날 때 , 임시 객체temporary object 에 대한 copy constructor 와 copy assignment operator 가 호출됩니다 .

임시 객체는 이름이 없으므로 rvalue 입니다 . 클래스 생성자가 이러한 rvalue 에 대해서 특별하게 동작하

도록 만든다면 , 임시 객체를 위한 복사 동작을 향상 할 수 있습니다 .

이것을 이동 문맥 move semantics 이라고 합니다 . move semantics 는 move constructor 와 move

assignment operator 를 통해 구현합니다 .17

class MemoryBlockclass MemoryBlock{public:

// Simple constructor that initializes the resource. explicit MemoryBlock(size_t length);

// Destructor. ~MemoryBlock();

// Copy constructor. MemoryBlock(const MemoryBlock& other);

// Copy assignment operator. MemoryBlock& operator=(const MemoryBlock& other);

// Retrieves the length of the data resource. size_t Length() const;

private: size_t _length; // The length of the resource. int* _data; // The resource.};

MemoryBlock. 생성자와 파괴자// Simple constructor that initializes the resource. explicit MemoryBlock(size_t length) : _length(length) , _data(new int[length]) { std::cout << "In MemoryBlock(size_t). length = " << _length << "." << std::endl; } // Destructor. ~MemoryBlock() { std::cout << "In ~MemoryBlock(). length = " << _length << "."; if (_data != NULL) { std::cout << " Deleting resource."; // Delete the resource. delete[] _data; } std::cout << std::endl; }

정수를 MemoryBlock 에 할당할 때 , 생성자가 호출되는 것을 방지하기 위해 , explicit 이 필요합니다 .

MemoryBlock. 복사 생성자와 할당연산자 // Copy constructor. MemoryBlock(const MemoryBlock& other) : _length(other._length) , _data(new int[other._length]) { std::cout << "In MemoryBlock(const MemoryBlock&). length = " << other._length << ". Copying resource." << std::endl;

std::copy(other._data, other._data + _length, _data); }

// Copy assignment operator. MemoryBlock& operator=(const MemoryBlock& other) { std::cout << "In operator=(const MemoryBlock&). length = " << other._length << ". Copying resource." << std::endl;

if (this != &other) { // Free the existing resource. delete[] _data;

_length = other._length; _data = new int[_length]; std::copy(other._data, other._data + _length, _data); } return *this; }

MemoryBlock.Move constructor // Move constructor. MemoryBlock(MemoryBlock&& other) : _data(NULL) , _length(0) { std::cout << "In MemoryBlock(MemoryBlock&&). length = " << other._length << ". Moving resource." << std::endl;

//*this = std::move( other ); // Copy the data pointer and its length from the // source object. _data = other._data; _length = other._length;

// Release the data pointer from the source object so that // the destructor does not free the memory multiple times. other._data = NULL; other._length = 0; }

MemoryBlock.Move assignment operator // Move assignment operator. MemoryBlock& operator=(MemoryBlock&& other) { std::cout << "In operator=(MemoryBlock&&). length = " << other._length << "." << std::endl;

if (this != &other) { // Free the existing resource. if( _data != nullptr ) delete[] _data;

// Copy the data pointer and its length from the // source object. _data = other._data; _length = other._length;

// Release the data pointer from the source object so that // the destructor does not free the memory multiple times. other._data = NULL; other._length = 0; } return *this; }

MemoryBlock.Move constructor.cont // Move constructor. MemoryBlock(MemoryBlock&& other) : _data(NULL) , _length(0) { std::cout << "In MemoryBlock(MemoryBlock&&). length = " << other._length << ". Moving resource." << std::endl;

*this = std::move( other ); }

MemoryBlock.main()int main(){ // Create a vector object and add a few elements to it. std::vector<MemoryBlock> v; v.push_back(MemoryBlock(25)); v.push_back(MemoryBlock(75));

// Insert a new element into the second position of the vector. v.insert(v.begin() + 1, MemoryBlock(50));}

In Vs2010 or aboveIn MemoryBlock(size_t). length = 25.In MemoryBlock(MemoryBlock&&). length = 25. Moving resource.In ~MemoryBlock(). length = 0.In MemoryBlock(size_t). length = 75.In MemoryBlock(MemoryBlock&&). length = 25. Moving resource.In ~MemoryBlock(). length = 0.In MemoryBlock(MemoryBlock&&). length = 75. Moving resource.In ~MemoryBlock(). length = 0.In MemoryBlock(size_t). length = 50.In MemoryBlock(MemoryBlock&&). length = 50. Moving resource.In MemoryBlock(MemoryBlock&&). length = 50. Moving resource.In operator=(MemoryBlock&&). length = 75.In operator=(MemoryBlock&&). length = 50.In ~MemoryBlock(). length = 0.In ~MemoryBlock(). length = 0.In ~MemoryBlock(). length = 25. Deleting resource.In ~MemoryBlock(). length = 50. Deleting resource.In ~MemoryBlock(). length = 75. Deleting resource.

Before Vs2010In MemoryBlock(size_t). length = 25.In MemoryBlock(const MemoryBlock&). length = 25. Copying resource.In ~MemoryBlock(). length = 25. Deleting resource.In MemoryBlock(size_t). length = 75.In MemoryBlock(const MemoryBlock&). length = 25. Copying resource.In ~MemoryBlock(). length = 25. Deleting resource.In MemoryBlock(const MemoryBlock&). length = 75. Copying resource.In ~MemoryBlock(). length = 75. Deleting resource.In MemoryBlock(size_t). length = 50.In MemoryBlock(const MemoryBlock&). length = 50. Copying resource.In MemoryBlock(const MemoryBlock&). length = 50. Copying resource.In operator=(const MemoryBlock&). length = 75. Copying resource.In operator=(const MemoryBlock&). length = 50. Copying resource.In ~MemoryBlock(). length = 50. Deleting resource.In ~MemoryBlock(). length = 50. Deleting resource.In ~MemoryBlock(). length = 25. Deleting resource.In ~MemoryBlock(). length = 50. Deleting resource.In ~MemoryBlock(). length = 75. Deleting resource.

쓰레드 Thread

쓰레드 thread프로세스 내에서 실행되는 흐름의 단위입니다 .윈도우즈에서 실행되는 온라인게임의 경우 대부분 WinMain() 에서 하나의 쓰레드가 실행됩니다 .

필요에 따라 다른 쓰레드를 만들고 실행할 수 있습니다 .

프로세스가 2개 이상의 쓰레드를 실행하면 멀티쓰레드 Multithread 프로그램입니다 .

Critical Section Mutex Semaphore TLS(Thread Local Storage)

쓰레드 구현 Win32 구현이 CreateThread() 입니다 . Microsoft 구현이 _beginthreadex() 입니다 .표준 라이브러리 구현이 std::thread 입니다 . boost 구현이 boost::thread 입니다 .

임계영역 Critical Section

두 개의 쓰레드가 같은 루틴을 실행할 수 있습니다 . 그 루틴이 동시에 실행되어서는 안 되는 코드블록이면 그것을 Critical

Section 이라고 합니다 . Critical Section 의 진입과 탈출을 제어하는 객체를 Mutex 라고 합니다

Mutex Mutual Exclusion 의 약자입니다 .일반적으로 운영체제가 제공하는 동기화 객체입니다 . lock() 과 unlock() 을 제공하며 , lock() 과 unlock() 사이의 코드 블록이 Critical Section 입니다 .

lock() 을 시도한 쓰레드가 unlock() 해야 합니다 .

Mutex Mutex 의 Win32 구현이 CRITICAL_SECTION 입니다 .

– EnterCriticalSection()– LeaveCriticalSection()– InitializeCriticalSection()– DeleteCriticalSection()

Mutex 의 표준 구현이 std::mutex 입니다 .– std::mutex 의 RAII 헬퍼가 std::lock_guard 입니다 .

Mutex 의 boost 구현이 boost::mutex 입니다 .

세마포 Semaphore

B 쓰레드가 A쓰레드의 작업 완료를 기다려야 하는 상황이 있습니다 . 이렇게 여러 쓰레드 사이의 동기화를 제공하는 객체를 Semaphore 라고 합

니다 . 일반적으로 Signal() 과 Wait() 인터페이스를 제공합니다 .

세마포세마포의 Win32 구현이 Event 입니다 .

– CreateEvent()– CloseEvent()– SetEvent()– ResetEvent()– WiatForSingleObject()

세마포의 표준 라이브러리 구현이 std::condition_variable 입니다 .

데드락 Deadlock– 어떤 쓰레드도 Critical Section 에 진입하지 못하는 상황입니다 .– Crash 처럼 프로그램이 종료하지 않습니다 .– 하지만 아무것도 할 수 없습니다 .

레이스 조건 Race Condition– 쓰레드가 Critical Section 에 서로 진입하려는 상황입니다 .

굶어죽음 Starvation– Race 상황에서 특정 쓰레드만 Critical Section 에 진입하지 못하는

상황입니다 .

쓰레드 모델각 쓰레드가 자신이 맡은 고유의 작업을 수행합니다.– 대부분의 게임 엔진이 이 모델을 사용합니다 .– 쓰레드들은 메시지 큐를 통하여 통신합니다 .

임의의 n개 쓰레드는 다른 작업 Task 를 수행합니다.– Task Parallel 하다고 합니다 .– 이러한 모델은 스케일러블 Scalable 합니다 .– 즉 Core 의 개수가 늘어나면 쓰레드의 개수를 늘리면 됩니다 .

임의의 n개 쓰레드는 데이터 영역이 다른 같은 작업을 수행합니다 .– Data Parallel 하다고 합니다 .– 이러한 모델 역시 Scalable 합니다 .

최근의 쓰레드 라이브러리들은 Data Parallelism 과 Task Parallelism 모두 지원합니다 .

TLS,Thread Local Storage

어떤 함수가 같은 변수를 접근하는 것 처럼 보이지만 , 쓰레드마다 유일한 자신의 변수를 접근하도록 변수를 선언할 수 있습니다 .

전역변수처럼 선언하지만 , 쓰레드마다 구별되는 변수입니다 . 이러한 변수를 TLS 라고 합니다 . TLS 의 Win32 구현이 TlsAlloc()류의 함수들입니다 . TLS 의 Microsoft 구현이 __declspec( thread ) 입니다 . TLS 의 boost 구현이 boost::thread_specific_ptr<>입니다 .

참고 자료 Threading

– http://en.wikibooks.org/wiki/C%2B%2B_Programming/Threading

Thread Support Library– http://en.cppreference.com/w/cpp/thread

Mutex 와 Semaphore 의 차이– http://stackoverflow.com/questions/62814/difference-between-

binary-semaphore-and-mutex

std::condition_variable– http://en.cppreference.com/w/cpp/thread/condition_variable

멀티쓰레드 프로그래밍 :02. 쓰레드 라이브러리

2014 년 10 월 20일jintaeks@gmail.com

목차 index std::thread std::mutex std::unique_lock std::condition_variable TLS atomic operations memory barriers

std::thread

#include <thread>#include <iostream>

void my_thread_func(){ std::cout<<"hello"<<std::endl;}

int main(){ std::thread t(my_thread_func); t.join();}

std::thread 는 RAII 형식으로만 thread callback 을 실행할 수 있습니다 .join() 은 쓰레드 t가 종료하기를 기다립니다 .

class bar {public: void foo() { std::cout << "hello from member function" << std::endl; }};

int main(){ bar b; std::thread t(&bar::foo, &b); t.join();}

객체의 멤버 함수를 thread callback 으로 전달 할 수 있습니다 .

thread.function object 사용하기#include <thread>#include <iostream>

class SayHello{public: void operator()() const { std::cout<<"hello"<<std::endl; }};

int main(){ std::thread t(SayHello()); t.join();}

thread.std::bind 로 함수 객체 만들기#include <thread>#include <iostream>#include <string>#include <functional>

void greeting(std::string const& message){ std::cout<<message<<std::endl;}

int main(){ std::thread t(std::bind(greeting,"hi!")); t.join();}

std::bind 를 이용하여 함수 객체를 리턴하도록 합니다 .

thread. 쓰레드 함수로 인자 전달하기#include <thread>#include <iostream>

void write_sum(int x,int y){ std::cout<<x<<" + "<<y<<" = "<<(x+y)<<std::endl;}

int main(){ std::thread t(write_sum,123,456); t.join();}

std::thread 의 생성자는 가변 인자를 받도록 설계되어 있습니다 . 생성자 구현이 첫번째 파라미터를 쓰레드 함수로 인식하고 , 나머지 값들을 쓰레드 함수의 인자로 가집니다 .

#include <thread>#include <iostream>

class SayHello{public: void greeting(std::string const& message) const { std::cout<<message<<std::endl; }};

int main(){ SayHello x; std::thread t(&SayHello::greeting,&x,"goodbye"); t.join();}

thread.shared_ptr 사용하기#include <>

int main(){ std::shared_ptr<SayHello> p(new SayHello); std::thread t(&SayHello::greeting,p,"goodbye"); t.join();}

스마트 포인터를 전달하는 것 가능합니다 . 쓰레드 객체 t가 살아 있을 동안 , p 의 lifetime 또한 유지됩니다 .

thread.reference 전달하기#include <thread>#include <iostream>#include <functional> // for std::ref

class PrintThis{public: void operator()() const { std::cout<<"this="<<this<<std::endl; }};

int main(){ PrintThis x; x(); std::thread t(std::ref(x)); t.join(); std::thread t2(x); t2.join();}

this=0x7fffb08bf7efthis=0x7fffb08bf7efthis=0x42674098

variadic templateint func() {} // termination version template<typename Arg1, typename... Args>int func(const Arg1& arg1, const Args&... args){ process( arg1 ); func(args...); // note: arg1 does not appear here!}

variadic template.specializationtemplate<typename T>class Template{public: void SampleFunction(T param){

template<>class Template<int>{public: void SampleFunction(int param){

template<typename... Arguments>class VariadicTemplate{public: void SampleFunction(Arguments... params){

template<>class VariadicTemplate<double, int, long>{public: void SampleFunction(double param1, int param2, long param3){

thread.constructorthread();(1)(since C++11)thread( thread&& other ); (2)(since C++11)template< class Function, class... Args >

explicit thread( Function&& f, Args&&... args ); (3)(since C++11)thread(const thread&) = delete ;(4)(since C++11)Constructs new thread object.

표준 constructor 는 variadic template 으로 선언되어 있지만 , Vs2012 의 실제 구현은 BOOST_PP 와 비슷한 구현의 가변 매크로를 사용하여 구현되어 있습니다 .

std::mutex

thread.mutexstd::mutex m;std::string s;void append_with_lock_guard(std::string const& extra){ std::lock_guard<std::mutex> lk(m); s+=extra;}void append_with_manual_lock(std::string const& extra){ m.lock(); try { s+=extra; m.unlock(); } catch(...) { m.unlock(); throw; }}55

std::lock_guard 를 이용하여 RAII 형식으로 예외에도 안전하게 동작하도록 합니다 .

std::unique_lock

std::unique_lockstd::mutex mtx; // mutex for critical section

void print_block (int n, char c) { // critical section (exclusive access to std::cout signaled by lifetime of lck): std::unique_lock<std::mutex> lck (mtx); for (int i=0; i<n; ++i) { std::cout << c; } std::cout << '\n';}

int main (){ std::thread th1 (print_block,50,'*'); std::thread th2 (print_block,50,'$');

th1.join(); th2.join();

return 0;}

lock_guard 와 같은 의도로 사용할 수 있습니다 .출력결과 :

**************************************************

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

std::unique_lock<std::mutex> acquire_lock(){ static std::mutex m; return std::unique_lock<std::mutex>(m);}

//std::mutex mtx; // mutex for critical section

void print_block (int n, char c) { // critical section (exclusive access to std::cout signaled by lifetime of lck): //std::lock_guard<std::mutex> lck (mtx); std::unique_lock<std::mutex> lck = acquire_lock(); for (int i=0; i<n; ++i) { std::cout << c; std::this_thread::sleep_for( std::chrono::milliseconds( 10 ) ); } std::cout << '\n';}

move semantics 에 대해 안전하게 동작합니다 . acquire_lock() 이 unique_lock 을 사용하지 않고 lock_guard 를 사용하면 컴파일 타임 에러가 발생합니다 .

rvalue 객체 생성 막기class KPreventRValueObject{private: KPreventRValueObject( KPreventRValueObject&& rvalueref_ );public: KPreventRValueObject(){}

private: std::string m_strData;};//class KPreventRValueObject

KPreventRValueObject TestRValueObject(){ return KPreventRValueObject(); // compile time error}//TestRValueObject()

move semantic 구현하기class data_to_protect{public: void some_operation(){} void other_operation(){}};//class data_to_protect

class data_handle{private: data_to_protect* ptr; std::unique_lock<std::mutex> lk;

friend data_handle lock_data();

data_handle(data_to_protect* ptr_, std::unique_lock<std::mutex>&& lk_) : ptr(ptr_) , lk( std::move( lk_ ) ) {}

public: data_handle(data_handle&& other) : ptr(nullptr) { *this = std::move( other ); } data_handle& operator=(data_handle&& other) { if( &other != this ) { ptr = other.ptr; lk = std::move(other.lk); other.ptr = 0; }//if return *this; } void do_op() { ptr->some_operation(); } void do_other_op() { ptr->other_operation(); }};//class data_handle

data_handle lock_data(){ static std::mutex m; static data_to_protect the_data; std::unique_lock<std::mutex> lk(m); return data_handle(&the_data, std::move(lk) );}//lock_data()

int main(){ data_handle dh = lock_data(); // lock acquired dh.do_op(); // lock still held dh.do_other_op(); // lock still held data_handle dh2 = std::move(dh); // transfer lock to other handle dh2.do_op(); // lock still held return 0;}//main()

std::unique_lock : 잠시 unlock 하기std::mutex m;std::vector<std::string> strings_to_process;

void update_strings(){ std::unique_lock<std::mutex> lk(m); if(strings_to_process.empty()) { lk.unlock(); std::vector<std::string> local_strings=load_strings(); lk.lock(); strings_to_process.insert(strings_to_process.end(), local_strings.begin(),local_strings.end()); }}

unique_lock 은 RAII 패턴을 사용하면서도 , 원하는 때에 lock/unlock 이 가능합니다 .

deadlock 상황의 예class account{ std::mutex m; currency_value balance;public:

friend void transfer(account& from,account& to, currency_value amount) { std::lock_guard<std::mutex> lock_from(from.m); std::lock_guard<std::mutex> lock_to(to.m); from.balance -= amount; to.balance += amount; }};

두개의 쓰레드가 accout.transfer( A, B, … ), account.transfer( B, A, … ) 형태로 호출하면 deaklock이 발생할 수 있습니다 .

struct Box { explicit Box(int num) : num_things{num} {} int num_things; std::mutex m;}; void transfer(Box &from, Box &to, int num){ // don't actually take the locks yet std::unique_lock<std::mutex> lock1(from.m, std::defer_lock); std::unique_lock<std::mutex> lock2(to.m, std::defer_lock); // lock both unique_locks without deadlock std::lock(lock1, lock2); from.num_things -= num; to.num_things += num; // 'from.m' and 'to.m' mutexes unlocked in 'unique_lock' dtors}

unique_lock 의 lock 시점을 컨트롤하는 기능을 사용하면 std::lock() 과 사용하여 dead lock 을 예방하는 코드를 작성할 수 있습니다 .

std::condition_variable

std::mutex mtx;std::condition_variable cv;bool ready = false;

void print_id (int id) { std::unique_lock<std::mutex> lck(mtx); while (!ready) cv.wait(lck); // ... std::cout << "thread " << id << '\n';}

void go() { std::unique_lock<std::mutex> lck(mtx); ready = true; cv.notify_all();}

int main (){ std::thread threads[10]; // spawn 10 threads: for (int i=0; i<10; ++i) threads[i] = std::thread(print_id,i);

std::cout << "10 threads ready to race...\n"; go(); // go!

for (auto& th : threads) th.join();

return 0;}

std::mutex mtx;std::condition_variable cv;bool ready = false;

void print_id (int id) { std::unique_lock<std::mutex> lck(mtx); //while (!ready) cv.wait(lck); cv.wait( lck, []{ return ready;} ); // ... std::cout << "thread " << id << '\n';}

void go() { std::unique_lock<std::mutex> lck(mtx); ready = true; cv.notify_all();}

int main (){ std::thread threads[10]; // spawn 10 threads: for (int i=0; i<10; ++i) threads[i] = std::thread(print_id,i);

std::cout << "10 threads ready to race...\n"; go(); // go!

for (auto& th : threads) th.join();

return 0;}

condition_variable.examplestd::mutex g_mutex;std::condition_variable g_conditioVariable;

bool g_bReady = false;bool g_bIsRunThread = true;

void consume (int n){ int iCounter = 0; while( g_bIsRunThread == true ) { { std::unique_lock<std::mutex> ulock( g_mutex ); g_conditioVariable.wait( ulock, []{ return g_bReady;} ); }//block

std::cout << iCounter << std::endl; std::this_thread::sleep_for( std::chrono::milliseconds(500) );

iCounter += 1; }//while}//consume()

int _tmain(){ int ich = 0; std::thread consumerThread(consume,10);

std::cout << "Thread started" << std::endl;

while( g_bIsRunThread == true ) { ich = _getch(); if( ich == 'p' ) // pause { std::unique_lock<std::mutex> ulock( g_mutex ); g_bReady = false; } else if( ich == 'c' ) // continue { std::unique_lock<std::mutex> ulock( g_mutex ); g_bReady = true; g_conditioVariable.notify_one(); }

else if( ich == 'e' ) // exit { g_bIsRunThread = false;

std::unique_lock<std::mutex> ulock( g_mutex );

g_bReady = true; g_conditioVariable.notify_one(); }//if.. else if.. }//while

consumerThread.join();

return 0;}//main()

Tls 는 하나의 코드 루틴이 서로 다른 쓰레드에서 호출될 때 , 접근하는 전역 메모리를 다르게 설정하는 것이 가능합니다 .

전역 변수를 각 쓰레드별로 나누어서 해당 쓰레드에서 접근하도록 하는 방법과의 차이점은 다음과 같습니다 .– 쓰레드를 위한 전역 변수를 직접 관리하면 각 쓰레드에 dependent 한 코드가 쓰레

드 루틴에 추가되어야 합니다 .– Tls 를 사용하면 코드는 쓰레드와 상관없이 하나의 코드를 사용합니다 .

Boost 는 각 플랫폼에 대한 Tls 구현을 숨긴 ''boost::thread_specifix_ptr<>'' 을 제공합니다 .– 각 쓰레드에서 필요한 메모리인 경우 boost::thread_specifix_ptr<> 타입으로 변수

를 선언합니다 .– 아래 코드에서 문제가 된 부분은 FindNearestSplinePoint() 함수가 단일 쓰레드에

서만 사용하다가 멀티 쓰레드에서 사용하게 되면서 이 함수가 내부에서 사용할 용도로 선언된 static 변수의 read/wirte 동작 때문에 크래시가 발생한 경우 였습니다.

– 그래서 이 함수가 사용하는 static 변수를 thread safe 하게 만들어 주어야 했습니다 .

/// TLS(Thread Local Storage) 를 사용하여 각 thread 가 자신의 local memory 를 사용하도록 수정하다 .

/// KpSplineUtil namespace 의 함수들은 thread safe 해야 한다 ./// native type 이 아니므로 ''boost::thread_specific_ptr'' 를 이용한다 ./// - jintaeks on 2013-03-20, 13:35 */static boost::thread_specific_ptr<KpIntervalHeap<KInfo>> s_kIntervalHeap;

bool KpSplineUtil::FindNearestSplinePoint( IN OUT KpSplinePosition& kInOutArgument_ , const KpSpline& kInSpline_ , const KpVector3& vInPoint_ , KpReal rInError_ ){ unsigned uNumSegs = kInSpline_.GetNumSegments(); ASSERT( uNumSegs > 0 ); if( uNumSegs == 0 ) { kInOutArgument_.Invalidate(); return false; }//if

KpIntervalInfo<KInfo> kInfo; KpVector3 vMin, vMax;

/// 각 thread 에서 처음으로 호출될 때 , 이 값은 NULL 이다 . /// 그 때 thread 가 사용할 메모리를 할당한다 . /// 프로그램 종료할 때 , 메모리 delete 를 시켜주어야 한다 . /// - jintaeks on 2013-03-20, 13:37 if( s_kIntervalHeap.get() == NULL ) { s_kIntervalHeap.reset( new KpIntervalHeap<KInfo>() ); }//if

s_kIntervalHeap->MakeEmpty();

if( kInOutArgument_.IsValid() ) { ASSERT( kInOutArgument_.m_iIndex < ( int ) uNumSegs );

native 타입인 경우는 Visual Studio 2005 부터 지원하는 확장 키워드 __declspec( thread ) 를 사용하여 변수를 선언하면 됩니다 .

코드 상으로 각 쓰레드가 같은 변수를 접근하는 것 같지만 , 각 쓰레드별로 다른 메모리 위치를 접근하게 됩니다 .

/// TLS(Thread Local Storage) 를 사용하여 각 thread 가 자신의 local memory 를 사용하도록 수정하다 .

/// KpSplineUtil namespace 의 함수들은 thread safe 해야 한다 ./// native type 인 경우는 Visual Studio 의 확장 기능인 __declspec( thread ) 를 명시하기만 하면

된다 ./// - jintaeks on 2013-03-20, 13:35

__declspec( thread ) static KpReal s_rBITolerance2;__declspec( thread ) static KpReal s_rBIWidth;__declspec( thread ) static KpReal s_rBIHeight;__declspec( thread ) static KpReal s_rBIntersectionT;__declspec( thread ) static KpReal s_rBMinDist2;

static bool _FindBezierRectangleIntersection( const KpBezierControl& kInBezier_ , KpReal rInTBegin_ , KpReal rInTEnd_ ){ { KpVector3 vMin, vMax; _CalcBezierControlAABB( vMin, vMax, kInBezier_ ); if( vMin.X() > s_rBIWidth || vMax.X() < KpReal( 0.0 )

참고문헌 thread tutorial

– http://www.justsoftwaresolutions.co.uk/threading/multithreading-in-c++0x-part-1-starting-threads.html

std::unique_lock– http://www.cplusplus.com/reference/mutex/unique_lock/?

kw=unique_lock– http://en.cppreference.com/w/cpp/thread/unique_lock– http://stackoverflow.com/questions/13099660/c11-why-does-

stdcondition-variable-use-stdunique-lock

std::condition_variable– http://www.cplusplus.com/reference/condition_variable/conditi

on_variable/?kw=condition_variable– http://stackoverflow.com/questions/13099660/c11-why-does-

stdcondition-variable-use-stdunique-lock

멀티쓰레드 프로그래밍 :03. 쓰레드 라이브러리 2

2014년 11월 3 일jintaeks@gmail.com

목차 index 1회

– 문장 statement 과 표현식 expression– rvalue reference– Move semantics– 쓰레드 thread

2회– std::thread– std::mutex– std::unique_lock– std::condition_variable– TLS

3회– atomic– lock-free– memory ordering– std::atomic<>

Atomic연산의 중간 과정을 결과로 얻을 수 없다면 atomic하다고 합니다 .

중간 과정을 얻는 것을 data race 라고 하며 , data race 의 결과 torn read/torn write 가 발생합니다.

non-atomic 이 발생하는 이유는 연산이 여러 개의 Cpu명령으로 분리되기 때문입니다 (하나의 Cpu명령 자체가 atomic 하지 않는 경우도 있습니다 ).

Atomic simple type 에 대한 aligned 된 read/write 는 atomic 합니다 .

Win32 의 _InterlockedIncrement(), C++11 의 std::atomic<int>::fetch_add() 는 atomic RMW 의 예입니다 .

std::atomic<> 은 lock-free 를 보장하지 않습니다 . std::atomic<>::is_lock_free() 로 검사해야 합니다.

RMW 의 가장 흔한 예는 Compare-And-Swap(CAS) 입니다 .

_InterlockedCompareExchange() 는 Win32 의 CAS 구현 intrinsic 함수입니다 .– intrinsic function 의 의미는 라이브러리가 제공하는 함수가 아니라

컴파일러가 제공하는 함수라는 의미입니다 .

Lock-free Programming

일반적으로 mutex 를 쓰지 않는 프로그래밍 기법이라고 알려져 있습니다 .

mutex 의 사용 여부와 상관없이 하나의 쓰레드가 다른 쓰레드를 영구히 block 시킬 수 없다면 lock-free 하다고 합니다 .

예 ) lock-free queue 의 push 구현void LockFreeQueue::push(Node* newHead)

{ for (;;){ // Copy a shared variable (m_Head) to a local.

Node* oldHead = m_Head;

// Do some speculative work, not yet visible to other threads.newHead->next = oldHead;

// Next, attempt to publish our changes to the shared variable. // If the shared variable hasn't changed, the CAS succeeds and we return. // Otherwise, repeat. if (_InterlockedCompareExchange(&m_Head, newHead, oldHead) == oldHead) return;}

_InterlockedCompareExchange() 의 리턴값을 비교하는 짧은 순간에 , 다른 쓰레드가 A값을 B로 바꾼 다음 다시 A로 변경되었을 가능성이 있습니다 .

CAS 는 ABA 문제 (ABA problem) 가 발생하지 않도록 조심스럽게 코딩해야 합니다 .

Memory Ordering

Memory Ordering atomic 한 일련의 연산이 보장된다고 , multi-core 에서 동작하는 multi-threaded 프로그램의 data race가 보장되지는 않습니다 .

왜냐하면 compiler 에 의해 , 실행 시간 Cpu 에 의해 연산의 순서가 바뀔 수 있기 때문입니다 .

프로세스는 여러가지 이유로 메모리 연산의 순서order 를 바꿀 수 있습니다 .

volatile bool Ready = false;int Value = 0;

// Thread Awhile(!Ready) {}printf("%d", Value);

// Thread BValue = 1;Ready = true;

예상되는 Value 의 결과는 1입니다 .하지만 , Ready = true; 가 Value = 1; 보다 먼저 실행된다면 ?

std::atomic<> 에는 6개의 memory ordering 옵션이 있습니다 .

하지만 , 3 가지 memory ordering 모델중 한가지를 나타냅니다 .– squentially-consistent ordering (memory_order_seq_cst)– acquire-release ordering (memory_order_consume,

memory_order_acquire, memory_order_release, and memory_order_acq_rel)

– relaxed ordering (memory_order_relaxed).

Memory Barrier pending 된 메모리 연산을 완료하도록 하는 일련의 명령들을 말합니다 .

acquire, release, fence 의 세종류가 있습니다 .

Acquire semantics

operation 1 operation 2<-operation 3-Acquire-> 3 is visible before 4-5 operation 4 operation 5

데이터에 접근하기 위해서 atomic 연산을 사용할 때, 다른 process 가 변경될 값들의 연산을 실행하기 전에 lock 을 볼 수 있어야 합니다 .

이것을 acquire semantic 이라고 합니다 . 데이터를 접근하기 위한 권한을 얻을려고 acquire 하고 때문입니다 .

Release semantics

operation 1 operation 2<-operation 3-Release-> 1-2 are visible before 3 operation 4 operation 5

atomic 연산이 최근에 변경된 값들을 release 하려고 할 때 , 새로운 값은 release 전에 다른 process 에게 보여야 합니다 .

이것을 release semantic 이라고 합니다 .

Fence semantics

operation 1 operation 2<-operation 3-Fence-> 1-2 are visible before 3, 3 is visible before 4-5

operation 4 operation 5

fence 는 full memory barrier 라고도 합니다 .

class SpinLock{ volatile tInt LockSem;public: FORCEINLINE SpinLock() : LockSem(0) {} FORCEINLINE tBool Lock() { while(1) { // Atomically swap the lock variable with 1 if it's currently equal to 0 if(!InterlockedCompareExchange(&LockSem, 1, 0)) { // We successfully acquired the lock ImportBarrier(); return; } } } FORCEINLINE void Unlock() { ExportBarrier(); LockSem = 0; }};

volatile 로 지정된 변수에 write 하는 것은 release semantic 과 같습니다 .

volotile 로 지정된 변수에서 읽는 것은 acquire semantic 과 같습니다 .

예 ) 실제 문제 상황HANDLE beginSema1;HANDLE beginSema2;HANDLE endSema;

int X, Y;int r1, r2;

DWORD WINAPI thread1Func(LPVOID param){ MersenneTwister random(1); for (;;) { WaitForSingleObject(beginSema1, INFINITE); // Wait for signal while (random.integer() % 8 != 0) {} // Random delay

// ----- THE TRANSACTION! ----- X = 1;#if USE_CPU_FENCE MemoryBarrier(); // Prevent CPU reordering#else _ReadWriteBarrier(); // Prevent compiler reordering only#endif r1 = Y;

ReleaseSemaphore(endSema, 1, NULL); // Notify transaction complete

} return 0; // Never returns};

DWORD WINAPI thread2Func(LPVOID param){ MersenneTwister random(2); for (;;) { WaitForSingleObject(beginSema2, INFINITE); // Wait for signal while (random.integer() % 8 != 0) {} // Random delay

// ----- THE TRANSACTION! ----- Y = 1;#if USE_CPU_FENCE MemoryBarrier(); // Prevent CPU reordering#else _ReadWriteBarrier(); // Prevent compiler reordering only#endif r2 = X;

ReleaseSemaphore(endSema, 1, NULL); // Notify transaction complete } return 0; // Never returns};

#if USE_SINGLE_HW_THREAD // Force thread affinities to the same cpu core. SetThreadAffinityMask(thread1, 1); SetThreadAffinityMask(thread2, 1);#endif

// Repeat the experiment ad infinitum int detected = 0; for (int iterations = 1; ; iterations++) { // Reset X and Y X = 0; Y = 0; // Signal both threads ReleaseSemaphore(beginSema1, 1, NULL); ReleaseSemaphore(beginSema2, 1, NULL); // Wait for both threads WaitForSingleObject(endSema, INFINITE); WaitForSingleObject(endSema, INFINITE); // Check if there was a simultaneous reorder if (r1 == 0 && r2 == 0) { detected++; printf("%d reorders detected after %d iterations\n", detected, iterations); } }

어떻게 해결하나요 ?std::atomic<int> X(0), Y(0);int r1, r2;

void thread1(){X.store(1); r1 = Y.load();

void thread2(){Y.store(1); r2 = X.load();

std::atomic<> 은 atomic 연산과 memory barrier 를 지원하는 C++11 의 표준 라이브러리입니다 .

.store() 와 .load() 는 디폴트로 fence 를 설치합니다 .

std::atomic<>

Relaxed orderingstd::atomic<int> x;std::atomic<int> y;

// Thread 1:r1 = y.load(memory_order_relaxed); // Ax.store(r1, memory_order_relaxed); // B

// Thread 2:r2 = x.load(memory_order_relaxed); // C y.store(42, memory_order_relaxed); // D

is allowed to produce r1 == r2 == 42 because, although A is sequenced-before B and C is sequenced before D, nothing prevents D from appearing before A in the modification order of y, and B from appearing before C in the modification order of x.

예 ) counter#include <vector>#include <iostream>#include <thread>#include <atomic> std::atomic<int> cnt = {0}; void f(){ for (int n = 0; n < 1000; ++n) { cnt.fetch_add(1, std::memory_order_relaxed); }} int main(){ std::vector<std::thread> v; for (int n = 0; n < 10; ++n) { v.emplace_back(f); } for (auto& t : v) { t.join(); } std::cout << "Final counter value is " << cnt << '\n';}

Release-Acquire ordering If an atomic store in thread A is tagged std::memory_order_release and an atomic load in thread B from the same variable is tagged std::memory_order_acquire, all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B, that is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory.

기다리는 B의 atomic load 이후의 명령들이 , A 의 atomic store 전에 처리한 모든 값을 볼 수 있습니다.

std::atomic<std::string*> ptr;int data; void producer(){ std::string* p = new std::string("Hello"); data = 42; ptr.store(p, std::memory_order_release);} void consumer(){ std::string* p2; while (!(p2 = ptr.load(std::memory_order_acquire))) ; assert(*p2 == "Hello"); // never fires assert(data == 42); // never fires} int main(){ std::thread t1(producer); std::thread t2(consumer); t1.join(); t2.join();}

Sequentially-consistent ordering Atomic operations tagged std::memory_order_seq_cst not only order memory the same way as release/acquire ordering (everything that happened-before a store in one thread becomes a visible side effect in the thread that did a load), but also establish a single total modification order of all atomic operations that are so tagged.

core 가 1 개인 경우 명령들이 정렬되어 실행되는 경우와 같습니다 .– fence(full memory barrier) 를 생성합니다 .

#include <thread>#include <atomic>#include <cassert> std::atomic<bool> x = {false};std::atomic<bool> y = {false};std::atomic<int> z = {0}; void write_x(){ x.store(true, std::memory_order_seq_cst);} void write_y(){ y.store(true, std::memory_order_seq_cst);}

void read_x_then_y(){ while (!x.load(std::memory_order_seq_cst)) ; if (y.load(std::memory_order_seq_cst)) { ++z; }}void read_y_then_x(){ while (!y.load(std::memory_order_seq_cst)) ; if (x.load(std::memory_order_seq_cst)) { ++z; }}int main(){ std::thread a(write_x); std::thread b(write_y); std::thread c(read_x_then_y); std::thread d(read_y_then_x); a.join(); b.join(); c.join(); d.join(); assert(z.load() != 0); // will never happen}

참고문헌 http://preshing.com/20130618/atomic-vs-non-atomic-operations/– blog series

http://en.cppreference.com/w/cpp/atomic/memory_order

http://www.developerfusion.com/article/138018/memory-ordering-for-atomic-operations-in-c0x/

멀티쓰레드 프로그래밍 :04. Parallel Pattern Library

2014년 11월 9일jintaeks@gmail.com

목차 index 3회

– atomic– lock-free– memory ordering– std::atomic<>

4 회 : PPL, Parallel Pattern Library– Task Parallelism (Concurrency Runtime)– Parallel Algorithms– Parallel Containers and Objects– Cancellation in the PPL– Debugging a Parallel Program

5회 : C++ AMP 6회 : 각 팀의 Thread 사용 현황

– 각 팀에서 사용중인 쓰레드– PPL 적용 개선 가능한 것들

Concurrency Runtime

Concurrency Runtime자체의 thread pool 을 유지합니다 . Concurrency runtime 은 work-stealing 알고리즘으로 각 쓰레드의 load 를 조정합니다 .

Concurrency runtime 은 리소스 접근의 동기화를 위해 서로 협동하는 blocking primitive 를 제공합니다 .– Parallel Pattern Library– Asynchronous Agene Library– Task Scheduler– Resource Manager

PPL, Parallel Pattern Library Ppl 은 일반적인 목적의 parallel container 와 algorithm 을 제공합니다 .

Ppl 은 parallel algorithm 을 통해 data parallelism 을 제공합니다 .

Ppl 은 task 를 통해 task parallelism 을 제공합니다 .

Asynchronous Agent Library Actor-based programming 을 제공합니다 . Message passing interface 를 제공합니다 .아래 링크를 참조하세요 .

– http://msdn.microsoft.com/en-us/library/dd492627.aspx

Task Scheduler Task Scheduler 는 실행시간에 task 를 스케쥴링하고 조정 coordinate 합니다 .

Processing 리소스를 최대한으로 사용하기 위해 work-stealing 알고리즘을 사용합니다 .

Resource Manager컴퓨팅 리소스 , 즉 프로세서 processor 와 메모리를 관리합니다 .

가장 최적이 되도록 리소스를 할당합니다 . Task Scheduler 와 상호작용하면서 리소스에 대한 추상 계층을 제공합니다 .

람다 lambda

In mathematical logic and computer science, lambda is used to introduce anonymous functions expressed with the concepts of lambda calculus.

람다 : syntax

a. lambda-introducer (capture clause)

b. lambda declarator (parameter list)

c. mutable (mutable specification)

d. exception-specification (exception specification)

e. trailing-return-type (return type)

f. compound-statement (lambda body)

람다 : exampleint main(){ using namespace std;

// Assign the lambda expression that adds two numbers to an auto variable. auto f1 = [](int x, int y) { return x + y; };

cout << f1(2, 3) << endl;

// Assign the same lambda expression to a function object. function<int(int, int)> f2 = [](int x, int y) { return x + y; };

cout << f2(3, 4) << endl;}

PPL 의 Task Parallelismparallel pattern library

Ppl 예 ) 피보나치 수열 계산// Calls the provided work function and returns the number of milliseconds // that it takes to call that function. template <class Function>__int64 time_call( Function&& f ){ __int64 begin = GetTickCount(); f(); return GetTickCount() - begin;}

// Computes the nth Fibonacci number. int fibonacci( int n ){ if( n < 2 ) return n; return fibonacci( n - 1 ) + fibonacci( n - 2 );}

직렬처리 serial processing __int64 elapsed;

// An array of Fibonacci numbers to compute. std::array<int, 4> a = { 24, 26, 41, 42 };

// The results of the serial computation. std::vector<std::tuple<int, int>> results1;

// Use the for_each algorithm to compute the results serially. elapsed = time_call( [&] { std::for_each (std::begin(a), std::end(a), [&]( int n ) { results1.push_back( std::make_tuple( n, fibonacci( n ) ) ); }); }); std::wcout << L"serial time: " << elapsed << L" ms" << std::endl;

병렬처리 parallel processing // The results of the parallel computation. concurrency::concurrent_vector<std::tuple<int, int>> results2;

// Use the parallel_for_each algorithm to perform the same task. elapsed = time_call( [&] { concurrency::parallel_for_each( std::begin(a), std::end(a), [&]( int n ) { results2.push_back( std::make_tuple( n, fibonacci( n ) ) ); });

// Because parallel_for_each acts concurrently, the results do not // have a pre-determined order. Sort the concurrent_vector object // so that the results match the serial version. std::sort( std::begin( results2 ), std::end( results2 ) ); }); std::wcout << L"parallel time: " << elapsed << L" ms" << std::endl << std::endl;

/** Outputserial time: 9250 msparallel time: 5726 ms

fib(24): 46368fib(26): 121393fib(41): 165580141fib(42): 267914296*/

concurrency::task<>#include <ppltasks.h>#include <iostream>

//using namespace concurrency;//using namespace std;

int wmain(){ // Create a task. concurrency::task<int> t( []() { return 42; });

// In this example, you don't necessarily need to call wait() because // the call to get() also waits for the result. t.wait();

// Print the result. std::wcout << t.get() << std::endl;}

/* Output: 42*/

concurrency::task<> 를 사용해 태스크를 정의합니다 . task 의 템플릿 인자는 태스크의 리턴타입 입니다 .

wait() 는 태스크가 실행된 경우 , 태스크의 종료를 기다립니다 .

get() 은 태스크의 종료시 리턴값을 얻습니다 .

concurrency::task::create_task(), then()concurrency::task<std::wstring> write_to_string(){ // Create a shared pointer to a string that is assigned to and read by multiple tasks. // By using a shared pointer, the string outlives the tasks, which can run in the

background after // this function exits. auto s = std::make_shared<std::wstring>(L"Value 1");

return concurrency::create_task([s] { // Print the current value. std::wcout << L"Current value: " << *s << std::endl; // Assign to a new value. *s = L"Value 2";

}).then([s] { // Print the current value. std::wcout << L"Current value: " << *s << std::endl; // Assign to a new value and return the string. *s = L"Value 3"; return *s; });}

태스크를 생성하기 위해 create_task() 를 사용합니다 . 연속된 태스크는 task<> 의 then() 을 사용합니다 .

lambda 는 thread-safe 해야 합니다 .// lambda-task-lifetime.cpp // compile with: /EHsc#include <ppltasks.h>#include <iostream>#include <string>

…int wmain(){ // Create a chain of tasks that work with a string. auto t = write_to_string();

// Wait for the tasks to finish and print the result. std::wcout << L"Final value: " << t.get() << std::endl;}

/* Output: Current value: Value 1 Current value: Value 2 Final value: Value 3*/127

태스크의 동작은 thread-safe 해야 합니다 . 그러므로 람다함수는 thread-safe 한 동작이 보장되도록 적절하게 변수를 capture 해야 합니다 .

예에서 string 은 write_to_string() 이 리턴된 이후에도 유효해야 하므로 std::shared_ptr<> 로 관리하고 있습니다 .

concurrency::task<std::array<std::array<int, 10>, 10>> create_identity_matrix([] { std::array<std::array<int, 10>, 10> matrix; int row = 0; std::for_each( std::begin(matrix), std::end(matrix), [&row](std::array<int, 10>& matrixRow) { std::fill( std::begin(matrixRow), std::end(matrixRow), 0); matrixRow[row] = 1; row++; }); return matrix; });

auto print_matrix = create_identity_matrix.then([](std::array<std::array<int, 10>, 10> matrix) { std::for_each( std::begin(matrix), std::end(matrix), [](std::array<int, 10>& matrixRow) { std::wstring comma; std::for_each( std::begin(matrixRow), std::end(matrixRow), [&comma](int n) { std::wcout << comma << n; comma = L", "; }); std::wcout << std::endl; }); });

then()

int wmain(){ … print_matrix.wait();}/* Output: 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 0, 1, 0, 0, 0, 0, 0, 0, 0, 0 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 0, 0, 0, 1, 0, 0, 0, 0, 0, 0 0, 0, 0, 0, 1, 0, 0, 0, 0, 0 0, 0, 0, 0, 0, 1, 0, 0, 0, 0 0, 0, 0, 0, 0, 0, 1, 0, 0, 0 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 0, 0, 0, 0, 0, 0, 0, 0, 1, 0 0, 0, 0, 0, 0, 0, 0, 0, 0, 1*/

auto create_identity_matrix = concurrency::create_task([] { std::array<std::array<int, 10>, 10> matrix; int row = 0; std::for_each( std::begin(matrix), std::end(matrix), [&row](std::array<int,

10>& matrixRow) { std::fill( std::begin(matrixRow), std::end(matrixRow), 0); matrixRow[row] = 1; row++; }); return matrix; });

concurrency::task 의 타입을 명시적으로 정의하기보다는 auto 로 정의하고 , create_task() 를 이용해서 정의합니다 .

task continuationint wmain(){ auto t = concurrency::create_task([]() -> int { return 0; });

// Create a lambda that increments its input value. auto increment = [](int n) { return n + 1; };

// Run a chain of continuations and print the result. int result = t.then(increment).then(increment).then(increment).get(); std::wcout << result << std::endl;}

/* Output: 3*/

task in taskint wmain(){ auto t = concurrency::create_task([]() { std::wcout << L"Task A" << std::endl;

// Create an inner task that runs before any continuation // of the outer task. return concurrency::create_task([]() { std::wcout << L"Task B" << std::endl; }); });

// Run and wait for a continuation of the outer task. t.then([]() { std::wcout << L"Task C" << std::endl; }).wait();}

/* Output: Task A Task B Task C*/

when_all()int wmain(){ // Start multiple tasks. std::array<concurrency::task<void>, 3> tasks = { concurrency::create_task([] { std::wcout << L"Hello from taskA." << std::endl; }), concurrency::create_task([] { std::wcout << L"Hello from taskB." << std::endl; }), concurrency::create_task([] { std::wcout << L"Hello from taskC." << std::endl; }) };

auto joinTask = concurrency::when_all( std::begin(tasks), std::end(tasks) );

// Print a message from the joining thread. std::wcout << L"Hello from the joining thread." << std::endl;

// Wait for the tasks to finish. joinTask.wait();}

/* Sample output: Hello from the joining thread. Hello from taskA. Hello from taskC. Hello from taskB.*/

when_all 은 task<std::vector<T>>를 리턴합니다 .

when_all() : get returnsint wmain(){ // Start multiple tasks. std::array<concurrency::task<int>, 3> tasks = { concurrency::create_task([]() -> int { return 88; }), concurrency::create_task([]() -> int { return 42; }), concurrency::create_task([]() -> int { return 99; }) };

auto joinTask = concurrency::when_all( std::begin(tasks), std::end(tasks) ).then([]( std::vector<int> results )

{ std::wcout << L"The sum is " << std::accumulate( std::begin(results), std::end(results), 0 ) << L'.' << std::endl; });

// Print a message from the joining thread. std::wcout << L"Hello from the joining thread." << std::endl;

// Wait for the tasks to finish. joinTask.wait();}133

/* Output: Hello from the joining thread. The sum is 229.*/

when_any()int wmain(){ // Start multiple tasks. std::array<concurrency::task<int>, 3> tasks = { concurrency::create_task([]() -> int { return 88; }), concurrency::create_task([]() -> int { return 42; }), concurrency::create_task([]() -> int { return 99; }) };

// Select the first to finish. concurrency::when_any( std::begin(tasks), std::end(tasks)).then([]( std::pair<int, size_t>

result) { std::wcout << "First task to finish returns " << result.first << L" and has index " << result.second << L'.' << std::endl; }).wait();}

/* Sample output: First task to finish returns 42 and has index 1.*/

when_any() 는 최초로 완료된 태스크의 std::pair< 리턴값 ,index> 를 리턴합니다 .

task group태스크의 집합을 관리합니다 .

– 태스크 그룹은 태스크를 work-stealing 큐에 push 합니다 .– 그룹의 각 태스크는 concurrency::task_handl 로 접근합니다 .

structured task group– 그룹의 연산은 같은 쓰레드에서 일어나야 합니다 .• cancel() 과 is_cancelling() 만 예외입니다 .

– wait() 호출 이후에 태스크를 추가하면 안 됩니다 .– 쓰레드 간의 동기화를 하지 않으므로 task_group 보다 오버헤드가

적습니다 .

unstructured task group concurrency::parallel_invoke() 는 structured_task_group 을 사용합니다 .

태스크 그룹은 cancellation 을 지원합니다 .135

structured_task_groupint wmain(){ // Use the make_task function to define several tasks. auto task1 = concurrency::make_task([] { /*TODO: Define the task body.*/ }); auto task2 = concurrency::make_task([] { /*TODO: Define the task body.*/ }); auto task3 = concurrency::make_task([] { /*TODO: Define the task body.*/ });

// Create a structured task group and run the tasks concurrently.

concurrency::structured_task_group tasks;

tasks.run( task1 ); tasks.run( task2 ); tasks.run_and_wait( task3 );}

make_task 는 task 를 정의만 하고 실행하지 않습니다 .

structured task group 의 run_and_wait() 는 group 의 모든 task 가 종료하기를 기다립니다 .

parallel_invoketemplate <typename T>T twice( const T& t ){ return t + t;}

int wmain(){ // Define several values. int n = 54; double d = 5.6; std::wstring s = L"Hello";

// Call the twice function on each value concurrently. // parallel_invoke uses structured_task_group internally. jintaeks on 20141107 concurrency::parallel_invoke( [&n] { n = twice(n); }, [&d] { d = twice(d); }, [&s] { s = twice(s); } );

// Print the values to the console. std::wcout << n << L' ' << d << L' ' << s << std::endl;}137

/** Output 108 11.2 HelloHello*/

parallel_invoke() 는 내부적으로 structured_task_group 을 이용합니다 .

unstructured task_groupint wmain(){ // A task_group object that can be used from multiple threads. concurrency::task_group tasks;

// Concurrently add several tasks to the task_group object. concurrency::parallel_invoke( [&] { // Add a few tasks to the task_group object. tasks.run([] { print_message(L"Hello"); }); tasks.run([] { print_message(42); }); }, [&] { // Add one additional task to the task_group object. tasks.run([] { print_message(3.14); }); } );

// Wait for all tasks to finish. tasks.wait();}

/** Output: Message from task: Hello Message from task: 3.14 Message from task: 42*/

non-structured task group 는 서로 다른 thread 에서 group 를 접근해서 task를 관리할 수 있습니다.

Cancellation concurrency::cancellation_token_source cts; auto token = cts.get_token();

std::wcout << L"Creating task..." << std::endl;

// Create a task that performs work until it is canceled. auto t = concurrency::create_task( [] { bool moreToDo = true; while( moreToDo ) { // Check for cancellation. if( concurrency::is_task_cancellation_requested() ) { // TODO: Perform any necessary cleanup here...

// Cancel the current task. concurrency::cancel_current_task(); } else { // Perform work. moreToDo = do_work(); } } }, token );

// Wait for one second and then cancel the task. concurrency::wait( 1000 );

std::wcout << L"Canceling task..." << std::endl; cts.cancel();

// Wait for the task to cancel. std::wcout << L"Waiting for task to complete..." << std::endl; t.wait();

std::wcout << L"Done." << std::endl;

/* Sample output: Creating task... Performing work... Performing work... Performing work... Performing work... Canceling task... Waiting for task to complete... Done.*/

Cancellation callback concurrency::cancellation_token_source cts; auto token = cts.get_token();

// An event that is set in the cancellation callback. concurrency::event e;

concurrency::cancellation_token_registration cookie; cookie = token.register_callback( [&e, token, &cookie]() { std::wcout << L"In cancellation callback..." << std::endl; e.set();

// Although not required, demonstrate how to unregister // the callback. token.deregister_callback(cookie); } );

std::wcout << L"Creating task..." << std::endl;

// Create a task that waits to be canceled. auto t = concurrency::create_task([&e] { e.wait(); }, token );

// Cancel the task. std::wcout << L"Canceling task..." << std::endl; cts.cancel();

// Wait for the task to cancel. t.wait();

std::wcout << L"Done." << std::endl;

/* Sample output: Creating task... Canceling task... In cancellation callback... Done.*/

Task tree // Create a task group that serves as the root of the tree. concurrency::structured_task_group tg1;

// Create a task that contains a nested task group. auto t1 = concurrency::make_task([&] { concurrency::structured_task_group tg2;

std::wcout << L"t1 task" << std::endl;

// Create a child task. auto t4 = concurrency::make_task([&] { std::wcout << L"t4 task" << std::endl; });

// Run the child tasks and wait for them to finish. tg2.run(t4); tg2.run(t5); tg2.wait(); });143

// Run the child tasks and wait for them to finish. tg1.run(t1); tg1.run(t2); tg1.run(t3); tg1.wait();

PPL 응용

Bitonic Merge Sort http://msdn.microsoft.com/en-us/library/vstudio/dd728066(v=vs.110).aspx

현재 Kog 게임에 적용 , FrameMove

엘소드 Npc 의 OnFrameMove() 는 약 2700 줄 ㅡㅡ ;

Debugging Parallel Tasks

예 ) create_task_tree() // Create a task group that serves as the root of the tree. concurrency::structured_task_group tg1;

// Create a task that contains a nested task group. auto t1 = concurrency::make_task([&] { concurrency::structured_task_group tg2;

std::wcout << L"t1 task" << std::endl;

// Run the child tasks and wait for them to finish. tg2.run(t4); tg2.run(t5); tg2.wait(); });150

// Run the child tasks and wait for them to finish. tg1.run(t1); tg1.run(t2); tg1.run(t3); tg1.wait();

Visual Studio 2012 Parallel 디버거

Parallel Task Window tg1 의 t1 에 breakpoint 가 활성화된 상황입니다 .

Parallel Task Window : task_group tg1 의 3 개의 task 중 2 개가 “활성” 상태입니다 .

Parallel Callstack Window “ 활성” 상태인 2개의 task 가 2 개의 쓰레드에서 실행되고 있습니다 .

Parallel Callstack Window : Task “ 작업 task” 단위로 callstack 을 관찰 할 수 있습니다 .

Parallel Watch같은 변수의 각 thread 에서의 값을 관찰합니다 .

데모

참고문헌 http://msdn.microsoft.com/en-us/library/dd492418.aspx

http://www.danielmoth.com/Blog/Parallel-Tasks-New-Visual-Studio-2010-Debugger-Window.aspx

http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Async-made-simple-with-Cpp-PPL

http://en.wikipedia.org/wiki/Bitonic_sorter http://msdn.microsoft.com/en-us/library/dd554943.aspx

Multithread programming 20151206_서진택

Technology

Multithread API’s

Politecnico di Milano SINTESI DI IMMAGINI FRATTALI ATTRAVERSO CALCOLI IN MULTITHREAD SU DIOPSIS 740 Referenti: Prof. Anna Antola Ing. Marco Domenico Santambrogio

Multithread Your Application

Multithread API’s Adam Piotrowski Grzegorz Jabłoński Lecture III

Multithread Programming

SE350: Operating Systemssmzahedi/crs/se350/slides/05-multithread… · •Many user-level threads are mapped to single kernel thread •User program provides scheduler and thread

UNIT II Process Management, Thread Scheduling Chapter 3: Process Concept Process ConceptsConcepts Chapter 4: Multithread Programming Threads Chapter 5:

Índice Programación en MPI - UPMlaurel.datsi.fi.upm.es/_media/docencia/asignaturas/ppd/mpi.pdf · MPI THREAD FUNNELED: Multithread, pero sólo uno (el maestro) puede llamar a la

Some Experiments and Issues to Exploit Multicore ...multithread. Exploiting Multicore Parallelism in MUMPS 3 1 Introduction MUMPS (MUltifrontal Massively Parallel sparse direct Solver)

Lors du cours précédent nous avons vu que pour réussir à ...Introduction Solution avec plusieurs threads Threads JAVA Accès concurrentiel Solution multiThread : gestion d'un nouveau

Buoi 4_ Multithread

Multithread Debugging + Qualitative Methodsjfc/cs160/F09/lecs/lec13.pdfQualitative Methods “Qualitative” methods, which typically come from anthropology and sociology, de-emphasizes

Multithread 416

D1.2v1 DIVA DOCUMENTATION - UAMP)-DiVA docum… · Microsoft Fundation Classes (MFC) 6.0. This package is used for network connections and multithread management. Intel® Open Source

Sistem Terdistribusi - komputasi.files.wordpress.com · Arsitektur Pengumuman: Silakan belajar mandiri topik pemrograman Java: Socket, Socket Multithread, RMI. Buat program “ngobrol”dengan

Multithread, real-time communication technology for building automation & security

CS4273: Distributed System Technologies and Programming I Lecture 4: Java Threads and Multithread Programming

Scheduling Multithread Computations by Stealing Work

EECS 470 Lecture 19 Simultaneous Multithreading · Simultaneous Multithreading (SMT) •Can we multithread an out-of-order machine? rDon’t want to give up performance benefits rDon’t

Multithread msc lecture