34
Parallel Programming with Intel ® Parallel Studio www.intel.com/go/parallel 방응준([email protected]) ㈜이에스컴소프트

04 방응준

Embed Size (px)

Citation preview

Page 1: 04 방응준

Parallel Programming with Intel® Parallel Studio

www.intel.com/go/parallel

방 응 준([email protected])

㈜이에스컴소프트

Page 2: 04 방응준

2Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Trend

Single Core

Dual Core

Quad Core

Multi-Core Processors Change the Rules

Page 3: 04 방응준

3Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Paradigm Shift: More Cores, Not a Faster Clock

• Power and thermal issues limit clock frequency

• Performance increases now come from parallelism

GHz Era Multi-core Era

TIME

PER

FO

RM

AN

CE

Multi-core Needs Parallel Applications

Page 4: 04 방응준

4Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Non-parallel (serial) is old and slow.

Unnecessarybottlenecks.

Page 5: 04 방응준

5Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Parallelism is the key to performance.

Unnecessarybottlenecks.

Parallelprogramminglets workproceed whenready.

Page 6: 04 방응준

6Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Parallelism is the key toperformance.

Dead end.Not the future.

The futureis here.We are ready.

Page 7: 04 방응준

7Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Targeting the „Mass Market“ of Parallelism

MAINSTREAM DEVELOPERS

Page 8: 04 방응준

8Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Targeting the „Mass Market“ of Parallelism

MAINSTREAM DEVELOPERS

Page 9: 04 방응준

9Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

the Mainstream Developers

Page 10: 04 방응준

10Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

the Mainstream Developers

Page 11: 04 방응준

11Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Intel® Parallel StudioMicrosoft Visual Studio plug in* for Parallelism

The Perfect Combination for Fast & Reliable Code

DESIGN

CODE & DEBUG

VERIFY

TUNE

+

Page 12: 04 방응준

12Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

• Advisor• Composer• Inspector• Amplifier

New software tools drive adoption of multi-coreFor Microsoft Visual Studio* C++ architects, developers, and software innovators creating parallel Windows* applications.

Page 13: 04 방응준

13Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Parallel Programming development lifecycle

DESIGNGain insight on where parallelism will most benefit existing source code

CODE & DEBUGDevelop effective applications with aC/C++ compiler and comprehensive threaded libraries

VERIFYEnsure application reliability with proactive parallel memory and threading error checking

TUNEEnhance applications with easy-to-use performance analyzer and tuner

Page 14: 04 방응준

14Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Intel® Parallel Advisor (Available in Q3/2010 )

DESIGN PHASE

• First and only threading advisor

• See where parallelism will most benefit Windows* apps

• Step-by-step threading guidance

• Make better design decisions

• Shorter learning curve for parallelism

Gain insight on where parallelism will most benefit existing source code

Page 15: 04 방응준

15Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Intel® Parallel Advisor Workflow

Page 16: 04 방응준

16Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Mark Insert Annotate

ANNOTATE_SITE_BEGIN(site1);for (i=0; i<N; i++) {

ANNOTATE_TASK_BEGIN(task1);func1(i);ANNOTATE_LOCK_ACQUIRE(0);glob_variable++;ANNOTATE_LOCK_RELEASE(0);func2(i);ANNOTATE_TASK_END(task1);

}ANNOTATE_SITE_END(site1);

Page 17: 04 방응준

17Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Views of Intel® Parallel Advisor

Page 18: 04 방응준

18Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Intel® Parallel Composer

CODE & DEBUG PHASE

• Easier, faster parallelism for Windows* apps

• C/C++ compiler and advanced threaded libraries

• Built-in parallel debugger

• Supports OpenMP*

• Save time and increase productivity

• Code Coverage

Develop effective applications with a C/C++ compiler and comprehensive threaded libraries

Page 19: 04 방응준

19Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Intel® Threading Building Blocks - today

• Extends C++ for parallelism– Solves C++ challenges in multiple areas– Portable to any C++ compiler, processor, O.S., already ported to a wide variety!– Coordinated with Visual Studio® 2010’s PPL and Concurrency Runtime

• Open source project started by Intel- http://threadingbuildingblocks.org

• Most used abstraction for parallelism• Flattered

Page 20: 04 방응준

20Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Concurrent Containersconcurrent_hash_map

concurrent_queueconcurrent_bounded_queue

concurrent_vector

Miscellaneoustick_count

Generic Parallel Algorithmsparallel_for(range)

parallel_reduceparallel_for_each(begin, end)

parallel_doparallel_invoke

pipelineparallel_sortparallel_scan

Task schedulertask_group

task_structured_grouptask_scheduler_init

task_scheduler_observer

Synchronization Primitivesatomic;

mutex; recursive_mutex;spin_mutex; spin_rw_mutex;

queuing_mutex; queuing_rw_mutex;null_mutex; null_rw_mutex

Memory Allocationtbb_allocator; cache_aligned_allocator; scalable_allocator; zero_allocator

Threadstbb_thread

Thread Local Storageenumerable_thread_specific

combinable

All these m

ake up TBB

Page 21: 04 방응준

21Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Classical parallel algorithm usage example#include "tbb/blocked_range.h"#include "tbb/parallel_for.h“using namespace tbb;

class ChangeArray{int* array;public:ChangeArray (int* a): array(a) {}

void operator()( const blocked_range<int>& r ) const{for (inti=r.begin();i!=r.end();i++ ){Foo (array[i]);}}};

void ChangeArrayParallel (int* a, int n ){parallel_for (blocked_range<int>(0, n), ChangeArray(a), auto_partitioner());}

int main (){task_scheduler_init init;int A[N];

// initialize array here…ChangeArrayParallel(A, N);

return 0;}

ChangeArrayclass definesa for-loop body for parallel_for

blocked_range– TBB templaterepresenting 1D iteration space

As usual with C++ functionobjects the main work is done inside operator()

A call to a template function parallel_for<Range, Body>:with arguments Range blocked_rangeBody ChangeArray

Page 22: 04 방응준

22Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

C++0x lambda functions support

#include "tbb/blocked_range.h"#include "tbb/parallel_for.h“using namespace tbb;

void ChangeArrayParallel (int* a, int n ){parallel_for (0, n, 1,

[=](inti) {Foo (a[i]);

}/*, auto_partitioner*/);}

int main (){//task_scheduler_init init;int A[N];

// initialize array here…ChangeArrayParallel (A, N);

return 0;}

Capture variables by valuefrom surrounding scope tocompletely mimic the non-lambdaimplementation. Note that [&]could be used to capture variables by reference.

Using lambda functions implementMyBody::operator() right insidethe call to parallel_for().

parallel_for example will transform into:

auto_partitioner is used by default

explicit task_scheduler_initcreation is now optional

parallel_for has an overload that takesstart, stop and step argument andconstructs blocked_range internally

Page 23: 04 방응준

23Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Intel® Threading Building Blocks (TBB) andMicrosoft* Visual Studio* 2010 Parallel Pattern Library

Identical semantics shared for a core set of concurrent containers and algorithm classesparallel_for(first,last,step,f)parallel_for_eachparallel_invoketask_handletask_group_statustask_groupstructured_task_groupis_current_task_group_cancellingmissing_waitconcurrent_vector*concurrent_queue*

*These are based on Intel’s implementation used by Threading Building Blocks

Page 24: 04 방응준

24Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Built-in Parallel Debugger Extension

Page 25: 04 방응준

25Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Intel® Parallel Inspector

VERIFY PHASE

• Find threading errors faster

• Parallel memory and threading error checking

• Rapid analysis of code

• Help ensure Windows* application reliability

• Ship apps that run error-free

Ensure application reliability with proactive parallel memory and threading error checking

Page 26: 04 방응준

26Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Memory Errors Analysis

• Memory Leaks

• Invalid Memory Accesses

• Invalid Partial Memory Accesses

• Mismatched Memory Allocation / Deallocation

• Missing Allocations

• Uninitialized Memory Accesses

• Uninitialized Partial Memory Access

Page 27: 04 방응준

27Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Intel® Parallel Inspector-Memory Errors

Page 28: 04 방응준

28Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Intel® Parallel Inspector-Memory Errors

Page 29: 04 방응준

29Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Threading Error Analysis

• Potential Threading Errors Detected

• Data Races

• Deadlock

• Potential Privacy Infringement

Page 30: 04 방응준

30Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Intel® Parallel Inspector-Threading Errors

Data Racing

Page 31: 04 방응준

31Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Race Conditions

• Threads “race” against each other for resources

• Execution order is assumed but cannot be guaranteed

• Storage conflict is most common

• Concurrent access of same memory location by multiple threads– At least one thread is writing

Page 32: 04 방응준

32Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Intel® Parallel Amplifier

TUNE PHASE

• Quickly find bottlenecks

• Tune Windows* apps faster

• Optimize app performance

• Scale apps for multi-core

• Designed for parallel apps

• Performance Analysis

• Performance Scalability Analysis

• Locks & Waits Analysis

Quickly find bottlenecks and tune parallel applications for scalable multi-core performance

Page 33: 04 방응준

33Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Where to parallel…

Page 34: 04 방응준

34Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

www.intel.com/go/parallel

Thank you!

Time for Questions now.

Naver Cafe: cafe.naver.com/intelswTwitter: IntelSDP