29
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: [email protected] Software and Services Group October 23, 2008

SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: [email protected] Software and Services

Embed Size (px)

Citation preview

Page 1: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

SEC(R)2008

SEC(R)2008

Intel® Concurrent Collections for C++ -a model for parallel programming

Nikolay Kurtovemail: [email protected]

Software and Services Group

October 23, 2008

Page 2: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

2Software and Services Group 2

SEC(R)2008

SEC(R)2008

October 23, 2008

Agenda

•Existing parallel programming models

•Key concepts, Blackscholes example

•Performance results

Page 3: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

3Software and Services Group 3

SEC(R)2008

SEC(R)2008

October 23, 2008

Parallel programming is important

•Number of multi-core machines is growing

•Developers want to fully exploit architecture capabilities

But Parallel programming is hard:

•Users must reason about parallelism

•Thread synchronization

•Embedded in serial languages

−Data Overwriting

−Arbitrary Serialization

•Tuning Performance

•Depends on a platform

Page 4: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

4Software and Services Group 4

SEC(R)2008

SEC(R)2008

October 23, 2008

Parallel Programming Models

•Improve productivity of programming

•Hide low-level details

•Provide high-level abstractions

The following models are very popular:

•OpenMP

•Cilk

•Intel® Threading Building Blocks

Page 5: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

5Software and Services Group 5

SEC(R)2008

SEC(R)2008

October 23, 2008

OpenMP

•Perfect for Data-parallel algorithms

•Basics are easy to be applied:

#pragma omp parallel for

for (int i = 0; i < N; i++) doSomething(i);

•Advanced usage is complicated and error-prone

•Requires compiler support

Page 6: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

6Software and Services Group 6

SEC(R)2008

SEC(R)2008

October 23, 2008

Cilk

•The programmer identifyes elements that can safely be executed in parallel

int fibonacci(int n) {

if (n < 2) return n;

int x = cilk_spawn fib(n-1);

int y = cilk_spawn fib(n-2);

cilk_sync;

return (x+y);

}

•Explicit spawning of tasks and synchronization with barriers

Page 7: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

7Software and Services Group 7

SEC(R)2008

SEC(R)2008

October 23, 2008

Intel® Threading Building Blocks

•Implemented as a C++ library

•Requires an excellent knowledge of C++

•Provides excellent high-level abstractions

•Provides basic parallel algorithms:

−parallel_for

−parallel_sort

−parallel_while

−parallel_reduce

−parallel_do

−parallel_scan

Page 8: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

8Software and Services Group 8

SEC(R)2008

SEC(R)2008

October 23, 2008

Existing models - summary

•The programmer explicitly expresses parallelism

•Provide an imperative algorithm description

•Many low-levels questions are solved by the programmer

•Good control over performance

Page 9: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

9Software and Services Group 9

SEC(R)2008

SEC(R)2008

October 23, 2008

Agenda

•Existing parallel programming models

•Key concepts, Blackscholes example

•Performance results

Page 10: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

10Software and Services Group 10

SEC(R)2008

SEC(R)2008

October 23, 2008

The application problem:

• Serial code

• Semantic correctness

Intel® Concurrent Collections:

• Architecture

• Actual parallelism

• Load balancing

• Distribution among processors

Ideal Parallel programming model

Domain Expert (person)

Only domain knowledge

No tuning knowledge

Tuning Expert (person, runtime, static

analysis)

No domain knowledge

Only tuning knowledge

Page 11: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

11Software and Services Group 11

SEC(R)2008

SEC(R)2008

October 23, 2008

How people think about their application

What are high level operations?

What are the chunks of data?

What are the producer/consumer relationships?

What are the inputs and outputs?

Parameters ResultSolve

Blackscholes

A data-parallel application

Solves an equation independently for each parameters set

Blackscholes

A data-parallel application

Solves an equation independently for each parameters set

Page 12: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

12Software and Services Group 12

SEC(R)2008

SEC(R)2008

October 23, 2008

•Step – a single high-level operation

•Item – a single data element

•Tag – an identifier of a step or an item

•Inputs/Outputs – items or tags produced or consumed by the environment

Intel® Concurrent Collections Key Concepts

Page 13: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

13Software and Services Group 13

SEC(R)2008

SEC(R)2008

October 23, 2008

// Declarations<SolveTags: int n>;[OptionData* Parameters: int n];[float Result: int n];

// Step prescription<SolveTags> :: (Solve);

// Step execution[Parameters] -> (Solve) -> [Result];

// Input from the environment: // initialize all tags and dataenv -> <SolveTags>, [Parameters];

// Output to the environment[Result] -> env;

Textual Graph Representation

Page 14: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

14Software and Services Group 14

SEC(R)2008

SEC(R)2008

October 23, 2008

Graph definition Translator

•Translates a graph definition into a declaration of a class

•A generated class contains properly named item collections, tag collections and step collections

•Generates a coding hints file – a template for steps definition

•Checks correctness of a graph

class blackscholes_graph_t : public Graph_t {

public:

ItemCollection_t<OptionData*> Parameters;

ItemCollection_t<float> Result;

TagCollection_t SolveTags;

StepCollection_t SolveStepCollection;

...

};

class blackscholes_graph_t : public Graph_t {

public:

ItemCollection_t<OptionData*> Parameters;

ItemCollection_t<float> Result;

TagCollection_t SolveTags;

StepCollection_t SolveStepCollection;

...

};

Page 15: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

15Software and Services Group 15

SEC(R)2008

SEC(R)2008

October 23, 2008

Items identifiers

• Items are stored in a graph in an item collection

• Put stores an item, associates it with a tag

• Get accesses items by a tag

• Items are immutable

Tags

Steps identifiers

• Steps are prescribed by tags

• Put stores a tag, instantiates prescribed steps

• The same tag is passed to each instantiated step

Page 16: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

16Software and Services Group 16

SEC(R)2008

SEC(R)2008

October 23, 2008

Specifying Computation

1. StepReturnValue_t Solve(

2. Blackscholes_graph_t& graph,

3. const Tag_t& step_tag)

4. {

5. OptionData* data =

6. graph.Parameters.Get(step_tag);

7. float result = solveEquation(data);

8. graph.Result.Put(step_tag, result);

9. return CNC_Success;

10.}

Page 17: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

17Software and Services Group 17

SEC(R)2008

SEC(R)2008

October 23, 2008

Using the graph in your C++ application1. Blackscholes_graph_t my_graph;

2. for (int i = 0; i < N; i++) {3. my_graph.SolveTags.Put(Tag_t(i));4. my_graph.Parameters.Put(Tag_t(i), data[i]);5. }

6. my_graph.run();

7. for (int i = 0; i < N; i++) {8. float result = my_graph.Result.Get(Tag_t(i));9. std::cout << result << std::endl;10.}

Page 18: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

18Software and Services Group 18

SEC(R)2008

SEC(R)2008

October 23, 2008

Steps Rescheduling

A step may begin execution before its input items are available

It will be rescheduled and started again from the beginning when the corresponding item is added to the collection

Image : k

ImageTag : k

Block : i, j

BlockTag : i,j

Result : i, jSplit : k Process : i, j

Page 19: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

19Software and Services Group 19

SEC(R)2008

SEC(R)2008

October 23, 2008

Constraints required by the application

1.Steps have no side-effects

2.Steps call Gets before any Puts

3.Steps call Gets before allocating any memory

Page 20: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

20Software and Services Group 20

SEC(R)2008

SEC(R)2008

October 23, 2008

Benefits from using Intel® Concurrent Collections•Improves programming productivity

−Only serial code

−No knowledge of parallel technologies required

−Determinism

−Race-free

•Portability

•Scalability

•Expert-tuning system

Page 21: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

21Software and Services Group 21

SEC(R)2008

SEC(R)2008

October 23, 2008

Summary: How to write an application using Intel® Concurrent Collections?

1. Draw the algorithm on a chalkboard

2. Define Data structures

3. Represent the algorithm in the textual notation

4. Implement high-level operations in C++

5. Instantiate a Graph and run it

Page 22: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

22Software and Services Group 22

SEC(R)2008

SEC(R)2008

October 23, 2008

Agenda

•Existing parallel programming models

•Key concepts, Blackscholes example

•Performance results

Page 23: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

23Software and Services Group 23

SEC(R)2008

SEC(R)2008

October 23, 2008

Blackscholes benchmark

• Calculations for a single set of parameters are less than 500 CPU instructions

• Steps should be grouped to reduce the overhead and improve cache locality

• Automatic grain selection is an area for future research

0

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8Threads

Rel

ativ

e P

erfo

rman

ce

System Threads

Block size 200

Block Size 10

No blocks

Page 24: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

24Software and Services Group 24

SEC(R)2008

SEC(R)2008

October 23, 2008

Dedup benchmark

•Algorithm is a pipeline

•The last pipeline stage is serial

•Feature “Steps Priorities” makes Dedup run 1.4 times faster

0

0.5

1

1.5

2

1 2 3 4 5 6 7 8Threads

Rel

ativ

e P

erfo

rman

ce

Intel® ConcurrentCollectionsPthreads

Page 25: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

25Software and Services Group 25

SEC(R)2008

SEC(R)2008

October 23, 2008

Possible model improvements

• Memory management

• Garbage collection

• Automatic grain selection

• Streaming data input

Page 26: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

26Software and Services Group 26

SEC(R)2008

SEC(R)2008

October 23, 2008

Getting More Information

Intel® Concurrent Collections for C/C++

on WhatIf.intel.com:

http://software.intel.com/en-us/articles/

intel-concurrent-collections-for-cc

Page 27: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

27Software and Services Group 27

SEC(R)2008

SEC(R)2008

October 23, 2008

Questions & Answers

Page 28: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services

28Software and Services Group 28

SEC(R)2008

SEC(R)2008

October 23, 2008

Thank you!

Page 29: SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services