Upload
percival-stafford
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
SEC(R)2008
SEC(R)2008
Intel® Concurrent Collections for C++ -a model for parallel programming
Nikolay Kurtovemail: [email protected]
Software and Services Group
October 23, 2008
2Software and Services Group 2
SEC(R)2008
SEC(R)2008
October 23, 2008
Agenda
•Existing parallel programming models
•Key concepts, Blackscholes example
•Performance results
3Software and Services Group 3
SEC(R)2008
SEC(R)2008
October 23, 2008
Parallel programming is important
•Number of multi-core machines is growing
•Developers want to fully exploit architecture capabilities
But Parallel programming is hard:
•Users must reason about parallelism
•Thread synchronization
•Embedded in serial languages
−Data Overwriting
−Arbitrary Serialization
•Tuning Performance
•Depends on a platform
4Software and Services Group 4
SEC(R)2008
SEC(R)2008
October 23, 2008
Parallel Programming Models
•Improve productivity of programming
•Hide low-level details
•Provide high-level abstractions
The following models are very popular:
•OpenMP
•Cilk
•Intel® Threading Building Blocks
5Software and Services Group 5
SEC(R)2008
SEC(R)2008
October 23, 2008
OpenMP
•Perfect for Data-parallel algorithms
•Basics are easy to be applied:
#pragma omp parallel for
for (int i = 0; i < N; i++) doSomething(i);
•Advanced usage is complicated and error-prone
•Requires compiler support
6Software and Services Group 6
SEC(R)2008
SEC(R)2008
October 23, 2008
Cilk
•The programmer identifyes elements that can safely be executed in parallel
int fibonacci(int n) {
if (n < 2) return n;
int x = cilk_spawn fib(n-1);
int y = cilk_spawn fib(n-2);
cilk_sync;
return (x+y);
}
•Explicit spawning of tasks and synchronization with barriers
7Software and Services Group 7
SEC(R)2008
SEC(R)2008
October 23, 2008
Intel® Threading Building Blocks
•Implemented as a C++ library
•Requires an excellent knowledge of C++
•Provides excellent high-level abstractions
•Provides basic parallel algorithms:
−parallel_for
−parallel_sort
−parallel_while
−parallel_reduce
−parallel_do
−parallel_scan
8Software and Services Group 8
SEC(R)2008
SEC(R)2008
October 23, 2008
Existing models - summary
•The programmer explicitly expresses parallelism
•Provide an imperative algorithm description
•Many low-levels questions are solved by the programmer
•Good control over performance
9Software and Services Group 9
SEC(R)2008
SEC(R)2008
October 23, 2008
Agenda
•Existing parallel programming models
•Key concepts, Blackscholes example
•Performance results
10Software and Services Group 10
SEC(R)2008
SEC(R)2008
October 23, 2008
The application problem:
• Serial code
• Semantic correctness
Intel® Concurrent Collections:
• Architecture
• Actual parallelism
• Load balancing
• Distribution among processors
Ideal Parallel programming model
Domain Expert (person)
Only domain knowledge
No tuning knowledge
Tuning Expert (person, runtime, static
analysis)
No domain knowledge
Only tuning knowledge
11Software and Services Group 11
SEC(R)2008
SEC(R)2008
October 23, 2008
How people think about their application
What are high level operations?
What are the chunks of data?
What are the producer/consumer relationships?
What are the inputs and outputs?
Parameters ResultSolve
Blackscholes
A data-parallel application
Solves an equation independently for each parameters set
Blackscholes
A data-parallel application
Solves an equation independently for each parameters set
12Software and Services Group 12
SEC(R)2008
SEC(R)2008
October 23, 2008
•Step – a single high-level operation
•Item – a single data element
•Tag – an identifier of a step or an item
•Inputs/Outputs – items or tags produced or consumed by the environment
Intel® Concurrent Collections Key Concepts
13Software and Services Group 13
SEC(R)2008
SEC(R)2008
October 23, 2008
// Declarations<SolveTags: int n>;[OptionData* Parameters: int n];[float Result: int n];
// Step prescription<SolveTags> :: (Solve);
// Step execution[Parameters] -> (Solve) -> [Result];
// Input from the environment: // initialize all tags and dataenv -> <SolveTags>, [Parameters];
// Output to the environment[Result] -> env;
Textual Graph Representation
14Software and Services Group 14
SEC(R)2008
SEC(R)2008
October 23, 2008
Graph definition Translator
•Translates a graph definition into a declaration of a class
•A generated class contains properly named item collections, tag collections and step collections
•Generates a coding hints file – a template for steps definition
•Checks correctness of a graph
class blackscholes_graph_t : public Graph_t {
public:
ItemCollection_t<OptionData*> Parameters;
ItemCollection_t<float> Result;
TagCollection_t SolveTags;
StepCollection_t SolveStepCollection;
...
};
class blackscholes_graph_t : public Graph_t {
public:
ItemCollection_t<OptionData*> Parameters;
ItemCollection_t<float> Result;
TagCollection_t SolveTags;
StepCollection_t SolveStepCollection;
...
};
15Software and Services Group 15
SEC(R)2008
SEC(R)2008
October 23, 2008
Items identifiers
• Items are stored in a graph in an item collection
• Put stores an item, associates it with a tag
• Get accesses items by a tag
• Items are immutable
Tags
Steps identifiers
• Steps are prescribed by tags
• Put stores a tag, instantiates prescribed steps
• The same tag is passed to each instantiated step
16Software and Services Group 16
SEC(R)2008
SEC(R)2008
October 23, 2008
Specifying Computation
1. StepReturnValue_t Solve(
2. Blackscholes_graph_t& graph,
3. const Tag_t& step_tag)
4. {
5. OptionData* data =
6. graph.Parameters.Get(step_tag);
7. float result = solveEquation(data);
8. graph.Result.Put(step_tag, result);
9. return CNC_Success;
10.}
17Software and Services Group 17
SEC(R)2008
SEC(R)2008
October 23, 2008
Using the graph in your C++ application1. Blackscholes_graph_t my_graph;
2. for (int i = 0; i < N; i++) {3. my_graph.SolveTags.Put(Tag_t(i));4. my_graph.Parameters.Put(Tag_t(i), data[i]);5. }
6. my_graph.run();
7. for (int i = 0; i < N; i++) {8. float result = my_graph.Result.Get(Tag_t(i));9. std::cout << result << std::endl;10.}
18Software and Services Group 18
SEC(R)2008
SEC(R)2008
October 23, 2008
Steps Rescheduling
A step may begin execution before its input items are available
It will be rescheduled and started again from the beginning when the corresponding item is added to the collection
Image : k
ImageTag : k
Block : i, j
BlockTag : i,j
Result : i, jSplit : k Process : i, j
19Software and Services Group 19
SEC(R)2008
SEC(R)2008
October 23, 2008
Constraints required by the application
1.Steps have no side-effects
2.Steps call Gets before any Puts
3.Steps call Gets before allocating any memory
20Software and Services Group 20
SEC(R)2008
SEC(R)2008
October 23, 2008
Benefits from using Intel® Concurrent Collections•Improves programming productivity
−Only serial code
−No knowledge of parallel technologies required
−Determinism
−Race-free
•Portability
•Scalability
•Expert-tuning system
21Software and Services Group 21
SEC(R)2008
SEC(R)2008
October 23, 2008
Summary: How to write an application using Intel® Concurrent Collections?
1. Draw the algorithm on a chalkboard
2. Define Data structures
3. Represent the algorithm in the textual notation
4. Implement high-level operations in C++
5. Instantiate a Graph and run it
22Software and Services Group 22
SEC(R)2008
SEC(R)2008
October 23, 2008
Agenda
•Existing parallel programming models
•Key concepts, Blackscholes example
•Performance results
23Software and Services Group 23
SEC(R)2008
SEC(R)2008
October 23, 2008
Blackscholes benchmark
• Calculations for a single set of parameters are less than 500 CPU instructions
• Steps should be grouped to reduce the overhead and improve cache locality
• Automatic grain selection is an area for future research
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8Threads
Rel
ativ
e P
erfo
rman
ce
System Threads
Block size 200
Block Size 10
No blocks
24Software and Services Group 24
SEC(R)2008
SEC(R)2008
October 23, 2008
Dedup benchmark
•Algorithm is a pipeline
•The last pipeline stage is serial
•Feature “Steps Priorities” makes Dedup run 1.4 times faster
0
0.5
1
1.5
2
1 2 3 4 5 6 7 8Threads
Rel
ativ
e P
erfo
rman
ce
Intel® ConcurrentCollectionsPthreads
25Software and Services Group 25
SEC(R)2008
SEC(R)2008
October 23, 2008
Possible model improvements
• Memory management
• Garbage collection
• Automatic grain selection
• Streaming data input
26Software and Services Group 26
SEC(R)2008
SEC(R)2008
October 23, 2008
Getting More Information
Intel® Concurrent Collections for C/C++
on WhatIf.intel.com:
http://software.intel.com/en-us/articles/
intel-concurrent-collections-for-cc
27Software and Services Group 27
SEC(R)2008
SEC(R)2008
October 23, 2008
Questions & Answers
28Software and Services Group 28
SEC(R)2008
SEC(R)2008
October 23, 2008
Thank you!