57
Chapter 3 Parallel Algorithm Design Parallel Algorithm Design

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Embed Size (px)

Citation preview

Page 1: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Chapter 3

Parallel Algorithm DesignParallel Algorithm Design

Page 2: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Outline

Task/channel modelTask/channel model Algorithm design methodologyAlgorithm design methodology Case studiesCase studies

Page 3: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Task/Channel Model

Parallel computation = set of tasksParallel computation = set of tasks TaskTask

ProgramProgram Local memoryLocal memory Collection of I/O portsCollection of I/O ports

Tasks interact by sending messages through Tasks interact by sending messages through channelschannels

Page 4: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Task/Channel Model

TaskTaskChannelChannel

Page 5: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Foster’s Design Methodology

PartitioningPartitioning CommunicationCommunication AgglomerationAgglomeration MappingMapping

Page 6: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Foster’s Methodology

ProblemPartitioning

Communication

AgglomerationMapping

Page 7: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Partitioning Dividing computation and data into piecesDividing computation and data into pieces Domain decompositionDomain decomposition

Divide data into piecesDivide data into pieces e.g., An array into sub-arrays (reduction); A loop into e.g., An array into sub-arrays (reduction); A loop into

sub-loops (matrix multiplication), A search space into sub-loops (matrix multiplication), A search space into sub-spaces (chess)sub-spaces (chess)

Determine how to associate computations with Determine how to associate computations with the datathe data

Functional decompositionFunctional decomposition Divide computation into piecesDivide computation into pieces

e.g., pipelines (floating point multiplication), workflows e.g., pipelines (floating point multiplication), workflows (pay roll processing)(pay roll processing)

Determine how to associate data with the Determine how to associate data with the computationscomputations

Page 8: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Example Domain Decompositions

Page 9: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Example Functional Decomposition

Page 10: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Partitioning Checklist

Large Grained TasksLarge Grained Tasks e.g, at least 10x more primitive tasks than e.g, at least 10x more primitive tasks than

processors in target computerprocessors in target computer Balance LoadBalance Load

Primitive tasks roughly the same sizePrimitive tasks roughly the same size ScalableScalable

Number of tasks an increasing function Number of tasks an increasing function of problem sizeof problem size

Page 11: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Communication

Determine values passed among tasksDetermine values passed among tasks Local communicationLocal communication

Task needs values from a small number of other Task needs values from a small number of other taskstasks

Create channels illustrating data flowCreate channels illustrating data flow Global communicationGlobal communication

Significant number of tasks contribute data to Significant number of tasks contribute data to perform a computationperform a computation

Don’t create channels for them early in designDon’t create channels for them early in design

Page 12: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Communication Checklist

BalancedBalanced Communication operations balanced among tasksCommunication operations balanced among tasks

Small degree:Small degree: Each task communicates with only small group of Each task communicates with only small group of

neighborsneighbors ConcurrencyConcurrency

Tasks can perform communications concurrentlyTasks can perform communications concurrently Task can perform computations concurrentlyTask can perform computations concurrently

Page 13: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Agglomeration

Grouping tasks into larger tasksGrouping tasks into larger tasks GoalsGoals

Improve performanceImprove performance Maintain scalability of programMaintain scalability of program Simplify programmingSimplify programming

In MPI programming, goal often to create In MPI programming, goal often to create one agglomerated task per processorone agglomerated task per processor

Page 14: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Agglomeration Can Improve Performance Eliminate communication between Eliminate communication between

primitive tasks agglomerated into primitive tasks agglomerated into consolidated taskconsolidated task

Combine groups of sending and receiving Combine groups of sending and receiving taskstasks

Page 15: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Agglomeration Checklist

Locality of parallel algorithm has increasedLocality of parallel algorithm has increased Tradeoff between agglomeration and code Tradeoff between agglomeration and code

modifications costs is reasonablesmodifications costs is reasonables Agglomerated tasks have similar computational Agglomerated tasks have similar computational

and communications costsand communications costs Number of tasks increases with problem sizeNumber of tasks increases with problem size Number of tasks suitable for likely target systemsNumber of tasks suitable for likely target systems

Page 16: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Mapping

Process of assigning tasks to processorsProcess of assigning tasks to processors Centralized multiprocessor: mapping done Centralized multiprocessor: mapping done

by operating systemby operating system Distributed memory system: mapping done Distributed memory system: mapping done

by userby user Conflicting goals of mappingConflicting goals of mapping

Maximize processor utilizationMaximize processor utilization Minimize interprocessor communicationMinimize interprocessor communication

Page 17: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Mapping Example

Page 18: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Optimal Mapping

Finding optimal mapping is NP-hardFinding optimal mapping is NP-hard Must rely on heuristicsMust rely on heuristics

Page 19: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Mapping Decision Tree

Static number of tasksStatic number of tasks Structured communicationStructured communication

Constant computation time per taskConstant computation time per task• Agglomerate tasks to minimize commAgglomerate tasks to minimize comm• Create one task per processorCreate one task per processor

Variable computation time per taskVariable computation time per task• Cyclically map tasks to processorsCyclically map tasks to processors

Unstructured communicationUnstructured communication• Use a static load balancing algorithmUse a static load balancing algorithm

Dynamic number of tasksDynamic number of tasks

Page 20: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Mapping Strategy

Static number of tasksStatic number of tasks Dynamic number of tasksDynamic number of tasks

Use a run-time task-scheduling algorithmUse a run-time task-scheduling algorithm• e.g., a master slave strategye.g., a master slave strategy

Use a dynamic load balancing algorithmUse a dynamic load balancing algorithm• e.g., share load among neighboring e.g., share load among neighboring

processors; remapping periodically processors; remapping periodically

Page 21: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Mapping Checklist

Considered designs based on one task per Considered designs based on one task per processor and multiple tasks per processorprocessor and multiple tasks per processor If multiple task per processor chosen, If multiple task per processor chosen,

ratio of tasks to processors is at least 10:1ratio of tasks to processors is at least 10:1 Evaluated static and dynamic task Evaluated static and dynamic task

allocationallocation If dynamic task allocation chosen, task If dynamic task allocation chosen, task

allocator is not a bottleneck to performanceallocator is not a bottleneck to performance

Page 22: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Case Studies

Boundary value problemBoundary value problem Finding the maximumFinding the maximum The n-body problemThe n-body problem Adding data inputAdding data input

Page 23: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Boundary Value Problem

Ice water Rod Insulation

Page 24: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Rod Cools as Time Progresses

Page 25: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Finite Difference Approximation

Page 26: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Partitioning

One data item per grid pointOne data item per grid point Associate one primitive task with each grid Associate one primitive task with each grid

pointpoint Two-dimensional domain decompositionTwo-dimensional domain decomposition

Page 27: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Communication

Identify communication pattern between Identify communication pattern between primitive tasksprimitive tasks

Each interior primitive task has three Each interior primitive task has three incoming and three outgoing channelsincoming and three outgoing channels

Page 28: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Agglomeration and Mapping

Agglomeration

Page 29: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Sequential execution time

– – time to update elementtime to update element nn – number of elements – number of elements mm – number of iterations – number of iterations Sequential execution timeSequential execution time: mn: mn

Page 30: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Parallel Execution Time

p p – number of processors– number of processors – – message latencymessage latency Parallel execution time Parallel execution time mm((nn//pp+2+2))

Page 31: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Finding the Maximum ErrorComputed 0.15 0.16 0.16 0.19Correct 0.15 0.16 0.17 0.18Error (%) 0.00% 0.00% 6.25% 5.26%

6.25%

Page 32: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Reduction

Given associative operator Given associative operator aa00 aa11 aa22 … … aan-1n-1

ExamplesExamples AddAdd MultiplyMultiply And, OrAnd, Or Maximum, MinimumMaximum, Minimum

Page 33: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Parallel Reduction Evolution

Page 34: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Parallel Reduction Evolution

Page 35: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Parallel Reduction Evolution

Page 36: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Binomial Trees

Subgraph of hypercube

Page 37: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Finding Global Sum

4 2 0 7

-3 5 -6 -3

8 1 2 3

-4 4 6 -1

Page 38: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Finding Global Sum

1 7 -6 4

4 5 8 2

Page 39: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Finding Global Sum

8 -2

9 10

Page 40: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Finding Global Sum

17 8

Page 41: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Finding Global Sum

25

Binomial Tree

Page 42: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Agglomeration

Page 43: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Agglomeration

sum

sum sum

sum

Page 44: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

The n-body Problem

Page 45: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

The n-body Problem

Page 46: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Partitioning

Domain partitioningDomain partitioning Assume one task per particleAssume one task per particle Task has particle’s position, velocity vectorTask has particle’s position, velocity vector IterationIteration

Get positions of all other particlesGet positions of all other particles Compute new position, velocityCompute new position, velocity

Page 47: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Gather

Page 48: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

All-gather

Page 49: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Complete Graph for All-gather

Page 50: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Hypercube for All-gather

Page 51: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Communication Time

p

pnp

p

np

i β

β )1(

log2

log

1

1-i −+=⎟⎟

⎞⎜⎜⎝

⎛+∑

=

Hypercube

Complete graph

p

pnp

pnp

β

β )1(

)1()/

)(1(−

+−=+−

Page 52: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Adding Data Input

Page 53: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Scatter

Page 54: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Scatter in log p Steps

12345678 56781234 56 12

7834

Page 55: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Summary: Task/channel Model

Parallel computationParallel computation Set of tasksSet of tasks Interactions through channelsInteractions through channels

Good designsGood designs Maximize local computationsMaximize local computations Minimize communicationsMinimize communications Scale upScale up

Page 56: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Summary: Design Steps

Partition computationPartition computation Agglomerate tasksAgglomerate tasks Map tasks to processorsMap tasks to processors GoalsGoals

Maximize processor utilizationMaximize processor utilization Minimize inter-processor communicationMinimize inter-processor communication

Page 57: Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies

Summary: Fundamental Algorithms

ReductionReduction Gather and scatterGather and scatter All-gatherAll-gather