75
Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Embed Size (px)

Citation preview

Page 1: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Principles of Parallel Algorithm Design

Prof. Dr. Cevdet Aykanat

Bilkent Üniversitesi

Bilgisayar Mühendisliği Bölümü

Page 2: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

• Identifying concurrent tasks

• Mapping tasks onto multiple processes

• Distributing input, output, intermediate data

• Managing access to shared data

• Synchronizing processors

Principles of Parallel Algorithm Design

Page 3: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

• several choices for each step

• relatively a few combinations lead to a good parallel algorithm

• different choices yield best performance on

– different parallel architectures– different parallel programming paradigms

Principles of Parallel Algorithm Design

•Identifying concurrent tasks •Mapping tasks onto multiple processes•Distributing input, output, intermediate data•Managing access to shared data•Synchronizing processors

Page 4: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Decomposition, Tasks

• decomosition: – dividing a computation into smaller parts– some or all parts can be executed concurrently

• atomic task– user defined– indivisible units of computation– same size or different size

Page 5: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 6: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Task Dependence Graphs (TDG) • directed acyclic graph• nodes : atomic tasks• directed edges : dependencies

– some tasks use data produced by other tasks

• TDG can be weighted:– node wgt: amount of computation– edge wgt: amount of data

• multiple ways of expressing certain computations– different ways of arranging computations – lead to different TDGs

Page 7: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 8: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 9: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 10: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Granularity, Concurrency

• granularity: number (#) and size of tasks – fine grain : large # of small tasks– coarse grain : small # of large tas

• degree of concurrency (DoC):– # of tasks that can be executed simultaneously

• max DoC : max DoC at any given time– tree TDGs: max DoC = # of leaves (usually)

• avg DoC : DoC over entire duration

Page 11: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 12: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Degree of Concurrency• depends on granularity

– finer task granularity : larger DoC– bound on fine granularity of a decomposition

• depends on shape of TDG– shallow and wide TDG : larger DoC– deep and thin TDG : smaller DoC– critical path:

• longest directed path between a start node and a finish node

– critical path length = sum of wgts along the path– avg DoC = total work / critical path length

Page 13: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 14: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Task Interaction Graph (TIG)

• tasks share input, output or intermediate data

• interactions among independent tasks of aTDG

• TIG: pattern of interactions among tasks– node: task– edge: connects tasks that interact with each other

• TIG can be weighted:– node wgt: amount of computation– edge wgt: amount of interaction

Page 15: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 16: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Processes and Mapping• process vs processor:

– logical computing agents that perform tasks

• mapping: assigning tasks to processes• conflicting goals in a good mapping

– maximize concurrency• map independent tasks to different processes

– minimize idle time / interaction overhead • map tasks along critical path to same process• map tasks with high interaction to same processes• e.g., map all tasks to the same process

Page 17: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 18: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Decomposition Techiques

• recursive decomposition

• data decomposition

• explaloratory decomposition

• speculative decomposition

Page 19: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Recursive Decomposition

• divide-and-conquer strategy → natural concurrency• divide problem into a set of independent subproblems• conquer: recursively solve each subproblem • combine: solns to subproblems to a soln of problem• if sequential algorithm is not based on DAC

– restructure computation as a DAC algorithm

– recursive decomposition to extract concurrency

– e.g., finding minimum of an array A of n numbers

Page 20: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 21: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 22: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 23: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Data Decomposition• partition/decompose computational data domain• use this partition to induce task decomposition

– tasks: similar operations on different data parts

• partitioning output data– each output can be computed independently as a fn of input

– example: block matrix multiplication

– data decomposition may not lead to unique task decompsition

– another example: computing itemset frequencies• input: transactions & output: itemset frequencies

Page 24: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 25: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 26: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 27: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 28: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Data Decomposition• partitioning input data

– may not be possible desirable to partition output data• e.g., finding min, sum of a set of numbers, sorting

– a task created for each part of the input data

– task: all computations that can be done using local data

– a combine step may be needed to combine results of tasks

– example: finding the sum of an array A of n numbers

– example: computing itemset frequencies

• partitioning both output and input data– output data partitioning is feasible

– partitioning of input data offers additional concurrency

– example: computing itemset frequencies

Page 29: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Data Decomposition

• partitioning intermediate data– multistage computations

• partioning input or output data of an intermediate stage

– may lead to higher concurrency

– some restructuring of the algorithm may be needed

– example: block matrix multiplication

• owner computes rule– each part performs all computations involving data it owns

– input: perform all computations that can be done using local data

– ouput: compute all data in the partition

Page 30: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 31: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 32: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 33: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Other Decomposition Techniques

• exploratory decomposition – search of a configuration space for a solution– partition the search space into smaller parts– search each part concurrently– total parallel work <, =, > total serial work– example: 15-puzzle problem

• speculative decomposition

• hybrid decompositions– computation structured into multiple stages– may apply different decompositions in different stages – examples: finding min of an array and quicksort– data decomposition then recursive decomposition

Page 34: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 35: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 36: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 37: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 38: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 39: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Characteristics of Tasks

• task generation: static vs dynamic task generation– static: all tasks are known priori to execution of algorithm

• data decomposition: matrix multiplication • recursive decomposition: finding min of an array

– dynamic: actual tasks and TPG/TIG not available a priori• rules, guideliness governing task generation may be known • recursive decomposition: quicksort• another example: ray tracing

• task sizes: uniform vs non-uniform– complexity of mapping depends on this– tasks in matrix multiplication: uniform– tasks in quicksort: non-uniform

Page 40: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Characteristics of Tasks

• knowledge of task sizes– can be used in mapping– known: tasks in decompositions for matrix multiplication– unknown: tasks in 15-puzzle problem

• do not know a priori how many moves will lead to a soln.

• size of data associated with tasks– associated data must be available to the process – size and location of the associated data– consider data migration overhead in the mapping

Page 41: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Characteristics of Inter-Task Interactions • static vs dynamic

– static: pattern and timing of interactions known a priori – static interaction: decompositions for matrix multiplication– message-passing paradigm (MPP):

• active involvement of both interacting tasks• static interactions easy to program• dynamic interactions harder to program • tasks assigned additional synchronization and polling responsibilities

– shared-address-space (SASP): can handle both equally easily• regular vs irregular (spatial structure)

– regular: structure that can be exploited for efficient implement.• structured/curvilinear grids (implicit connectivity)• image dithering (example)

– irregular: no such regular pattern exists• unstructured grids (connectivity maintained explicitly)• SpMxV (sparse matrix vector multiplication)

– irregular and dynamic interactions harder to handle in MPP

Page 42: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 43: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Characteristics of Inter-Task Interactions • read-only vs read-write

– read-only: tasks require read-only access to shared data– example: decompositions for matrix multiplication– read-write: tasks need to read and write on shared data– example: heuristic search for 15-puzzle problem

• one-way vs two-way– 2-way: data/work needed by a task explicitliy supplied by another– usually involve predefined producer and consumer– 1-way: only one of a pair of comm. tasks initiates & completes interaction– read-only → 1-way & read-write → either 1-way or 2-way– SASP can handle both interactions equally easily– MPP cannot handle 1-way interaction directly

• source of data should explicitly send it to the recipient • static 1-way: easily converted to 2-way via program restructuring• dynamic 1-way: nontrivial program structuring for converting to 2-way

– polling: task checks for pending requests from others at regular intervals

Page 44: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Mapping Techniques

• minimize overheads of parallel task execution– overhead: inter-process interaction– overhead: process idle time (uneven load distribution)

• load balancing– balanced aggregate load: necessary but not sufficient– computations & interactions well balanced at each stage– example: 12-task decomposition (9-12 depends on 1-8)

Page 45: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 46: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Static vs Dynamic Mapping• static: distribute tasks prior to execution

– static task generation: either static or dynamic mapping– good mapping: knowledge of task sizes, data sizes, TIG– non-trivial problem (usually NP-hard)– task sizes known but non-uniform

• even if no TDG/TIG → number partitioning problem

• dynamic: distribute workload during execution– dynamic task generation: dynamic mapping– task sizes unkown: dynamic mapping more effective– large data size: dynamic mapping costly (in MPP)

Page 47: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Static-Mapping Schemes • mapping based on data partitioning

– data partitioning induces a decomposition– partitioning selected with final mapping in mind

• i.e., p-way data decomposition

– dense arrays – sparse data structures, graphs (FE meshes)

• mapping based on task partitioning – task dependence graphs, task interaction graphs

• hierarchical partitioning – hybrid decomposition and mapping techniques

Page 48: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Array Distribution Schemes

• block distributions: spatial locality of interaction– each process receives a contigous block of entries– 1D: each part contains a block of consecutive rows

• i.e., kth part contains rows kn/p ... (k+1)n/p-1

– 2D: checkerboard partitioning– higher dimensional distributions

• higher degree of concurrency

• less inter-process interaction

• example: matrix multiplication

Page 49: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 50: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 51: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 52: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 53: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 54: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 55: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 56: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Array Distribution Schemes • cyclic distribution

– amount of work differs for different matrix entries• examples: ray casting, dense LU factorization• block distribution leads to load imbalance

– all processes have tasks from all parts of the matrix– good load balance, but complete loss of locality

• block-cyclic distribution– partition array into more than p blocks– map blocks to processes in a round-robin (scattered) manner

• randomized block distribution– when the distribution of work has some special pattern

• adaptive 2D array partitionings– rectilinear, jagged, orthogonal bisection

Page 57: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 58: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 59: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 60: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 61: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 62: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 63: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 64: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 65: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 66: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 67: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 68: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Dynamic Mapping Schemes

• centralized schemes– all tasks maintained in a common pool or by a process– idle processes take task(s) from central pool or master process– easier to implement – limited scalability: central pool/process becomes a bottleneck – chunk scheduling: idle processes get group of tasks

• danger of load imbalance due to large chunk sizes• decrease chunk size as program progresses

– e.g., sorting entries in each row of a matrix• non-uniform tasks & unknown task sizes

– e.g., image-space parallel ray casting

Page 69: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Dynamic Mapping Schemes

• distributed schemes– tasks are distributed among processes– more scalable (no bottleneck)– critical parameters of distributed load balancing

• how sending and receiving processes ard paired?• who initiates the work transfer: sender or receiver?• how much work transferred in each exchange?• when is he work transfer performed?

• suitability to parallel architectures– both can be implemented in both SAS and MP paradigms– dynamic schemes require movement of tasks– computational granularity of tasks should be high in MP systems

Page 70: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Methods for Interaction Overheads• factors:

– volume and frequency of interaction– spatial and temporal pattern of interactions

• maximizing data locality– minimize volume of data exchange

• minimize overall volume of shared data • similar to maximizing temporal data locality

– minimize frequency of interaction• high startup cost associated with each interaction• restructure algorithm: shared data accessed in large pieces• similar to increasing spatial locality of data access

• minimizing contention and hot spots– multiple tasks try to access same resource concurrently

• multiple simultaneous access to same memory block/bank • multiple processes sending messages to same process at the same time

Page 71: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Methods for Interaction Overheads• minimizing contention and hot spots

– multiple tasks try to access same resource concurrently• multiple simultaneous access to same memory block/bank

• multiple processes sending messages to same process simult.

– e.g., matrix multiplication based on 2D partitioning

• overlapping computations with interactions– early initiation of an interaction– support from programming paradigm, OS, hardware– MP: non-blocking message-passing primitives

Page 72: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Methods for Interaction Overheads

• replicating data or computation – replicating frequently accessed read-only shared data– MP paradigm benefits more from data replication– replicated computation for shared intermediate results

• using optimized collective interaction operations– usually use available implementations (e.g., by MPI)– sometimes, it may be better to write your own procedure

• overlapping interactions with other interactions– example: one-to-all broadcast

Page 73: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü
Page 74: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Parallel Algorithm Models• data-parallel model

– data parallelism: identicial operations applied concurrently on different data items

• task graph model– task parallelism: independent tasks in a TDG– quicksort, sparse matrix factorization

• work-pool or task-pool model– dynamic mapping of tasks onto processes – mapping may be centralized or distributed

Page 75: Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü

Parallel Algorithm Models• master-slave or manager-worker model

– master process generates work & allocates to worker processes

• pipeline or producer-consumer model– stream parallelism: execution of diff. programs on a data stream

– each process in the pipeline: • consumer of the sequence of data items for the preceeding process

• producer of data for the process following in the pipeline

– pipeline may not be a linear chain (it can be a DAG)

• hybrid models– multiple models applied hierarchically

– multiple models applied sequentially to different stages