Upload
others
View
20
Download
0
Embed Size (px)
Citation preview
290 Scalability 12.1 Isoefficiency Metric of Scalability
12 Parallel Algorithm Scalability1
12.1 Isoefficiency Metric of Scalability
• For a given problem size, as we increase the number of processing elements,the overall efficiency of the parallel system goes down with given algorthm.
• For some systems, the efficiency of a parallel system increases if the problemsize is increased while keeping the number of processing elements constant.
1Based on [1]
291 Scalability 12.1 Isoefficiency Metric of Scalability
Variation of efficiency: (a) as the number of processing elements is increased fora given problem size; and (b) as the problem size is increased for a given number ofprocessing elements. The phenomenon illustrated in graph (b) is not common to allparallel systems
• What is the rate at which the problem size must increase with respect to thenumber of processing elements to keep the efficiency fixed?
292 Scalability 12.1 Isoefficiency Metric of Scalability
• This rate determines the scalability of the system. The slower this rate, thebetter
• Before we formalize this rate, we define the problem size W as the asymptoticnumber of operations associated with the best serial algorithm to solve theproblem
• We can write parallel runtime as:
TP =W +To(W, p)
p
• The resulting expression for speedup is:
S =WTp
=W p
W +To(W, p)
293 Scalability 12.1 Isoefficiency Metric of Scalability
• Finally, we write the expression for efficiency as
E =Sp
=W
W +To(W, p)
=1
1+To(W, p)/W
• For scalable parallel systems, efficiency can be maintained at a fixed value(between 0 and 1) if the ratio To/W is maintained at a constant value.
• For a desired value E of efficiency,
E =1
1+To(W, p)/WTo(W, p)
W=
1−EE
W =E
1−ETo(W, p).
294 Scalability 12.1 Isoefficiency Metric of Scalability
• If K = E/(1–E) is a constant depending on the efficiency to be maintained,since To is a function of W and p, we have
W = KTo(W, p)
• The problem size W can usually be obtained as a function of p by algebraicmanipulations to keep efficiency constant.
• This function is called the isoefficiency function.
• This function determines the ease with which a parallel system can maintaina constant efficiency and hence achieve speedups increasing in proportion tothe number of processing elements
295 Scalability 12.1 Isoefficiency Metric of Scalability
Example: Adding n numbers on p processors
• The overhead function for the problem of adding n numbers on p processingelements is approximately 2p log p:
The first stage of algorithm (summing n/p numbers locally) takes n/ptime
the second phase involves log p steps with communication and additionon each step. Assume single communication takes unit time, the time is2log p
TP =np+2log p
S =n
np +2log p
E =1
1+ 2p log pn
296 Scalability 12.1 Isoefficiency Metric of Scalability
• Doing as before, but substituting To by 2p log p , we get
W = K2p log p
• Thus, the asymptotic isoefficiency function for this parallel system is Θ(p log p)
• If the number of processing elements is increased from p to p’, the problemsize (in this case, n ) must be increased by a factor of (p’ log p’)/(p log p) toget the same efficiency as on p processing elements
Efficiencies for different n and p:
n p = 1 p = 4 p = 8 p = 16 p = 32
64 1.0 0.80 0.57 0.33 0.17
192 1.0 0.92 0.80 0.60 0.38
320 1.0 0.95 0.87 0.71 0.50
512 1.0 0.97 0.91 0.80 0.62
297 Scalability 12.1 Isoefficiency Metric of Scalability
Example 2: Let To = p3/2 + p3/4W 3/4
• Using only the first term of To in W = KTo(W, p), we get
W = K p3/2
• Using only the second term, yields the following relation between W and p:
W = K p3/4W 3/4
W 1/4 = K p3/4
W = K4 p3
• The larger of these two asymptotic rates determines the isoefficiency. This isgiven by Θ(p3)
298 Scalability 12.2 Cost-Optimality and the Isoefficiency Function
12.2 Cost-Optimality and the Isoefficiency Function
• A parallel system is cost-optimal if and only if
pTP = Θ(W )
• From this, we have:
W +To(W, p) = Θ(W )
To(W, p) = O(W )
W = Ω(To(W, p))
=> cost-optimal if and only if its over-head function does not asymptotically ex-ceed the problem size
• If we have an isoefficiency function f (p), then it follows that the relation W =Ω( f (p)) must be satisfied to ensure the cost-optimality of a parallel system asit is scaled up.
299 Scalability 12.2 Cost-Optimality and the Isoefficiency Function
Lower Bound on the Isoefficiency Function
• For a problem consisting of W units of work, no more than W processing ele-ments can be used cost-optimally.
• The problem size must increase at least as fast as Θ(p) to maintain fixedefficiency; hence, Ω(p) is the asymptotic lower bound on the isoefficiencyfunction.
Degree of Concurrency and the Isoefficiency Function
• The maximum number of tasks that can be executed simultaneously at anytime in a parallel algorithm is called its degree of concurrency .
• If C(W ) is the degree of concurrency of a parallel algorithm, then for a prob-lem of size W , no more than C(W ) processing elements can be employedeffectively.
300 Scalability 12.2 Cost-Optimality and the Isoefficiency Function
Degree of Concurrency and the Isoefficiency Function: Example
• Consider solving a system of equations in variables by using Gaussian elimi-nation (W = Θ(n3))
• The n variables must be eliminated one after the other, and eliminating eachvariable requires Θ(n2) computations.
• At most Θ(n2) processing elements can be kept busy at any time.
• Since W =Θ(n3) for this problem, the degree of concurrency C(W ) is Θ(W 2/3)
• Given p processing elements, the problem size should be at least Ω(p3/2) touse them all.
301 Graph Algorithms 12.2 Cost-Optimality and the Isoefficiency Function
Scalability of Parallel Graph Algorithms
12.1 Definitions and Representation
12.2 Minimum Spanning Tree: Prim’s Algorithm
12.3 Single-Source Shortest Paths: Dijkstra’s Algorithm
12.4 All-Pairs Shortest Paths
12.5 Connected Components
12.6 Algorithms for Sparse Graphs
302 Graph Algorithms 12.1 Definitions and Representation
12.1 Definitions and Representation
• An undirected graph G is a pair (V,E), where V is a finite set of points calledvertices and E is a finite set of edges
• An edge e ∈ E is an unordered pair (u,v), where u,v ∈V
• In a directed graph, the edge e is an ordered pair (u,v). An edge (u,v) isincident from vertex u and is incident to vertex v
• A path from a vertex v to a vertex u is a sequence 〈v0,v1,v2, . . . ,vk〉 of verticeswhere v0 = v, vk = u, and (vi,vi+1) ∈ E for i = 0,1, ...,k−1
• The length of a path is defined as the number of edges in the path
• An undirected graph is connected if every pair of vertices is connected by apath
• A forest is an acyclic graph, and a tree is a connected acyclic graph
• A graph that has weights associated with each edge is called a weightedgraph
303 Graph Algorithms 12.1 Definitions and Representation
• Graphs can be represented by their adjacency matrix or an edge (or vertex) list
• Adjacency matrices have a value ai, j = 1 if nodes i and j share an edge; 0otherwise. In case of a weighted graph, ai, j = wi, j, the weight of the edge
• The adjacency list representation of a graph G = (V,E) consists of an arrayAdj[1..|V |] of lists. Each list Adj[v] is a list of all vertices adjacent to v
• For a grapn with n nodes, adjacency matrices take Θ(n2) space and adjacencylist takes Θ(|E|) space.
304 Graph Algorithms 12.1 Definitions and Representation
• A spanning tree of an undirected graph G is a subgraph of G that is a treecontaining all the vertices of G
• In a weighted graph, the weight of a subgraph is the sum of the weights of theedges in the subgraph
• A minimum spanning tree (MST) for a weighted undirected graph is a span-ning tree with minimum weight
305 Graph Algorithms 12.2 Minimum Spanning Tree: Prim’s Algorithm
12.2 Minimum Spanning Tree: Prim’s Algorithm
• Prim’s algorithm for finding an MST is a greedy algorithm
• Start by selecting an arbitrary vertex, include it into the current MST
• Grow the current MST by inserting into it the vertex closest to one of the ver-tices already in current MST
306 Graph Algorithms 12.2 Minimum Spanning Tree: Prim’s Algorithm
procedure PRIM_MST(V , E , w , r )begin
VT := r ;d[r] := 0 ;for a l l v ∈ (V −VT ) do
i f edge (r,v) existsset d[v] := w(r,v) ;
elseset d[v] := ∞ ;
endifwhile VT 6=V dobegin
f ind a vertex u such thatd[u] := mind[v] | v ∈ (V −VT ) ;
VT :=VT ∪u ;for a l l v ∈ (V −VT ) do
d[v] := mind[v],w(u,v) ;endwhile
end PRIM_MST
307 Graph Algorithms 12.2 Minimum Spanning Tree: Prim’s Algorithm
Parallel formulation
• The algorithm works in n outer iter-ations - it is hard to execute theseiterations concurrently.
• The inner loop is relatively easy toparallelize. Let p be the number ofprocesses, and let n be the numberof vertices.
• The adjacency matrix is partitionedin a 1-D block fashion, with dis-tance vector d partitioned accord-ingly.
• In each step, a processor selectsthe locally closest node, followedby a global reduction to select glob-ally closest node.
• This node is inserted into MST, andthe choice broadcast to all proces-sors.
• Each processor updates its part ofthe d vector locally.
308 Graph Algorithms 12.2 Minimum Spanning Tree: Prim’s Algorithm
• The cost to select the minimum entry is O(n/p+ log p).
• The cost of a broadcast is O(log p).
• The cost of local updation of the d vector is O(n/p).
• The parallel time per iteration is O(n/p+ log p).
• The total parallel time O(n2/p+n log p):
Tp =
computation︷ ︸︸ ︷Θ
(n2
p
)+
communication︷ ︸︸ ︷Θ(n log p)
• Since sequential run time is W = Θ(n2):
S =Θ(n2)
Θ(n2/p)+Θ(n logn)
E =1
1+Θ((p log p)/n)
309 Graph Algorithms 12.2 Minimum Spanning Tree: Prim’s Algorithm
• => for cost-optimal parallel formulation (p log p)/n = O(1)
• => Prim’s algorithm can use only p = O(n/ logn) processes in this formulation
• The corresponding isoefficiency is O(p2 log2 p).
310 Graph Algorithms 12.3 Single-Source Shortest Paths: Dijkstra’s Algorithm
12.3 Single-Source Shortest Paths: Dijkstra’s Algorithm
• For a weighted graph G = (V,E,w), the single-source shortest paths problemis to find the shortest paths from a vertex v ∈V to all other vertices in V
• Dijkstra’s algorithm is similar to Prim’s algorithm. It maintains a set of nodes forwhich the shortest paths are known
• It grows this set based on the node closest to source using one of the nodes inthe current shortest path set
311 Graph Algorithms 12.3 Single-Source Shortest Paths: Dijkstra’s Algorithm
procedure DIJKSTRA_SINGLE_SOURCE_SP(V , E , w , s )begin
VT := s ;for a l l v ∈ (V −VT ) do
i f (s,v) existsset l[v] := w(s,v) ;
elseset l[v] := ∞ ;
endifwhile VT 6=V dobegin
f ind a vertex u such that l[u] := minl[v] | v ∈ (V −VT ) ;VT :=VT ∪u ;for a l l v ∈ (V −VT ) do
l[v] := minl[v], l[u]+w(u,v) ;endwhile
end DIJKSTRA_SINGLE_SOURCE_SPRun time of Dijkstra’s algorithm is Θ(n2)
312 Graph Algorithms 12.4 All-Pairs Shortest Paths
Parallel formulation
• Very similar to the parallel formulation of Prim’s algorithm for minimum span-ning trees
• The weighted adjacency matrix is partitioned using the 1-D block mapping
• Each process selects, locally, the node closest to the source, followed by aglobal reduction to select next node
• The node is broadcast to all processors and the l-vector updated.
• The parallel performance of Dijkstra’s algorithm is identical to that of Prim’salgorithm
12.4 All-Pairs Shortest Paths
• Given a weighted graph G(V,E,w), the all-pairs shortest paths problem is tofind the shortest paths between all pairs of vertices vi, v j ∈V .
• A number of algorithms are known for solving this problem.