12 Parallel Algorithm Scalabilityeero/PC/Lecture15.pdf12 Parallel Algorithm Scalability1 12.1 Isoefﬁciency Metric of Scalability For a given problem size, as we increase the number

290 Scalability 12.1 Isoefficiency Metric of Scalability

12 Parallel Algorithm Scalability1

12.1 Isoefficiency Metric of Scalability

• For a given problem size, as we increase the number of processing elements,the overall efficiency of the parallel system goes down with given algorthm.

• For some systems, the efficiency of a parallel system increases if the problemsize is increased while keeping the number of processing elements constant.

1Based on [1]


Variation of efficiency: (a) as the number of processing elements is increased fora given problem size; and (b) as the problem size is increased for a given number ofprocessing elements. The phenomenon illustrated in graph (b) is not common to allparallel systems

• What is the rate at which the problem size must increase with respect to thenumber of processing elements to keep the efficiency fixed?


• This rate determines the scalability of the system. The slower this rate, thebetter

• Before we formalize this rate, we define the problem size W as the asymptoticnumber of operations associated with the best serial algorithm to solve theproblem

• We can write parallel runtime as:

TP =W +To(W, p)

p

• The resulting expression for speedup is:

S =WTp

=W p

W +To(W, p)


• Finally, we write the expression for efficiency as

E =Sp

=W

W +To(W, p)

=1

1+To(W, p)/W

• For scalable parallel systems, efficiency can be maintained at a fixed value(between 0 and 1) if the ratio To/W is maintained at a constant value.

• For a desired value E of efficiency,

E =1

1+To(W, p)/WTo(W, p)

W=

1−EE

W =E

1−ETo(W, p).


• If K = E/(1–E) is a constant depending on the efficiency to be maintained,since To is a function of W and p, we have

W = KTo(W, p)

• The problem size W can usually be obtained as a function of p by algebraicmanipulations to keep efficiency constant.

• This function is called the isoefficiency function.

• This function determines the ease with which a parallel system can maintaina constant efficiency and hence achieve speedups increasing in proportion tothe number of processing elements


Example: Adding n numbers on p processors

• The overhead function for the problem of adding n numbers on p processingelements is approximately 2p log p:

The first stage of algorithm (summing n/p numbers locally) takes n/ptime

the second phase involves log p steps with communication and additionon each step. Assume single communication takes unit time, the time is2log p

TP =np+2log p

S =n

np +2log p

E =1

1+ 2p log pn


• Doing as before, but substituting To by 2p log p , we get

W = K2p log p

• Thus, the asymptotic isoefficiency function for this parallel system is Θ(p log p)

• If the number of processing elements is increased from p to p’, the problemsize (in this case, n ) must be increased by a factor of (p’ log p’)/(p log p) toget the same efficiency as on p processing elements

Efficiencies for different n and p:

n p = 1 p = 4 p = 8 p = 16 p = 32

64 1.0 0.80 0.57 0.33 0.17

192 1.0 0.92 0.80 0.60 0.38

320 1.0 0.95 0.87 0.71 0.50

512 1.0 0.97 0.91 0.80 0.62


Example 2: Let To = p3/2 + p3/4W 3/4

• Using only the first term of To in W = KTo(W, p), we get

W = K p3/2

• Using only the second term, yields the following relation between W and p:

W = K p3/4W 3/4

W 1/4 = K p3/4

W = K4 p3

• The larger of these two asymptotic rates determines the isoefficiency. This isgiven by Θ(p3)

298 Scalability 12.2 Cost-Optimality and the Isoefficiency Function

12.2 Cost-Optimality and the Isoefficiency Function

• A parallel system is cost-optimal if and only if

pTP = Θ(W )

• From this, we have:

W +To(W, p) = Θ(W )

To(W, p) = O(W )

W = Ω(To(W, p))

=> cost-optimal if and only if its over-head function does not asymptotically ex-ceed the problem size

• If we have an isoefficiency function f (p), then it follows that the relation W =Ω( f (p)) must be satisfied to ensure the cost-optimality of a parallel system asit is scaled up.


Lower Bound on the Isoefficiency Function

• For a problem consisting of W units of work, no more than W processing ele-ments can be used cost-optimally.

• The problem size must increase at least as fast as Θ(p) to maintain fixedefficiency; hence, Ω(p) is the asymptotic lower bound on the isoefficiencyfunction.

Degree of Concurrency and the Isoefficiency Function

• The maximum number of tasks that can be executed simultaneously at anytime in a parallel algorithm is called its degree of concurrency .

• If C(W ) is the degree of concurrency of a parallel algorithm, then for a prob-lem of size W , no more than C(W ) processing elements can be employedeffectively.


Degree of Concurrency and the Isoefficiency Function: Example

• Consider solving a system of equations in variables by using Gaussian elimi-nation (W = Θ(n3))

• The n variables must be eliminated one after the other, and eliminating eachvariable requires Θ(n2) computations.

• At most Θ(n2) processing elements can be kept busy at any time.

• Since W =Θ(n3) for this problem, the degree of concurrency C(W ) is Θ(W 2/3)

• Given p processing elements, the problem size should be at least Ω(p3/2) touse them all.

301 Graph Algorithms 12.2 Cost-Optimality and the Isoefficiency Function

Scalability of Parallel Graph Algorithms

12.1 Definitions and Representation

12.2 Minimum Spanning Tree: Prim’s Algorithm

12.3 Single-Source Shortest Paths: Dijkstra’s Algorithm

12.4 All-Pairs Shortest Paths

12.5 Connected Components

12.6 Algorithms for Sparse Graphs

302 Graph Algorithms 12.1 Definitions and Representation

12.1 Definitions and Representation

• An undirected graph G is a pair (V,E), where V is a finite set of points calledvertices and E is a finite set of edges

• An edge e ∈ E is an unordered pair (u,v), where u,v ∈V

• In a directed graph, the edge e is an ordered pair (u,v). An edge (u,v) isincident from vertex u and is incident to vertex v

• A path from a vertex v to a vertex u is a sequence 〈v0,v1,v2, . . . ,vk〉 of verticeswhere v0 = v, vk = u, and (vi,vi+1) ∈ E for i = 0,1, ...,k−1

• The length of a path is defined as the number of edges in the path

• An undirected graph is connected if every pair of vertices is connected by apath

• A forest is an acyclic graph, and a tree is a connected acyclic graph

• A graph that has weights associated with each edge is called a weightedgraph


• Graphs can be represented by their adjacency matrix or an edge (or vertex) list

• Adjacency matrices have a value ai, j = 1 if nodes i and j share an edge; 0otherwise. In case of a weighted graph, ai, j = wi, j, the weight of the edge

• The adjacency list representation of a graph G = (V,E) consists of an arrayAdj[1..|V |] of lists. Each list Adj[v] is a list of all vertices adjacent to v

• For a grapn with n nodes, adjacency matrices take Θ(n2) space and adjacencylist takes Θ(|E|) space.


• A spanning tree of an undirected graph G is a subgraph of G that is a treecontaining all the vertices of G

• In a weighted graph, the weight of a subgraph is the sum of the weights of theedges in the subgraph

• A minimum spanning tree (MST) for a weighted undirected graph is a span-ning tree with minimum weight

305 Graph Algorithms 12.2 Minimum Spanning Tree: Prim’s Algorithm

12.2 Minimum Spanning Tree: Prim’s Algorithm

• Prim’s algorithm for finding an MST is a greedy algorithm

• Start by selecting an arbitrary vertex, include it into the current MST

• Grow the current MST by inserting into it the vertex closest to one of the ver-tices already in current MST


procedure PRIM_MST(V , E , w , r )begin

VT := r ;d[r] := 0 ;for a l l v ∈ (V −VT ) do

i f edge (r,v) existsset d[v] := w(r,v) ;

elseset d[v] := ∞ ;

endifwhile VT 6=V dobegin

f ind a vertex u such thatd[u] := mind[v] | v ∈ (V −VT ) ;

VT :=VT ∪u ;for a l l v ∈ (V −VT ) do

d[v] := mind[v],w(u,v) ;endwhile

end PRIM_MST


Parallel formulation

• The algorithm works in n outer iter-ations - it is hard to execute theseiterations concurrently.

• The inner loop is relatively easy toparallelize. Let p be the number ofprocesses, and let n be the numberof vertices.

• The adjacency matrix is partitionedin a 1-D block fashion, with dis-tance vector d partitioned accord-ingly.

• In each step, a processor selectsthe locally closest node, followedby a global reduction to select glob-ally closest node.

• This node is inserted into MST, andthe choice broadcast to all proces-sors.

• Each processor updates its part ofthe d vector locally.


• The cost to select the minimum entry is O(n/p+ log p).

• The cost of a broadcast is O(log p).

• The cost of local updation of the d vector is O(n/p).

• The parallel time per iteration is O(n/p+ log p).

• The total parallel time O(n2/p+n log p):

Tp =

computation︷︸︸︷Θ

(n2

p

)+

communication︷︸︸︷Θ(n log p)

• Since sequential run time is W = Θ(n2):

S =Θ(n2)

Θ(n2/p)+Θ(n logn)

E =1

1+Θ((p log p)/n)


• => for cost-optimal parallel formulation (p log p)/n = O(1)

• => Prim’s algorithm can use only p = O(n/ logn) processes in this formulation

• The corresponding isoefficiency is O(p2 log2 p).

310 Graph Algorithms 12.3 Single-Source Shortest Paths: Dijkstra’s Algorithm

12.3 Single-Source Shortest Paths: Dijkstra’s Algorithm

• For a weighted graph G = (V,E,w), the single-source shortest paths problemis to find the shortest paths from a vertex v ∈V to all other vertices in V

• Dijkstra’s algorithm is similar to Prim’s algorithm. It maintains a set of nodes forwhich the shortest paths are known

• It grows this set based on the node closest to source using one of the nodes inthe current shortest path set

311 Graph Algorithms 12.3 Single-Source Shortest Paths: Dijkstra’s Algorithm

procedure DIJKSTRA_SINGLE_SOURCE_SP(V , E , w , s )begin

VT := s ;for a l l v ∈ (V −VT ) do

i f (s,v) existsset l[v] := w(s,v) ;

elseset l[v] := ∞ ;

endifwhile VT 6=V dobegin

f ind a vertex u such that l[u] := minl[v] | v ∈ (V −VT ) ;VT :=VT ∪u ;for a l l v ∈ (V −VT ) do

l[v] := minl[v], l[u]+w(u,v) ;endwhile

end DIJKSTRA_SINGLE_SOURCE_SPRun time of Dijkstra’s algorithm is Θ(n2)

312 Graph Algorithms 12.4 All-Pairs Shortest Paths

Parallel formulation

• Very similar to the parallel formulation of Prim’s algorithm for minimum span-ning trees

• The weighted adjacency matrix is partitioned using the 1-D block mapping

• Each process selects, locally, the node closest to the source, followed by aglobal reduction to select next node

• The node is broadcast to all processors and the l-vector updated.

• The parallel performance of Dijkstra’s algorithm is identical to that of Prim’salgorithm

12.4 All-Pairs Shortest Paths

• Given a weighted graph G(V,E,w), the all-pairs shortest paths problem is tofind the shortest paths between all pairs of vertices vi, v j ∈V .

• A number of algorithms are known for solving this problem.

Documents

12 Parallel Algorithm Scalabilityeero/PC/Lecture15.pdf12 Parallel Algorithm Scalability1 12.1 Isoefﬁciency Metric of Scalability For a given problem size, as we increase the number