Module assignment in distributed systems - CESGcesg.tamu.edu/wp-content/uploads/2012/02/76.-NODDLE...4.1 Assignment Graphs for Two-Processors Systems 21 4.2 Assignment Graphs for Systems

SICE UNIVERSITY

NODDLE ASSIGNMENT IN DISTRIBUTED SYSTEMS

by

MI LU

A THESIS SUBMITTED

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF SCIENCE

APPROVED. THESIS COMMITTEE

James B. Sinclair,

Assistant Professor of Computer Science in

tne Department of Electrical Engineering, Chairman Z"*'

r wests

e u

Peter Varman,

Assistant Professor of Computer Science in the Department of Electrical Engineering

J.^Jcooert Jump, Professor of Computer Science in the Department of Electrical Engineering

HOUSTON. TEXAS

APRIL 1984

3 1272 00289 0059

ABSTRACT

The problem of finding an optimal assignment of a modular program

for n processors in a distributed system is studied. We characterize

the distributed programs by Stone's graph model» and attempt to

find an assignment of modules to processors which minimizes the sum of

module execution costs and intermodule communication costs. The prob¬

lem is NP-complete for more than three processors. We first show how to

identify all modules which must be assigned to a particular processor

under any optimal assignment. This usually results in a significant

reduction in the complexity of the optimal assignment problem. We also

present a heuristic algorithm for finding assignments and experimentally

verify that it almost always finds an optimal assignment.

ii

ACKNOWLEDGEMENT

I sincerely thank Dr. J. h, Sinclair, my advisor, for the chance he

gave me to study under his guidance, for his continual encouragement,

support, enlightenment and advice in my research work and all aspects of

graduate study,

I am grateful to Dr, J, R« Jump and Dr, P, Varman for serving on my

thesis committee,

I would like to extend a special thanks to Rick Covington for his

friendship and help,

I want to express my appreciation to all the professors and facul¬

ties of Electrical Engineering Department for making my study and stay

at Rice University a very valuable and pleasurable experience.

This research is supported principally by the National Science

Foundation under Grant MCS80-04107

iii

TABLE OF CONTENTS

I. INTRODUCTION 1

II. DISTRIBUTED SYSTEMS MODEL 5

III. A SURVEY OF THE RECENT RESEARCH 9

IV. ASSIGNMENT GRAPHS 21

4.1 Assignment Graphs for Two-Processors Systems 21

4.2 Assignment Graphs for Systems with

More than Two Processors 28

V. SEARCH SPACE REDUCTION 33

5.1 The Reduction Theorem 33

5.2 Assignment Problem Simplification by

P-reductions 44

VI. HEURISTIC ALGORITHM FOR MULTIPROCESSOR ASSIGNMENT 53

6.1 G-H Tree 53

6.2 Heuristic Algorithm to Find Near-Optimal

n-Processor Assignment 61

6.3 Experimental Results 67

VII. SUMMARY AND CONCLUSIONS 70

7.1 Summary of Results 70

7.2 Suggestions for Future Research 72

REFERENCES 77

APPENDIX 80

CHAPTER I

INTRODUCTION

As the field of computer networking matures and becomes more

sophisticated» distributed processing has received considerable atten¬

tion. By '’distributed" we mean that a computation can simultaneously

execute on different processors in the system. The modularity» rlexi-

bility» and reliability of distributed processing makes it attractive to

many types of users» and several distributed processing systems have

been designed and implemented in recent years. Distributed processing

applications range from large data base installations where processing

load is distributed for organizational efficiency to high-speed signal

processing systems where extremely fast processing must be performed in

a real-time environment.

A distributed system has several processors and a communication

subnetwork. Processors exchange data and control information through

the communication subnetwork. A program may be partitioned into several

parts called modules. Modules can be executed on the same processor or

may be assigned to different processors. A problem of increasing

interest is to determine how to assign the modules of a program to the

various processors in order to optimize the performance of the system»

1

2

according to some appropriate performance measure*

Executing a module of a program has associated with it a cost of

execution for each processor in the system. It may be advantageous to

execute a module on a particular processor because of the availability

of a faster arithmetic component» a particular data base» higher speed

memory or other resources.

Modules may exchange information during program execution. If two

modules which communicate with one another are assigned to different

processors» the information that they exchange will be transmitted over

the communication subnetwork» and the transmission will incur a communi¬

cation cost. An optimal assignment minimizes the sum of the module exe¬

cution costs and intermodule communication costs.

When the system consists of only two processors» we can efficiently

find one optimal assignment by the construction of an assignment graph

and the application of a max-flow min-cut algorithm[l]. Extension of

this approach to three or more processors does not appear to be feasi¬

ble. Previous research shows that the efficient solution of finding an

absolute optimal assignment of program modules to n (n>3) processors

does not exist[2]. Since many real systems of interest have more than

two processors» and the number of possible assignments grows exponen¬

tially with the number of modules» it is important to be able to effi¬

ciently select an assignment that has a cost that is nearly minimal» at

3

least most of the time* In this thesis» we examine the problem of

reducing the searching space for optimal assignment and efficiently

finding near optimal (near minimum cost) assignment* Ve can find an

algorithm by which the complexity of the assignment problem is greatly

reduced* We can also present an algorithm which solves the problem for

near-optimal solution. Experimental results show that the error of the

heuristic solution compared to the real-optimal solution is quite small*

The remainder of the thesis consists of six parts. Chapter II

explains the model of a distributed system that we will use» including

the relevant aspects of the processors» communication subnetwork» and

programs. We also give a precise definition of the optimal assignment

problem.

Chapter III presents a brief survey of previous research in the

general problem of optimal assignments in distributed systems* We

describe a method for solving the assignment problem for two processors

which provides a basis for the approach we use for the n>2 processor

case» and we outline previous work dealing with systems containing more

than two processors* This work can be categorized into a few

approaches: one of them is based on an assignment graph models and

another on shortest path approaches.

Chapter IV describes many of the assumptions and definitions used

in the remainder of this thesis* It also describes the construction of

4

the assignment graphs for both two-processor systems and for systems

with more than two processors.

In Chapter V we present a reduction theorem which permits simplifi¬

cation of the optimal assignment problem in many cases. We give exam¬

ples of the application of this result. These examples illustrate the

effectiveness of this approach in reducing the complexity of finding

solutions to the optimal assignment problem for n>2-processor systems.

Even with the aid of the reduction theorem, we cannot guarantee

that the number of possible assignments can be made small enough to

allow an exhaustive search for the optimal assignment. In Chapter VI we

present a heuristic algorithm for finding an assignment when the number

of processors is more than 2. Based on the idea of a Gomory-Hu

tree[3], the algorithm efficiently finds assignments which are experi¬

mentally shown to be almost always optimal, and nearly optimal in the

remaining cases.

Chapter VII presents a summary of our main results. We also dis¬

cuss some potentially rewarding extensions to the methods we developed.

CHAPTER II

DISTRIBUTED SYSTEMS MODEL

In this chapter we describe a model tor a distributed system. We

give more precise definitions to the program module# assignment problem

and cost function in the distributed system.

In our model a distributed system contains several programmable

processors# and a single program that can be distributed over two or

more processors for execution. The site of program activity shifts from

one processor to the other dynamically in time as the program being exe¬

cuted.

A single program consists of a set of modules. The modules should

be viewed as program segments which either contain executable instruc¬

tions plus local data or contain global data accessible to other seg¬

ments. The modules are free to be assigned to any one of the proces¬

sors# taking advantage of specific efficiencies of some processors in

executing some program modules. The modules executed on different pro¬

cessors may communicate with one another during an execution of the pro¬

gram.

5

6

Some modules have a fixed assignment because these modules depend

on capabilities or resources that are available at only one processor.

The facilities might be high-speed arithmetic unit, access to a particu¬

lar data base, the need for a large high-speed memory, access to a

specific peripneral device, or any other facility that is associated

with only one processor.

Each processor is an independent computer with full memory, con¬

trol. and arithmetic capability. Two processors may both be multipro-

grammed. The processors are connected through a communication subnet¬

work to create a distributed processing system.

Each assignment of modules to processors has an associated cost.

In our model, the total assignment cost has two components. The first

component is the cost associated with module executions. It is assumed,

in general, to depend on the processor to which that module is assigned

and the amount of computation performed by that module. The second com¬

ponent is the cost of intermodule communication between pairs of modules

when they are resident on different processors and transmit information

through the communication subnetwork. Ve assume that the cost of an

intermodule transfer is zero when the transfer is made between two

modules assigned to the same computer.

The cost of an assignment of modules to processors is the sum of

the module execution costs for all modules plus the sum of intermodule

7

communication costs for all pairs of modules that are not coresident.

An optimal assignment for the program is an assignment with minimum

cost. The assignment problem is to find an assignment of modules to

processors that minimizes total cost. Modules tend to be assigned to

processors on which their execution costs are lower, while at the same

time pairs of modules which communicate heavily tend to be assigned to

the same processor.

The cost function can be either the program's elapsed run time or a

dollar cost. If the cost function is elapsed run time, we will insist

that program execution be serial; that is. even though there are several

computers in the system, two processors may not concurrently execute

modules of the same program. Program activity can shift bach and forth

between different computers, but at any given time only one module of a

given program is in execution.

We can define the cost function in precise terms. Let an assign¬

ment A: M -> P be a function that maps the set of program modules M into

the set of processors P on which the modules may be executed. If (i.h)

is in A, then module i will be executed on processor h. e. . is the i» h

cost of executing module i on processor h. If module i communicates

with module j and A(i) + A(j), they incur a communication cost t. .. i* J

Suppose there are m modules in the program. The total execution cost E

is given by

8

E “ 2 ei h i=l ith

The total communication cost T is given by

m m 1 = 5 5

i j=i+l i.j

A(i)M(j)

Thus the total assignment cost C is defined by

C = E + T

The assignment problem is to find an assignment

assignments such that

opt from all possible k

C(A .) = min( CU,), C(A.), .... C(A.) ) opt 12 k

In the next chapter, we describe previous research on solving the

assignment problems. These results deal with finding an optimal assign¬

ment in a two-processor system as well as in an n>2-processor system.

CHAPTER III

A SURVEY OF THE RECENT RESEARCH

In this chapter ve present a survey of previous research on the

problem of finding optimal assignments in distributed systems. We first

describe the method of solving the assignment problem for two proces¬

sors. as well as certain extensions based on this method. Then we dis¬

cuss the published results dealing with finding optimal or near optimal

assignments in systems with n>2 processors.

The first results in this area dealt with two-processor systems and

were based on a graph model due to Stoned]. This model was later

extended in several respects. Stone[4], Sinclair[5]. and Gusfield[6]

studied the problem of varying system parameters that influence assign¬

ment costs. These parameters include the load on one.or both processors

and the channel traffic in a broadcast communication subnetwork. Rao,

Stone and Hu[7], Gonsalves[8] and Sinclair[9] used the same graph model

in attempting to solve the optimal assignment problem with the con¬

straints of limited memory in one of two-processor distributed system or

limited bandwidth in a broadcast network. SinclairdO] and Stone and

Bokhari[2] solved the dynamic assignment problem for two processors.

9

10

under two somewhat different cost criteria. They also considered the

case of more than two processors, and used shortest path algorithms to

find optimal assignments.

Stone made use of a max-flow min-cut algorithm in finding the

optimal program module-to-processor assignments to minimize the cost of

executing programs in a distributed fashion[l]. He showed that the

modules of a program may be assigned to the processors of a distributed

computer system so as to minimize the overall cost including module exe¬

cution cost and intermodule communication cost. The former is the cost

of running the individual modules on their assigned processors and the

latter is the cost of communicating between modules assigned to dif¬

ferent processors. He constructed a graph model for the two-processor

problem which represents processors and modules as nodes and the module

execution cost and intermodule communication costs as weights on the

undirected edges connecting the nodes of the graph. He showed that a

minimal weight cutset of edges which disconnects the graph into two sub¬

graphs corresponds an optimal assignment of program modules, where the

weight of the cutset is the sum of the weights carried by all the edges

in the cutset. Treating the two-processor assignment problem as a com¬

modity llow problem with suitable modifications, he found an optimal

assignment by using a max-flow min-cut algorithm. He also described the

construction of a processor tlow graph for the n-processor problem but

was unable to efficiently solve for an n-processor optimal assignment,

when n>2. Stone's algorithm is presented in detail in the next chapter.

11

Stone also examined the sequence of static optimal assignments

found as the load on one processor is held fixed and the load on the

other is varied» and proved the existence of a critical load factor for

each program modnle[4]. He assumed that in a two-processor system the

modules of a distributed program are free to move from processor to pro¬

cessor during the course of a computation. A module can be specified

prior to the start of execution as being located on one or the other of

the processors, and can be reassigned without any penalty after program

execution has begun. He found that for every program module M there

exists a critical load factor f^ such that when the load on the proces¬

sor with variable load is below f^, H is assigned to that processor by

an optimal assignment, and when the load on that processor is above f^.

the optimal assignment places M on the other processor. Thus, for sys¬

tems in which a single processor load factor is the only parameter that

varies, one can dynamically assign modules in an optimal fashion by cal¬

culating all critical load factors ahead of time and comparing the com¬

puted load factors against the actual load factors experienced at run

time. This opens the possibility of doing optimal assignments in real¬

time.

Sinclair analyzed the problem of processors connected to a broad¬

cast communication channel[5]. Communication costs are assumed to be a

function of the amount of information to be transmitted and the loading

on the network which causes access delays. These delays are caused by

the total communication traffic in the system. Sinclair was able to

12

find a minimal sequence of optimal assignments for a given program as

the average access delay increases from 0. The vaines of average delays

at which optimal assignments must change are called critical delays.

The algorithm requires 0(q) optimal assignment computations, where q is

the number of critical delays. There are no restrictions on the number

of processors, but the difficulty of tinding an optimal assignment for

more than two processors limits the algorithm's utility, except in those

cases referred to belcw for which efficient solutions to the optimal

assignment problem for more than two processors are known.

Gusfield recently applied a general method called parametric com¬

binatorial computing to the problem of finding optimal assignment in

two-processor systems when the loads on processors are independent and

both may vary[6]. This method is applied to efficiently find optimal

assignments for all possible combinations of processor loads,

represented by two time-varying parameters X^ and X^. The solutions are

represented by the faces of a convex polygon over the bounded (X^, X^)

plane, and the polygon can be constructed in O(n^) time, where n is the

number of modules. He also showed that for the two-processor problem

with the load on one processor fixed and the load on the other varying

as a function of the parameter X, the same method can be used to find a

minimal set of optimal assignments covering the entire bounded X line,

and the costs of these assignment form a piecewise linear concave func¬

tion of X with each linear segment corresponding to an optimal assign¬

ment

13

Rao, Stone and Hu considered the problem of minimum cost assignment

of program modules to two processors when one processor has limited

memory[7]. They first constructed a Gomory-Hu tree based on network

flow theory. A Gomory-Hu tree generated from an n-node network has n

nodes and n-1 edges. The value on each edge is found by solving a flow

problem in a network equal to or smaller in size than the original» and

the Gomory-Hu tree has the property that it is tlow-equivalent to the

original graph. (Two networks with the same node set N but different

edges and/or edge weights are said to be flow equivalent if for any pair

of nodes z and y in N, the maximum flow between z and y are the same in

both networks.) The maximum flow between two nodes z and y in an

undirected tree is just the weight of the minimum weighted edge in the

(unique) path between z and y. Furthermore» the Gomory-Hu tree has the

property that if P and P are the sets of nodes reachable from z and y»

respectively» in the tree when the edge with minimum weight in the path

from z to y is removed» the P and P are the nodes reachable from z and y

in the original graph if we remove all of the edges in some minimal

weight cutset separating z and y. This means that we can find an

optimal two-processor assignment in the Gomory-Hu tree. The Gomory-Hu

tree can be used to obtain some information about modules which should

be assigned as a group in a memory constrained system» as well as res¬

trictions on the assignment options.

Rao» et. al.» also constructed an inclusive cuts graph which speci¬

fies a partial order for searching for a minimum feasible cutset. A

14

feasible cutset is one which corresponds to an assignment satisfying the

memory constraint, and a minimum feasible cntset is a feasible cutset

with minimum weight. Use of the Gomory-Hn tree or the inclusive cut

graph, which corresponds to a minimum cost assignment satisfying the

memory constraint, can lead to substantial reductions of the size of the

problem.

Gonsalves studied the same problem in his thesis using heuristic

approaches[8]. He presented two polynomial-time algorithms for two-

processor scheduling with limited memory, and demonstrated experimen¬

tally that they can be successfully applied to this problem. He also

showed the reduction in complexity possible using the inclusive cut

graph for constant degree graphs.

Sinclair considered the problem of finding an optimal (minimum

cost) assignment for a distributed task in a computer network with a

broadcast communication subnetwork[9]. For broadcast networks in which

the channel is a critical resource, the optimization of a distributed

task assignment should consider not only the total task cost but the

cost in terms of channel utilization as well. He described a method for

determining an optimal assignment to two processors with minimum channel

utilization. The desired assignment with minimum communication cost can

be found efficiently for the two-processor case by performing a reduc¬

tion of the processor flow graph, relabeling the edges of the reduced

graph and applying a max-flow min-cut algorithm to the reduced processor

15

flow graph. He also considered the case when more than two processors

are involved and the module intercommunication graph is a tree. He

described an algorithm for constructing an assignment graph and reducing

it to a graph in which each spanning tree corresponds to an minimum cost

assignment. After reassigning weights to the nodes of this graph, one

can apply a dynamic programming algorithm to find a minimum weight span¬

ning tree that corresponds to a minimum cost assignment with minimum

transmission cost.

Sinclair also considered the possibility of dynamic reassignment of

modules during program execution in distributed processing systems both

for two-processor and for system with more than two processors[10].

Modules may migrate from one processor to another during program execu¬

tion but incur a reassignment cost in doing so. An optimal dynamic

assignment minimizes the sum of module execution, interprocessor commun¬

ications. and module reassignment costs for the program execution.

Using an extension to Stone's original work, he was able to efficiently

solve the two-processor problem. He showed an algorithm to find an

optimal solution for the general problem with time and space complexi¬

ties exponential in the number of program modules, and pointed out that

pruning the dynamic assignment tree as it is being created results in an

average space complexity much less than worst case. He also described a

modification of this algorithm which emphasizes its equivalence to a

shortest path algorithm and which can greatly reduce the number of com¬

putations by taking advantage of "localities" in the sequence of module

16

executions. Bokhari also considered reassignment costs as well as

residence costs in a somewhat different model and was able to solve this

problem for the case of two processors.

Stone and Bokhari summarized a number of theoretical results that

point the way toward the control of distributed computer systems[2].

They presented an extension of Stone's two-processor solution to the

three-processor case. They described the construction of a processor

flow graph tor the three-processor problem, and discussed an algorithm

for finding the minimum tricutset in a three-processor assignment graph,

which corresponds to a minimum cost three-processor assignment.

Another approach to solve the module assignment problem which does

not involve the use of processor assignment graphs is based on a shor¬

test path algorithm. Bokhari demonstrated that the n-processor assign¬

ment problem has an efficient solution when the calls graph of the pro¬

gram is tree-structured[ll]. He constructed an assignment graph such

that each assignment of modules to processors corresponds to a subset of

nodes and edges of the graph constituting a tree (called an assignment

tree). The weight of each assignment tree equals the cost of the

corresponding assignment. The shortest tree algorithm finds a minimum

cost assignment by finding the minimum weight assignment tree in the

assignment graph, where the weight of an assignment tree is the sum of

the node weights in the tree. This allows us to minimize the sum of

execution and communication costs for distributed systems with arbitrary

17

numbers of processors. For m modules and n processors, the time com-

2 plexity of the algorithm is 0(mn ).

Sinclair used Bokhari's method in finding the critical delays in

programs with tree-structured calls graphs[S] and also in finding

optimal assignments with minimum communication cost for broadcast chan¬

nels^] .

Bryant and Agre used queueing theory to compute the cost of an

assignment when congestion delays are included in the total cost[12].

Other researchers used different formulations and methods for solving

the module assignment problem. Chu» et. al.[13]» described a model with

additional real time constraints and several means of solving these con¬

strained optimization problems. They used a heuristic assignment algo¬

rithm to find a "good" assignment. Chou and Abraham used Markov deci¬

sion theory to solve the assignment problem under a different set of

constraints.

Bryant and Agre discussed in their paper an alternative approach to

modeling the execution of a set of distributed programs» using a closed»

multiclass queueing network[12]. This model allows representation of

congestion factors such as mean queue size or average response time in

the cost criterion function. The cost function used in their model is

the sum of the response ratios (response time divided by requested pro¬

cessor time) across all distributed programs on the system. Solving the

18

model allows one to obtain estimates of program execution time that

include queueing and communications delay. This approach has the advan¬

tage that the execution cost can be expressed in terms of performance

measures of the system such as response time. Additionally, they intro¬

duced an interchange heuristic search as a method of finding a good

module allocation. The complexity of the resulting algorithm is 0(M K

(K + N) C) where M is the number of modules. K is the number of sites in

the network, N is the number of communications processors, and C is the

number of distributed program types.

Chu, et. al.[13], discussed Stone's model and the deficiencies

dealing with no load balancing. They described a 0-1 integer program¬

ming approach which can be used with real-time and/or memory con¬

straints. They pointed out that this approach also fails to accurately

account for real time constraints when module precedence relations are

considered. Consequently, they generated a heuristic algorithm based on

work by Gylys[14] which can be used for module allocation subject to the

real-time and memory constraints, and they proposed replacing the real¬

time constraint with a load-balancing constraint. The approach involves

"fusing" together modules to get an initial assignment and then checking

to see if the assignment satisfies the load-balancing requirements and

memory constraints. If not, a heuristic phase moves some modules from

one processor to another to improve load balancing while meeting memory

constraints. They also discussed the estimation of intermodule inter¬

communication costs. They treated the modules as tasks waiting in the

19

queue to be allocated to the processors. The maximum system performance

is measured by throughput. Maximum throughput is achieved by load

balancing which tries to distribute modules as much as possible, but

overhead due to interprocessor communication drives the allocation stra¬

tegy to cluster modules to as few processors as possible.

Chou and Abraham presented an algorithm that is more general and

applicable to n-processor systems for making an optimal module to pro¬

cessor assignment for a given performance criteria[15]. Their model

includes a set of tasks and an operational precedence relationship among

the tasks. It allows the description of probabilistic branching and

concurrent execution in a distributed program. The algorithm is based on

the analysis of a semi-Markov process with rewards.

We are interested in finding optimal or near-optimal assignments in

systems with more than two processors. Previous work in this area, as

described above, either only considered programs with specific proper¬

ties (i.e., Bokhari's results for programs with tree-structured calls

graphs) or relied on methods such as interchange heuristics to obtain

"good" if not optimal assignments. This thesis is concerned with using

the extension of Stone's graph model to n>2 processor as a basis for

developing methods for dealing more effectively and efficiently with the

general n-processor assignment problem. In Chapter 5 we show that the

complexity of the n-processor assignment problem can often be signifi¬

cantly reduced, and in Chapter 6 we offer a heuristic algorithm for

20

finding near optimal assignments based on a strnctnre called an affinity

tree, which is related to a Gomory-Hn tree. To provide the necessary

background for both of these results we present in Chapter 4 Stone's

graph model and its solution in detail.

CHAPTER IV

ASSIGNMENT GRAPHS

In this chapter, we introduce assignment graphs and describe their

use in finding optimal assignments. We describe assignment graphs for

two-processor model, and then we extend them to more than two proces¬

sors. We also present several definitions and assumptions that are

relevant to the remainder of this paper.

4.1. Assignment Graphs for Two-Processor Systems

The assignment graph is a graphical representation of the program

model originally developed by Stone[1]. An assignment graph is an con¬

nected undirected graph consisting of a set of nodes and a set of

weighted, undirected edges. Every program module is represented by a

node, as is each processor. The edges are assigned edge weights, indi¬

cated by the numeric labels on the edges in the graph.

The edges between module nodes represent intermodule communication

patterns, and the weight of an edge between two module nodes represents

cost of communicating between the associated modules when the modules

are not coresident on the same computer. Recall that intermodule

21

22

communication costs between a specified pair of modules are assumed to

be zero when the pair of modules are coresident. Each module node is

labeled with the name of its associated program module.

Let the two processors be called and P^, and let the nodes in

the assignment graph associated with the processors also be labeled P^

and P^. The assignment graph has edges from each module node M to both

nodes P^ and P^. The weight of an edge between module node M and node

P^ is the cost of executing module M on processor P^. Similarly, the

weight of the edge between node M and node P^ is the cost of executing M

on P^.

Fig. 1 is an example of assignment graph for two processors. The

node P^ and P^ represent the two processors. Nodes A through F

represent modules. Ve assume that the costs for executing each module

on either processor are known. For each pair of modules, the cost of

communication between them should they not be coresident is also assumed

to be known. The communication cost between coresident modules is

ignored. The communication costs are normally given in units of time or

dollars, as are the execution costs. Ve see from the graph that some

modules run faster on processor 1 (for example. A), and some run faster

on processor 2 (for example, D). The symbol °» indicates an infinite

cost. An edge (M, P.) labeled with *® indicates that the module M must

be executed on processor P..

23

Figure 1. An assignment graph, and a cutset that determines a module assignment*

24

A cutset in a graph is defined to be a collection of edges such

that 1) when removed from the graph, node is disconnected from node

Pj» and 2) no proper subset of a cutset is also a cutset. The edges

crossed by the bold line in Fig. 1 form a cutset.

A cutset partitions the nodes of the graph into two disjoint sub¬

sets such that the nodes in one subset are connected to P^ and the nodes

in the other subset are connected to P^. Stone showed that each cutset

in the assignment graph corresponds to a module assignment, and every

assignment is represented by a cutsetlll.

The weight of a cutset is the sum of the weights of the edges in

the cutset. It is equal to the cost of the corresponding module assign¬

ment since the weight of a cutset is the sum of the module execution and

intermodule communication costs for that assignment. Note in Fig. 1

that if a module M is assigned to P^ then the edge (M, P^) is cut. but

the weight of this edge is the cost of executing H on P^. Clearly, for

each module M, either (H, P^) or (M, P^) must be in a cutset since oth¬

erwise P^ would not be disconnected from P^ by the cutset's removal. It

is easy to show that both (M, P^) and (H, P^) cannot be in the cutset,

and so each cutset will uniquely define an assignment, with the weight

of the cutset including the sum of the module execution costs for that

assignment. If two module nodes M and N are assigned to different pro¬

cessors by the cutset, then any edge between M and N must be in the

cutset and its weight included in the weight of this cutset. But this

25

weight is just the cost of communicating between M and N, assuming they

are assigned to different processors. If M and N are assigned to the

same processor. (M, N) cannot be in the cutset, in agreement with the

assumption that coresident modules have zero communication cost.

An optimal assignment corresponds to a minimum weight cutset of the

assignment graph. It follows that an optimal assignment may be obtained

by finding a minimum weight cutset in the graph. This may be done using

network flow algorithms.

A flow network consists of a set of nodes and a set of edges con¬

necting these nodes. We assume that the number of nodes and edges is

finite. For convenience we rule out the possibility of having an edge

forming a self-loop. We use A^ to indicate the edge leading from node

i to node j. Then a network is connected if for every partitioning of

the nodes of the network into subsets X and X there is either an edge

A. . or A. . with i € X and j € X. Every edge A. . have an associated lj ji J 13

positive integer b^ called the capacity of the edge.

A flow network has two special nodes. One is called the source,

denoted by s. and one is called the sink, denoted by t. An appropriate

analogy is a water pipeline system. The edges represent pipelines, the

source is the inlet for the water, the sink is the outlet of the water,

and all other nodes are junctions between pipelines. The capacity of

each edge is maximum volume per unit time of the pipeline. With such a

26

pipeline system, we are interested in the maximum flow that can be put

through it from the source to the sink.

For a given flow network, a set of nonnegative integers x^ is

called a flow in a network if they satisfy the following constraints:

jk

-v if j=s, 0 if j^s, t, v if j=t,

(4.1)

0 - xij - bij (for all nodes i, j). (4.2)

The v which appears in (4.1) is a nonnegative number called the value of

the flow. Note that (4.1) expresses the fact that flow is conserved at

every node except the source and the sink. Constraint (4.2) means that

the edge flow x^ is always bounded by the capacity of the edge b^,

If the network is a simple path from s to t, then the maximum

amount of flow that can be put through the network is obviously limited

by the edge with the minimum capacity of all the edges in the path. An

edge with minimum capacity is a bottleneck of the network. Ye shall now

define the general notion of a bottleneck in an arbitrary network. A

cutset is denoted by (X, X), where X is a subset of nodes of the network

and X is its complement. The capacity or value of a cutset (X, X),

denoted by c(X, X), is 5b,, for all i € X and j € X.

i.j J

Clearly, due to the constraints (4.1) and (4.2) the maximum flow

value v is less than or equal to the capacity of any cutset separating s

27

and t. Less obvious is that the maximum flow vaine is always equal to

the minimum capacity of all cntsets separating s and t. A cutset

separating s and t with minimum capacity is called a minimum cutset*

This result is called the Max-Flow Min-Cut Theorem, due to Ford and

Fulkerson[16].

Max-Flow Min-Cut Theorem

For any flow network the maximal flow value from the source to the

sink is equal to the capacity of a minimum cutset separating the source

and the sink.

The proof of Max-Flow Min-Cut Algorithm involved showing that if a

minimum cutset were not saturated (capacity = flow), then there must

exist a path from the source to the sink through which the flow between

s and t could be augmented. Ford and Fulkerson used this flow augmenta¬

tion approach to develop the first Max-Flow Min-Cut algorithm. Edmonds

and Karp improved the algorithm by always searching for a flow augment¬

ing path with a breadth-first search and were able to show a time com¬

plexity of O(n^) where n is the number of nodes. Later algorithms by

Dinic and Earzanov, Evens, Malhotra, et. al., and others, were success-

3 ful in reducing the time complexity to 0(n ).

Ve can view the assignment graph as a flow network with source

and sink P^. The edge weights are interpreted as flow capacities, and

we apply a max-flow min-cut algorithm to compute a maximum flow in the

28

network. The value of this flow is the cost of a minimum cost assign¬

ment of the program. By establishing a maximum flow in the network, we

can also find a minimum cutset of the assignment graph by identifying

all nodes reachable from by paths consisting only of unsaturated

edges. Since every minimum cutset of the assignment graph defines an

optimal assignment, we can solve the optimal assignment problem for the

3 two-processor case in 0(m ) time where m is the number of program

modules.

4.2,. Assignment Graphs for Systems with More than Two Processors

This section describes how to construct a graphic model for the n>2

processor problem. We will show suitable generalizations of the notion

of an assignment graph and the procedure for constructing such a graph.

This generalization is described in [1].

An n-processor assignment graph for a program of m modules consists

of a set of n+m nodes and a set of undirected, weighted edges. Each

module M has an associated module node labeled M, and each processor P^

has an associated processor node labeled P^. As before, the edge

between two module nodes M and N represent information transfer between

modules M and N, and the weight of the edge is the cost of communication

between M and N, assuming they are not coresident on the same processor.

29

As same as in the two-processor systems, for the graph model first

described by Stone and Bokhari[2] the edges between two modale nodes

represent intermodule communication patterns, and the numbers on the

branches represent the cost of intermodule communication between modules

which are not coresident on the same computer. It is assumed to be zero

for the intermodule communication costs between two coresident modules.

Similarly, every module node M is connected to each processor node

1 <. i <. n. by an edge (M, P^) with a weight which is derived from M's

execution costs. Suppose that the cost of executing a module M on pro¬

cessor P.. i = 1, 2, .... n. is T.. The edge between node M and node P„ l i 1

has weight (T- + ... + T - (n - 2)T, ) / (n - 1), and likewise the Z 21 1

branches to nodes P. carrys the weights (T + T, + ... + T - (n - 2)1.) L 1 o n Z

n / (n - 1). In general. (M, P.) has weight (( 5 T.) - (n - 2)T ) /

1 j=l J 1

j*i

(n - 1).

An n-cutset of the assignment graph is a set of edges such that

1) if removed, separates the assignment graph into n subgraphs each con¬

taining exactly one processor node, and 2) no proper subset is also an

n-cutset. As in the two-processor case, each n-cutset corresponds to an

assignment of the modules of the program. A module M is assigned to a

processor P^ by an n-cutset if module node M is in the subgraph contain¬

ing processor node P^. The weight of an n-cutset is the sum of the

weights of the edges in the n-cutset. The edge weights are defined such

that the weight of an n-cutset is the cost of the associated assignment.

30

The problem of finding an optimal assignment is equivalent to that of

finding a minimum weight n-cutset of the assignment graph.

Fig. 2 shows the assignment graph for a three-processor system with

P^, and P^ representing processors and A, B, C, D, and E representing

program modules. If D is assigned to processor P^, the edges to P^ and

Pj are cut. and their weights total to T^, the cost of executing D on

P^. In general, the execution cost of a module M on a processor P^ con¬

tributes to the weights of all edges (M. P^), 1 < i < n. For each pro¬

cessor P^. i ^ j, the contribution is / (n - 1). If M is assigned to

P., then the total contribution due to M's execution cost of P. is (n - 3 3

l)Tj / (n - 1) = Tj, as it should be. The total contribution due to

M's execution costs on P^, i 4 j, is 0, since n-2 edges include T^ /

(n - 1), and one edge (M, P^) contributes -(n - 2)T^ / (n - 1). An edge

(M. P.) with an infinite cost indicates that M must be assigned to P. i l

since the cost of running M on any other processor would contribute an

infinite amount to the program's total cost.

A minimum n-cutset in an assignment graph for n-processors can be

found by exhaustive enumeration but the computational complexity of such

an approach makes this impractical for all but small problems. An m-

module program in an n-processor system has 0(nm) possible distinct

assignments. Vhen n>2. finding an optimal assignment is difficult. For

n>3. the problem of finding an optimal assignment is known to be NP-

complete; i.e. it is in a class of "hard" problems for which efficient

31

Figure 2. Au assignment graph for a three-processor system.

32

solutions almost certainly do not exist.

A two-processor flow can give information about the minimal n-

cutset in an n-processor graph. Previous research[l] shows that a

module node associated with a processor node by a two-processor flow is

also associated with that node by a minimum n-cutset. Unfortunately, it

is easy to construct examples in which a node that belongs with a par¬

ticular distinguished node by a minimum n-cutset fails to be associated

with that node by a two-processor flow. In the next chapter we will

discuss the implications of these results and describe a new result

which can substantially decrease the complexity of finding a minimum n-

cutset

CHAPTER Y

SEARCH SPACE REDUCTION

In using an exhaustive search procedure to find an optimal assign¬

ment, we assume that each module might be assigned to any of the n pos¬

sible processors by an optimal assignment. However, in many cases this

is not true. Ve show that a simple modification of the n-processor

assignment graph can be used to identify modules which must be assigned

to a particular processor under any optimal assignment of the program.

This can greatly reduce the size of the search space for optimal solu¬

tions. Ve present a number of experimental results indicating the mag¬

nitude of the reductions that we might expect in applying this result.

5_.1_ The Reduction Theorem

A max-flow min-cut algorithm requires that the flow network have

exactly one source node and one sink node. When we use a max-flow min-

cut algorithm to find an optimal assignment in a two-processor flow

graph, we choose one processor node to be the source node and the other

to be the sink node. Attempting to extend this approach to an n-

processor system is difficult because we no longer can identify a unique

33

34

source and sink.

However* we can modify the n-processor assignment graph so that we

can apply a max-flow min-cnt algorithm to obtain some information about

module assignments for the n-processor case. In the n-processor assign¬

ment graph* we choose one of the processor nodes* say as a source

node. Ve add a new node P’ to the graph, and edges connecting it to

each processor node except Pj. The weights of all the new edges are

infinite. Ve call this graph the P^-reduction graph. Then we let P^ be

the source node and P' be the sink node. By running a max-flow min-cut

algorithm on the P^-reduction graph, we find a minimum cutset which

separates the nodes of the graph into two partitions, one containing P^

and the other containing P*. Sometimes more than one cutset have

minimum weight. In a two-processor graph or P.-reduction graph, we

refer to the minimum weight cutset which associates the fewest number of

nodes with P^ as the P^-reduction cutset. Ve say that a module A is

assigned to P^ by a P^-reduction cutset if module node A belongs to the

same partition as P^.

The following theorem establishes a relationship between the P -

reduction cutset and the assignments of particular modules in any n-

processor optimal assignment. Let P^ be an arbitrary processor.

35

Reduction Theorem

If a module A is assigned to a processor by an P^-reduction

cutset* then A must he assigned to P^ under any n-processor optimal

assignment.

Proof :

First note that if A is assigned to P^ by every minimum capacity

cutset (Xj»Xj) of the P^-reduction graph* then A is assigned to P^ by

the cutset (X,X) where P^ € X^ V j and X = Ç) X^.

We prove the theorem by contradiction. Let A be assigned to P^ by

a Pj-reduction cutset (X,X), which is represented by cutset I in Fig. 3.

Assume A is assigned to a different processor P2 under an n-processor

minimum cost partition II. Without loss of generality* assume I and II

cross each other in the original graph* as shown in Fig. 3. Denote the

four regions defined by these cutsets as P^> P^. A, and X according to

the nodes appearing in these regions. Region X may be empty. The

cutset weight c(U,V) is the sum of the weights of all branches between

two regions U and V.

Since I is an P^-reduction cutset* the weight of I is less than the

weight of I-A, where I-A represents the cutset that partitions the graph

into two regions Pj and P^ U X U A. include A with P^. Thus

c(P1*X) + c(P1*P2) + c(A,X) + C(A,P2) < ctPj.X) + ctPj.A) + ctP^Pj)

36

Figure 3. Two cutsets in a graph.

37

We can find

c(A,X) + C(A,P2) < c(P1.A) (5.1)

If the hypothesis is true, since II is a minimum cost partition, the

weight of II is less than or equal to the weight of II-A, where II-A is

the cutset that partitions the graph into two regions D X D A and Pj.

Combining (5.1) and (5.2)

c(A.X) < -c(A,X)

Since all costs are nonnegative, this is a contradiction. Eence A can¬

not be assigned to any processor node P2 by an optimal n-processor

assignment.

This theorem is similar to but stronger than the one described by

Stone[ll. Stone showed that A is associated with P^ in some minimum

cost partition of the n-processor graph if A is associated with P^ by a

two-processor flow algorithm. We have explored the condition under

which A is associated with P^ in any minimum cost partition of the n-

We have

c(A,X) + e(A.P ) < C(P2.A) (5.2)

processor graph

38

The above theorem has a direct application in reducing the size of

the search space in attempting to find an optimal assignment. In the

algorithm that follows* let c(a,b) be the weight of the edge between

nodes a and b.

Procedure PREDUCE

1. n <- number of processor nodes;

2. for i-1 step 1 until i=n do

begin /* construct the P^-reduction graph */

3. source node <- P^;

4. add a new node P ^ to the assignment graph as the sink node;

5. for j=l step 1 until n do

6. if j*i

begin

7. add edge (P^ P^^) to the assignment graph;

8. c(j,n+l) = «;

9.

10.

11.

12.

end

establish a maximum flow between P. and P „ using l n+1

a max-flow min-cut algorithm;

X <- {all module nodes reachable from P^ by a path of

unsaturated arcs);

X <- {all module nodes not in X};

/* (X, X) is the Pj-reduction cutset •/

condense the nodes in X into one node P^';

for all v € X do

39

13. c(PA', v) = J c(u, v) u

¥ u € X

14. remove P^+^ and all edges incident on it;

end

In this algorithm we first choose a processor node P^ as a source

node. Then we add a new node P^+^ as the sink node to the graph and

connect the other processor nodes P., .... P to P by edges of infin-

ite weight. After running the max-flow algorithm on this graph we find

a P^-reduction cutset. Then we condense the nodes in the partition X

with P^ into one node P^'. If any node a in X has an edge to a node j

in the other partition of the graph, there is an edge from P^' to j in

the modified graph. The weight of (P^'.j) is equal to the sum of the

weights of the edges from all nodes in X to node j. After removing P^+j

and its incident edges, we choose P^ as the source node and repeat the

procedure above. We call this process of combining module nodes with a

processor node P^ a P^-reduction.

By Theorem 1 a module that is associated with a processor P^ by a

Pj-reduction cutset must be assigned to Pj by any optimal n-processor

assignment. By finding the P-reduction cutsets for all the processors

P. we hope to find a number of modules whose assignment in the optimal

n-processor partition can be fixed. Then when we attempt to find the

optimal n-processor assignment, we only need to consider the remaining

modules

40

In this way we reduce the work on searching optimal assignment. In

the next section we show through a number of examples that this algo¬

rithm results in a significant reduction in the size of the search space

for most assignment problems, at least for programs which conform to the

class of assignment graphs that we generated as test cases.

The success in applying the Reduction Theorem to the assignment

problem leads us to ask if we can obtain any additional information from

the set of P^-reduction graphs about module assignments. In particular,

what if a module is associated with a processor node P^, by a minimum

capacity cutset of the P^-reduction graph but not the P^-reduction

cutset, and A is not associated with any P^, i ^ j, by any minimum capa¬

city cutset of the P^-reduction graph? Ve might hope that A could then

be shown to be assigned to P^ by any n-processor optimal assignment.

Unfortunately, the following lemma shows that this is not true.

fcemma

If A is assigned to P^ by a minimum cutset of the P^-reduction

graph but not by an P.-reduction cutset, and A is not assigned to any

processor P (j £ i) by any minimum cutset of P -reduction graph, then A «I J

is not necessarily assigned to P^ by an optimal n-processor partition.

Ve will prove this by contradiction. Fig. 4(a) is an assignment

graph of a 3-processor system. P^, and P^ represent processors. A,

B and X are module nodes. Edge weights CCA.P^) and C(A,P^> are zero.

41

Figure 4(a), Au assignment graph of a three-processor

system and its minimum cost partition.

Figure 4(h), An -reduction graph and the minimum cutsets.

42

Figure 4(c). An P2~reduction graph and the minimum cutsets.

Figure 4(d). An P3-reduction graph and the minimum cutset.

43

Processor Assignments Cost of Assignment A B C

1 i 1 6 1 i 2 5 1 i 3 6 1 2 1 6 1 2 2 5 1 2 3 6 1 3 1 7 1 3 2 6 1 3 3 7 2 1 1 12 2 1 2 7 2 1 3 10 2 2 1 10 2 2 2 5 2 O 3 8 2 3 1 12 2 3 2 7 2 3 3 10 3 1 1 12 3 1 2 9 3 1 3 8 3 2 1 11 3 2 2 8 3 2 3 7 3 3 1 11 3 3 2 8 3 3 3 7

Table 1. All the possible assignments and tbe weights of tbeir cutsets for the graph in Fig. 4.

* optimal assignment

44

From the definitions of (processor node, module node)-edge weights,

<T2 + T3 - Tt) / 2 = 3.

(Tx + T3 - T2) / 2 » 0,

(T1 + T2 " T3) / 2 = °*

This implies that « 0 and = 3. Figures 4(h), 4(c), and 4(d)

are the P^-, Pj-* an<* Pj“r®duction graphs. The bold lines show the

minimum cutsets in each reduction graph. Notice that A is assigned to

by a minimum cutset other than the P^-reduction cutset in the P^-

reduction graph, and A is not assigned to P2 or P^ by a minimum cutset

of the P2~ or P^-reduction graph. Table 1 lists all the possible 3-

processor assignment and the weight of the cutset for each. The three

assignments marked with are optimal. In the third assignment A is

assigned to P2 instead of P^. Hence A is not necessarily assigned to P^

by an optimal 3-processor partition even if A is assigned to P^ by a

minimum cutset in the P^-reduction graph, and not assigned to P2 or P^

by any minimum cutset in the P2 or P^-reduction graph.

5_.2 Assignment Problem Simplification by P-reductions

The Reduction Theorem can be used to simplify the n-processor

optimal assignment problem. In this section we will show the efficiency

of this approach by giving some experimental results. These results are

for a particular class of assignment problems determined by the method

used to generate the assignment graph, and we will describe this method

45

first*

The graphs we used in the experiments were generated randomly.

First we generated the set of edges in the graph. We use a graph model

in which the probability of an edge existing between any pair of nodes

is a constant. 1/3.

We inserted edges between all pairs of nodes with probability 1/3

for each pair. The average degree of a node (number of incident edges)

is equal to (n - 1) / 3 for an n-node graph. After all edges were gen¬

erated. we verified that the resulting graph was connected; that is, we

required that there exist at least one path between every pair of nodes.

If we found a set of nodes that is not connected to the remaining nodes

by any paths, we added an edge which links a node in this set to one of

the other nodes of the graph. In this way we generated graphs with a

relatively dense interconnection structure, which in turn tends to make

the assignment problem more difficult.

We then assigned weights to the edges. We assumed that communica¬

tion costs are uniformly distributed in the range 1 to 20 in our exam¬

ples. For each edge between two module nodes, we generated a random

number R between 0 and 1. calculate R * 20 and used the ceiling of this

value as the weight of the edge. In the similar way we determined the

weight for an edge between a processor node and a module node. Since

there do not exist any detailed analyses of programs written for distri-

46

buted processors the choice of parameters is arbitrary.

In regards to the cost for executing module A on n processors, we

have following equations from Stone's formula.

cfPj.A) = (T2 + T3 + ... + T - (n - 2)T2) / (n - 1).

C(P2.A) - (TJ + T3 + ... + Tn - (n - 2)T2) / (n - 1).

c(P ,A> - (T, + T„ + ... + T , - (n - 2)T ) / (n - 1) n 12 n-1 n

In generating the model, we selected the edge capacities c(P.,A)

rather than execution costs for A. Using these equations, we can

calculate the values T^. 1 <. i <. n, from the chosen edge capacities. If

we find that an execution cost turns out to be 0. this is not really

a problem. We can bias all of the T^'s by replacing each with T.+<r

for some positive <r. This would not influence the solution of the n-

processor optimal assignment since this causes'the assignment cost to

increase by a constant a for each possible assignment, and the optimal

assignment(s) remain unchanged. Consequently, had we started with

values TJ+<T ^ 0 for «11 i. 1 <. i <u, we could have subtracted a from

each term again without changing the identity of the optimal assign¬

ments. even though some of the values of might now be zero.

Fig. 5 is a histogram of the graph size reduction for problems with

3 processor nodes and 6 module nodes. Fig. 6 is the case of 3 processor

nodes and 9 module nodes, and Fig. 7 describes the results for 6

freq

uen

cy

47

rH «O 4> d N O

(O’M O h W O h W O O © vo co o vo co © *d d o vo co o vo co o o«*o O 'O CO O VO CO O 04 o o VO CO O VO CO o W Vl O VO CO O xO CO O &0

<6 ,d P< d vt 60

o ed

u <0 o <0

<H •d o

d d o

•rt © ■M rH O d d *d

*d o o a Vi

VO O N T3

•H d (0 04

fd a © 04 *d Vt o 60 d

4> Vi wd o «M (0

CO <M © O O

o a Vi 04 0* m 60 co o

«M •d <0 «V»

•eH •H W >

• co

<0 Vi d 60

tv

O VO co O VO CO O fH CO «O VO oo O

frequency

48

es 4> P N O

• • •• •• •• «• •« •• •• •• .« *r4 <n +*

OHCScn^VOIÔOONO o OH(SCO^^VûhCOO A P OH<Srt^V)'OhCOO o<*o OHtStn^VJSOhOOO «av OiHcSco^*n\orôoo d Hi Or-icsco^r*nvorôoo &o

,d o« <3 Hi ao

4> A «M

# M (A O 4>

<M *P O

d d o

<D r4

O p P ’p

*© o <D s Hi

ON 4> N 'a

•H d <A 03

A (A Cu 1) <3 *d Hi o euo d

a> Hi ,d O

(A <A

«M 41 o O

o 6 Hi 03 P« Hi 00 cn O

•H» .d (A •M

•HI •H » £

# Vo

© Hi P oo

•H

OH NW^«n\Ohoo o rHcSfn^*»nvor-ooo

f re

qu

en

cy

49

^ on o P N O

«o «M O H CS CO 't VO !>• 00 C7N O Q OHN ^^mvohcoo ,p P OiHnro^f«nvof-ooo o< *a OHdfo^fmvohoo o « o> OH««^V>VOhWO U U OHNtO^^VÛhOOD 60

(A •P P< fid w 60

4>

Vi <0 O

<H *o o

P p O

•*<4 fl) «V» H O P P

•o o 0> 6 V»

o\ «0 *o Vi o 00 p

4> u •P o ■M CO

CO «M o O o

o s V a & W4 00 VO o

«M •P CO «M •H •*4 W *

0> *4 p 00

•H to

OHf^CO'tViVOh-OOO H(Swt V1V: h» O

overage percentage reduction vs. the number of

1 modules

50

Figure

51

processor nodes and 9 modules nodes. In each of the cases described

above we generated 100 examples.

The axis marled "graph size reduction" gives the percentage of

module nodes that are assigned to one processor in any optimal assign¬

ment. In other words, this is the fraction of nodes whose assignment in

the n-processor case is fixed by a P-reduction. The histograms show the

relative frequency of occurrence of each percentage reduction from the

100 test case.

Fig. 8 shows the average percentage reduction as a function of

number of module nodes, for several different numbers of processor

nodes. We see that for those cases with relatively fewer of nodes, the

reduction algorithm is able to fix the assignments of a large fraction

of modules. Also, the effectiveness of the reduction process decreases

as the number of processor increases, for a fixed number of modules.

Other histograms of the graph size reduction for various numbers of pro¬

cessor nodes and module nodes are found in the Appendix.

A P-reduction is based on the application of a max-flow min-cut

3 algorithm, the complexity of which is 0(r ) for an assignment graph with

r nodes. For an n-processor system we have to run the max-flow algo¬

rithm on n P-reduction graphs, each with m+2 nodes. The time complexity

3 is 0(nm ). Suppose that for n processor nodes and m module nodes, we

can expect an average total assignment graph reduction of x. as a frac-

52

tion of the total number of module nodes. The time required to find an

optimal assignment m(l-x) modules by exhaustive enumeration is

0(nm^ *)). Thus, if we first do P-reduction and then find the optimal

n-processor assignment for the reduced problem* the time complexity is

0(nD^ Z^+nm^). If x is large enough, this represents a significant

improvement over the time required for exhaustive enumeration of m

module nodes, which is 0(nm).

We noted above that as the number of module nodes and processor

nodes increases, the fraction of modules whose assignments can be fixed

decreases. Consequently, we need some means other than exhaustive

enumeration for finding optimal or near-optimal assignments even when we

use P-reductions to reduce the size of the search space. In the next

chapter, we introduce an efficient heuristic assignment algorithm to

allow us to find "good" assignments even when the number of processors

or modules is large, and we show that for a large number of test cases,

it in fact produces an optimal assignment.

CHAPTER VI

HEURISTIC ALGORITHM TO FIND NEAR OPTIMAL MULTIPROCESSOR ASSIGNMENT

In this chapter we introduce a tree structure called a G-H tree

which is flow equivalent to a network flow graph. Then we describe a

related structure called affinity tree which can be used as a basis for

a heuristic algorithm to find optimal and near-optimal n-processor

assignments. Finally we present some experimental results which show

the effectiveness of this algorithm.

6.1 G-H Tree

Gomory and Hu showed that for a flow network with n nodes, maximum

flows between all the n(n - 1) / 2 pairs of nodes can be obtained by the

application of only n-1 max-flow min-cut algorithms[3]. They described

a procedure for constructing a tree, called a Gomory-Hu cutset tree,

from a xlow network with values on the edges of the tree and with the

following properties:

1) The maximum flow value between any pair of nodes N and N in a D

the flow network is equal to

53

54

■i,,(T.1''V’rji ’«k’

where v^ is the value associated with the edge between nodes N^ and N^

in the Gomoxy-Hu cutset tree (abbreviated as G-H tree).

2) Any edge of the G-H tree corresponds to a minimum cutset

(X,X) separating and in the original network. The set X is the

set of nodes connected to N^ in the G-H tree when the edge is

removed.

Fig. 9(b) is the G-H tree for the network in Fig. 9(a). For exam¬

ple* the maximum flow between nodes 1 and 5 is 13, while that between 3

and 4 is 14. The G-H tree is said to be flow-equivalent to the original

graph because of property 1 listed above.

To construct a G-H tree the procedure is briefly as follows. Ve

choose two nodes as source and sink, and do a maximum flow computation

to find a minimum cutset. The nodes in the network are separated by the

minimum cutset into two parts A and A. We represent this by two nodes

connected by an edge bearing the cutset value v^ (Fig. 10). In one node

are listed the nodes of A, in the other those of A. Next we choose two

nodes in A (or two in A), and solve the flow problem in the condensed

network in which the nodes listed in A (or A) are combined into a single

node. The resulting cutset has value v^ and is represented by an edge

of weight V2 connecting the two parts into which A (or A) is divided by

the cutset, say A^ and A^. The node A (or A) is attached to A^ if it is

55

Figure 9(a)

14

Figure 9(b)

(EHMD

Figure 10

56

in the same partition as A^, and to if it is in the same partition as

A2 (Fig. 11).

We continue to select pairs of nodes, both of which are contained

in a single node in the tree being constructed, and compute the maximum

flow between them. Each time we do this, we create a new node and edge

which are added in such a way that the resulting structure is still a

tree. This process is terminated when each node of the tree is labeled

with exactly one node. Each time we solve a flow problem in a network

equal to or smaller in size than the original. When the algorithm ter¬

minates. the resulting tree is a G-H tree for the original flow network.

For a more detailed description of this algorithm, see [17].

We illustrate this process by the example of Fig. 9. We arbi¬

trarily choose nodes 2 and 6, and compute a maximum flow between them.

We find the minimum cutset to be ((1.2). (3.4.5.6)) with value 17 as

indicated in Fig. 12(a). The first step in constructing the G-H tree is

shown in Fig. 12(b).

Next we choose nodes 1 and 2. since they belong to the same node in

Fig. 12(b). In obtaining the maximum flow between nodes 1 and 2, we

combine 3. 4. 5 and 6 into a single node. Fig. 13(a) shows the result¬

ing graph and the minimum cutset separating 1 and 2. which is ((1). (2,

3. 4. 5. 6)) with value 18. The tree shown in Fig. 13(b) includes an

edge between nodes 1 and 2 with weight 18. with node 2 connected to the

57

Figure 12(a)

Figure 12(b)

58

Figure 13(a)

Figure 14(a)

59

node labeled (3* 4, 5, 6).

Now we choose 3 and 6, 1 and 2 are condensed into a single node

(Fig. 14(a)). while node 3. 4. 5, and 6 are no longer combined. The

minimum cutset separating 3 and 6 is ((1. 2» 6), (3. 4, 5))» with value

13. This is reflected in the tree in Fig. 14(b).

To find the maximum flow between nodes 4 and 5. we can combine 1. 2

and 6 into a single node (Fig. 15(a)). The minimum cutset in this graph

is ((4), (1. 2. 3. 5, 6)) with value 14. Since 4 is alone in its parti¬

tion. we add a node labeled 4 to the tree in such a way that 4 is

separated from all other nodes in the tree by an edge of weight 14 (Fig.

15(b)).

Finally we consider the maximum flow between nodes 3 and 5. taking

1. 2 and 6 as one node. The minimum cutset separating 3 and 5 is ((3).

(1. 2. 4. 5. 6)) with capacity 15. and the completed G-H tree is shown

in Fig. 9(b).

Since the G-H tree of r nodes requires r-1 maximum flow computa¬

tions. the time complexity for constructing the G-H tree is bounded by

4 0(r ). However, since each time we compute a new maximum flow, we usu¬

ally combine sets of nodes in the original flow network into single

nodes in the flow network for which we compute the maximum flow, the

expected time complexity should be much less than this bound.

60

0-^_0_lï_0 “ (S.4.5

Figure 14(b)

Figure 15(a)

Figure 15(b)

61

The G-H tree is flow-equivalent to the original flow network. How¬

ever. there has been some loss of information in making this transforma¬

tion. We cannot be sore of finding an optimal n-processor assignment by

searching for a minimum value partition of a G-H tree derived from the

n-processor assignment graph. Nevertheless, we will make use of a simi¬

lar but simpler transformation procedure as a basis for a heuristic

algorithm for finding optimal or near-optimal assignments. This

transformation and the heuristic algorithm that employs it are the sub¬

jects of the next section.

6..2 Heuristic Procedure for Determining Multiprocessor Assignment

Since in an n>3-processor system the problem of finding an optimal

assignment is NP-complete. it is not likely that we will be able to

develop an efficient algorithm for finding an optimal assignment.

Instead, we are led to the question of whether there is any efficient

approximation algorithm for finding near-optimal assignments — assign¬

ment with costs that are very close to that of an optimal solution — at

least most of the time. Such an algorithm would be very attractive in

terms of the feasibility of its application to practical problems.

The G-H tree for a system consisting of n processors and m modules

4 has n+m nodes and is constructed in time bounded by 0((n+m) ). In this

section we first define a structure called an affinity tree which is

constructed in a similar fashion but which has fewer nodes. We then use

62

it as a basis for a heuristic algorithm for determining n-processor

assignments. Ve also show some results of the application of this

heuristic to a number of artificially generated test problems.

An affinity tree is a tree consisting of n nodes generated from an

n-processor assignment graph. It is related to a G-H tree, but each

node is labeled with exactly one processor node and some (perhaps none)

module nodes of the original module assignment graph. A node in an

affinity tree may represent more than one node in the original graph.

For an n-processor system we can get an affinity tree with n nodes.

The following algorithm generates an affinity tree.

(1) Create a node N, and list all of the nodes of the assignment graph

in N.

Let V = (N) /* V is the set of nodes in the (partially constructed)

affinity tree */

Let E ** 0 /* E is the set of edges in the affinity tree */

(2) Construct an assignment graph G in which each node n. listed in N

is a separate node in G. For any other node N' € V, all of the

nodes listed in N' are condensed into a single node labeled N' in

G. (Edge (N*,n.) exists in G if an edge (n.,n.) exists for some n • j i j

in N’ in the original assignment graph. The weight c(N',n^) is the

sum of edge capacities c(n^,n^) for all n^ such that n^ is a node

63

listed in N' and (n^.n^) is an edge in the original

graph.)

assignment

(3) Arbitrarily choose two processor nodes and listed in N, and

find the maximum flow f and minimum cutset (V «V ) between P and x y x

P such that P € V , P € V . y x x y y

(4) Create two new nodes N and N . In N , list all the nodes in V x y x x

that were listed in N, and in N list all the nodes in V that were y y

listed in N. Remove N from V.

(5) Add an edge (N ,N ) to E, with weight f. x y

(6) For every N' € V such that (N'.N) € E with weight f', if N' € V^,

replace (N'.N) by (N',N^) with weight f', and if N' € V , replace

(N'.N) by (N'.N ) with weight f’. y

(7) Replace N in V by N and N . i y

(8) Choose a node N' in V with two or more processor nodes listed in

it. If no such node exists, stop; else let N - N', and go to (2).

Fig. 16(b) is an affinity tree of the module assignment graph shown

in 16(a). Node P„ through P^. are processor nodes and m4 through m, are lo ID

module nodes* Some assignment graphs have two or more distinct affinity

trees

20 5

Figure 16(a)

Figure 16(b)

65

Our heuristic algorithm to assign modules in an n-processor distri¬

buted system uses the affinity tree as a basis. Basically our reasoning

is as follows. Each edge in the affinity tree represents a minimum

value cutset between two processor nodes in the n-processor assignment

graph. The set of cutsets associated with the edges of an affinity tree

partition the nodes of n-processor assignment graph into n subsets, each

containing exactly one processor node. By making the interpretation

that a module node A belonging to the same subset as processor node P^

means that module A is assigned to processor P^, we see that an affinity

tree uniquely defines an assignment. The partition has been constructed

by combining only minimum capacity cutsets between processor nodes into

a single n-cutset of the assignment graph.

If each of the minimum cutsets defined by the affinity tree were

disjoint (contained no edges belonging to any other cutset), the result¬

ing n-cutset should define an optimal assignment. Unfortunately, these

cutsets will be disjoint only in a few degenerate cases, and commonly

will share many edges with other cutsets defined in the affinity tree.

However we will approximate a minimum value n-cutset by the set of

minimum cutsets from the affinity tree under the supposition that it is

usually better to separate two processor nodes in the n-processor

assignment graph by a minimum cutset than by a cutset which does not

have minimum value. In some sense, the modules that belong to the same

node of the affinity tree as a processor P have an "affinity" for that

66

processor; hence the term affinity tree.

The algorithm for determining an assignment can he stated as two

simple steps.

Generate an affinity tree from an n-processor assignment graph.

step 2

Assign each module to the processor with which it is coresident in the

same node of the affinity tree.

This algorithm greatly reduces the computation for assignment on a

distributed system. When it is used to determine module assignment the

results may be sufficiently close to the optimal results to be useful in

a pragmatic sense. This algorithm uses a maximum flow algorithm as a

subroutine so the complexity of the module assignment is dependent upon

the implementation of the maximum flow algorithm used. Fortunately, the

maximum flow algorithm is among the class of algorithms with relatively

low computational complexity. There are various modifications of the

algorithm available that take advantage of special characteristics of

the module to obtain increased efficiency[18. 19, 20. 21]. The least

complexity of the algorithm is 0(r * e) where r is the number of nodes

and e is the number of edges in a graph. Since there are r(r - 1) / 2

edges in a fully connected graph, the value of e is at most r(r - 1) / 2 3

and the algorithm has complexity 0(r ). If there are n processors and m

modules in a system, only n-1 two-processor flows are run. Thus the

67

3 complexity of our algorithm is 0(n(m+n) )•

6.3 Experimental Resnlts

To determine the effectiveness of the heuristic algorithm described

in the previous section, we tested it on 100 randomly generated exam¬

ples. In these examples we allowed both the number of processor nodes

and the number of module nodes to vary. For each example we generated

all possible module assignments in an exhaustive search for a minimum

cost assignment. As we examined each possible assignment we calculated

its cost, and we recorded the maximum and minimum cost as well as the

mean cost over all possible assignments for a given test case. Table 3

shown in the Appendix lists these values for each example, comparing the

cost of the assignment generated by our heuristic algorithm with that of

the optimal assignment and the mean cost over all possible assignments.

In examples 11. 63. 65 and 77. we found several different affinity trees

for each. They indicate different heuristic assignments. In three

examples. 11. 63 and 77. the assignments represented by the different

affinity trees were all optimal. In example 65. we have four affinity

trees, but only one of them shows the assignment with minimum cost and

we use it to calculate the deviations listed in the table involving the

"near optimal" assignments. The percent deviation described in Table 2

and 3 are computed by taking the difference between the two costs,

dividing by the smaller of the two costs* and then multiplying by 100.

68

<A s 8» H CA es 08

<M g U EP EP a* IP fiP »? tP »? «M «M en o QO cS vo O 00 en

d P, c8 ON o O en 00 en en o o a • • • • • • • •rt •M en en 1 en 1 o CS en 00 rH «M P *M o ON NO r** o 00 CS CS «8 08 P rH rH rH

•H 4> O > P P <D S O

■o P B

a> O CA |x

CA CA

ïP * SP »? EP cP »? Èp EP EP

«8 CM 08 rH 'Cf CS NO 00 NO o 00 en U M N ON m O 00 00 en CS CS o a> c O • • • m • • • • • > 08 O en en en CS en Os o o r** 08 8> rH en 00 rH r* 00 en CS en es

B rx «S rH rH CS rH rH rH 08

CA 4> CA 08

«M U tP iP EP EP EP »? EP »? O rX

08 rX en O CS en o "Cf -f r* P G c8 n O I en I rH en en 00 rH O •** a • • • • • • • • • •H -M •M rH en 1 en 1 CS CS en 00 Tf "M P< H-» rH ON 00 rH 00 en CS «8 O P rH rH

•M O > B P o o o

T3 P

CM P

 rH 1^ en NO CS es o 00 rH c8 M en 00 rH r- 00 HT CS en en

rX «S rH rH CS rH rH rH P

M P 08 O 6 P

•M P CM +* O O P, îP EP iP Èp IP »? »? EP EP

O «M a P en rH O r** o NO O O »n ^f o a a vo rH O en o NO 00 ■S* o 00

•M O •H • • • • • • • • • • -M IX -M 00 o NO o Tf ON en o8 CM P

•H O P

0> rH o •O o8 P

a o ix P 60 H-» P EP * Êp P EP eP »? EP EP »? «8 P. P U o M l> rH o rH o o Hf xf ON 1> O H1 00 o Os o xf -Hf ON No O > n • • • • • ■ • • • • 08 08 r-X O O o rH o rH rH O *n TT

O i-X

P P Z

U <D 4> rX 43 H 6 O P O P G

ON NO

CA W

H o <u CA ,© CM CA S O o PS o a o

*4 P<

NO NO

es

Table

69

We find from the Table 3 that for 73% of the examples the heuristic

algorithm gives an real optimal module assignment. For the remaining

examples the deviation of the cost of the "near optimal" assignment (the

heuristic solution) from the cost of an optimal assignment is around

10%. with the maximum deviation being less than 14%. Even in the worst

case the cost of the heuristically determined assignment is much less

than the mean value of the cost for all the possible assignments.

In Table 2 we display the deviations for different classes based on

the number of processors and number of modules. We consider two cases:

one in which we include all the examples in the class and one in which

we only include those examples for which the heuristic solution was not

optimal. The average deviation of the approximate solution cost from

the optimal solution cost for all 100 examples is 1.67%. Although as

the number of modules nodes and the number of processor nodes in the

assignment graph increases the average deviation tends to increase, it

is clear that considering the increased number of all possible assign¬

ments. the heuristic algorithm is still an efficient means for finding

"good" assignments, i.e.. assignments which have costs only slightly

higher than an optimal assignment.

CHAPTER YII

SUMMARY AND CONCLUSIONS

7.1 Summary of Results

We studied the program module assignment problem in distributed

systems. The graph model developed by Stone is useful in the two-

processor case because the optimal assignment can be efficiently found

by the application of a max-flow min-cut algorithm. Although the graph

model can easily be extended to the n>2 processor case* the n>3-

processor assignment problem is known to be NP-complete, and no effi¬

cient algorithm for n>3 has been discovered.

We can often reduce the complexity of the n-processor assignment

problem by identifying modules that must be assigned to specific proces¬

sors under any optimal assignment. The Reduction Theorem which proves

this shows how to determine which modules, if any, will be assigned to a

particular processor P^ under any optimal assignment. We construct a

P.-reduction graph and apply a max-flow min-cut algorithm to it to find

a P.-reduction cutset which partitions the nodes of the reduction graph

into two subsets. Any module belonging to the same subset as P^ are

assigned to P by any optimal n-processor assignment. This often allows

70

71

substantial reductions in the number of modules whose assignments in an

optimal n-proces$or assignment cannot be fixed, although the effective¬

ness of this procedure decreases as the number of modules and/or proces¬

sors increases* This was demonstrated experimentally using a number of

randomly generated assignment graphs for several combinations of numbers

of modules and processors*

Even after applying these P-reductions, we are still left with the

problem of finding an optimal assignment based on the reduced assignment

graph* The heuristic algorithm that we used to find "good" solutions is

based on a structure called the affinity tree that is related to the G-H

tree, a flow-equivalent transformation of a flow network. We construct

an affinity tree and used it to identify a partition of the n-processor

assignment graph that represents (hopefully) a near-optimal module

assignment* The heuristic algorithm yielded very good results in a

variety of test cases based on randomly generated graphs* In a prag¬

matic sense the resulting assignment are almost always optimal, and when

the resulting assignment is not optimal, its cost is usually very close

to that of the optimal assignment especially when compared to the aver¬

age cost of all possible assignments* The time complexity of this algo-

3 rithm is bounded by 0(n(m+n) ) where n is the number of processors and m

is the number of modules in the system or in the reduced system.

72

1.2 Suggestions for Future Research

Since the heuristic algorithm is not guaranteed to yield an optimal

result» we can use the solution it produces as a starting point for an

iterative search method such as the one described by Bryant and

Agre[12]. Suppose we have n processors and m modules in a system. In

the search algorithm they employed» they pick an order in which the

module nodes are to be considered. They select an initial assignment»

compute its cost» and then perturb it to see if this cost can be

reduced. They consider assigning the first module (call it M^) to all

processors other than the one it is currently located while holding all

other module assignments fixed. If any of these alternative assignments

of leads to a lower cost assignment» we move to the low processor

which minimizes the assignment cost. Then we repeat the process using

the second module M^. After finish with the mth module» we continue

with M^. When we reach a point such that moving the module under con¬

sideration to a processor other than the one to which it is assigned at

the start of the iteration does not lead to lower total cost» we use the

current assignment as our best guess for an optimal assignment. Note

that in each iteration we must examine n-1 different assignments» and

the time required for each iteration is therefore 0(mn). By making an

informed choice of initial assignment using our heuristic assignment

algorithm» we might expect fewer iterations to be required» and we might

also hope that by starting "closer" to a globally optimum solution»

there is less chance the algorithm will stop on finding a locally

73

optimum solution (relative to the assignment of the module under con¬

sideration when the algorithm terminated).

This thesis has considered the problem of determining an optimal

assignment for a program in which each module is constrained to be

assigned to a single processor for the entire program execution. The

module reassignment after the beginning of program execution is elim¬

inated. The cost functions used to construct an assignment graph are

only the aggregate module execution costs and intermodule communication

costs. If we allow modules to be reassigned during program execution,

the cost of reassignment will influence the cost of the optimal dynamic

assignment. The dynamic module assignment for the program in a distri¬

buted system (especially an n>2-processor distributed system) is the

open problem of the future research. The model we used for the statis¬

tic case can be extended to the dynamic case. The function cost which

should be considered can include the module reassignment cost and the

module residence cost. The latter is incurred even when the module is

inactive. It is a function of the processor on which the previous

instance of that module is executed or the next instance is to be exe¬

cuted. The former is incurred when the module, may be the inactive

module, is reassigned during the program execution.

If we have a G-H tree generated from a module assignment graph, let

some nodes in the G-H tree represent processors and some represent

modules. Because of the feature of the locality in the tree, we can get

74

some heuristic knowledge about assigning module nodes to processor

nodes.

First# if a module node is only attached to a processor node# then

the module should be assigned to that processor* For example# in Fig.

17(a)# we can assign module nodes C and D to processor node Pj.

In addition# if one module node is only attached to another module

node# we can condense the two nodes into one node* For example# we can

condense node F into node E and assign F to whatever E is assigned to*

In this way the assignment problem can be simplified* The pruned tree

is shown in Fig* 17(b)*

For module nodes attached to several processor nodes# there are

several choices for the assignment* Each is formed by means of breaking

a group of edges in the G-H tree. We use the ”+” operation from Boolean

algebra to express choosing this pattern "or” choosing that pattern*

The combination of corresponding edges broken to form patterns is based

on such principles: If a group of module nodes is attached by n proces¬

sor nodes, in other words# there are n parallel branches in the sub¬

graph# then n-1 branches have to be broken in one pattern* We use the

"** operation from Boolean algebra to express breaking this edge "and"

breaking that edge* There may be several serial edges in one branch*

If any one of them is broken the branch will be broken.

75

Figure 17(a)

Figure 170>)

76

Thus for the subtree in Fig* 17(b)* to assign module nodes A and B

we can choose links to break like this:

a*(c + d)+b*(c + d)+a*b

It is

a*c + a*d + b*c+b*d + a*b

Where a * c corresponds to assigning module À to P^ and B to P^, and a *

d corresponds assigning both module A and module B to Pj.

In this way we might expect to greatly reduce the amount of search¬

ing required for finding the desired assignment. This reduction is due

to the fact that the principle to form the assignment has eliminated

many of the original n° patterns from consideration* We have a limited

number of patterns to choose instead of nm* Recall that for a system

with n processor and m modules there may exist nm assignments*

As the use of computer networks matures* these and similar problems

dealing with the distribution of a program's execution to take advantage

of the various resources available in the network will become increas¬

ingly important* The results contained in this thesis will help provide

a basis for further research in this area as interest in it continues to

grow

REFERENCES

[1] 3.S. Stone» "Multiprocessor Scheduling with the Aid of Network Flow

Algorithms." IEEE Trans» Software Engineering SE-3(1), pp.85-93

(January 1977).

[2] H.S. Stone and S.H. Bokhari. "Control of Distributed Processes."

Computer 11(7). pp.97-106 (July 1978).

[3J R.E. Gomory and T.C. Hu, "Multi-Terminal Network Flows," Journal of

the Society of Indnstrial Applied Mathematics 9(4), pp.399-404

(1961).

[4] H.S. Stone, "Critical Load Factors in Two-Processor Distributed

Systems," IEEE Trans. Software Engineering SB-4(3), pp.254-258 (May

1978).

[5] J.B. Sinclair, "Critical Delays for Optimal Assignments in Broad¬

cast Networks," TR No. 8210, Department of Electrical Engineering,

Rice University, Houston, TX (August 1982).

[6] D. Gusfield, "Parametric Combinatorial Computing and a Problem of

Program Module Distribution," Journal of the ACM 30(3), pp.551-563

(July 1983).

77

78

[7] G.S. Rao, H.S. Stone, and T.C. Hn, "Assignment of Tasks in a Dis¬

tributed Processor System with Limited Memory." IEEE Trans. Comput¬

ers C-2S(4), pp.291-299 (April 1979).

[8] T.A. Gonsalves, "Heuristic Algorithms for Distributed Processor

Scheduling with Limited Memory," Master's Thesis, Department of

Electrical Engineering, Rice University, Houston, Texas (June

1978).

[9] J.B. Sinclair, "Optimal Assignments in Broadcast Networks with

Transmission Delays," TR No. 8205, Department of Electrical

Engineering, Rice University, Houston, TX (July 1982).

[10] J.B. Sinclair, "Dynamic Assignment in Distributed Processing Sys¬

tems," PhD Thesis, Department of Electrical Engineering, Rice

University, Houston, Texas (August 1978).

[11] S.H. Bokhari, "A Shortest Tree Algorithm for Optimal Assignments

across Space and Time in a Distributed Processor System," IEEE

Trans. Software Engineering SE-7(6), pp.583-589 (November 1981).

[12] R.M. Bryant and J.R. Agre, "A Queueing Network Approach to the

Module Allocation Problem in Distributed Systems," Performance

Evaluation Review 10(3), pp.181-204 (Fall 1981).

[13] W.W. Chu, L.J. Holloway, M.T. Lan, and K. Efe, "Task Allocation in

Distributed Data Processing," Computer 13(11), pp.57-69 (November

1980).

79

[14] V.B. Gy lys and J.A. Edwards, "Optimal Partitioning of Workload for

Distributed Systems," Direst of Papers. COMPCON Fall 76. pp.353-357

(September 1976).

[15] T.C.K. Chon and J.A. Abraham, "Load Balancing in Distributed Sys¬

tems," IEEE Transactions on Software Engineering SB-8(4), pp.401-

412 (July 1982).

[16] L.R. Ford, Jr. and D.R. Fulkerson, "Maximal Flow through a Net¬

work," Canadian Journal of Mathematics 8(3), pp.399-404 (1956).

[17] T. C. Hu, Integer Programming and Network Flows. AddisonrWesley,

Reading, Mass. (1970).

[18] L.R. Ford, Jr. and D.R. Fulkerson, Flows in Networks. Princeton .

University Press (1962).

[19] J. Edmonds and R.M. Karp, "Theoretical Improvements in Algorithm

Efficiency for Network Flow Problems," JACM 19(2), pp.248-264

(1972).

[20] S. Even, "The Max Flow Algorithm of Dinic and Karzanov: An Exposi¬

tion," Report No. MIT/LCS/TM-80, Laboratory for Computer Science,

Massachusetts Institute of Technology, Cambridge, Mass. (December

1976).

[21] V.M. Malhotra, M.P. Kumar, and S.N. Maheshwari, "An O(lvl^) Algo¬

rithm for Finding Maximum Flows in Networks," Information Process¬

ing Letters 7(6), pp.277-278 (October 1978).

APPENDIX

f re

qu

eu

cy

31

I

Si n n VOt^COCOOOOOiHOi-l CO ri

m CO

SÊ 8? o en h o <n h o co vo o co vo O CO NO O CO VP o co VO O co vo O CO VO O CO VO O CO VO D CO VO

&&&&&&& O CO h O CO Is o O CO vo O CO VO o O CO vo O CO vo O O ro vo o co vo O O CO VO o CO VO o O to VO O CO vo o

44 p« os u co a>

44 <4-» • « n o o *o

«M O d

d o a> •^4 *H ■M d o *o d o •d 6 o u cs

r-l <D N *d

d (0 03 44 w CU <0 o3 *0 u o «o d 4) U

44 O <M <0 0) <H 4> O o

O

. H

isto

gra

m

wit

h

3 pr

a> d N O

00 rH

•H 4> «0 4-» U

o d 44 P 00

Pi T3 fltf 4) P P oo

o oo vo *n co H N co

o oo vo m co H O m in vo h oo o\ o

000000%: >>>>>>>>>>

> rrrn-rn-rrr)

>>'>))

:r/7

■/ ■/

) LIA

82

o S 4>

OcnOOOOOOOOOOOOOtHO H Cï

o d N O

fOvoo>(SôOH^,voa\N»nooH^,rÔ o •nOinHvoHhNhMoowoo't^ *3* © *d d WhO^hHÔOH^Wciin^N'OO P« *0 cj^f,o\H^'o«Htnm«ocnohO d d ÛOVO^NHOShVi^MOÛOt'^WHO U U OOh\Oin^«HOO\OOh»0^ffJMHO 00

(0 •d Q< «3 P 00

<D ,d 4-» •

CO n 0 0 *o <w O

d d 0 0 •T4 4-» d O ’O d 0 •d a <D U r-

T*i 0> N *0 •*4 d « d

•d CA a d d "O *4 c 00 d

4> M A 0 +■» CO

CA (M 4> O O

O a U d p< (4 00 rr> O •M .d (A •M •H •*4 en *

• o> rH

O M P 00 •H fci

OmHhmOymHhNMÔVONOOÔ HHcs«w^t«n»n\ohhOOCoo\o

000000%

: 7

5

* > n

imi

) i r

rrrr

r / n

11m

) -J

21

rn

rrn

r/v

v v i

545455%

: 9

2Z

ZZ

3

83

Q 3 Q>

A

s s

OOOOOOOOOOOOOOOOOOOr-l«n

V tf N O

o\ôO(r>r*NvoHôv)OMfooror^M'€H<no o OVOHhC<OOcOO>ÔinO'OHhC<OOtoO\^0 ,3 3 ONcnoodr-rHvoo*no^to\<nooc^r*^HVoo«no 0**0 O^HhMCO(nO\^D«ftO'OHh«eOcnONtO It 0> 0\ffJ»c<>H'00^0^0\WOOMhHNOOlOO H H O'OHhM^ïO^ÔlOOVOrthMOOWONVO U>

O 0\ W w (S hHVOOÔ^^rOOONhHVCOViO

Fig

ure

20.

His

tog

ram

of

the

gra

ph

size

red

ucti

on fo

r th

e

gra

phs

wit

h

3 pro

cess

or

nodes

and

22

module

nodes.

00

00

00

%:

g g

rrrr

rrr n

ni

/ trm

n ) u

n / n

i) n

// u

n 7:

2:11

84

>> O S Q>

D \± ^ C40000000000000000000000000

r-4 O N

'^CQ<S»n0\fnt^rH»n00<SV0O^t*00cS«O0>C0r^rHV^00cS'OO «nO'oHvo«shwwwoNÔ‘no'OH'ô^htnoo(o^,i,o r=< Ht0,tVCh0\OM««nV0C0OHW^VOh0\O<S(0V»'O»O P« sOCSOOÔVOfOONtnHt^CO D\ÛMMÔ'Orr>0\>nHMnO «* Ô^ffi»(nhM'OHÔ«nOÔ\MOOcOhNVOH\OOIOO V* OOVOiOfO(SOO\h^^fOHOûOVÛinW(SOO\h'Û^MHO «0

Otohrl^^WVOOÔOÔOWhHlftChcnVOOÔONVûO HHHMM(nco«n^îo«n«ovo\ovorHr%oooocoo\ô

red

ucti

on

Fig

ure

21

. H

isto

gra

m

of

the

grap

h

size

red

uct

ion

for

the

gra

ph

s w

ith

3 p

roce

ssor

no

des

and

26

mod

ule

no

des

.

ZZ

ZZ

Z

85

o a

«n vo vo o\ v© oo H H H H (S

O P C4 O

# ï£ ^ ^ # M«M O O O O O O O OOOOOO *P P O O O O O O Ck *o oooooo id a> OOOOOO WiM OOOOOO «o

10 •P o« «s 60

a> WP

t4 to O 4>

<w *P O

a q O

a> 44 »-H O p P «o

•O O P a 14

tn <u N *C

•H p to «

TCJ P P< <o P *o H O 60 p

O H «P O +4 V)

to <H O O o

O B U P & Wi 60 ^ O 'M ,p (O 44

•H «H w *

cs CS

O w P 00

•H E*

OOOOOO cl S’ vo 00 o

fre

qu

en

cy

86

M HH TH

© d N O

* * *H *rt © -M

OOOOOOOOO O ooooooooo «dd ooooooooo pu *a ooooooooo © © OOOOOOOOO U U omomoooioo *o

rd CU © H 60 © •d -M

• ©

o © o

d d o

•H © 4-» »—» © d d

o © 6 m

60 © N •Ü

•rt d W ©

*d M cu © © *a u o 60 d © u

wa © -M ©

© <W © O o

o 8 © Q* oo o

td W 4-»

•f* •*4 a *

• cn

© d 60 to

ON^hOînhO H<sco«n\otôoo

frequency

87

•ft rt H Q> A N O

• • •• •• •• •• •• •• •• •• •• •• •« *H *H w -v*

oo\»h\o»ft«ft^(nNHO o OOHN«^«ft'0h»0\O »d P O0\00hVôio^wNHOO 04*0 OOrlNW^Mft'O [> OO 0> O «S 4> O0\»hV3«ft^«NHOO U U OOHN(ft^<ft'0h°°0\ O OO

OftWhVOlft'tWP^HOO HCSfft^'ft'Of-OOONO

(0 ,P P< « u 00

o

4-» • 0)

m <o o *o <W o

P P o © •H «M d O P O •o a <D Vi iH

rH <D N *p •H d (A oa

•P (A & O o *o Vi o 00 d

© VI o

«M (A (A ©

O O o

E Vi P OI U 00 <** o *M *P «0 4-> •H •ft u *

cs

o

p 00 •rt

IX

freq

uen

cy

88

B XL VOÛOmOOOOOOOOOOOOOfH

£ B* Eê

O O O O O O O O O O O O O O O O O O O O O «n O «n O OM«OhO

* # 8? ^ O O O O O O O O O O O O O O O O «n © m © N irt t' O

ï£ B* & & O O O O O O O O O O O O D O O O «n © tn © M »n ©

* B* o © © O © © o © © © © © © © © © «n © m © n «o ©

U U 00

V) Pk « U

©

W O

£ O

O Pt o n © N

rfl eu © M 00 ©

rP

6 od M

a

» <n

© £ O

•rt *rt © <0 -M M

o P

fCt P* aO •rt

eu *o « 4)

(*

O'ONOOVJHt^tnO'OdOOlOHh-fOO ^Hr-f<SfOfo^«n*nvo'orôooooN©

wit

h

4 p

roce

sso

r n

od

es

and

16

mod

ule

nod

es.

000000%

:

)>?'

) T ri

im

nann

rrm

i

89

>1 o rt 4>

H - - mOOOOOOOOOOOOOOOOOOO *H

<o C N O

»AO^,3S,t^tn00w00(ShNhH'O.H\0O«nO o OHHHfqMWW't^înVOVOhhOOOOÔNO P ÔOhVOV)^(ONrtOOyW^VO^Tf<nc<HOO 04*0 HW«OhO\H(n'nhO'ON^'O00ON^\O00O «s o vor^wtohm^^HOOôvoNOsmHhwo Vf MftC<0««nwO»\0(nHO\>O^HO\h^MO 60

O^0\^0Nf0WWC0«hMhH'OH'OO‘OO«no rlHf<NWW,t^»n|ft'OVO^hOOOOO>OsO

Fig

ure

26.

His

togra

m

of

the

gra

ph

size

reducti

on fo

r th

e

gra

phs

wit

h

4 p

rocess

or

no

des

and

21

mo

du

le

nodes.

000000%

: <

n\U

T'r

TT

‘> ) )}

rrrrrr

) i )

) i}

} ) )

) ) ) )

) ) 11}

) i n

) ) )

i

90

a o (3 N O

OOOOOOOOOOOOOOOOOOOOOOOOO O OOOOOOOOOOOOOOOOOOOOOOOOO ,3 P OOOOOOOOOOOOOOOOOOOOOOOOO 0**0 OOOOOOOOOOOOOOOOOOOOOOOOO c«4> OOOOOOOOOOOOOOOOOOOOOOOOO * H OOOOOOOOOOOOOOOOOOOOOOOOO t*

O^CO^'OOl'OO^VOO'tWlSveOÔOMVOO'tM^VOO HrHcsc^c^cncn^^f^rin^nvovovpr^rôooooooNOô

Fig

ure

27

, H

isto

gra

m

of

the

grap

h

size

red

uct

ion

for

the

gra

ph

s w

ith

4 p

roce

ssor

nod

es

and

25

mod

ule

nod

es.

freq

uen

cy

91

HMW HH <D d N O

• • •• •• •• •• *H *v4

# ^ « <M O O O O O O o o o o o 44 d O O O O O P< *o O O O O O d 4> O O O O O H O O O O O t*

CO

Ql (0 u 00

o

n M o <D

«H *o o

d d o

4> w ^Hl o d d •o

*o o « 6 u 4> N *o

•H d (A «8

44 CO p< o d *d H o 00 d

o * 44 o •4-* CO

CO «M o O o

o 6 u d p4 * oo •n o 'M 44 <0

•H •H •H *H » *

•

00 cs *> m d 00

•H C*

o «n o m o CJ «n f* O

frequency

92

H (H H H o (3 N O

<« +J O^^ffjhHVoO U OHd^VihOOo «40 Oh^rtMinno ou «o omn^csooô «j « owhm^(SrtO i4M ON«OOOrt tbO 00

O 00 (SJ h H VJ O

r( «Tf «OhOO O

«0 ed Pi a u OÛ

O

«w •

w. CO O A> «H

O ci d O

d> •M *-* O d

•a *T3 o U O

«M CO CA

<W 4) O O

O 6 U cd Pi t* oo «n o 4-» •d O» 4-» •<H •H W *

ON cs

<0 U O 00 •H cx<

frequency

93

CO

44 o« a u eo

o 44 •M •

CO

U 0) O *o

o G

o <D •rH 4-» P O •o P o

*o 6 <D W o

rH Q> N *o

*ri P CO P

44 CO

P< o OS n O 00 a P N O

o cn

•H *H Q> CO (H

o P 44 P 00 «f4 cu*a CSc

OS o »-i m 00

ooooooooooo HNto^'nvohWoo

I}}

n JI

} ) /1

) //1

u / )}

t}>

j u}

I n n

}}

) i nr/'j.l

ZL

:%O

OO

OO

O

94

>% o a 4>

CA

A ÇU 04 U 60

4> A 44 »

M U 4> O TJ

«H O G

G O 4>

+4 0 O »Q G O

TJ 6 o * «n

4> N *o

G <0 OS

Qt O o

o 6 it a eu u to m o 44 (A 44

w *

iH'AOfHOOOOOOO d

h«Oh«OhwOh<n VOCOONOCOONOCOOVOCO

©COOV©CO©V©CO©VOCO VOCOOVOPOOVOCO D \o w ©coovocoovocoovoco vo co o vo ro © v© co © v© co

O O O rH

4> 0 iH

N O CO

•rt "H 4> eÊ e* ^ s* CA 44 U O t*'* CO o © P

O WD CO O ^ P 60

O V© en O eu *© bu O CO o 04 <D © V© co O *4 fc

O © CO o 60

ovofoovocoovocoovocoovoeoo HNMW'tt^VOVOhOOOOÔ

\ ))>))>

)) rrr/ rrrm

11 ) n

? n'> i > > n

) n i n

TT

7\ T

6 :%

00

000

0

95

o P 4)

OOlHOOOOOOOOOOOOOOOOOO

oooooooooooo oooooooooooo oooooooooooo OOOOOOOOOO DO OOOOOOOOOOOO oooooooooooo

q> P H O

• • •• •• •• •• «H «*«1 «A+»

oooooooo o oooooooo Æ 0 oooooooo o* *a OOOOOOOO ai 4> OOOOOOOO u u OOOOOOOO &Û

0«n©*oo*o0tn©ino«o©*o©tn©«no«no HH«cE<n«^r^viin\oyohh00ooâo

Fig

ura

32

. H

isto

gra

m

of

the

grap

h

size

red

uct

ion fo

r

the

gra

ph

s w

ith

5 p

roce

sso

r n

od

es

and

20

mod

ule

no

des

.

000000%

: 1

nn

/ 7

77

77

)>

>))

)) n

I )

)})')

i i

i }

11

i )

1} }

Un

iLU

.ll

96

o a a>

oooooooooooooooooooooooo a> £ M O

•• •• •• •• •• •• «H

r*coo^(nor*coo^foor^(not^«ots'foo^coo u vocoovofoONocnovooiovocnOVoroovomovocno & 3 vocoovûeoovofoovocnovocnovocoovocoovocoo p« *a ^pf00vocv)ovo<no>0700\ofnovotoovofoovo(no «s « vocoovocnovocooôcnovofoovoroovocnovomo u u Hn'n^°OoH(nin'ooooH(n«n\oWoH«n<n'oWo w>

o^«c<voo«o\whHV)0ôoMô movcorîHioo rHrHnc<C4fOCO^,^«OV>«OVO'Ol^t^rÔOOOONO\0

Fig

ure

33.

His

tog

ram

of

the

gra

ph

size

reducti

on fo

r th

e

gra

ph

s w

ith

5 p

rocess

or

nodes

and

24

mo

du

le

nodes.

m 11 n

rnjj.n

.in n

ut m

"nrn

, i I.LU

i

97

>% 0

p © P

01

Vi

s

\

\ s s

CO 44 a* «1 n 00

o 44 4-»

• V* <0 o o

*o o

P P o •H © •M H Q P P ’O

o © a Vi

NO o N *o

•M P CO at

44 CO 04 © aJ •O $4 O 00 G

O V« 44 O 44 CO

(A CM V o o

o s Vi C3 OI Vi 00 N© O 44 44 CO 44

•H •M « »

o\H^t o d N Cï H

© p CO

N O © •M >M Vi

# eR ï£ # CO 44 P 00

*9*4 OMnOhW o o o NO co o NO co o A p Ci« o vo co o vo co o p* *o o vo CO o vo CO o at © O VO CO O NO CO o Vi Vi O VO CO O VO CO o 00

O NO €0 O VO CO O H <n »0 VO OO o

f re

qu

en

cy

98

(S ^ H H 4> fl N O

CA

Pi cd d 00

o id 4-»

• to CA o ©

•a o

d d o •H 41 «M H O d d *©

*o o © 6 is

m V N *d

•*S d CO od

CA

p< 41 01 *o u o oo d

© Is fd O «M <0

«A O

o o O e is

cd Pi Is 00 l> o

s-> *d (0 S-»

•«s •H W >

m co

# tfî ïf* »+» o o o o o o » OOOOOO «40 OOOOOO 04*0 OOOOOO «4 4> OOOOOO u u OOOOOO <*

© u 0 «0

to

OOOOOO N'tvo WO

frequency

99

«n es o d N O

&&&&&&&&& « ^ ooooooooo o ooooooooo ,d d ooooooooo 0**0 OOOOOOOOO d P OOOOOOOOO U U O'OO'AO'ttO'nO 00

09 .d eu d *4 00

P •d 4-*

«4 C0 O P <M

O d d o •fi p +4 M O d d •o *o o O B U

00 P N •a •ri d (A «s

•d CA P« O d *o u o oo d

P U fd o +4 CA

CO CM P O P

O S P d Pi P oo r- o +* fd <o •M •H n *

CD CO

P m P 00 •H

OM«OhO(SiOhO HNm^SOhOOO

frequency

100

H 3 3 N O

3 +» OÔNWhHVO O O OHcS't'nhOoo ,3 3 ObtHOOirtMO O* *X3 O«OHh<Sfl0Tf D ot 3 OOO > >n M H O U U Oc^mOOH^tÔ 60

3 ,3 Pi

P to

0 *4 +*

• P 3 O 3

*o O

3 a O •p 3 «*■» f-H O 3 3 •O

*a O <0 a P

r- 03 M *o

•H 3 CO 3

,3 CO Pi 3 3 •O P O 60 3

O P *3 O

3 3

<M 3 O O

O E P 3 Pi * 60 00 O

•M ,3

a >

cn

a> P 3 60

•H fcl

hH*no H d ^ m h 00 o

101

<W o G

S E ' * P P £ * £ P P P * P P P P P P P P P P P P P P P P P 0 o -H o U 4-» NO M* en NO 00 NO es p- G 00 O o O o NO G O O O NO O O O NO rH P- NO it <M 0< N ON G 00 p- 00 G h- ON O o O o O O CN O O 00 O O NO G rH C— r- 4-» o G G Ov r- G 00 en m rH ON rH P- o o O o p** 00 P- NO NO es NO rH e'¬ en 00 O rH

1-1 G U G o O NO rH o 00 O O NO o o O o G en 00 CS CN ON r- NO en 00 t- ON NO ► o as en H rH rH en es rH NO rr rH rH rH rH rH rH rH rH «G- es rH rH rH rH o 6 o •o 0

<W o B c

O il M as P * P P P P P PPPPPPPPPP iP P P ^ iP iP cP «P «P P

o <4-4 6 NO rH o 00 vo es r- if 00 O O O O NO G O o rH «o O O O NO rH r- NO iH 4,1

•H G ^

CS G G 00 t- 00 «o r- ON O O O O O O ON o 00 00 O O NO G rH r- r- G G Q« ON P- G 00 en G rH ON rH P- O O O O P- 00 P- NO O CS NO rH r- en 00 o rH •H « O O rH NO rH O 00 O O 40 o O O O G en 00 CS rH ON r- NO en 00 C-> ON NO

E en ri rH H en CS rH 40 rf rH rH rH rH rH rH rH rH G rH es rH rH rH rH

*o

<W H *H O «a as

B 6 c o •H 4-i 44 P PPPPPPPPPP # # # P tP P iP P

i< o< o. o O NO O O o O O O O O O O O O O O O rH O O O O O O o O 4-» Of

o o o O NO O O o O O o O O O O O O O O o rH o O O O o o o O i-4 *> n B M* ft o O G o o o O o o O O o o O o O o o 00 o O O o o o o O O

w w O H

•o J3 <H»

U as as B en en G r* G* ON CS f- ON NO ON ON ON ON G en rH en O rH en IO NO 00 en ON G as -H es 1* en es CS CS rH G en en en en en es en rH G es en CS en NO 40 en en G -M 40"^ *"N* - P, G •o O •O 4-> O

o o G en en en r- G ON CS t- ON NO ON G en rH en e'¬ rH en NO NO 00 en ON G •H B es G en es es CS rH HT en en CS en rH en «S en es en NO NO en en 4-> (4

B w o 6 w es vo ON 00 00 en en G en vo G es en ON rH G O rH p- G en 00 en p- a ce G es en G G en ON NO G vo o rH e- rH ON o rH es 00 rH en rH ON o

•HI B «H rH rH rH rH rH rH rH rH rH rH rH rH rH rH rH rH rH

CO CO « en en r- en en r- o t- en f- O O NO NO NO O O NO NO O O O O O G en en NO -en en NO o vo en vo O O CS es O NO t- CS NO G G G NO

G • • • • « • • • • • • • • • • • • • • • • • • • O en ON es ON ON 00 CS es NO 00 oo if G ON 00 00 es O NO NO NO G G IO B O 00 ON ON ON 00 NO G ON rH r- 00 NO 00 NO e* 00 ON vo 00 O ON e- 00

rH rH rH

G u O a> H

*o <M G NO NO NO Vo vo NO NO 6

6

NO NO NO 40 NO 40 40 NO NO NO NO G G G G § O 13

G O B G W4

M O o G

*o <H G

§ O G en en en en en en en en en en if G G If G* G G G G G NO NO NO NO

(3 O W P«

m O H es en G en vo r- oo ON O rH CS en G »o NO e* 00 ON O rH CS en G Z rH rH rH rH rH rH rH rH rH rH es es es CS CS

Tab

le

3(a

)

102

CM *H O 03 e e tS O «M O U 4J

•M CM P« 4-1 O

,tOVDH<^Hh'tnHV)OOSNHriyoO»0(SW^hNV)(OVOW oo o\‘ONh*»worio\H(ShHô\0^ctvefnvemdMvo«o

d a •M «a ^ ► d d o G o *o 0

S

VPH^yoN(n»VOhOHÔO(Sh«lOV)ct\oOO\OHOH»ON HOh(SON\ÛOWh«H(S‘0»0\0'Û^roHOW(SWVOin»A H H H H M 1H r-t NMdrOHNlTH-ICSCSH

CM O 8

O 0 U ai

&&&&&&&&&&&&&&&&&&&&&&&&&&&. O CH 8 *H *M •M d «M

o VO H Os H^mH‘AOO\(SHCOnOV)NOO^h(Smn'OM WON^NhCOffJO HOSHMhHVOWO^^VOW'OÔONVOm

d « Ûi •M «> O > 8 o

*o

voiMOsvo«Srf)fovotÔ^H»oeocsvoOcncr»^vooo\oiHOr-|oôos HOh(SON«oooh«H^m\oh\oo\fnHO»(s»\o‘ntn H H H H M Ml Ml MrtrO(nH(S«OHMMH

CM ~4 »M O cd d .ES d -M -M

O 44 4-» •4 ft P) 4-» O O d

000000r*000000000ir>00000000000 000000*0000 OOOO00N0OOOOOOOOOOO

•M M 8 ► d O d d *4

•O d CM

000000000000000*0^4*00000000000

St

"near

op

tim

al"

«Nm»(nH»'0(oo\N^mff>voiftO^HOPoirtHhff>0(o cn^^N^ncnfo^dcoNMvotn^^vwcswôcr^rinvo

Ml

O U d

•H 8

44

cociio^(f)H»nvotn^(S^fO(n^tno^HOOotnHhmOtn

TH

sig

nm

ei

max

mvoH(SOMn^^ÔfoN(n«oooooMinNMfo\op'»(SOoo OOOWO»hOOH»H^«HVO»HO>Oh(nêOO\(fjVOH H H H Ml Ml Ml Ml iH (SHcSHCSMtSH^NM

d d d

O OO OOOhhhhhhnwhfOOOhhhOhOn toO ^roooosocsoMiMiMivoMivocooovofnoovovovoovoocncnco

d B

CS’d*0*OtÔO«^^frO\VO*4,rHOO«OVOf’OVOO^*V©VO^fr*<S»OOSfrs oooooovooo\ovotôsvoosco«ooscnr^^*mroM(vorôs(ncor^vo

HHHHHHHHHHH

nu

mb

er

of

mo

du

les

't^^^'tfddddinden«dd0)0(0\0\ô\osôsosoo

nu

mber

of

pro

cess

ors

cocotococotONovovovovdvpso\ovovorr)(n(f>(ncn«o(nrocorr>^

No.

ir>Norôoo\OMi(S<n*4riovorôoosOMi(si(o^cosor*ooosOtH

Tab

le

3(b

)

103

H o ft

p a a O «H

* # fcR R |R * R ER £ * R ER ER ER R ER R * R tR * # & & * tp o M 44 O o V) cs ON vo O Hf O cn vo o o cn o *<r rH 00 vo H 00 ON vo ON o •H 44 Qt

o o o *8- 00 H- vo H* ON cn r* Hf •© o o •© vo o o 00 cn m 00 ^8* ON vo 00

« d •© i© H ON s* cn cs ON r** o «© o o o vo 00 00 ON rH 00 H O cs cn •H ft u H H 'HI 00 4f vo cs ON 00 vo ON cs cs ON cn H H o cn ON cn 00 ̂ 8* 00 00 > « «8

a © d

rHI H H CS H cs H H CS H H H CS cs rH H H H rH rH H H rH •o

<W o a

O f-l 14 ft

tR R # ER * R # # * ERCRERERERIRERERERERIRERER R EP ER o 4-i a VO O i© <© ON vo O H* o cn o o o cn O O O O VO VO ̂ 8* 00 rH o ON H* •H ON ON h- r- ©■ vo Hf ON cn r- H* «© o o <© VO NO vo NO l© cn l© ON vo <© 44 d 44 «8 ft o< cn «n 00 cn ON cn cs ON o V© o o O VO vo vo vo vo 00 VO ^8- O CS «© •W © o cs cs cs ON *8» vo cs ON H* 00 vo ON cs cs ON cn cn cn cn 'ft* ON cn ON Ht* 00 00 ► a H H H CS H cs H H CS H H H cs CS H H H rH H H H H H •O

<M H fH o ft ft

a a •*4 **4 (3 o ER

•H O, P< r** r- en O o O O o O O O o o o O H H vo O O 00 O o H* 44 at o o

H H o H O o o o o O O o o o o o «© <© l© o *8* o cn O o ON w •«4 H B H* vo en o o o o o O O o o o o o 00 00 CS vo o CS o v© o o o ► * o H © © <4 no J3 <44

H U ft © a m <© cn ON Os Hf cs 00 H H* cs vo vo vo r* rH H cn <© f* cn r- 00 cn H V fl t- r- vo cn *©• cs ON cn cn VO ON r- •8 1- <© <© N© vo vo oo 00 ON oo «© o P 44 #»*s. H - Qi © e© ft •© ft •© O •© 44 O

«0 O o P cs o CS ON ON cs 00 H H* cs vo vo H r- H H* cn cn H vo •*4 vo cn «©• cs ON cn cn VO ON *8* vo vo 00 00 ON 00 <© O a rH •M a 4) 6 M m vo oo cn H o •© vo vo rH ON 00 00 ON *© vo o cn vo d © H r* vo vo ©• 00 CS cn o NO 00 00 cn 00 o ON V© cs Hf f* cn «0 a cs H H H rH cs H H cs cn cs H H H H cs H cs cs CS H cs ••4 0» tft «8 <© m n m o V© c© o o o o O O O o O o o o f*

p cs cs o cs r* o t** 1© oo 00 cs 00 CS CS o 00 o rH vo «8 • • • • • • • • • • ft • • • • # • ft • • • • © rH r* vo cs «© o cn cs <© cs rH o vo ON o o vo a VO cn CS CS o H ON cn r* H* H «8* H* rH •© vo •© o oo o HP ON

H H HI H H CS H H CS cs H H rH H rH H cs rH cs rH H

14 « ©

© H

g <44 P o *d

00 00 00 00 00 00 00 00 oo r*. r* r* f* r* t* r* r- vo vo vo

P o d a

0» «4

w o t> (A

,© 44 «0 I O © H- ^8* H- H* H- H* 't «© 1© »© v© <© 1© i© 1© i© <© vo VO VO

d o 44 04

• © N cn H* •© NO r* 00 ON O H cs cn H* V© vo r^ 00 ON o rH CS cn z «© vo v) m vn n <© v© VO VO vo vo vo vo vo VO vo vo

Table

3(c)

104

«H ri O B

a 6 o

6 -r4

1* # * * * * * # O * * * * * |R # P * #*****#*# o U 44 t*» H o OS os 00 ri 40 NO CS CS en 00 en ri en en Os O O NO 00 ri es os 40 vo en •*4 a»

o H CS 40- 00 00 CO OS 40 ri 00 40 en 00 NO ri en vo en en o 00 Os ri o 00 40» ri t**

B a CS i0* 00 00 00 CS 40 CS ri CS o 40 en ri 40» 40 ri vo CS en 40 ri t'¬ r— 00 t- o\ r- •*4 >

B «

14 B

h* 40» ri

s* ri

CO ri

CO ri o ri

ri ri

t*- 40* CS CS es es en es CS es es CS CS es es ri 40 S* B B O *o a

s

«M O 8

a o u

^4 B

(RtRaRtRtR#£tRtRlRlRtRlR^t£t£l£#^^ ********

o 6 r- ▼H O os OS 00 ri 40 •O ri m es es en ri OS 40» 40 «o *4» so 40 40 CS NO «o 4o en •H 4-» a

•H +J

H CS 00 90 CO OS 40 CS «H 40 40* en <o ri Os *40* vo 40 Os 00 es 00 r- en 0 W h 00 « Pt CS 00 00 00 CS 40 CS VO ri t'¬ CS OS ri 40" CS ^r vo O en 40 s- 00 es vo r- H h •H o O r- *0* CO CO O ri r* 4* en es en S* en 40 en en en es es en es en es 40 <S 40 ► 8 ri ri ri ri ri ri

*o

r «M •■i ri O B B

8 8 a o

•*■4 44

•H 44

* # tR tR tR |R tR tR * ïRlRtRtRfcRlRlPlR * «RtRcRtRcR^^tRtR •H o« Qt o O O O O O O o ri 40 CS Os O o 40* Os ri 40 vo O NO r- ON 00 o o 4-» O O o O O O O O O o vo 00 vo 40 O o Os 40 ri r-* r- o o en "0* es O ri O •H U 8 o o o O O o O o en vo 40 en ri o o ri 00 vo o o o ri 40 NO O ri O ► B O ri ri ri ri « O 14 •o a 44

H 14 B B 8 NO CS CS 40 40 NO VO 40 ri Os Os 40 ri ri en 00 00 O vo "t vo CS vo o O OS CS O •*4 O t- 40 4* 40* 40 vo 00 o vo o l- en O es 40 CS 40* h* vo r- oo es 00 es ri ri en a 44 ri « CS en ri es CS CS ri CS CS ri CS ri ri es ri CS ri CS ri S P< B «o

4-» O

O o a

«H VO CS cS 40 VO VO 40 4t CS es r- ri CS 40 00 CS en ’B* 40 OS Os 00 1 t- 1 O t"- CS e O 40 »s* 40 vo 00 os 40 OS 40 o O CS «O ri CS vo vo t*» vo ri t-» 1 o ri ON en

44 B

ri ri CS CS ri es CS CS ri CS CS ri CS ri ri CS ri i CS ri ri ri

N B 6 H OS ri 40 os NO «H NO 4* 40* OS 40 os 40 Os OS «0* 40 r- os 40 CS ri es Os en 4o oo a B H ri 40 CS CO r- r- O 40 OS en o O ri 40 ri CS es 40* en s* O 40 r- r- «o o 60 •*4 10

8 CS CS ri ri ri ri ri en en en es en en en CS en en es en es es en es es ri es es

<0 « o en r- O CO O en o O en O r-* en en en ri os 00 00 Os •o VO OS NO O a

«i V) 00 ri 40 CO 40 NO h* en NO r* o en 00 ri en oo ri es 40 ri O O

B CS 40 Os t*» CO CS NO S o es OS 00 r- ri en en es C-* Os vo es vo ri CS OS 40 8 00 CS O ri 40* 40* en t*- ri 00 00 os *0* os O ri CS ri CS 00 en VO vo en os ri ri ri ri ri ri ri CS en en CS es CS es es es en es en es es es es es ri eS ri

»i B B

B H •o 44

Q a tjjj NO NO NO NO NO 6

6

40 40 40 40 40 40 40 40 40 40 40 4 4 40* S' 40 4 4 4

B a o a 8

40 *4

m O o 40 ,o 44 40

P O B NO NO NO NO VO NO NO 40 40 40 40 40 40 40 40 40 40 NO vo NO NO NO NO NO NO NO NO

a B O

ri ri ri ri ri ri ri ri ri ri ri ri ri ri ri ri ri ri ri ri

U p«

# O 40 NO t** 00 OS O ri CS en 4* 40 NO f- 00 os O ri CS en 40 40 NO t* 00 OS O Z r*» r- r- r*» 00 00 00 00 00 00 00 00 00 00 Os OS Os os os OS OS OS OS OS O

ri

Table

3(d)

Documents

Module assignment in distributed systems - CESGcesg.tamu.edu/wp-content/uploads/2012/02/76.-NODDLE...4.1 Assignment Graphs for Two-Processors Systems 21 4.2 Assignment Graphs for Systems