12
This article was downloaded by: [UQ Library] On: 14 November 2014, At: 19:44 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK International Journal of Systems Science Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/tsys20 A job scheduling approach based on a learning automaton for a distributed computing system ZEN-KWEI HUANG a & SHENG-DE WANG a a Department of Electrical Engineering , National Taiwan University , Taipei, Taiwan, 10764, Republic of China Published online: 27 Apr 2007. To cite this article: ZEN-KWEI HUANG & SHENG-DE WANG (1993) A job scheduling approach based on a learning automaton for a distributed computing system, International Journal of Systems Science, 24:7, 1221-1231, DOI: 10.1080/00207729308949555 To link to this article: http://dx.doi.org/10.1080/00207729308949555 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

A job scheduling approach based on a learning automaton for a distributed computing system

Embed Size (px)

Citation preview

Page 1: A job scheduling approach based on a learning automaton for a distributed computing system

This article was downloaded by: [UQ Library]On: 14 November 2014, At: 19:44Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

International Journal of Systems SciencePublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/tsys20

A job scheduling approach based on a learningautomaton for a distributed computing systemZEN-KWEI HUANG a & SHENG-DE WANG aa Department of Electrical Engineering , National Taiwan University , Taipei, Taiwan, 10764,Republic of ChinaPublished online: 27 Apr 2007.

To cite this article: ZEN-KWEI HUANG & SHENG-DE WANG (1993) A job scheduling approach based on a learning automaton fora distributed computing system, International Journal of Systems Science, 24:7, 1221-1231, DOI: 10.1080/00207729308949555

To link to this article: http://dx.doi.org/10.1080/00207729308949555

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: A job scheduling approach based on a learning automaton for a distributed computing system

INT. 1. SYSTEMS SCI., 1993, VOL. 24, No.7, 1221-1231

A job scheduling approach based on a learning automaton for adistributed computing system

ZEN-KWEI HUANGt and SHENG-DE WANGt

A stochastic learning automaton model based on relative reward strength is pro­posedfor solvingthe job schedulingproblem in distributed computing systems. Thescheduling approach belongs to the category of distributed algorithms. An auto­maton scheduler is used for each local host in the computer network to make thedecision whether to accept the incomingjob or transfer it to another server. Thelearning scheme proposed makes use of the most recent reward to each actionprovided by the environment. This feature means that the automaton has thecapability to handle a classof uncertainty such as workload variation or incompletesystem state information. Simulation results demonstrate that the performance ofthe proposed scheduling approach is not degraded in the case of a change inworkload and is better than the approaches of Fixed Scheduling Discipline andJoining the Shortest Queue under incomplete system information.

1. IntroductionEarly computer systems were centralized due to the high hardware costs and

management costs. As hardware costs dropped and network technologies improved,it became possible to connect several computers installed at different locations. Sincethe demands for computing power are continually increasing, the computers in anetwork can work together to solve a problem or share resources with each other. Atask entering a heavily loaded computer can be scheduled to run on another computerthrough the network. Jn such away, distributed computing systems can provide acost-effective solution to meet computing power requirements. However, the questionof how to schedule tasks in a distributed computing system still needs further study.The scheduling algorithm can be centralized or distributed. A distributed schedulingalgorithm is attractive when the factor of fault tolerance is considered.

The main objective of task scheduling in distributed computing systems is both tobalance the loads of computers at different sites and to shorten the task completiontime. The goals ofload balancing and completion time minimization can be achievedby using static or dynamic scheduling. The static task scheduling problem for distri­buted computing systems generally assumes that information including the runningtime and intertask communication of the task to be assigned are known in advance.Given these assumptions, many approaches have been proposed for the static taskassignment problem, such a directed-graph (Ammar 1986, Katz and Shmueli 1987),fixed scheduling discipline (Glorioso and Colon-Osorio 1980), stochastic approxi­mation method (Bonomi 1990, Bonomi and Kumar 1990), and heuristic methods(Dessouky et al. 1987, Thames and Young 1986). The approach proposed in thispaper belongs to the realm of dynamic scheduling.

In this paper, we develop a distributed scheduling algorithm using stochasticlearning automata (SLA) as schedulers in each computer host. The concept of

Received 3 May 1991.t Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan

10764, Republic of China.

oo20-7721{93 $10.00 (9 [993 Taylor & francis Lid.

Dow

nloa

ded

by [

UQ

Lib

rary

] at

19:

44 1

4 N

ovem

ber

2014

Page 3: A job scheduling approach based on a learning automaton for a distributed computing system

1222 Zen-Kwei Huang and Sheng-De Wang

stochastic automata operating in random environments was first introduced byTsetlin (1961) and has been studied intensively since that time. The SLA makedecisions through interaction with the environment and take action based on actionprobability. Since the characteristics of environments are assumed unknown to theautomata, the automata should learn an optimal action asymptotically with a highprobability on the basis of response from the environment. Recently, the researchdirection of SLA has been mainly in the environment model, the convergence speedand realistic application. In the environment model, most research topics assume theyare stationary (i.e. the reward characteristics of the environment are constant), whilesome assume they are non-stationary, i.e. Markovian switching, slow varying or fasttime changing (Mason 1973, Narendra and Thathachar 1974, Tsetlin 1961), etc. In theconvergence speed research fields, there are Oommen's discretized automata (Oommenand Christensen 1988), Thathachar's estimator automata (Thathachar and Sastry1985) and Simha's stochastic approximation automata (Simha and Kurose 1989).Most of them assume the environments are stationary but some can still be appliedto non-stationary environments. Applications include game theory (Chandrasekaranand Shen 1969), telephone traffic routing (Narendra and Mars 1983), packet switchingrouting (Narendra et al. 1977) and computer network job scheduling (Glorioso andColon-Osorio 1980, Mirchandaney and Stankovic 1986) etc. Thathachar and Sastry(1985) have shown that automata with estimation learning scheme perform betterthan those without estimation, such as the linear reward penalty (L,_p) and linearreward-inaction (L,_;) learning automata in the convergence speed. In other word, theestimation learning schemes are more suitable for a changing environment. Thestochastic learning automata model here is different from the estimator automata(Thathachar and Sastry 1985) in the estimation method. The estimation method(Thathachar and Sastry 1985) takes the time average of response as the environmentcharacteristic, and in our method, we take the instantaneous response as the environ­ment characteristic. The simulation results show that our model is more suitable fortracking an uncertain environment than the other scheduling policies.

Mirchandaneyand Stankovic (1986) found that if the system state information isavailable with a period of 2 s, then the L,_p automata scheduler performs close to ananalytical optimal scheme. In our learning scheme, we try to reduce the state informa­tion requirement and apply it to the scheduling problem (Glorioso Colon-Osorio1980). The state information is updated only when the scheduler dispatches a job tothe server, while Bonomi (1990) and Mirchandaney and Stankovic (1986) make thestate information available any time or periodically. Thus, with this assumption, thestate information requirement will be independent of the host number. We call thiscondition an incomplete system state information condition. Several comparisons aremade for different scheduling policies all based on incomplete system state informa­tion. From the experimental results, it shows that the proposed automaton scheduleris superior to the scheduling policy of Joining the Shortest Queue (JSQ) and SL,_p(S-model Linear reward-penalty) automaton scheduler under a surge of jobs andincomplete system state information. The JSQ scheduling discipline has been provedto be an optimal policy under complete system state information and homogeneousserver speeds, while under incomplete system state information, it is not sufficient tobe an optimal policy.

This paper is organized as follows. In §2, we first review some convergencedefinitions for SLA; the ergodic properties are also discussed in this section. Then, wepropose a learning automa ton and prove the convergence properties. In § 3, we

Dow

nloa

ded

by [

UQ

Lib

rary

] at

19:

44 1

4 N

ovem

ber

2014

Page 4: A job scheduling approach based on a learning automaton for a distributed computing system

Learning automata for a distributed computing system

Environment

1223

a(t) Automata

lA, f(t), B, F}

Figure I. Environment/automata interaction block diagram.

formulate the job scheduling problem of a computer queuing network (Glorioso andColon-Osorio 1980). For this problem, we first discuss several benefits of the automatascheduler and the Fixed Scheduling Discipline (FSD) scheduler. The FSD is anoptimal policy if a priori system state information is available. In this section, we alsopropose a credit assignment criterion to this scheduling problem. In §4, we conductseveral experimental comparisons for our automata scheduler, SL,_p automatascheduler and JSQ scheduler under incomplete system state information conditionsand homogeneous server speeds. Finally, we give the conclusions and remarks in § 5.

2. Stochastic learning automata2.1. Background

The interaction between the environment and the automata is shown in Fig. I.In this figure, the environment is assumed to be an S-model with response {3,(t)(0 ,;;: {3,(I) ,;;: 1,0) to the action i at time I. If the expectation of {3,(I), £[/3,(1)], isconstant for all I, then the environment is said to be stationary, otherwise, it is saidto be non-stationary. Thus an S-model environment is characterized by a vectorB = (bl , b2 , .•• , b,), where b, = £[{3j(I)] and r is the number of pennissible actionsof the automata. As shown in Fig. 1, an estimation automata model is composed offour-tuple {A, P(I), B, F}, where A is the set of permissible actions, A = {IX"1X2•... , IX,}, P(I) is the vector of action probability, P(I) = (PI (I), pz(I), ... , p,(I»within the simplex [qmin, I - (r - I)' qminJ' at time I, and qmin will be defined later,'B is the set of environment response B = {{3j(t), j = I, 2, ... , r}, and F is thelearning scheme.

In a stationary environment model, {3j(l) is a random variable with constant meanbj • Without loss of generality, we assume that {3it) is a random variable distributedwithin [0, I] with a uniform, normal or exponential density. From the discussionabove, we realize that there are two important characteristics for an environment, themean and the distribution type of {3j(I), while the characteristics of an automaton areits action probability P(I) and the reinforcement scheme F.

In the following, we review various convergence properties of SLA, namely, theoptimal scheme, the s-optimal scheme and the expedient scheme. The performance ofan SLA can be evaluated by using these definitions. Throughout this paper, weassume that the response from the environment is a type of reward strength, and thereexists an optimal action IX. that can gain the largest reward from the environment.

We define a quantity M(I), which represents the average reward received by an

Dow

nloa

ded

by [

UQ

Lib

rary

] at

19:

44 1

4 N

ovem

ber

2014

Page 5: A job scheduling approach based on a learning automaton for a distributed computing system

1224 Zen-Kwei Huang and Sheng-De Wang

automaton with action probability vector (P, (r), P2(t), ... , p,(t».

M(t) E(j3(t)IP(t)]

r

L Pj(t) • Pit)j=!

For an automaton with equal action probability (i.e. a pure-chance automaton), M(t)becomes a constant value denoted as Mo = (I/r) L;~l b.. The terms M(t) and Moplayan important role in the following definitions.

Definition I

Suppose the nth action is the best strategy among all actions (i.e. b; > b., j = I,2, ... , r, j # n). Then a learning scheme F is said to be optimal if

limE[Pn(t)] = I (2.1)'-00

where Pn(t) is the action probability of action n at time I.

Definition 2

If there exists an arbitrary a > 0 such that the learning scheme F has

lim E[M(t)] = b; - a'_00

then we say that the learning scheme F is a-optimal.

Definition 3If there exists a finite integer N > 0 such that the learning scheme F has

E[Pn(l) I P(t - I)] - Pn(t - I) > 0, VI > N

then we say that the learning scheme F is absolutely expedient converged.

Definition 4The automata are said to be expedient if

lim E[M(I)] > Mo'-00

(2.2)

(2.3)

(2.4)

From these definitions, we can see that the optimal and absolutely expedientschemes are strong convergent schemes which emphasize the convergence in proba­bility, while the expedient scheme is just better than the random decision.

2.2. Proposed automata model

In this section, we present a stochastic learning scheme, which implicitly estimatesthe reward strength given by the environment as a response to the decision of theautomaton. An automaton using an estimating learning scheme has been introducedby Thathachar and Sastry (1985), where the convergence property was shown to bea-optimal, and their estimated method takes the time average of Pi(t) as the estimatedcharacteristic (i.e. fji(t) = (liT) ~T Pi(t». The proposed stochastic learning

(=)

algorithm is simply to maintain the most recent reward strength given by the environ-ment. The maintained reward strength is then fed to update the action probabilities.Compared to the previous learning scheme, the new algorithm has the advantageof tracking environment change as time proceeds. We now present.our learning

Dow

nloa

ded

by [

UQ

Lib

rary

] at

19:

44 1

4 N

ovem

ber

2014

Page 6: A job scheduling approach based on a learning automaton for a distributed computing system

Learning automata for a distributed computing system 1225

algorithm and show the convergence properties of the proposed learning schemewithin the feasibility set.

For our purpose we define the feasibility set and an optimal set to constrain theaction probability P(t).

(a) Feasibility set G/

o, = {P(t) Ijt, pit) = 1'0, pit) ;;;: qmin, I ,,;:;) ,,;:; r, 'It, 0 < qmin < I}

(2.5)

qmin is a lower bound selected appropriately for the reinforcement algorithm.A typical value of qmin can be 1-10% of the uniform probability I{r.

(b) Optimal set Go: (note that Go is a subset of Gf ) :

Go = {P(t) I for a given finite number N > 0, 3m, I ,,;:; m ,,;:; r,

Pm(t) = 1·0 - (r - I)qmin' pit) = qmin, 'I) "# m, 'It> N} (2.6)

Let a denote the estimated reward vector of B, i.e.

a [6,,62 , .•• ,6,] and Max[6] = 6 (2.7a)j } m

B {b1,b2 , ••• ,b,} and Max[b]=b (2.7b)j } n

(Note that m is an integer random variable, whereas n is a constant.)

Assume at time t, that the automaton chooses action i (i.e. a(t) = a,). The stochasticlearning algorithm is given as follows:

(i) Update the number of times the action i was chosen

t, = t, + I (2.8)

(ii) Estimate the reward characteristics: update 6;(t) as P;(t) and keep 6j(t)(j "# i) unchanged:

6;(t) = P;(t)

6)(t - I), i » I, I ,,;:;) ,,;:; r

(2.9 a)

(2.9 b)

(iii) If maxj[6it) = 6m (note that in (2.10), m is the maximum of 6;(t) and it isnot always equal to n, the optimal action in (2.6», then (2.10)

p;(t + I) p;(t) - ).(t)(h; - hm)2, for Pi(t + I) E Gf , i "# m

Pm(t + I)r

Pm(t) + ).(t) L (hj - hm)2, for Pj(t + I) E G,)=1 (2.11 )

V) and pit + 1)¢Gf

or Pj(t) E Go

where 0 < ).(t) ,,;:; I is a time-varying parameter satisfying:

(a) ~'" ).(t) = CIJ, ~'" ).2(t) < CIJ(=0 t=O

for a stationary environment (typically, ).(t) can be a type of I/t function);

(b) ).(t) remains a small constant if the environment is non-stationary.

The main feature of this learning algorithm is the indirect learning from the

Dow

nloa

ded

by [

UQ

Lib

rary

] at

19:

44 1

4 N

ovem

ber

2014

Page 7: A job scheduling approach based on a learning automaton for a distributed computing system

1226 Zen-KIwi Huang and Sheng-De Wang

environment response. First, it uses fJi(l) to update bi' Secondly, it searches themaximal value, denoted as bm' of bj among) and then the action probabilities areupdated by the square of the difference between bm and bj. From the feasibility set of(2.5), we see that as qm;, tends to zero, this learning scheme will produce severalabsorbing barriers in the state space, thus preventing the automaton from beingergodic. In this situation, if the environment is non-stationary then the strong law oflarge numbers will not help in finding the maximal value of b, because the automatonis trapped inside the barrier. That is, once qmi' vanishes into zero, the absorbingbarriers created will prevent the automaton tracking the changing environment as theenvironment is non-stationary. To solve this problem, we have the following lemma.

Lemma I (Sato et al. \988)

In a stationary environment, if any learning scheme operates inside the feasibilityset Gf , such that

lim Ij = 00 almost surelyHOC>

then it implies that

lim E[Pil)] = b., for) = I, 2, ... , r, almost surely'-00

(2.12)

(2.13)

(note that Ij is the number of times action) was chosen, and I is the system time index).This lemma explains that only when any action probability does not vanish into

zero (i.e. qmi, "" 0), can the strong law of large numbers help in finding the environ­ment's reward characteristics.

Lemma 2For a learning scheme (2.8)-(2.\1), if P(I) E Gf , then there exists a finite integer

N > 0, and VI > N such that

prob[ m;X(bJ = b,] > prob[ m;x(bJ = bko k "" n]

(i.e. Prob [m = n] > Prob [m "" n]).

ProofLet

max (bj )J

max(bJJ

max(bJJ

b" with a probability q21

b2 , with a probability qzz

boo with a probability q,

bn with a probability q"

From Lemma 1, we have lim,_oo E[bj(t)] = bj , I ~) ~ r, and the assumptionthat the nth action is optimal, b, > b, for all) "" n.

We have limr_ oo E[b,(I)] > lim,_oo E[bj(I)], for all) "" n.

Dow

nloa

ded

by [

UQ

Lib

rary

] at

19:

44 1

4 N

ovem

ber

2014

Page 8: A job scheduling approach based on a learning automaton for a distributed computing system

Learning automata for a distributed computing system 1227

Thus by choosing .'.(t) small enough (e.g., limt _ oo .'.(t) = 0), there will exist a finiteN > 0, 'It > N such that E[6,(t)] > E[6/t)] , for all j i' n with probability I, i.e.q, > q2k> for all k i' n with probability I, and the result of this lemma follows. 0

Theorem 3

For the learning scheme (2.8)-(2.11), if the action probability vector is inside thefeasibility set Gr, then

E[p,(t + I) I P(t)] - p,(t) > 0, almost surely (2.14)

with a probability q21

with a probability q22

Proof

We concentrate our attention at the optimal action probability p,(t). From thelearning scheme (2.11) and Lemma 2, we have

p,(t + I) P,(t) - .'.(t)(6,(t) - 6, (1))2,

p,(t + I) p,(t) - .'.(I)(6,(t) - 62(t))2,

p,(t + 1) P.(t) - .'.(t)(6.(t) - 6,(IW, with a probability q2,

P.(t + I)

P.(t + I)

,P.(t) + .'.(t) L (6.(t) - 6j(t))2, with a probability ql

j=1

P.(t), with a probability (I - q, - q21 - q22 - ... - %)

After manipulation, the conditional expected optimal action probability becomes

E[p.(t + 1)1 P(t)]

i.e.

r

P.(I) + .'.(t) L (q, - %)(6.(t) - 6j(t))2, for all z .» Nj=1

E[p.(t + I) I P(t)] - P.(t)I

.'.(t) L (q, - %)(6.(t) - 6J(t))2

> 0, for all I > Nj=l

The inequality follows from Lemma 2 and the fact that .'.(t) > 0, (6.(t) -6it))2 > O. Thus (2.14) follows directly from the above equation. 0

This theorem states that, if the environment is stationary and there exists anoptimal action n which maximizes the expected reward, then the learning scheme(2.8)-(2.11) converges towards the optimal set if the action probability vector is insidethe feasibility set Gr. In fact, for a well-behaved distribution of the random variablefJ/t) (such as normal, uniform, or exponential, etc.), we could simply let .'.(1) be afunction of the form I/t (i.e. we can make .'.(t)decrease towards zero as time proceeds).Then the learning scheme (2.8)-(2.11) will converge absolutely expediently. However,if the distribution of fJj(t) is not well-behaved (such as a U-shaped distribution) or ifthe environment is non-stationary, then we must keep .'.(t) as a small constant for theadaptation of a general distribution of fJj(t).

3. Job scheduling problem in a computer network (Glorioso and Colon-Osorio 1980)In this section, we formulate the job scheduling problem in a computer network.

This scheduling problem can also be found in some manufacturing processes or other

Dow

nloa

ded

by [

UQ

Lib

rary

] at

19:

44 1

4 N

ovem

ber

2014

Page 9: A job scheduling approach based on a learning automaton for a distributed computing system

1228 Zen-Kwei Huang and Sheng-De Wang

ur

Figure 2. Job scheduling problem of a computer queuing network.

queuing networks. The job scheduling problem in a distributed computing system isimportant for network performance and system utilization such as the goal of loadbalancing which is not easy to solve by theoretical analysis. Glorioso and Colon­Osorio (1980) found that if the server speed u, and the job arrival rate a, are knowna priori, then the FSD policy is better than the L,_p automata scheduler if themeasurement criterion is the mean task turnaround time. However, if the system issubject to a load change environment (due to a surge of jobs or incomplete systemstate information), then the L,_p automata scheduler is better than the FSD scheduler.That is, the FSD scheduler is suitable for well-informed scheduling problems with noload variation and is not suitable for dynamic scheduling problems. Thus, the auto­maton scheduler is a better candidate for solving the dynamic scheduling problem,especially when incomplete system state information is available. The L,_p automatonis an ergodic automaton except in the degenerate case (i.e. when the reward proba­bility = 1·0 for a certain action). Thus it is suitable for tracking the changingcharacteristics of the environment. However, the learning speed of the L,_p automatonis very slow. Owing to this drawback, we feel that there must be a better solution forthis problem (Glorioso and Colon-Osorio 1980) than the L,_p approach. In thefollowing, we apply the proposed stochastic automaton to solve this problem andcompare the average response time with the SL,_p automaton scheduler which revealsa more rapid tracking speed for the load changing characteristic. We also comparethis result with the JSQ policy under both incomplete system state information andcomplete system state information.

Consider a queuing network composed of r servers with serving speed flj' j = I,2, ... , r. For each server there is a queue and an arrival node before it (see Fig. 2).The arrival node j accepts the incoming jobs arriving at a rate aj • Assume there is adirect path between node j and queue j, and there is a generic network between nodesj and k, k -# j. The cost for transferring jobs in the direct path is assumed to be zeroand in the generic network is assumed to have a mean transferring rate 1/r. The jobschedulers are located at each arrival node. Moreover, we assume a., u, and 1/. areall exponentials and the server speeds are homogeneous (i.e. fll = fl2 = fl3 = ... =

fl,). If the queue length of each server is known at any time, then the JSQ policy isshown to be the optimal policy (Bonomi 1990) that minimizes the mean task turn­around time. Mirchandaney and Stankovic (1986) proposed a SLA scheduler in adistributed processing system with the L,_p learning scheme. They have shown bysimulation that the performance of the L,_p scheduling policy with a schedulingperiod about 2s is close to an analytical lower bound if the network communicationdelay is not taken into account (i.e. 1/. = 0). Bonomi (1990) has presented some

Dow

nloa

ded

by [

UQ

Lib

rary

] at

19:

44 1

4 N

ovem

ber

2014

Page 10: A job scheduling approach based on a learning automaton for a distributed computing system

Learning automata for a distributed computing system 1229

assignment policies for a centralized scheduling system for a processor sharing queue.They show the equivalent of Total Completion Time (TCT) optimal policy andStochastic Ordering (SO) optimal policy. In addition, they also proposed somescheduling policies that are better than the JSQ scheduling policy in the sense of totalcompletion time. Bonomi and Kumar (1990) have proposed some related problemsof load balancing, namely, the balance of server idle time in the root mean squaresense. They also obtained the optimal solution to this problem by a stochasticapproximation algorithm. All the above works assume each scheduler knows theloading of the all servers at any instant or periodically. However, in general, it is notefficient to acquire this information periodically since the schedulers will pay the costof the communication delay to probe the other nodes. It is more reasonable to acquirethe system state information when a dispatching action occurs. That is, the stateprobing is more suitable to be event-driven. In this paper, the system state is definedto be the queue length of each computer in the network. If a scheduler is operatedbased on periodically acquiring the system state information, we term it undercomplete system state information. If the state probing is event-driven, we call it underincomplete system state information.

The incomplete system state information assumption is not a trivial assumptionbecause in this condition, the state information exchange loading is little dependenton the host number, while in the other condition, the state information exchangeloading will jam the network as the host number increases. Thus, for large networks,event-driven state probing is more beneficial than time-driven probing. The approachof using an automaton scheduler is good for the incomplete information case since anautomaton is assumed to be operating under an unknown stochastic environment.The only information an automaton can obtain is the reward provided by theenvironment which is a response to the action taken by the automaton. Each auto­maton scheduler has a set of r actions corresponding to r hosts in the network. Theith action of each scheduler is to transfer a job to host i in the networks. Afterreceiving the transferred job, the ith server in the network responds with a messageshown in the following as the reward strength to the sender:

[exp ( - queue.) - i,]

where queue, is the current queue length of server i, and i, is a term taking into accountthe network transferring delay and the number of times the ith action was taken, t.;this is assumed to be of the form

i, = «t, - 1)lt,)i, + ilt,

The reward strength is required to be normalized to within the interval [0, I].

4. Simulation resultsFor the purpose of comparing different scheduling policies, we conducted experi­

ments for a two-server two-scheduler problem as stated in §3. The program wassimulated on a CRAY X-MP supercomputer. For high confidence levels of simu­lation, the simulation results were based on an average of 200 simulation runs withdifferent random seeds.

Experiment

Suppose the server speeds are equal with mean JlI = Jl2 = 1,0, and mean arrivalrates are a j = 0-1, a, = 1-5, respectively, The initial action probabilities for both

Dow

nloa

ded

by [

UQ

Lib

rary

] at

19:

44 1

4 N

ovem

ber

2014

Page 11: A job scheduling approach based on a learning automaton for a distributed computing system

1230 Zen-Kwei Huang and Sheng-De Wang

Proposed automata

JSQ incompletesystem information

JSQ completesystem information

FSD scheduler

4·31 (,1=0,4)

7·52 (,1=0,8)

55-42

2·11

2-32

3·95 (,1=0,5)

7-49 (,1=0,9)

4·12 (,1=0,6)

6·91 (A = 1'0)

Mean task turnaround time for different update parameters A.

schedulers are assumed to be 0·5. That is, initially, the two actions for each schedulerare equally likely to be taken. Three cases corresponding to different values of theparameter A. in the reinforcement scheme are used in our experiments as shown in theTable. The FSD policy is assumed to have the following fixed job transferringprobabilities:

PI I = )'0, PIZ = 0·0, PZI = 7/15, P22 = 8/)5

where Pi} denotes the job transferring probability from server i to server j. Thesimulations were stopped after the completion of 10000 jobs and the mean taskturnaround times are listed in the Table. From the Table, we can see that the meantask turnaround time is minimized when A. = )·0 for the proposed automaton, andA. = 0·5 for the SL,_p automaton. In general, the automaton schedulers are not betterthan the FSD scheduler as expected, but the performance of the automaton schedulersare much better than the JSQ policy under incomplete system state information. Thuswe adopted A. = ),0 for the proposed automaton, and A. = 0·5 for the SL,_p auto­maton scheduler in the following experiment.

Experiment 2

The experiment conditions are the same as Experiment l , but the arrival atnode I and node 2 is exchanged at the completion of 4000 jobs to simulate the loadchange environment (i.e. a1 = O-}, az = 1·5 for the first 4000 tasks and a, = )'5,az = 0·1 for the rest tasks). The simulation results are plotted in Fig. 3. In this

AHu,e of ZOO Run.

sc

sc

I.S.II. lncomp. s.r

",,- ~~

(Tbouu.ndo)Tuk Numbero (Ile.eUon No.)

Figure 3. Comparison of scheduling disciplines.

Dow

nloa

ded

by [

UQ

Lib

rary

] at

19:

44 1

4 N

ovem

ber

2014

Page 12: A job scheduling approach based on a learning automaton for a distributed computing system

Learning automata for a distributed computing system 1231

experiment, we assume the mean subnet delay is 0·1 and the system total loading is0·8. From this figure, the lSQ policy under complete system state information is thebest policy, while the SL,_p is not capable of tracking this load changing such thatafter 4000 tasks, the response time increases rapidly; the proposed automata scheduleris not degraded by this load changing. The lSQ policy under incomplete system stateinformation is also not affected by this load changing, however, its response time ismuch higher than the other scheduling approaches.

5. ConclusionWe have proposed a learning scheme for stochastic learning automata and applied

it in a distributed scheduling algorithm. The salient feature of the propsoed learningscheme is the use of the most recent reward strength provided by the environment andthus has the advantage of tracking the changes in the environment. Simulation resultsshow the superiority of our approach over other approaches such as lSQ and FSDpolicies under incomplete state information. Although the lSQ policy is optimal in thecase of complete state information, it does not perform well in the case of incompletestate information. Furthermore, our simulation results show that the existing auto­maton model SL,_p performs somewhat better than the proposed automaton in astationary environment. However, the SL,_p automaton cannot catch up with theload changing while the proposed automaton performs very well in this case.

REFERENCES

AMMAR, R., 1986, Performance model of parallel and distributed procession systems, Proc.J986 ACM 14th Annual Computer Science Conf., p. 424.

BONOMI, F., 1990, I.E.E.E. Trans. Comput., 39, 858.BONOMI, F., and KUMAR, A., 1990, I.E.E.E. Trans. Comput., 39, 1234.CHANDRASEKARAN, B., and SHEN, D. W. c., 1969, I.E.E.E. Trans. Systems, Sci. Cyber., 5, 145.DESSOUKY, M. I., DESSOUKY, Y. M., and DESSOUKY, M. M., 1987, J. Manuf. Systems, 6, 23.GLORIOSO, R. M., and COLON-OSORIO, F., 1980, Engineering Intelligent Systems (Burlington,

Mass.: Digital Press), p. 209.KATZ, S., and SHMUEL!, 0., 1987, I.E.E.E. Trans. Software Eng., 13,540.MAsoN, L. G., 1973, I.E.E.E. Trans autom. Control, 18,493.MIRcHANDANEY, R., and STANKOVIC, R. A., 1986, J. Para. Distrib, Comput., 527.NARENDRA, K. S., and MARs, P., 1983, Automatica, 19,495.NARENDRA, K. S., and THATHAcHAR, M. A. L., 1974, I.E.E.E. Trans. Systems, Man Cyber., 14,

323.NARENDRA, K. S., WRIGHT, E. A., and MASON, L. G., 1977, I.E.E.E. Trans Systems, Man

Cyber., 7, 785.OOMMEN, B. J., and CHRISTENSEN, P. R., 1988, I.E.E.E. Trans. Systems, Man Cyber., 18,451.SATO, M., ABE, K., and TAKEDA, R., 1988, I.E.E.E. Trans. Systems, Man Cyber., 18,677.SIMHA, R., and KUROSE, J. F., 1989, J.E.E.E. Trans. Systems, Man Cyber., 19,388.THAMES, J. M., and YOUNG, P. R., 1986, Automatedscheduling and assignment: an application

of synthetic intelligence. Proc. J986 Summer Computer Simulation Conf., p. 481.THATHACHAR, M. A. L., and SASTRY, P. S., 1985, J.E.E.E. Trans. System, Man Cyber., 15, 168.TSETL!N, M. L., 1961, Avtom. Telemekh., 22, 1345.

Dow

nloa

ded

by [

UQ

Lib

rary

] at

19:

44 1

4 N

ovem

ber

2014