Robust supervisory control policies for manufacturing ...pdfs.semanticscholar.org/0a5a/b585e8a766c1bc8d283c596b8c50854e365c.pdfmajority of automated manufacturing systems, the failure

346 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 18, NO. 3, JUNE 2002

Robust Supervisory Control Policiesfor Manufacturing Systems With

Unreliable ResourcesMark A. Lawley, Member, IEEE,and Widodo Sulistyono

Abstract—Manufacturing practitioners and researchers haverecognized the need to develop effective supervisory controllers forautomated manufacturing systems. One active area of research hasbeen developing deadlock avoidance methods for these systems. Al-most all work to date has assumed that allocated resources do notfail. In this paper, we consider deadlock and blocking problems insystems with unreliable resources. Our objective is to develop su-pervisory control policies that allocate resources so that the failureof any given resource does not propagate through blocking to ef-fectively stall other portions of the system. In other words, when aresource fails, we want the system to automatically continue pro-ducing all part types that do not require the failed resource. To ac-complish this, the supervisory controller must ensure safety for theoriginal system while avoiding states that do not serve as feasibleinitial states for the reduced system when an unreliable resourcefails. The policy must then ensure safety for the reduced systemwhile avoiding states that do not serve as feasible initial states forthe original system so that transition to normal operation is smoothwhen the failed resource is restored. This paper illustrates this classof problems through several examples, identifies properties thatcontrollers must satisfy to deal effectively with these problems, anddevelops two polynomial control policies that satisfy these proper-ties.

Index Terms—Deadlock avoidance, fault-tolerant, flexible man-ufacturing systems, robust control.

I. INTRODUCTION

OVER THE PAST decade, manufacturing practitioners andresearchers have recognized the need to develop effective

supervisory controllers for automated manufacturing systems.One important and active area of research has been in devel-oping control logic that guarantees deadlock-free operation forthese systems. Tremendous progress has been made in under-standing both the computational and structural aspects of thedeadlock problem for a variety of manufacturing system classi-fications. Researchers have addressed deadlocking problems insystems with no routing flexibility and flexible routing throughmachine and sequence flexibility, systems with and without re-source redundancy and those with conjunctive resource alloca-tion needs.

Manuscript received March 26, 2001; revised January 22, 2002. This paperwas recommended for publication by Associate Editor M. Jeng and EditorN. Viswanadham upon evaluation of the reviewers’ comments. This work wassupported in part by the National Science Foundation under Grant NSF GOALI0085047.

The authors are with the School of Industrial Engineering, Purdue Univer-sity, West Lafayette, IN 47907-1287 USA (e-mail: [email protected];[email protected]).

Publisher Item Identifier S 1042-296X(02)06335-8.

One important area where little manufacturing controlwork has been done is in developing systems of control logicthat robustly handle failure of components. Currently, in themajority of automated manufacturing systems, the failure ofa single device such as a machine tool or even an individualsensor can cause the whole system to shutdown. This is amajor problem that the manufacturing research communitymust begin addressing, especially now that we have built asolid foundation of deadlock avoidance research upon whichto build.

In this paper, we investigate resource allocation in manufac-turing systems with unreliable resources. Resource failure couldbe caused by damaged tooling, quality out-of-control signals,missing or defective part programs or tools, failure of actu-ating and sensing systems, and so on. Our objective is to de-velop supervisory control policies that allocate resources so thatthe failure of any given resource does not propagate throughblocking to effectively stall other portions of the system. In otherwords, when a resource fails, we want the system to automati-cally continue producing all part types that do not require thatfailed resource. Furthermore, when the failed resource is re-paired, we want to resume producing the full range of part typessupported by the system. This should all occur with a minimumamount of disruption, i.e., with a minimal amount of part shuf-fling and transport

The paper is organized as follows. Section II provides the lit-erature review and broadly discusses the unique contributionsof this paper. Section III formally defines the properties that wewant our controllers to exhibit and provides a taxonomy thatclassifies manufacturing system control problems with respectto robustness and redundancy. Section IV develops a robust su-pervisor using a suboptimal deadlock avoidance policy (DAP).Section V develops a robust supervisor using the optimal DAPand discusses the use of a central buffer. Section VI then con-cludes by discussing future research directions.

II. L ITERATURE REVIEW

In this section, we review relevant research on deadlockavoidance and tolerance to resource failure in automatedmanufacturing systems.

System deadlocks were first addressed by computer scientistsresponsible for developing resource allocation logic in computeroperating systems (prominent among them are [1]–[5]). Theseresearchers developed the idea of a resource allocation system(RAS) as being composed of a finite set of resources, either

1042-296X/02$17.00 © 2002 IEEE

LAWLEY AND SULISTYONO: ROBUST SUPERVISORY CONTROL POLICIES FOR MANUFACTURING SYSTEMS WITH UNRELIABLE RESOURCES 347

reusable or consumable, a set of “processes” that request anduse these resources and a “banker” responsible for allocatingthe resources to the processes so that every process is eventu-ally satisfied. The contributions of this early research include:1) defining the concepts of “safety” (is deadlock avoidable?)and “safe sequence” (how can resources be allocated to avoiddeadlock?) and developing algorithms that search for safe se-quences [1], [2]; 2) using digraph-based formalisms to representresource allocation, identifying necessary and sufficient condi-tions for deadlock and developing correct detection algorithmsfor certain RAS classes [3]; and 3) establishing the computa-tional complexity of safety for certain RAS classes [4], [5]. Inthis literature, the safety problem is in general NP-complete,while the deadlock detection problem is of polynomial com-plexity. Thus, while an operating system could reasonably beexpected to detect deadlock, computing safety in real time isnot generally feasible.

In the late 1980s, manufacturing researchers began torecognize that deadlock and safety were important problemsin automated manufacturing (among the first were [6]–[9]).Manufacturing researchers possess two advantages that thecomputer scientists did not have. First, a part’s future resourceneeds (machines remaining to be visited, tools required, etc.)are well documented in the part’s process plan, and second,resource requirements in manufacturing are highly sequential.(The safety problem remains NP-complete, while deadlockdetection is of polynomial complexity, [10].) By combiningthese two advantages with the methodologies developed in thecomputing research, manufacturing researchers were able tomake many additional contributions. First, many researchersdeveloped suboptimal deadlock avoidance policies, that is,deadlock avoidance policies that reject some safe resourceallocations in order to achieve tractable computation [11]–[19].Secondly, these researchers began to explicitly analyze andreason about resource allocation states that can be classified asneither safe nor deadlock, i.e., states that contain no deadlock,but from which deadlock cannot be avoided (prominent amongthis work is [20]–[23]). The analysis of these states allowed theidentification of many special structures that render the safetyproblem polynomial (for a review of these special structures,see [10]). Other work investigates the relationship betweenflexible routing capabilities and the deadlock avoidanceproblem in manufacturing systems [24]–[26].

Although the study of systems with unreliable componentshas existed in computing for many years (see, for example,[27]–[31]), relatively little work exists with regard to manufac-turing systems. Results from computing literature are typicallynot directly applicable to manufacturing systems because redun-dancy is not easy or inexpensive to achieve in a manufacturingsystem and the “jobs” that manufacturing systems process aretangible entities that occupy space, represent wealth, and requiretime consuming physical transport. Thus, we think the controlimplications of resource failure in manufacturing are best inves-tigated by building on the deadlock avoidance work discussedin the previous paragraphs and that is what we do in this paper.

Traditionally, manufacturing researchers have consideredonly the performance implications of resource failure [32]–[35],while only a few papers address the control implications. [36]

addresses the issue of part blocking in the presence of contin-gencies due to resource breakdown or introduction of expedientjobs. The author combines flexible routing with the deadlockavoidance policies developed in [17]–[19] to accommodateoperational contingencies. This approach assumes that eachpart follows an assigned route until a failure occurs. At thatpoint, a route reassignment is computed and some set of partsis removed from the system in order to continue operation. Thework in [37] addresses fault-tolerant supervisory control byderiving necessary and sufficient conditions for the existenceof a fault-tolerant supervisor. These conditions are stated interms of language theoretic properties and are computationallyintensive to compute. Finally, [38] develops a fault-tolerantcontroller for assembly processes using the Petri net formalism.This approach uses the concept of “minimal resource require-ment” to determine acceptable control actions. That is, thefiring of a transition is allowed only if the resultant markinghas a reachable marking that covers the minimal resourcerequirements of the processes in the net. The author modelsresource failure as decrements in the net marking and developsbounds on marking changes for which the Petri net systemmodel remains live.

Our work differs from [36] in that we do not assume “outside”capacity to be sufficient to remove any number of parts from thesystem. In other words, our controllers take capacity outside thesystem (central buffer) into account in making allocation deci-sions. We differ from [37] in that we focus on developing robustpolynomial control policies and not on establishing conditionsfor their existence. Finally, this work differs from [38] in that weaccept that the failure of a resource will prevent our system fromachieving its full range of production. Our objective is to con-trol the system so that if a resource fails the system can continueto produce part types not requiring that resource. This does notimply that a Petri net model of the system would be live underfailure.

Section III formally defines the problems that we address inthis paper. Section IV then develops a robust supervisor basedon a suboptimal deadlock avoidance policy, while Section V de-velops a robust supervisor based on an optimal deadlock avoid-ance policy. Section VI concludes and discusses future work.

III. ROBUST BUFFERSPACE ALLOCATION

In order to clearly and formally define the property that wewant our controllers to guarantee, we first present a discreteevent model of relevant system components and their logicaloperations. For more information on these models, the reader isreferred to [39].

The system is defined as a tuple. is the set of resource types in the system, with

, , where is the set of resourcetypes not subject to “failure” and is the set of resource typessubject to failure. Let (natural numbers) be a func-tion that returns the capacity of a resource, that is, is thenumber of identical units of .

is the set of part types produced with eachrepresented as an ordered set of processing stages,

where represents the th


Fig. 1. Example resource structure.

processing stage of . We will use to represent anactual instance of . Let : such thatreturns the resource required by . Thus, the route of is

. Let .We will suppose our resource types to be workstations with

buffer space for staging and storing parts and a processor orserver for operating on parts. We will use the standard assump-tion from queuing theory that the server is not idle so long asthere are unfinished parts in a workstation’s buffer space. Theresource units that we are concerned with allocating are in-stances of the workstation’s buffer space. The controllers thatwe design are not intended to allocate the server among partswaiting at the workstation. We assume this to be done by somelocal queue discipline.

Workstation “failure” will imply the failure of the worksta-tion’s server, not any of its buffer space. We will assume that,when the server of a workstation fails, we can continue to allo-cate its buffer space up to capacity, but that none of the waitingparts can be processed and thus cannot proceed along their re-spective routes until the server is repaired. We further assumethat server failure does not prevent finished parts occupying theworkstation’s buffer space from being moved away from theworkstation and proceeding along their respective routes. Fi-nally, we assume that server failure does not damage or destroythe part being processed and that failure can only occur whenthe server is working.

For example, consider resources and in Fig. 1, eachhaving three units of buffer capacity and a single server. Supposethat part is finished at and next needs to visit , whilepart is finished at and next needs to visit . Further,suppose that part is waiting to be processed at and theserver is busy with . If the server fails in this state, then isstill allowed to advance to and is allowed to advance to

, although it cannot be processed until the server is repaired.

We are now in a position to define states and events.Let represent the set of system states, wheresv ,

sv being the status of the server of an unreliable workstation(0 if failed, 1 if operational), being the number of unfinishedunits of (parts waiting and in-process) located in the bufferspace of and being the number of finished units of

located in the buffer space of . is the set of initialstates with being the state in which no resources areallocated and all servers are operational. The dimension ofis

.

Let , whereis the set of controllable events with repre-

senting the allocation of to an instance of , that is,is the event that an instance of a part of typeadvances

into the buffer space of a workstation that will perform itsthoperation. represents a finished part of typeleavingthe system. We assume that the system controls the occurrenceof these events through resource allocation decisions.

is the set of uncontrollable events whererepresents the

completion of service for part stage .represents the failure () and repair ( ) of the server of

unreliable resource . Service completions, failures and repairsare assumed to be beyond the controllers influence.

Let be a function that, for a given state, returnsthe set of enabled events. This function is defined for a state,,as follows:

1) For and , if

then

Events that release new parts into the system are enabledwhen space is available on the first required workstationin the route.

2) For and , if

and sv then

If unfinished parts are in the buffer space of a workstationand the server is working, then the corresponding comple-tion events are enabled.

3) For and , if

and sv then

If unfinished parts are in the buffer space of an unreli-able machine and the server is working, then the corre-sponding failure event is enabled.

4) For , if

sv then and

If the server of a machine is failed, the repair event is en-abled and completion events corresponding to that ma-chine are not.

5) For and , for , if

and

then

When a part finishes its current operation and space be-comes available on its next required machine, the eventthat advances that part is enabled.

6) If , then . If a part has com-pletely finished all of its operations, the event that unloadsit from the system is enabled.


(a)

(b)

Fig. 2. Failure propagation in a simple system. (a) Blocking propagation. (b) Infeasible initial state.

Let be the state transition function defined asfollows:

advancing a part

service completion

failure of server

repair of server

where , , and are the standard unit vectors with, and sv , respectively. Note that

, the null vector.When resource fails, the uncontrollable event

occurs, the service completion event is disabled for all. This effectively bounds the occurrence of events

until theuncontrollable repair event, , occurs. Our objective is todesign resource allocation logic so that when failsthe set of bounded events is exactly. For example, the smallsystem in Fig. 2 produces part types, , and with routes

, , and , respectively. All fourresources have one unit of buffer space and one server and

. Events are enumerated as follows:

Now suppose that the failure event occursin Fig. 2(a) as the server of is processing .Then the failure bounded events of will be

,that is, these events can occur at most a finite number of timesbefore the repair event, , occurs. Note, however, that for thisstate other events are also affected. None of the events in theset can occur because is blockedby a unit which is waiting for , while is blocked bya unit of waiting for the failed resource . In fact, thefailure of has effectively stalled the system through thepropagation of blocking. The unit of cannot be removed

since this particular system has no central buffering capacity.(In this work, we will develop resource allocation methods forboth systems with and without central buffers. In addressingcentral buffer systems, we will assume that, like workstations,they accommodate a finite number of parts. This will bediscussed more thoroughly later in the paper.)

Fig. 2(b) illustrates an additional requirement for our con-troller. Suppose the system is in the given state when the serverof fails. In this case, the system can continue to produce parttype by first advancing to . Once this is accomplished,all bounded events are in , that is, the system can continue toproduce ’s while the server of is being repaired. However,when the repair event, , occurs, parts are dead-locked and production of these part types cannot resume. Thus,we add the requirement that in controlling the system betweenfailure and repair events our controller must not allow states thatstall the original system after the repair event occurs. With thesemotivating examples, we are prepared to formally develop thedesired properties of our resource allocation policies.

Let represent the uncontrolledlanguage of the system and for and eventlet be the score (number of occurrences) of eventin .We extend the state transition functionto strings of eventsin the usual way, that is, for and enabled ,

.We define a supervisory controller,, as follows: for

and state , let .Then , where means is admis-sible and means is inadmissible. Thus, de-termines which of the enabled controllable events (resource al-locations) is admissible and which is not. The admissible eventsmay be executed whereas the inadmissible events must not beexecuted.

Let represent the uncontrolledlanguage of given that resource failures do not occur.

represent the controlled languageof under given that resource failures do not occur. Let

for some (weassume that the system initially starts in ).


The first required property of the supervisory controlleristhat it keeps the system deadlock-free in the absence of resourcefailure, i.e., that it keeps the system safe. In terms of strings andevents, we express this as follows: assuming no resource failure,

must guarantee that and (naturalnumbers), such that is a prefix ofand , . This basically states that, in theabsence of resource failures, the system can continue to produceall of its part types indefinitely.

Now suppose that the system has executed event sequence, that the system is in state ,

and that the server of is busy in this state. If we appenda failure event, , onto , we get and state

. The event set that can generate is now reconfigured;in fact, we will say that we have a modified events generatorthat bounds the occurrence of certain events, at least those in.Furthermore, must start in initial state , and in factthe set of initial states for can be defined as

and enables .Let be the controlled language of assuming

no further event failure or repair and let

for some there exists

such that

must now keep safe and running while the failed resourceis being repaired. That is, assuming no further resource failureor repair, must guarantee that and

, such that is a prefix of and, . This basically states that in

the absence of additional resource failures or repairs, the systemcan continue to produce all part types not requiring .This further implies that, while supervising, must constrain

to feasible initial states for , that is, initial states of forwhich deadlock-free operation is possible.

Now suppose that the system has executed event se-quence , where ,

, and and the uncontrollablerepair event, , occurs. Then, we have event se-quence and state

. The eventset that can generate has now been restored andmustonce again supervise, this time starting in initial state .This implies that, in supervising , must constrain tofeasible initial states for , that is, initial states of from whichdeadlock-free operation is possible. The following definitionis now possible.

Definition 1: Supervisory controller is operationally ro-bust to the failure of resource, , in system if:

1) and ,such that is a prefix of and , .

2) For every that enables , the stateserves as a feasible initial state for.

3) and ,such that is a prefix of and ,

.

TABLE IROBUSTNESSLEVEL

4) For every , the state serves as a feasibleinitial state for .

To summarize: 1) guarantees that the system remains dead-lock-free if there are no resource failures; 2) guarantees thatthe system visits only states for which continuing operationis possible in the event of the failure of ; 3) guaranteesdeadlock-free operation for the system while is failed; and4) guarantees that while is failed the system will visit onlystates for which deadlock-free operation is possible in the eventof repair.

The objective of this work is to develop supervisory controlpolicies that satisfy these properties and thus maintain systemoperation in the face of resource failures without manual inter-vention to clear and reset the system.

Many degrees of operational robustness are possible. A spe-cific supervisory controller may be robust to the failure of oneresource, but be unable to guarantee continuing operation iftwo resources fail simultaneously. Further, the level of resourceand processing redundancy in the system will affect operationalproperties. For example, if every operation can be performed bytwo or more resources, then higher degrees of operational ro-bustness are possible.

We organize these possibilities into a taxonomy that will beused to establish new classes of relevant systems. We begin withthe simplest approach; to assume that there is some subset ofresources susceptible to failure and that at most one of theseresources can be down at a time. This leads to the results shownin Table I.

Next, we need some similar taxonomy for resource redun-dancy. As with robustness, there are many possibilities that needto be explored. One simple example is shown in Table II.

The next step is to cross these two taxonomies and identifythose systems that are most relevant to manufacturing as shownin Table III.

The deadlock avoidance research reviewed earlier addressessystems falling in the first row, that is, RBRD for rep-resents reliable systems with no flexible routing so that safe op-eration is equivalent to avoiding deadlock. For reliable systemswith , the control objective is to avoid deadlock whileexploiting flexible routing alternatives made available throughresource flexibility.

The second and third rows exhibit several interesting systems.For example, with the combination RBRD , the controllermust allocate resources so that if a specified unreliable resourcefails the resulting system can continue to produce part types not


TABLE IIRESOURCEREDUNDANCY

TABLE IIICROSSPRODUCT

requiring that resource, since there are no flexible routing oppor-tunities. For RB RD , the controller must allocate resourcesso that if a specified unreliable resource fails the system cancontinue producing all part types, since every part type stage re-quiring the failed resource has an optional resource type that itcan use.

Since the use of a central buffer can play an important rolein handling resource failures, we append a third category to ourtaxonomy, RB RD , where represents the capacity ofthe central buffer.

Section IV develops a supervisory controller forRB RD , while Section V develops a controller forRB RD , .

IV. ROBUST SUPERVISORYCONTROL WITH A SUBOPTIMAL

DEADLOCK AVOIDANCE POLICY

This section develops a supervisory controller for the systemRB RD . This controller combines a suboptimal DAP witha set of neighborhood constraints. We will establish that thispolicy satisfies properties 1)–4) of Definition 1.

A suboptimal DAP is a policy that sacrifices some safeallocation states in order to achieve computational tractability.Banker’s Algorithm (BA), first proposed by Habermann [1] forcomputer operating systems, is perhaps the most widely knownpolicy in this class. It assumes that as each process enters thesystem, it declares the maximum number of each resource thatit might ever require at one time. It further assumes that, if aprocess is simultaneously allocated its stated maximum of eachresource, then the process will terminate without additionalrequests and will release all reusable resources allocated toit. These resources then become available for allocation toother processes. The policy avoids deadlock by allowing anallocation only if the processes can be ordered so that themaximal resource needs of theth process, , can be met bypooling available resources with those allocated to processes

. The order defines a sequence in which all

processes in the system can be terminated successfully. Thework in [3] shows BA to be , where is thenumber of resource types andis the number of requests.

The work in [18] develops a variant of BA by modifying itsdata structure and introducing the concepts of ordered and par-tially ordered allocation states that allow incremental increasesin the number of admissible states (states accepted by BA are“ordered” while those rejected are “unordered”). Experimentalresults suggest that the modified algorithm performs well interms of the ratio (size of admissible region/size of safe region)as compared to several other suboptimal DAPs (such as [7], [17],and [19]).

Most DAPs, including BA, are not robust to resource failure.For example, BA and most other DAPs would allow the statesof Fig. 2. Thus, in the following, we develop two policies, onebased on a set of neighborhood constraints and the other a modi-fied version of BA. Although neither of these two policies is cor-rect by itself, we will prove that their conjunction satisfies thedesired properties. We begin with the neighborhood constraints.

Let , that is, is the sole unreliable resource andrecall that is the set of part type stages supported by. De-fine a resource, , to be “failure-dependent” if, for every

, , where . That is, is failure-depen-dent if every part that enters its buffer space requires future pro-cessing by the unreliable server. Let be the set of failure-dependent resources (clearly, ). For each resourcein , we will develop a neighborhood constraint. The neigh-borhood of is defined as the set of part type stagesthat require later in their routes and have no intervening re-source in . More formally, NH

with and for . Forexample, in Fig. 3, , , NH

, NH , NH .A resource allocation state is said to capacitate a neighbor-

hood when the number of part type instances currently in thesystem equals the capacity of neighborhood’s resource. For ex-ample, in Fig. 3, NH is capacitated if(recall that is the number of unfinished units of locatedin the buffer space of and is the number of finishedunits of located in the buffer space of ).

We will construct neighborhood constraints (NHC) to assurethat no neighborhood is over-capacitated, thus, for the exampleof Fig. 3, we have

(1)

(2)

(3)

Equation (1) assures that NHis not over-capacitated, while (2)and (3) assure the same for NHand NH , respectively.

We need one additional constraint. To see this, notice that inFig. 3 both NH and NH are capacitated and the system isstalled. The constraints for NHwill not let a unit of ad-vance to . However, any safe sequence for this state requiresthat one of the ’s be advanced to , which violates ’sconstraint. Thus, the neighborhood constraints, as they are now


Fig. 3. Example system.

Fig. 4. Example of deadlock state admitted by NHC.

stated, induce deadlock on the system. To alleviate this, we in-troduce the constraint that at most one neighborhood can be ca-pacitated at a time. That is,

(4)

(5)

(6)

Equation (4) assures that NHand NH are not simultaneouslyfilled, (5) assures that NHand NH are not simultaneouslyfilled, and (6) assures that NHand NH are not simultaneouslyfilled. The neighborhood constraints (NHC) can be generallystated as

For each

For each pair

We note that these constraints alone will not enforce safety ofthe system, that is, they will permit unsafe states, as illustratedin Fig. 4. Thus, NHC by itself is not a correct supervisor. We willshow later that its conjunction with a modified BA will yield asupervisor that is robust with respect to the unreliable resource.We now establish some simple but important properties of NHC.

Property 1: Movements of parts not requiring the unreliableresource are not inhibited by NHC.

Proof: If part type does not require the unreliableresource, then none of its associated state variables appear inNHC. Thus, NHC does not inhibit these parts.

Property 2: Neighborhoods do not overlap, that is, NHNH , such that .

Proof: Suppose NH NH , for . By defi-nition, the unreliable resource appears in the route of andthus, for some , . Also, every part that enters

the buffer space of both and requires future processingby the unreliable server. Without loss of generality, suppose

and with .Loosely speaking, an instance of needs first to visit ,then and then . NH implies that

, that is, none of the re-sources required by are failure-de-pendent. Similarly, NH implies that

, that is, none of the resources requiredby are failure-depen-dent. This is a contradiction, since .

Property 3: Movements of parts within a neighborhood donot violate NHC.

Proof: Suppose , NH . Further, supposestate satisfies NHC and . Then, the sum of statevariables ( ) appears in the constraintof NH . Since , advancing aninstance of to its next step decrements and incre-ments . This does not change the sum (

) and thus does not cause NHC to be violated.Property 4: Advancing a part out of a capacitated neighbor-

hood will not violate NHC.Proof: Suppose NH and NH .

Further, suppose statesatisfies NHC, , and capac-itates NH . Since , advancinga unit of will decrement the contents of NHand pos-sibly increment the contents of another neighborhood. However,NHC guarantees that only one neighborhood is capacitated instate , NH . Thus, advancing a part out of NHcannot over-capacitate a neighborhood or cause two neighborhoods to be si-multaneously capacitated.

Having developed NHC, we will now modify BA. Our ob-jective is to develop a version of BA that finds an ordering ofparts in the system so that those not requiring the unreliable re-source are advanced out of the system (thus making their cur-rently allocated resource available) and those requiring the un-reliable resource are advanced to the failure-dependent resourcethat corresponds to their current neighborhood (and thus, in caseof failure, assuring the supervisor’s ability to clear the systemof these parts). In the following, we assumeis the unreliableresource.

The required notation and the algorithm, Algorithm 1, aregiven in the Appendix. Furthermore, Fig. 5 provides a detailed


Fig. 5. Applying Algorithm 1.

example. Algorithm 1 operates on the part type stages repre-sented in the system and not on the actual parts (if there are sev-eral parts of the same type and stage in the system, Algorithm 1handles them simultaneously). Note that it uses three data struc-tures: AVAILABLE, ALLOCATION, and NEED.

AVAILABLE is a vector that records the number of avail-able resource instances of each resource (the amount of avail-able buffer space for each workstation), while ALLOCATIONis a two-dimensional (2-D) array that records the allocation of

resources to each part type stage represented in the system.NEED is a 2-D array that records the future resource needs foreach part type stage. For stages not requiring the unreliable re-source, NEED records the resources required to finish and exitthe system. For stages requiring the unreliable resource, NEEDrecords the resources required for the parts of that stage to makeit to the first failure-dependent resource in the remaining route(if parts of a stage are already on a failure-dependent resource,the NEED is set to 0).


The first loop for Algorithm 1 eliminates from considerationthose part type stages corresponding to parts on failure-depen-dent resources. Recall that, for parts requiring the unreliable re-source, the objective is push them into the failure-dependent re-source of their current neighborhood. If they are already there,the algorithm does not have to move them, so the correspondingstages can be eliminated from consideration.

The next loop identifies part type stages whose futureresource NEED is completely available. If the stage requiresthe unreliable resource, then this means that all resourcesrequired to advance the corresponding parts (one at a time) intothe failure-dependent resource of the current neighborhood areavailable. In this case, the resource currently held is deallocated(its place in the AVAILABLE vector is incremented by thenumber of parts of that stage present) and the stage’s NEED fornonfailure-dependent resources is set to zero. Then the AVAIL-ABLE units of the corresponding failure-dependent resourceis decremented by the number of parts of that stage presentand the ALLOCATION of the failure-dependent resource tothat stage is incremented by the number of parts of that stagepresent. This changes the resource allocation state from thecurrent to the one in which all of these parts have finishedadvancing.

If the stage whose NEED is completely available does not re-quire a failure-dependent resource, then the resource currentlyheld is de-allocated (its place in the AVAILABLE vector is in-cremented by the number of parts of that stage present), itsNEED is set to zero and its ALLOCATION is set to zero. Thischanges the resource allocation state from the current to one inwhich all of these parts have been completed.

Algorithm 1 accepts when all parts corresponding to stagesnot requiring the unreliable resource have been advanced out ofthe system and all parts corresponding to stages requiring theunreliable resource have been advanced into the failure-depen-dent resource of their current neighborhood. If this is not pos-sible, then Algorithm 1 rejects.

We note that, if there are no unreliable resources, this versionof BA reduces to the modified version given in [18] and it guar-antees deadlock-free operation. If an unreliable resource exists,then the policy is not correct by itself, for it will allow deadlocksamong failure dependant resources, as we see in Fig. 5. We nowshow that a conjunctive policy, which combines BA with NHC(a resource allocation will be permitted only if it satisfies bothBA and NHC) is robust to the failure of the unreliable resource.We refer to this policy as .

Lemma 1: enforces safety for .Proof: First, we note that BA admits a state only if there

exists an ordering such that all parts not requiring future pro-cessing on the failed resource,, can be finished (call theseparts ) and all parts requiring future processing oncan beadvanced into failure-dependent resources (call these parts).Note that we will assume parts at to belong to . Further,NHC admits a state only if at most one neighborhood is capac-itated. therefore admits a state only if it satisfies each ofthese requirements. To prove the lemma, we must establish twothings. First, we must show that execution of the ordering com-puted by BA never requires a violation of NHC. We must thenshow that the resulting state (all parts in having been com-

pleted and all parts in having been advanced into failure-de-pendent resources) is safe and all remaining parts can be ad-vanced to completion under .

We begin by assuming that induces deadlock in , thatis, there exists a state acceptable to, but for any possible or-dering that BA computes, execution of that ordering eventuallyrequires violation of NHC. Thus, in executing the ordering,would induce deadlock at the point where NHC has to be vio-lated to continue. Call this induced deadlock state .satisfies , that is, satisfies both BA and NHC. But, if thenext part in the order of BA is advanced, NHC will be violated.

Let be the next part advanced in the order computedby BA. Advancing must violate NHC. Thus, by Property1, requires the unreliable resource. Further, by Property 4,

must be advancing out of an uncapacitated neighborhoodand thus, occupies space in a failure-dependent resource.But, if this is true, cannot be next in the order since it al-ready occupies space at a failure-dependent resource and theorder computed by BA does not advance parts out of their cur-rent neighborhoods. Thus, we have a contradiction. Therefore,the ordering computed by BA for a given state can be executedwithout violating NHC.

We must now show that all parts remaining in the state re-sulting from executing the order can be finished under. First,note that after executing the order, only failure-dependent re-sources hold parts. Further, by NHC, there can be at most onecapacitated neighborhood. Thus, at most one failure-dependentresource can be allocated to capacity.

If a failure-dependent resource is allocated to capacity, ad-vance any part at that resource one step in its route, otherwise,select any part to advance one step. Then applyto the re-sulting state. By Property 4, this state does not violate NHC.BA will either move this part out of the system or into an-other failure-dependent resource. To see this, note that if the partdoes not require another failure-dependent resource, then all re-sources in its remaining route are available. If the part does re-quire another failure-dependent resource, then it is in the neigh-borhood and therefore free to advance into the failure-dependentresource’s buffer space.

It is clearly possible to repeat this procedure until all parts arecompleted and thus the proof is complete.

We will now show that all states admitted by serve asfeasible initial states for .

Lemma 2: If and enables , thenis a feasible initial state for .

Proof: Suppose satisfies . Then starting from ,admits a sequence of events that finishes all parts in

and advances all parts in to a failure-dependent resource.We claim that this same sequence can be executed starting instate .

Consider the set of events in the sequence. It consists of theoperation completions (’s) and the part advancements (’s) re-quired to finish parts in . It also consists of the operation com-pletions ( ’s) and part advancements (’s) required to moveparts in to the failure-dependent resource of their currentneighborhood. It will have no part advancement () that cor-responds to advancing a part away from a failure-dependent re-source and thus it does not require and will not have any com-


pletion event ( ) for a failure-dependent resource. Thus, thesequence in no way depends on the occurrence of completionevents at the unreliable resource (’s such that ),that is, the state of ’s server is irrelevant to the execution ofthe sequence.

Therefore, the sequence that BA computes foralso holdsfor . Once this sequence is executed starting from, all re-sources not failure-dependent are free of parts and the systemcan continue producing those parts not requiring.

Lemma 3: ensures safety for .Proof: By Lemma 2, assures feasible initial states for

. Further, it easily follows from the proof of Lemma 1 that ifis started in a feasible initial state, then enforces safety.Lemma 4: If , then is a feasible

initial state for .Proof: Suppose satisfies . Then BA computes an or-

dering of parts such that those in can be emptied from thesystem and those in can be advanced to the failure-dependentresource of their current neighborhood. By the proof of Lemma2, this sequence in no way depends on the status of the serverof the unreliable resource, . Thus, the occurrence of the re-pair event, , in no way affects the validity of the order of BA.After the order is executed, and all parts in will bein the buffer space of failure-dependent resources. By the proofof Lemma 1, these parts can then be completed under. Thus,

is a feasible initial state for .Theorem 1: is robust to the failure of .

Proof: Follows from Definition 1 and Lemmas 1–4.These results establish that is robust to the failure of

one specified unreliable resource. Thus, a system can bestarted under the supervision of and, anytime the unreliableresource fails, production can continue smoothly and withoutmajor disruption. We want to clarify that it is rarely necessaryto follow the sequence computed by BA, even when failureoccurs. It is only necessary to assure that each state allowed hassuch an associated sequence. This, together with NHC, gives usthe guarantee that safe operation is possible from that state. Infact, will allow new parts requiring the unreliable resourceto be loaded into the system even when that resource’s server isdown, just so long as the resulting state satisfies both BA andNHC.

The following section develops another robust supervisorypolicy for RB RD , which makes use of the optimality re-sults cataloged in [10].

V. ROBUST SUPERVISORYCONTROL WITH OPTIMAL

DEADLOCK AVOIDANCE

As mentioned in the literature review, one branch of deadlockavoidance research seeks to discover special system structuresthat render the safety question polynomial. If such a structureis present in the system design, then establishing state safety isequivalent to detecting deadlock. That is, if a state exhibits nodeadlock, then it is safe. For these systems, single step look-ahead for deadlock (SSLA), allowing a part movement so longas the resulting state exhibits no deadlock, keeps the system op-erational as long as no resource fails (it is easy to show that

SSLA is not robust to resource failure). SSLA applies a dead-lock detection algorithm to the state that results if some selectedpart is allowed to advance one step in its route. Deadlock detec-tion is a straightforward polynomial computation, and detectionalgorithms can be found in [22] and [24].

In [10], several special cases are cataloged for which statesafety and deadlock detection are equivalent. Two of the mostapplicable are: 1) when every resource has at least two units ofbuffer space and 2) there is an optional central buffer to whichany part may return after any operation, provided the centralbuffer has unallocated space available. For these types of sys-tems, we now establish that a policy combining NHC and SSLA(a state is allowed only if it has no deadlock and does not violateneighborhood constraints) is robust to the failure of one speci-fied resource. We refer to this policy as .

Lemma 5: ensures safety for .Proof: accepts a state only if it is safe and does not

violate NHC. Being safe implies the existence of a sequenceof resource allocations that finishes all parts in the system andsatisfying NHC implies no over-capacitated neighborhood andat most one capacitated neighborhood. To prove the lemma, weneed to establish that execution of the safe sequence does notrequire violation of NHC.

Assume that, in executing the safe sequence,inducesdeadlock, that is, we reach a state acceptable tobut any safepart movement violates NHC. Let be the set of parts inthe system that can safely advance one step, let be thosethat can advance, but not safely (these advance into deadlock),and let be those that cannot advance because theyare blocked. For every , the advancement ofmust violate NHC. By Property 1, requires the unreliableresource and, by Property 4, must be advancing out of anuncapacitated neighborhood. This implies that occupiesbuffer space at a failure-dependent resource,and that has some available buffer space (otherwise, itsneighborhood would be capacitated). Thus, parts in areblocked either by other parts in or by parts in ,but not by parts in .

Suppose that . Then, every part in thesystem can advance safely. Thus, if there is a capacitated neigh-borhood, advancing any part from that neighborhood does notviolate NHC. If there is no capacitated neighborhood, advancingany part from one neighborhood to another cannot violate NHC.Thus, we can have no induced deadlock.

Suppose that and . Then every partin is blocked by another part in and we have adeadlock (and a contradiction since such a state would never beaccepted by SSLA). Thus, we must have .

Now, if we advance , we get a deadlock.does not advance into the failure-dependent resource

occupied by any . To see this, recall thatcan safely advance and thus must have availablebuffer space. Because advancing deadlocks ,advancing to causes every part at tobecome blocked. However, if there exists such that

, then advancing tocannot capacitate . That is, advancing to

cannot cause to become blocked


and thus advancing to cannot causedeadlock. Thus, we have .

It follows that no resource in is in-volved in the deadlock created when we advance ,that is, parts in are irrelevant to the unsafety of parts in

. This implies that there is no safe sequence for parts inand thus we have a deadlock-free unsafe state one step

from deadlock involving at least the parts in and pos-sibly some in . This, however, contradicts the assump-tion that we have a special structure in place that eliminates thepossibility of deadlock-free unsafe states. This establishes thelemma.

Thus, if we have any special structure in place in the systemthat renders deadlock-free unsafe states impossible, then thepolicy SSLA combined with NHC keeps the system operating.We next establish that any state reachable underserves as afeasible initial state for .

Lemma 6: If and enables , thenis a feasible initial state for .

Proof: Suppose satisfies and enables . Thenthere exists a sequence of admissible state transitions thatempties the system; call this sequence. Note thatcontains events that advance parts into failure-dependentresources, , events that finish parts atfailure-dependent resources, and eventsthat advance parts away from failure-dependent resources,

. In fact, for every (recallthat is the set of parts in the system that require future pro-cessing on the unreliable resource,), has a subsequenceof the form , where advances into itsclosest failure-dependent resource, and advances the partout of this failure-dependent resource, through the rest of itsroute and out of the system. The subsequence may beinterspersed with many other events, but it must appear in.

Now suppose the failure event occurs in state . Thus,the new state is and we need to show that itis a feasible initial state for . It will be feasible initial if all

can be advanced to failure-dependent resources andheld there, without deadlocking other parts in the system. Toachieve this, modify as follows: for each , deleteits corresponding from , that is, delete the events thatadvance the part out of its closest failure-dependent resource,through the rest of its route and out of the system. Call the re-sulting sequence . is a sequence that advances all parts re-quiring the unreliable resource into the their closest failure-de-pendent resource and all parts not requiring the unreliable re-source out of the system. To see thatdoes this, recall that once

occupies a failure-dependent resource it no longer in-terferes with the advancement and completion of parts .Thus, for , the deleted subsequence , is irrelevant tothe safety of parts . Thus, is a feasibleinitial state for .

Lemma 7: Given that retains the special structure that ren-ders the safety polynomial, ensures safety for .

Proof: By Lemma 6, assures feasible initial states forand, in fact, demonstrates that all parts incan be advanced

through their remaining routes and out of the system and allparts in can be advanced into failure-dependent resources.

If the failure of does not interfere with the special structurethat eliminates deadlock-free unsafe states, then it is clear that

can have no deadlock-free unsafe states and thuskeepsoperating.To establish that is robust to the failure of , we now

have only to show that every stateof reachable under ,the state serves as a feasible initial state for.

Lemma 8: If , then is a feasibleinitial state for .

Proof: For every state, , assures a safe se-quence for parts in and a sequence that advances parts ininto failure-dependent resources. Clearly, this sequence is alsovalid for . Execute this sequence. The resulting state,say , has and every part of occupying a failure-de-pendent resource. Further, does not violate NHC. We haveonly to show that every part present in can be advanced tocompletion under .

If there exists a capacitated neighborhood, advance any partfrom that neighborhood, otherwise advance any part. The partis either leaving (the unreliable resource) for the last time(in which case it is entering no neighborhood) or entering anuncapacitated neighborhood. If the first, then it can be advancedout of the system. If the second, then it can be advanced to thefailure-dependent resource corresponding to that neighborhood.If parts remain, repeat the procedure.

Clearly, after this procedure is executed a finite number oftimes, all parts will be finished and the system empty. The emptystate with all resources operational, , is a feasible initialstate for .

Theorem 2: is robust to the failure of .Proof: This proof follows from Definition 1 and Lemmas

5–8.These results establish that is robust to the failure of one

specified unreliable resource. Thus, a system exhibiting the re-quired special structure can be started under the supervision of

and, anytime the unreliable resource fails, production cancontinue smoothly and without major disruption. As with ,we want to clarify that it is rarely necessary to explicitly followthe safe sequences guaranteed by, even when failure occurs.It is only necessary to assure that each state allowed has such asequence. This, together with NHC, gives us the guarantee thatsafe operation is possible from that state. As with, willallow new parts requiring the unreliable resource to be loadedinto the system even when that resource’s server is down, justso long as the resulting state satisfies both SSLA and NHC.

We will now discuss the use of a central buffer for helpinghandle resource failures. We have establishedto be robustto the failure of for systems with an optional central buffer,assuming that, after any operation, any part may return to thecentral buffer if it has available space. For this result to hold, thecentral buffer need hold only one part, that is, the result does notdepend on the central buffer being able to accommodate everypart requiring the failed resource (for a proof of this, the readeris referred to [24]).

Now, suppose we have a central buffer withspaces. We needat least one unit of the central buffer’s space to assure that SSLAavoids deadlock. The other spaces can be distributed among theneighborhoods of the failure-dependent resources. For example,


Fig. 6. Using a central buffer for SSLA and to augment neighborhoods.

in Fig. 6, two units of the central buffer are allocated for useby failure-dependent resource and two units are allocated tounreliable resource . The other two units are used by SSLA toassure that deadlock-free unsafe states do not arise. In this case,

will be robust to the failure of and the right-hand sidesof NHC will be augmented, making the policy more flexible.

A more sophisticated approach is to dynamically allocateunits of the central buffer to neighborhoods that request thoseunits when the neighborhoods fill up. Thus, the right-handsides of NHC could be dynamically adjusted based on real-timeallocation of central buffer space among failure-dependentresources and SSLA. This is beyond the scope of this paper andis one direction for future work.

VI. CONCLUSION

In this paper, we developed the notion of robust supervisorycontrol for automated manufacturing systems with unreliableresources. Our objective is to allocate system buffer space sothat when an unreliable resource fails the system can continueto produce all part types not requiring the failed resource. Weestablished four properties that such a controller must satisfy,namely, that it ensure safety for the system given no resourcefailure; that it constrain the system to feasible initial states incase of resource failure; that it ensure safety for the system whilethe unreliable resource is failed; and that during resource repairit constrain the system to states that will be feasible initial stateswhen the repair is completed. We then developed two polyno-mial control polices that satisfy these conditions and discussedmethods for using a central buffer to increase the flexibility ofthese policies. We also developed a taxonomy that combines ro-bustness and redundancy to help guide future work in this area.Future research will address dynamic allocation of central bufferspace to neighborhoods, other relevant categories of the tax-onomy and performance related issues through discrete eventsimulation.

APPENDIX

Let RT be the remaining route of(note that the current resource is included). For , let

RT andRT . is the set of part type stages

represented in whose part instances are either allocated bufferspace at or need to visit before finishing. is the setof part type stages represented inwhose part instances do nothold or require buffer space at . Let .

We want to modify BA so that it accepts a state if and onlyif there exists an ordering of such that every part of type

can be be allocated sufficient resources to be drivenfrom the system and every part of type can be allocatedsufficient resources to be driven into some .

Partition as follows: , whereand

. is the set of part type stages inwhose part instances are allocated buffer space at a failure-de-pendent resource. is the set of part type stages inwhosepart instances are not allocated buffer space at a failure-depen-dent resource.

Let be an indexing function on sothat is the th member of . Define the followingdata structures.

For all and ,ALLOCATION NEEDAVAILABLE (recall is thecapacity of ).For ,

ALLOCATION .For RT ,

NEED .For , ,

NEED .For

where is smallest integer with,

NEED .AVAILABLE ALLOCATION

The algorithm is now defined as follows.


Algorithm 1Begin

Remove parts on failure-dependent re-sources .LoopFind such that NEED ,

If no such exists, break from loop,ElseEnd LoopFind parts with resource needs that can beaccommodated .LoopIf , Return AdmitElse find such that ,NEED ALLOCATION AVAILABLE

If no such exists, Return Reject.ElseFor such thatAVAILABLE AVAILABLE ALLOCATIONNEEDEnd ForIf NEED for someFind such that ALLOCATIONAVAILABLE AVAILABLE ALLOCATIONALLOCATION ALLOCATIONNEEDEnd IfFor such thatALLOCATIONEnd For

End ElseEnd LoopEnd

REFERENCES

[1] A. Habermann, “Prevention of system deadlocks,”Commun. ACM, vol.12, pp. 373–377, 1969.

[2] A. Shoshani and E. Coffman, “Sequencing tasks in multiprocess systemsto avoid deadlocks,” inProc. 11th Annu. Symp. on Switching AutomataTheory, 1970, pp. 225–233.

[3] R. Holt, “Some deadlock properties of computer systems,”ACM Com-puting Survey, vol. 4, pp. 179–196, 1972.

[4] T. Araki, Y. Sugiyama, and T. Kasami, “Complexity of the deadlockavoidance problem,” inProc. 2nd IBM Symp. the Mathematical Foun-dations of Computer Science, Tokyo, Japan, 1977, pp. 229–257.

[5] E. Gold, “Deadlock prediction: Easy and difficult cases,”SIAM J. Com-puting, vol. 7, pp. 320–336, 1978.

[6] Z. Banaszak and E. Roszkowska, “Deadlock avoidance in pipeline con-current processes,”Podstawy Sterowania (Foundations of Control), vol.18, pp. 3–17, 1988.

[7] Z. Banaszak and B. Krogh, “Deadlock avoidance in flexible manu-facturing systems with concurrently competing process flows,”IEEETrans. Robot. Automat., vol. 6, pp. 724–734, 1990.

[8] N. Viswanadham, Y. Narahari, and T. Johnson, “Deadlock preventionand deadlock avoidance in flexible manufacturing systems using petrinet models,”IEEE Trans. Robot. Automat., vol. 6, pp. 713–723, 1990.

[9] R. Wysk, N. Yang, and S. Joshi, “Detection of deadlocks in flexiblemanufacturing cells,”IEEE Trans. Robot. Automat., vol. 7, pp. 853–859,1991.

[10] M. Lawley and S. Reveliotis, “Deadlock avoidance for sequential re-source allocation systems: Hard and easy cases,”Int. J. Flexible Manuf.Syst., vol. 13, no. 4, 2001.

[11] Y. Leung and G. Sheen, “Resolving deadlocks in flexible manufacturingcells,” J. Manuf. Syst., vol. 12, no. 4, pp. 291–304, 1993.

[12] T. Kumaran, W. Chang, H. Cho, and R. Wysk, “A structured approach todeadlock detection, avoidance and resolution in flexible manufacturingsystems,”Int. J. Prod. Res., vol. 32, no. 10, pp. 2361–2379, 1994.

[13] F. Hsieh and S. Chang, “Dispatching-driven deadlock avoidancecontroller synthesis for flexible manufacturing systems,”IEEE Trans.Robot. Automat., vol. 10, pp. 196–209, 1994.

[14] J. Ezpeleta, J. Colom, and J. Martinez, “A Petri-net based deadlock pre-vention policy for flexible manufacturing systems,”IEEE Trans. Robot.Automat., vol. 11, pp. 173–185, Apr. 1995.

[15] M. Lawley, S. Reveliotis, and P. Ferreira, “Design guidelines for dead-lock handling strategies in flexible manufacturing systems,”Int. J. Flex-ible Manuf. Syst., vol. 9, pp. 5–30, 1997.

[16] M. Fanti, B. Maione, S. Mascolo, and B. Turchiano, “Event-based feed-back control for deadlock avoidance in flexible production systems,”IEEE Trans. Robot. Automat., vol. 13, pp. 347–734, 1997.

[17] M. Lawley, S. Reveliotis, and P. Ferreira, “FMS structural control andthe neighborhood policy: Parts 1 and 2,”IIE Trans., vol. 29, no. 10, pp.877–899, 1997.

[18] , “Application and evaluation of Banker’s algorithm for dead-lock-free buffer space allocation in flexible manufacturing systems,”Int. J. Flexible Manuf. Syst., vol. 10, pp. 73–100, 1998.

[19] , “A correct and scalable deadlock avoidance policy for flexiblemanufacturing systems,”IEEE Trans. Robot. Automat., vol. 14, pp.796–809, 1998.

[20] E. Roszkowska and J. Jentink, “Minimal restrictive deadlock avoidancein FMS’s,” in Proc. Eur. Control Conf. ECC ’93, vol. 2, 1993, pp.530–534.

[21] K. Xing, B. Hu, and H. Chen, “Deadlock avoidance policy for Petrinet modeling of flexible manufacturing systems with shared resources,”IEEE Trans. Automat. Contr., vol. 41, pp. 289–296, 1996.

[22] S. Reveliotis, M. Lawley, and P. Ferreira, “Polynomial complexitydeadlock avoidance policies for sequential resource allocation systems,”IEEE Trans. Automat. Contr., vol. 42, pp. 1344–1357, Oct. 1997.

[23] M. Fanti, B. Maione, and B. Turchiano, “Event control for deadlockavoidance in production systems with multiple capacity resources,”Studies in Informat. Contr., vol. 7, no. 4, pp. 343–364, Dec. 1998.

[24] M. Lawley, “Deadlock avoidance for production systems with flexiblerouting,” IEEE Trans. Robot. Automat., vol. 15, pp. 1–13, June 1999.

[25] , “Integrating flexible routing and algebraic deadlock avoidancepolicies in automated manufacturing systems,”Int. J. Prod. Res., vol.38, no. 13, pp. 2931–2950, 2000.

[26] W. Sulistyono and M. Lawley, “Deadlock avoidance for manufacturingsystems with flexible sequencing,” inProc. IEEE Int. Conf. Roboticsand Automation, San Francisco, CA, Apr. 2000.

[27] B. Johnson,Design and Analysis of Fault-Tolerant Digital Sys-tems. Reading, MA: Addison-Wesley, 1989.

[28] M. Lyu, Software Fault Tolerance, M. Lyu, Ed. Chichester, N.Y.: JohnWiley, 1995.

[29] J. Gray and D. Siewiorek, “High-availability computer systems,”IEEEComputer, vol. 24, pp. 39–48, Sept. 1991.

[30] D. Pradhan,Fault Tolerant Computer System Design. Upper SaddleRiver, NJ: Prentice-Hall, 1996.

[31] D. Siewiorek and R. Swarz,Reliable Computer Systems: Design andEvaluation. Natick, MA: A.K. Peters, 1998.

[32] J. Buzacott, “Optimal operating rules for automated manufacturing sys-tems,”IEEE Trans. Automat. Contr., vol. AC-27, pp. 80–86, 1982.

[33] O. Maimon and S. Gershwin, “Dynamic scheduling and routing for flex-ible manufacturing systems that have unreliable machines,”Oper. Res.,vol. 36, pp. 279–292, 1986.

[34] A. Kalir and Y. Arzi, “Optimal design of flexible production lineswith unreliable machines and infinite buffers,”IIE Trans., vol. 30, pp.391–399, 1998.

[35] M. Olumolade and D. Norrie, “Reactive scheduling system for cellularmanufacturing with failure-prone machines,”Int. J. Comput. Integr.Manuf., vol. 9, no. 2, pp. 131–144, 1996.

[36] S. Reveliotis, “Accommodating FMS operational contingencies throughrouting flexibility,” IEEE Trans. Robot. Automat., vol. 15, pp. 3–19, Feb.1999.

[37] S. Park and J. Lim, “Fault-tolerant robust supervisor for discrete eventsystems with model uncertainity and its application to a workcell,”IEEETrans. Robot. Automat., vol. 15, pp. 386–391, Apr. 1999.

[38] F. Hsieh, “Reconfigurable fault tolerant deadlock avoidance controllersynthesis for assembly production processes,” inProc. IEEE Conf. Man,Systems and Cybernetics, Nashville, TN, 2000, pp. 3045–3050.

[39] P. Ramadge and W. Wonham, “Supervisory control of a class of discreteevent processes,”SIAM J. Control Optim., vol. 25, no. 1, pp. 206–230,1987.


Mark A. Lawley (M’01) received the B.S. degreein industrial engineering from Tennessee Techno-logical University, Cookeville, the M.S. degree inmanufacturing systems engineering from AuburnUniversity, Auburn, AL, and the Ph.D. degree inmechanical engineering from the University ofIllinois at Urbana-Champaign.

He is currently Assistant Professor of IndustrialEngineering at Purdue University, West Lafayette,IN. He has held industrial positions with Westing-house Electric Corporation and Emerson Electric

Company. His research interests are in manufacturing systems modeling,analysis and control.

Dr. Lawley is a Registered Professional Engineer in the State of Alabama.

Widodo Sulistyonoreceived the B.S. and M.S. degrees in electrical engineeringfrom Texas A&M University, College Station. He is currently working towardthe Ph.D. degree in industrial engineering at Purdue University, West Lafayette,IN.

He served as an Applications Engineer for the Agency for the Assessment andApplication of Technology—Indonesia, specializing in industrial control au-tomation. He also worked as a Graduate Assistant in the Industrial EngineeringDepartment at Purdue University.

Documents

Robust supervisory control policies for manufacturing ...pdfs.semanticscholar.org/0a5a/b585e8a766c1bc8d283c596b8c50854e365c.pdfmajority of automated manufacturing systems, the failure