6
Heurestics and reinforcement learning in manufacturing control: optimization by phases AISSANI Nassima LIO University of Oran Oran, Algeria [email protected] BELDJILALI Bouziane LIO University of Oran Oran, Algeria [email protected] MEZIANE Amine LIO University of Oran Oran, Algeria [email protected] Abstract—Manufacturing control still continues to attract the attention of researchers for 3 decades already. But what concerns them most is to find a compromise between optimizing either for cost, time, ect ... and responsiveness to cope with competition increasingly growing. In this paper we aim to develop a bi-phases manufacturing control, the first phase is assignment which is allocating resource to task, we use here an heuristic based on iterative optimization. Than the second phase sequencing which define a beginng and end date for each task, we use for that holon negociation and reactive learning. we applied the developed approach on flexibal job shop (FJS). Keywords—Manufacturing control; Holon system; heterarchical structure; heurestics; Reinforcement learning I. INTRODUCTION Manufacturing control still continues to attract the attention of researchers for 3 decades already. But what concerns them most is to find a compromise between optimizing either for cost, time, ect ... and responsiveness to cope with competition increasingly growing. First, herarchical static architectures form a pyramid in which the superior levels make decisions based on their global view of the system; and operational levels remain attentive to control directions. The top-down command flow assures that lower levels’ actions are lined up with the system’s objectives. Nevertheless, the high number of decision loops with non negligible associated lags and major issues concerning aggregation and de-aggregation of data; make hierarchical structures incapable of reacting appropriately and efficiently to internal disturbances and continuously changing market rules [1]. As a consequence, hierarchical structures do not offer the sufficient flexibility, adaptability and robustness necessary to cope with current market challenges. Then, distributed architecture have been developed: heterarchic et semi- heterarchic in which intelligent entities agree to work together without following a specific plan and avoiding the master/slave relationships present in hierarchical architectures. Their cooperation strategies are based on full local autonomy, self- organization, minimum global information and enhanced communication capabilities [2]. However, adaptability requires other techniques, including learning. The learning technique should be chosen to allow the system to remain responsive while improving its performance. Reinforcement learning (RL) is an appropriate way to reach this objective in multiagent systems. Thus, reactive learning technology, such as Reinforcement learning, are integrated into system’s agents to improve the quality of their decision-making so that the system can offer adaptive scheduling. Reinforcement learning is learning by trial and error. This type of learning will help us manage the heterarchical system’s “myopic” behaviour [3] by constantly improving the agent performances. In this paper we aim to present a distributed dynamic control system for JSF based on heurestics and reinforcement learning. This paper is organized as fellow: First, the problem is introduced by analyzing the literature about distributed dynamic control. Then, the motivation of our research is explained. Next, we introduce an accurate holonic model with reactive learning abilities and heuristic. After the proposed model is described, the results of its experimental implementation using data from a real case and our analysis of these results are presented. Finally, conclusions are presented, summarizing our contribution and introducing our prospects for future research. II. DISTRIBUTED DYNAMIC CONTROL : HETERARCHY AND MIOPYA A. Heterarchy and multi-agents One point that dynamic control and distributed scheduling have in common is that they both adopt a heterarchical model for agent-based manufacturing systems [4]. These autonomous, intelligent agents are perfectly suitable for production planning and scheduling. Heterarchical architectures, on the other hand, use a hierarchical organization with links between same-level entities, and thus are more responsive and help to reduce costs [5,9]. [2] analyzed the possibilities of self-organized heterarchical control systems within a dynamic production environment, highlighting the advantages of such an approach. With heterarchical architectures, it is possible to use a Multi-Agent System (MAS) model because the agents are independent and are able to receive information from their environment, act on that CoDIT'13 978-1-4673-5549-0/13/$31.00 ©2013 IEEE 798

[IEEE 2013 International Conference on Control, Decision and Information Technologies (CoDIT) - Hammamet, Tunisia (2013.05.6-2013.05.8)] 2013 International Conference on Control, Decision

  • Upload
    meziane

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: [IEEE 2013 International Conference on Control, Decision and Information Technologies (CoDIT) - Hammamet, Tunisia (2013.05.6-2013.05.8)] 2013 International Conference on Control, Decision

Heurestics and reinforcement learning in manufacturing control: optimization by phases

AISSANI Nassima LIO University of Oran

Oran, Algeria [email protected]

BELDJILALI Bouziane LIO University of Oran

Oran, Algeria [email protected]

MEZIANE Amine LIO University of Oran

Oran, Algeria [email protected]

Abstract—Manufacturing control still continues to attract the attention of researchers for 3 decades already. But what concerns them most is to find a compromise between optimizing either for cost, time, ect ... and responsiveness to cope with competition increasingly growing. In this paper we aim to develop a bi-phases manufacturing control, the first phase is assignment which is allocating resource to task, we use here an heuristic based on iterative optimization. Than the second phase sequencing which define a beginng and end date for each task, we use for that holon negociation and reactive learning. we applied the developed approach on flexibal job shop (FJS).

Keywords—Manufacturing control; Holon system; heterarchical structure; heurestics; Reinforcement learning

I. INTRODUCTION Manufacturing control still continues to attract the attention

of researchers for 3 decades already. But what concerns them most is to find a compromise between optimizing either for cost, time, ect ... and responsiveness to cope with competition increasingly growing.

First, herarchical static architectures form a pyramid in which the superior levels make decisions based on their global view of the system; and operational levels remain attentive to control directions. The top-down command flow assures that lower levels’ actions are lined up with the system’s objectives. Nevertheless, the high number of decision loops with non negligible associated lags and major issues concerning aggregation and de-aggregation of data; make hierarchical structures incapable of reacting appropriately and efficiently to internal disturbances and continuously changing market rules [1]. As a consequence, hierarchical structures do not offer the sufficient flexibility, adaptability and robustness necessary to cope with current market challenges. Then, distributed architecture have been developed: heterarchic et semi-heterarchic in which intelligent entities agree to work together without following a specific plan and avoiding the master/slave relationships present in hierarchical architectures. Their cooperation strategies are based on full local autonomy, self-organization, minimum global information and enhanced communication capabilities [2].

However, adaptability requires other techniques, including learning. The learning technique should be chosen to allow the

system to remain responsive while improving its performance. Reinforcement learning (RL) is an appropriate way to reach this objective in multiagent systems. Thus, reactive learning technology, such as Reinforcement learning, are integrated into system’s agents to improve the quality of their decision-making so that the system can offer adaptive scheduling. Reinforcement learning is learning by trial and error. This type of learning will help us manage the heterarchical system’s “myopic” behaviour [3] by constantly improving the agent performances.

In this paper we aim to present a distributed dynamic control system for JSF based on heurestics and reinforcement learning.

This paper is organized as fellow: First, the problem is introduced by analyzing the literature about distributed dynamic control. Then, the motivation of our research is explained. Next, we introduce an accurate holonic model with reactive learning abilities and heuristic. After the proposed model is described, the results of its experimental implementation using data from a real case and our analysis of these results are presented. Finally, conclusions are presented, summarizing our contribution and introducing our prospects for future research.

II. DISTRIBUTED DYNAMIC CONTROL : HETERARCHY AND MIOPYA

A. Heterarchy and multi-agents One point that dynamic control and distributed scheduling

have in common is that they both adopt a heterarchical model for agent-based manufacturing systems [4].

These autonomous, intelligent agents are perfectly suitable for production planning and scheduling. Heterarchical architectures, on the other hand, use a hierarchical organization with links between same-level entities, and thus are more responsive and help to reduce costs [5,9]. [2] analyzed the possibilities of self-organized heterarchical control systems within a dynamic production environment, highlighting the advantages of such an approach. With heterarchical architectures, it is possible to use a Multi-Agent System (MAS) model because the agents are independent and are able to receive information from their environment, act on that

CoDIT'13

978-1-4673-5549-0/13/$31.00 ©2013 IEEE 798

Page 2: [IEEE 2013 International Conference on Control, Decision and Information Technologies (CoDIT) - Hammamet, Tunisia (2013.05.6-2013.05.8)] 2013 International Conference on Control, Decision

information and generally behave in a rational manner.[7] have also experimented with this heterarchical approach to control for scheduling in job-shops and for scheduling production and maintenance tasks in the petroleum industry [8]. A major problem in these heterarchical system models is “myopic” behavior [9], which makes it difficult to provide efficient results and optimization mechanisms. Models could potentially be based on a holonic approach since such approaches have been successfully used to solve distributed scheduling problems. For example, in the ADACOR project, [11] used simple scheduling algorithms embedded in holons. These authors integrated dynamic mechanisms to increase system performance and used industrial scenarios that required fast scheduling solutions. Although their approach does not include any learning capabilities for adaptive behavior (as we aim to), [11]’s experiments showed that their approach could improve performance, especially in terms of agility.

In the next section, adaptation of Reinforcement learning (RL) for the multi-agent decisional context is presented.

B. Agent learning capabilities and reinforcement learning techniques The integration of learning capabilities has already been

studied in the context of scheduling, but not specifically as related to multi-site production. For example, [12] examined a scheduling problem for a very expensive electric motor production system. These authors considered the production units as insect colonies that were able to organize themselves to carry out a task, thus allowing production risk problems to be solved more easily. [13] proved the ability of Multi-Agent Systems to control job-shops and the usefulness of integrating learning capabilities into the agents in order to make them adaptive and able to react effectively to resolve disturbances. More recently, [8] used intelligent agents with multi-objective learning capabilities based on reinforcement learning. And in [10] Reinforcement learning is used to control multi-site production system.

Reinforcement learning is learning by trial and error. In other words, agents perform actions and wait for an evaluation of the quality of the chosen action to determine whether or not the action should be added to their repertoire. Reinforcement learning is thus a reactive learning technique and can be appropriate for generating online solutions and improving them over time. This technique has often been used in robotics to teach robots proper behavior with respect to goals and obstacles [14].

Problem distribution must be done intelligently on an appropriate model that can control different entities that often have different goals and different convergences. When these entities are also endowed with a reactive learning technique, such as reinforcement learning, learning must be controlled so that entities can learn a policy that allows them to accomplish their objectives, while simultaneously responding to a global goal. To do this, modeling the learning function must involve entities learning local parameters as well as the overall system parameters.

However this technic can be used in different manner to control a manufacturing system, because we can identify

especially to kind of resolution planning and scheduling system: integral resolution or resolution by phases.

C. Integrated or by phases resolution for dynamic control The problem of scheduling workshops can be flexible Job

Shop husked into two problems: assignment of operations to machine operations and sequencing at each machine].

Integrated: In this approach, the two sub problems are treated fully, that is to say, without dividing the original problem. A lot of work is done for the integrated resolution of scheduling problems [15].

By phases: Unlike the integrated approach, the hierarchical approach treats the problem in two phases by dividing the original problem into two simpler problems in order to reduce the complexity of the problem: Assignment or allocation: choose which resource will execute the task, sequencing: in which order. Table1 represents work that used the Bi-phase and inspired us.

Table1. References using bi-phases resolution

Ref Assignment sequencing

[16] Local research (Neighbourhood function)

Neighbourhood function

[17] location method Genetic algorithm

[18] Branch & bound Genetic algorithm

[19] Tabu research Genetic algorithm

Based on this analysis, the proposed approaches are limited especially in the face to the compact scheduling. A compact plan or scheduling is characterized by a minimum number of unused time slots (dead time) at each resource. Indeed, scheduling is more compact, the maximum resource exploitation is what directly affects its quality. In this context, we focus on the compactness of scheduling as a means of optimization in the assignment phase.

The following section presents the general holon model and shows how that work for assignment and sequencing including the RL mechanisms used to control JSF.

III. HOLONIC FRAMEWORK FOR MANUFACTURING CONTROL Conventional production management and control activities

can be classified as strategic, tactical and operational depending on the long, medium or short term nature of their task [20]. Manufacturing control refers to the management and control decisions at the operational level. Resource allocation, routing evaluation, tool and inventory management, maintenance scheduling as well as online control and monitoring of shop-floor activities are regular tasks necessary for manufacturing control. Although, the scope of this paper reaches only the allocation problem (aiming at proving the concept) this framework tries to set some generic structural and organizational concepts, which could replicated to the other manufacturing activities.

CoDIT'13

978-1-4673-5549-0/13/$31.00 ©2013 IEEE 799

Page 3: [IEEE 2013 International Conference on Control, Decision and Information Technologies (CoDIT) - Hammamet, Tunisia (2013.05.6-2013.05.8)] 2013 International Conference on Control, Decision

A. Holon’s generic structure A reformulation of Christensen’s holon model [21] is

proposed due certain limitations and missing functionalities desired for myopic behavior control. First of all, the physical processing layer of this model is too restrictive. Nowadays, manufacturing systems may contain smart entities which take local decisions and therefore have a certain degree of autonomy. Our model makes a clear division between a high-level control and an operational level control (Figure. 1) to mark up these differences. The high-level control represents the intelligence in the holon, which means the decision-making, the information treatment and the well-structure interaction capabilities (e.g. a decision-making based on a market-like approach functionality will reside on this layer). On the contrary, the operational layer, though it has certain intelligence, it is ruled by the decisions taken in the above level (e.g., routing decisions to go from resource A to resource B, being the resource B the destination decided by the high-level control).

Figure 1. Proposed holon structure

Two additional characteristics are proposed: the division between “decision-making” and “data processing”, and the inclusion of specialized interfaces at each level. The first one is inspired on the different nature of both processes. While data calculation could be required on a high frequency basis, decision-making processes are event-oriented. The decision-making entities are served by the data processing modules, while the data processing modules keep variables and parameters updated based on those decisions and current system conditions.

Finally, the three-layer model counts on specialized interfaces at each level to have a clear understanding on the necessary protocols and technological requirements depending on the type of information they transmit. Inner holon communication is also possible and it is also handled by the same interface units.

The five components of the proposed holon model are described in detail:

The Control Decision Entity (CDE) hosts the algorithms and strategies for achieving the holon’s objectives. It uses the information stored in the CEM and the holon’s state (stored in the ODE) to make the appropriate decisions.

The Control ExecutionModule (CEM) manages all the information related to the high level control. It does not take any decisions but it structures, modifies and stocks information. Attached to the CEM an interface drives the information exchange with external holons, at the same level. This means the exchanged information is relative to holon’s objectives or its decision-making process. Internally, the CEM provides to the CDE with meaningful information coming from the lower layers, necessary also for decision-making. The CEM is considered as an information processing module.

The Operational Decision Entity (ODE) takes all the operational decisions related to the low level control of the system. Once a decision has been reached in the upper level, a sequence of operational decisions must be taken so the tasks can be done. The choices made by this entity are transformed into actions at the physical level (e.g. a shuttle moving, a resource processing a product, etc.). Additionally, the aggregation/de-aggregation decisions are managed from this level. The number of sub-holons needed and their task assignation are some of the responsibilities of the ODE.

The Operational Execution Module (OEM) manages all the information related to the low level control. It does not take any decision but it structures, modifies and stocks information related to the local information needed for task execution. This module provides information to the upper layer and the ODE, coming from other holons, sub-holons or the physical part of the holon. An image of this module allows the external users to determine the state of the holon at a time t.

The Physical Entity (PE) represents the physical elements found in a manufacturing system (products, tools, machines, pallets, conveyors, routing swivels, etc). This entity may or may not be implemented by some types of holons, depending on their nature.

B. Holonic system organization Manufacturing control entities are instantiated from the

previously presented holon structure. Two types of holons are present in the system: product-related and resource-related holons. Product-related holons demand services to resource-related holons. For this framework we structure the manufacturing organization based on the recursiveness characteristic of holonic systems.

The Order Holon (OH) represents the jobs in a manufacturing system. Its control layer is in charge of managing the strategy to accomplish its objectives with the best possible performance. It uses its operational layer to create other sub-holons and assign to them, parts of the problem so they can work on a solution in a distributed way. The order holon does not need to implement a physical part since its sub-holons will have it. Order holons are issued when a production order is available by the upper management system

Product Holons (PH) are the order holon’s sub-holons. Each product has been assigned with a specific task-sequence according to its type and a particular objective, which they store in their CEM. Based on this information, the CDE implement a strategy to achieve the individual objective, while assessing the overall order’s performance. In the operational

CoDIT'13

978-1-4673-5549-0/13/$31.00 ©2013 IEEE 800

Page 4: [IEEE 2013 International Conference on Control, Decision and Information Technologies (CoDIT) - Hammamet, Tunisia (2013.05.6-2013.05.8)] 2013 International Conference on Control, Decision

layer the product holon has a controller for its physical part as well as the required functionalities to create and manage other sub-holons. The product holon’s physical part is composed of the actual product and a resource holon (the shuttle) for mobility purposes.

Resource Holons are service providers for product holons. On their control layer they can host their own objectives and strategies to accomplish them (such as work balance, adaptive capacity, tool management, etc). Their CEM hosts the information related to supported tasks and reservations posted by PHs. The operational layer is composed by the control elements, physical and informational, for executing its services. At the physical level, sensors and actuators follow the commands received from the operational layer to achieve a manufacturing process.

This proposed holonic organization targets the implementation of two mechanisms to control myopic behavior. The first mechanism concerns assignment based on heuristics. Second mechanism concerns sequencing, based on holon negotiation and reactive learning technic which is RL.

IV. RESOLUTION

A. Problem presentation The problem is to organize the execution of N Jobs on R

resources. Each Job Jj consists of a number of ordred operations aj (precedence constraint). Each operation must be run on a resource. The execution of the operation Oi, j (ith

operation (or task) of Jj) on Rk resource which makes the resource Rk unavailable for other operations during a period Pi, j, k (resource constraint).

For each task a treatment times table can be calculated and associated, for example product who is on resource R1 waiting of axis task has table 2 of execution time.

Figure2. Manufacturing layout and task distribution

Table2. Operation time

Task R2 R3

B2 22 42

In the first phase assignment, we aim to place tasks on resources, the second phase sequencing calculate the begining date ti, j and the end date tfi, j of each task Oi, j. The objective is to minimize the considered tow criteria described as follows:

1. The makespan: Cr1 = Cmax=maxj tf aj i, j, j, 2. The maximum workload: Cr2 = Wmax = Maxk Wk, k / Wk is the sum of operation times of assigned operation to resource Rk where : Wk = / R ≥ K ≥ 1. Xi, j, k : Xi, j, k = 1 if Oi, j is assigned to Rk, else, Xi, j, k = 0.

The objective is: f(s)Ass = 1/ [Cr1(s) + Cr2(s)]

B. Dynamic behaviour The order holon (OH) sends production sequence to each

product holon (PH), PHs sends execution request to concerned resources (RH). According to their stat, RH sends their proposition for execution to OH. OH executes the assignment procedure to allocate RH for each task. This is the assignment phase.

In the sequencing phase, each RH receives it predicted sequence. For each task and according to its Qtable, the RH choses to delay or leave dates as they are and sends decision to PH in order to verify the precedence constraints. When all dates of tasks are defined, each RH sends its sequence to OH who calculates performances and sends rewards to RHs in order to update their Qtable.

Figure3. UML sequence diagram

C. Assignment procedure l = 0 (iteration 0), 1. Generate an initial solution S0, 2. Generate space solution Ωl

n = S1, S2, S3, …, Sn à partir de S0 (tasks permutation : diversification),

3. l = l + 1, (Next iteration) 4. Apply the optimization function Fop (x) on Ωl

n (Task Movement):

CoDIT'13

978-1-4673-5549-0/13/$31.00 ©2013 IEEE 801

Page 5: [IEEE 2013 International Conference on Control, Decision and Information Technologies (CoDIT) - Hammamet, Tunisia (2013.05.6-2013.05.8)] 2013 International Conference on Control, Decision

Ωln’ = Fop (S1) = y1, Fop (S2) = y2, Fop (S3) = y3,…, Fop

(Sn) = yn’ 5. Select m best solutions among the set y1, y2, y3, …, yn’

and add them in descending order (depending on the fitness of each solution) to the list Ll

m = δ 1, δ 2, …, δ m, 6. Generate space solution Ωl

n = S1, S2, S3, …, Sn from δ 1 and go to 3 until Ll

m ≠ Ll-1m.

S0 is generated by combining RH responses for PH

requests. Ensuring beginning by workable solutions. 1) Tasks permutation (diversification)

We note: S = O1

1, O21, O3

1,…, On11, O1

2, O22, O3

2,…, On22 , …, O1

m, O2

m, O3m, …, Onm

m a solution, where: - O1

1, O21, O3

1, …, On11 are tasks assigned to RH1.

- O12, O2

2, O32, …, On2

2 are tasks assigned to RH2. - O1

m, O2m, O3

m, …, Onmm are tasks assigned to RH3.

Permute Oik with Oj

k’ /

If the new solution obtained via the permutation is

acceptable (meets all constraints), it is accepted otherwise it is rejected.

2) Tasks movemnt (compact – optimization)

Task O i, j be moved to the unused period Q [α, β] if:

J j is critical : t f aj > t f aj*

1) Générer une solution initiale In the following example (Figure 1), the operation O1, 4 is

the last operation. Therefore, we favor the movement of the operations Job 1: the only operation that the Job 1 can be moved to the period 1 is O1, 2.

Movement of O1, 2 unused period 1 : [α, β] - J1 is critic ; - 20 > 8 : O1, 2 is after 1 ; - 8 + 2 ≤ 10 : no conflict between O1, 2 and O3, 2 (resource

constraint) ; - 8 ≥ 6 : O1, 2 begin after O1, 1 finish (precedence

constraint). The optimization of each solution through several movements. The process is repeated until there is a movable operation.

D. Sequencing procedure Second phase is based on RL. RL methods involve learning to act by trial and error. Holons perceive their individual states and perform actions for which numerical rewards are given. The goal of the agents is thus to maximize the total reward received over time (Figure5). These methods are often used in robotics, in order to teach a robot the proper behaviour to achieve its goals and to overcome obstacles. The most popular method is Q-learning [22].

Figure 5. Q-learning algorithm

…* This algorithm is works with the following data:

• State parameters are the current time t ∈ 0…T; the inventory of tasks affected to resource Oj1…Ojn ; the list of resources states SRH1…SRHj…SRHk(e.g., working, stopped) and ordered sequence Rq , and the production duration for each operation (Pijr ). • Action For each task and according to its Qtable, the RH choses to keep task on its place (with the predicted dates) or move after tasks already placed.

• Reward functions assign no reward to most states, positive rewards to the goal state and negative rewards to non desirable states. For more precision and to obtain a proper convergence, the reward function is a state combination engendered by an action and a Cmax, Cmaxi (Cmax on i resource) and Wmax:

R(s) = 1/C max+ Cmaxi +Wmax More details are given in [7 and 9].

J1 J2 J 3unused

RH1

RH2

RH3

2 4 6 8 10 12 14 16 18 20 22 24 26 28

O1, 1

O3, 1

O3, 2O2, 1

O2, 2

O2, 3

O3, 3

O3, 4

O1, 2

O1, 3

O1, 4

Critic Job 1 2

3 4

Resourc

Time Figure 4 : Re-assignemnt of O1, 2 (RH1 to RH3)

- 1 ≤ K ≤ m

- K+1 ≤ K’ ≤ m-1

- 1 ≤ i ≤ nk

- 1 ≤ j ≤ nk’

- Job J j is critical - t f i, j > α (O i, j finish after Q) - α + P i, j ≤ β (resource constraint) - α ≥ t f i-1, j (precedence constraint)

CoDIT'13

978-1-4673-5549-0/13/$31.00 ©2013 IEEE 802

Page 6: [IEEE 2013 International Conference on Control, Decision and Information Technologies (CoDIT) - Hammamet, Tunisia (2013.05.6-2013.05.8)] 2013 International Conference on Control, Decision

V. EXPERIMENTATION AND ANALYSIS The physical cell used for is shown in figure 2. In this cell,

four different letter-shaped products (B, E, L and T) can be manufactured with a combination of five basic components (Axis, I, L, r and screw) assembled on a plate (Fig. 6). Each product is assembly according to its fabrication sequence, imposing precedence constraints to the allocation problem.

Figure 6. Components, finished products and fabrication sequences

The Holons framework is simulated using Jade MAS

development platform (see http://jade.tilab.com/) in JDK 5 environment. Tow mixed-orders were issued separately. 4, 8 products orders were launched into the Jade-based simulator. Results are presented in Table 3 compared to MILP (Mathematical Integer Linear Program) developed for this workshop [3].

Table3. Results from the proposed holonic system and the MILP (*optimal solution)

Order Holonic System MILP Wmax Cmax (s) Exec.

Time (s) Cmax

(s) Exec.

(s) 4 67 369 10.051 349* 165.63 8 90 642 60.231 549 3600

These first results show: first the Realizability of the developped approach and the results are interesting compared to an exact method of resolution MILP.

VI. CONCULSION AND FUTURE WORKS The main objective of this paper was to set the grounds for studying heterarchical manufacturing control integrating heuristic and RL. We also showed how we have combined the two in a holonic system. The experimentations performed on the flexible workshop are encouraging. The objective now is to push further experiments to ensure the reliability of the approach on more complex cases. We would also propose the improvement of the RL algorithm. Anyway other heurestics could be considered.

REFERENCES

[1] Nagalingam, S. Lin, G., 2008. CIM—Still the solution for manufacturing industry. Robotics and Computer Integrated Manufacturing (24)pp. 332–344.

[2] Bousbia, S., & Trentesaux, D. (2002). Self-organization in distributed manufacturing control: State-of-the-art and future trends. IEEE International Conference on Systems, Man & Cybernetics, 5, 6

[3] Zambrano, G., Aissani, N., Pach, C., Berger, T., & Trentesaux, D. (2011). An approach for temporal myopia reduction in heterarchical control architectures. In Proceedings of 20th IEEE inter symposium on industrial electronics, 27–30 June 2011.

[4] Silva, N., Sousa, P., & Ramos, C. (1998). A holonic manufacturing system implementation. Advanced Summer Institute(ASI’98). Bremen, Germany; 14–17 June 1998.

[5] Trentesaux, D., Pesin, P.,&Tahon, C. (2000).Distributed artificial intelligence for FMS scheduling, control and design support. Journal of Intelligent Manufacturing, 11, 573–589.

[6] Trentesaux, D. (2009). Distributed control of production systems. Engineering Applications of Artificial Intelligence, 22(7), 971–978

[7] Aissani, N., Trentesaux, D., & Beldjilali, B. (2008). Use of machine learning for continuous improvement of the real time manufacturing control system performances. International Journal of Industrial System Engineering, 3(4), 474–497.

[8] Aissani, N., Beldjilali, B., & Trentesaux, D. (2008b). Efficient and effective reactive scheduling of manufacturing system using SARSA-multi-objective-agents. In Proceedings of the 7th international conference MOSIM, Paris, pp. 698–707.

[9] Aissani, N., Trentesaux, D., & Beldjilali, B. (2009). Dynamic scheduling of maintenance tasks in the petroleum industry: A reinforcement approach. EAAI: Engineering Applications of Artificial Intelligence, 22, 1089–1103.

[10] Aissani, N., Bekrar, A, Trentesaux, D., & Beldjilali, B. (2012). Dynamic scheduling for multi-site companies: a decisional approach based on reinforcement multi-agent learning. J Intell Manuf DOI 10.1007/s10845-011-0580-y,Journal of Intelligent Manufacturing, Vol. 23, Issue 6, pp 2513-2529

[11] Leitao, P., & Restivo, F. (2008). A holonic approach to dynamic manufacturing scheduling. Robotics and Computer-Integrated Manufacturing, 24, 625–634.

[12] Katalinic, B., & Kordic, V. (2004). Bionic assembly system: Concept, structure and function. In Proceedings of the 5th IDMME, Bath, UK.

[13] Monostori, L., Csáji, B. Cs., & Kádár, B. (2004). Adaptation and learning in distributed production control. CIRP Annals-Manufacturing Technology, 53(1), 349–352.

[14] Dongbing, G., & Yang, E. (2007). Fuzzy policy reinforcement learning in cooperative multi-robot systems. Journal of Intelligent and Robotic Systems, 48(1), 7–22.

[15] Jones. A, Louis C, « Survey of Shop Scheduling Techniques», NISTIR, National Institue of Standards and Technology, Gaithersburg, MD, 1998.

[16] Mastrolilli, M. and Gambardella, L. M., Effective neighborhood functions for the flexible job shop problem. Journal of Scheduling, 3, 3-20, 2000.

[17] Kacem, I., Hammadi, S. and Borne, P. Approach by localization and multi-objective evolutionary optimization for flexible job-shop scheduling problems. IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews, 31, 1-13, 2002

[18] N. Zribi; Kacem Imed; El Kamel Abd-El-Kader; Borne Pierre;“Assignment and scheduling in flexible job-shops by hierarchical optimization”, vol. 37, no4, pp. 652-661, article (26 ref.), 2007.

[19] N. Zribi, A. El Kamel. MPM Job-shop under Availability Constraints . Int. J. of Computers, Communications & Control, 2009.

[20] Caramia, M. Dell'Olmo, P., 2006. Effective resource management in manufacturing systems: optimization algorithms for production planning. Springer Series in Advanced Manufacturing, Springer-Verlag London Limited.

[21] Christensen, J., 1994. Holonic manufacturing systems : Initial architecture and standards directions. Proceeding of the First European Workshop on Holonic Manufacturing Systems, HMS Consortium, Hannover, Germany

[22] Watkins, C. J. C. H. (1989). Learning from delayed rewards, PhD thesis, Cambridge University, Cambridge, England

CoDIT'13

978-1-4673-5549-0/13/$31.00 ©2013 IEEE 803