14
IEEE TRANSACTIONS ON CYBERNETICS 1 Automated Coordination Strategy Design using Genetic Programming for Dynamic Multi-Point Dynamic Aggregation Guanqiang Gao, Yi Mei, Senior Member, IEEE, Bin Xin, Member, IEEE, Ya-hui Jia, Member, IEEE, and Will N. Browne, Member, IEEE Abstract—The multi-point dynamic aggregation (MPDA) prob- lem of the multi-robot system is of great significance for its real- world applications like bush fire elimination. The problem is to design the optimal plan for a set of heterogeneous robots to complete some geographically distributed tasks collaboratively. In this paper, we consider the dynamic version of the problem where new tasks keep appearing after the robots are dispatched from the depot. The dynamic MPDA problem is a complicated optimization problem due to several characteristics, such as the collaboration of robots, the accumulative task demand, the relationships among robots and tasks, and the unpredictable task arrivals. In this paper, a new model of the problem considering these characteristics is proposed. To solve the problem, we develop a new genetic programming hyper-heuristic (GPHH) method to evolve reactive coordination strategies (RCS) which can guide the robots to make decisions in real-time. The proposed GPHH method contains a newly designed effective RCS heuristic template to generate the execution plan for the robots according to a GP tree. A new terminal set of features related to both robots and tasks, and a cluster filter which assigns the robots to urgent tasks are designed. The experimental results show that the proposed GPHH significantly outperformed the state-of-the-art methods. Through further analysis, useful insights such as how to distribute and coordinate robots to execute different types of tasks are discovered. Index Terms—Genetic programming, hyper-heuristic, multi- robot system, multi-point dynamic aggregation, real-time decision-making I. I NTRODUCTION The Multi-Point Dynamic Aggregation (MPDA) problem is an important optimization problem that has many real- world applications [1]–[3]. The objective of MPDA is to design a routing plan for a set of robots to complete some distributed tasks in a collaborative way so that the makespan (the completion time of the last task) is minimised. A typical This work was supported in part by the National Key Research and Development Program of China under Grant 2018YFB1308000; in part by the National Natural Science Foundation of China under Grant 61673058 and Grant 62088101; in part by the National Outstanding Youth Talents Support Program under Grant 61822304; in part by the Consulting Research Project of the Chinese Academy of Engineering under Grant 2019-XZ-7; in part by the Projects of Major International (Regional) Joint Research Program of NSFC under Grant 61720106011; in part by the Beijing Advanced Innovation Center for Intelligent Robots and Systems; and in part by the Peng Cheng Laboratory. (Corresponding author: Bin Xin.) Guanqiang Gao and Bin Xin are with the School of Automation, Beijing Institute of Technology, Beijing 100081, China; Yi Mei, Ya-hui Jia, and Will N. Browne are with School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand. example is the robot fire-fighting planning problem. In the past years, much attention has been given to the fire-fighting problem (e.g. the bush fire 2019 in Australia and the bush fire 2020 in California) [4], [5] since fire causes enormous damage to the economy and natural environments. Because of good mobility and safety [6], multi-robot systems (e.g. unmanned aerial/ground vehicles) have been considered to execute fire- fighting missions [1], [7]–[10]. In addition to the fire-fighting application, MPDA also has other real-world applications, such as intelligent security [11], search and rescue [12], and environment monitoring [13]. So far, most studies for MPDA have been focused on static environments, in which all the information regarding the robots and tasks is known in advance [2], [3], [14]. However, in the real world, the environment is dynamic and the information about the new tasks is unknown until the tasks are detected. For example, Fig. 1 shows a dynamic scenario of a multi-robot fire-fighting mission. Three robots are assigned to execute tasks 1, 2 and 3. Tasks 4 and 5 are yet not known to the robots, as they have not been detected/arrived at yet. In this paper, we will focus on solving the Dynamic MPDA (DMPDA) problem with dynamic task arrivals. Fig. 1. An example of the dynamic task planning of the multi-robot system for the fire-fighting mission. The DMPDA problem is a very challenging problem. First, the traditional optimization methods like mathematical algorithms and meta-heuristic algorithms that generate de- terministic solutions cannot efficiently handle the dynamic environment. When a new task is detected, the pre-planned solution needs to be adjusted to complete the new task as well as the remaining tasks. However, these methods cannot effectively re-optimize the solution in real-time due to their high computational cost and the NP-hard characteristic of the problem [1], [15]. Second, the demand of a task increases over time. A little delay on one task can lead to extremely large ac-

IEEE TRANSACTIONS ON CYBERNETICS 1 Automated …

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

IEEE TRANSACTIONS ON CYBERNETICS 1

Automated Coordination Strategy Design usingGenetic Programming for Dynamic Multi-Point

Dynamic AggregationGuanqiang Gao, Yi Mei, Senior Member, IEEE, Bin Xin, Member, IEEE, Ya-hui Jia, Member, IEEE, and

Will N. Browne, Member, IEEE

Abstract—The multi-point dynamic aggregation (MPDA) prob-lem of the multi-robot system is of great significance for its real-world applications like bush fire elimination. The problem is todesign the optimal plan for a set of heterogeneous robots tocomplete some geographically distributed tasks collaboratively.In this paper, we consider the dynamic version of the problemwhere new tasks keep appearing after the robots are dispatchedfrom the depot. The dynamic MPDA problem is a complicatedoptimization problem due to several characteristics, such asthe collaboration of robots, the accumulative task demand, therelationships among robots and tasks, and the unpredictable taskarrivals. In this paper, a new model of the problem consideringthese characteristics is proposed. To solve the problem, wedevelop a new genetic programming hyper-heuristic (GPHH)method to evolve reactive coordination strategies (RCS) whichcan guide the robots to make decisions in real-time. The proposedGPHH method contains a newly designed effective RCS heuristictemplate to generate the execution plan for the robots accordingto a GP tree. A new terminal set of features related to bothrobots and tasks, and a cluster filter which assigns the robots tourgent tasks are designed. The experimental results show that theproposed GPHH significantly outperformed the state-of-the-artmethods. Through further analysis, useful insights such as howto distribute and coordinate robots to execute different types oftasks are discovered.

Index Terms—Genetic programming, hyper-heuristic, multi-robot system, multi-point dynamic aggregation, real-timedecision-making

I. INTRODUCTION

The Multi-Point Dynamic Aggregation (MPDA) problemis an important optimization problem that has many real-world applications [1]–[3]. The objective of MPDA is todesign a routing plan for a set of robots to complete somedistributed tasks in a collaborative way so that the makespan(the completion time of the last task) is minimised. A typical

This work was supported in part by the National Key Research andDevelopment Program of China under Grant 2018YFB1308000; in part bythe National Natural Science Foundation of China under Grant 61673058 andGrant 62088101; in part by the National Outstanding Youth Talents SupportProgram under Grant 61822304; in part by the Consulting Research Project ofthe Chinese Academy of Engineering under Grant 2019-XZ-7; in part by theProjects of Major International (Regional) Joint Research Program of NSFCunder Grant 61720106011; in part by the Beijing Advanced Innovation Centerfor Intelligent Robots and Systems; and in part by the Peng Cheng Laboratory.(Corresponding author: Bin Xin.)

Guanqiang Gao and Bin Xin are with the School of Automation, BeijingInstitute of Technology, Beijing 100081, China; Yi Mei, Ya-hui Jia, andWill N. Browne are with School of Engineering and Computer Science,Victoria University of Wellington, PO Box 600, Wellington 6140, NewZealand.

example is the robot fire-fighting planning problem. In thepast years, much attention has been given to the fire-fightingproblem (e.g. the bush fire 2019 in Australia and the bush fire2020 in California) [4], [5] since fire causes enormous damageto the economy and natural environments. Because of goodmobility and safety [6], multi-robot systems (e.g. unmannedaerial/ground vehicles) have been considered to execute fire-fighting missions [1], [7]–[10]. In addition to the fire-fightingapplication, MPDA also has other real-world applications,such as intelligent security [11], search and rescue [12], andenvironment monitoring [13].

So far, most studies for MPDA have been focused on staticenvironments, in which all the information regarding the robotsand tasks is known in advance [2], [3], [14]. However, in thereal world, the environment is dynamic and the informationabout the new tasks is unknown until the tasks are detected.For example, Fig. 1 shows a dynamic scenario of a multi-robotfire-fighting mission. Three robots are assigned to executetasks 1, 2 and 3. Tasks 4 and 5 are yet not known to the robots,as they have not been detected/arrived at yet. In this paper, wewill focus on solving the Dynamic MPDA (DMPDA) problemwith dynamic task arrivals.

Fig. 1. An example of the dynamic task planning of the multi-robot systemfor the fire-fighting mission.

The DMPDA problem is a very challenging problem.First, the traditional optimization methods like mathematicalalgorithms and meta-heuristic algorithms that generate de-terministic solutions cannot efficiently handle the dynamicenvironment. When a new task is detected, the pre-plannedsolution needs to be adjusted to complete the new task aswell as the remaining tasks. However, these methods cannoteffectively re-optimize the solution in real-time due to theirhigh computational cost and the NP-hard characteristic of theproblem [1], [15]. Second, the demand of a task increases overtime. A little delay on one task can lead to extremely large ac-

IEEE TRANSACTIONS ON CYBERNETICS 2

cumulated demands of the remaining tasks. This characteristicof the problem makes the search space very rugged. A slightchange of the solution could lead to a very different makespan.Third, allowing collaboration between robots (each task canbe visited by multiple robots) generates a much larger searchspace than that of the non-collaborative path-finding problemssuch as vehicle routing problems [16], [17].

Reactive coordination strategies (RCSs) have been shown asa promising heuristic for the multi-robot system [6], [18], [19].RCS is used in the execution process of the mission, to guidethe robots’ activities (e.g., decide the next task for a robotto execute when it completes the current task) based on theanalysis and synthesis of the environment and task information[20], [21]. Since RCS can decide the next task for the robot toexecute in a very short time, they are very suitable for solvingthe DMPDA problem. There have been a variety of manuallydesigned RCSs for MPDA [1], [22], [23], which have achievedpromising results.

Designing RCSs for the DMPDA problem needs to considervarious factors related to both robots and tasks. Not only arehigh expertise and domain knowledge required, but differentRCSs should be designed for different scenarios. Thus, man-ually designing RCSs is very difficult.

Genetic programming Hyper-heuristic (GPHH) methodshave been used to automatically generate rules for manydynamic problems [24]–[27], such as dynamic job shopscheduling and dynamic vehicle routing. With its ability toevolve sophisticated rules and flexible representations [28],[29], GPHH is a potential method for automatically generatingbetter RCSs than the manually designed RCSs. To the bestof our knowledge, no GPHH method has been designed forevolving RCS for the DMPDA problem.

Before investigating the DMPDA problem, there are someother difficulties. First, there is no proper model that cancharacterise the problem so far. Second, there is no existingbenchmark test set that can be used to evaluate the perfor-mance of an algorithm. Third, the existing execution processfor robots to complete the tasks is not effective enough todeal with the incremental demands of the tasks. Especially, alittle delay on one task leads to a much larger demand of theother tasks. This issue needs to be addressed in the executionprocess. Finally, an effective RCS needs to be designed andembedded in the execution process. To address these issuesand also solve the DMPDA problem effectively, we establisha DMPDA model, design a DMPDA benchmark set, andpropose a novel GPHH method. Specifically, the paper hasthe following contributions.

1) A DMPDA problem model is proposed and investigated:• To describe the DMPDA problem comprehensively, a

novel model is proposed in this paper. The proposedmodel considers the time-varying task demands, the col-laboration between tasks, and the dynamic environmentwhere the tasks cannot be known until they are detected.

• To test different algorithms, a new benchmark set isdesigned. The benchmark set contains a wide rangeof instances with different characteristics, such as thenumber of robots and tasks, robot abilities, and detectiontime of the dynamic tasks.

2) An effective GPHH algorithm is designed:• We design a new heuristic template of robots to effec-

tively handle the complex relationship among the robotsand tasks, and the dynamic task arrivals. The heuristictemplate can be considered as the simulation of theDMPDA problem.

• A new filter is designed for the candidate task pool toadaptively switch between distributing and grouping therobots during the RCS heuristic template processing. Thenew filter can prevent a task demand from growing toolarge to be completed in time.

• To evolve RCS, we design a terminal set includingfeatures of the robots and tasks for the GPHH approach.

• We analyse the structure of the rules and solutionsobtained by the proposed GPHH algorithm to gain in-sights from the useful patterns in the promising DMPDAsolutions.

The rest of this paper is organised as follows. Section IIpresents the mathematical model for the DMPDA problem andreviews the related work. Afterwards, the proposed GPHH isdescribed in Section III. Sections IV-VI present the experimen-tal studies, including empirical comparisons with manuallydesigned heuristics, the demonstration of the efficacy of theproposed strategies, and the insights of the evolved RCSs.Finally, conclusions are drawn in Section VII.

II. BACKGROUND

This section first briefly describes the problem definition ofthe DMPDA problem, then introduces the related work aboutMPDA and GPHH.

A. Problem Definition

An undirected graph G(V, E) is used to define the DMPDAproblem. In the undirected graph, V = v0, v1, . . . , vNrepresents the set of vertices and E is the set of edges. v0indicates the depot. Each vertex vi ∈ V is a task and isassociated with three characteristics: the detection/arrival timet0,i, the inherent increment rate γi, and the initial demand q0,iwhen it is detected. The accumulated demand qi(t) of vi attime t is shown as follows.

qi(t) = q0,i + γi × (t− t0,i). (1)

Each edge eij = (vi, vj) ∈ E is associated with a travel timetij from vi to vj .

There is a set of robots R = rob1, . . . , robM to executethese tasks. The robots are initially located at the depot v0.Each robot robk has an ability uk, indicating the amount ofdemand that the robot can reduce per time unit.

Fig. 2 gives an example to show how the robots reduce theaccumulated demand of a task over time. From Fig. 2, the taskis detected at time 2, its inherent increment rate γ is 1, andits initial demand is 2. Its accumulated demand increases withits inherent increment rate from time 2 to 4, and reaches 4 attime 4. rob1 with the ability of 0.5 arrives at the task at time4 and starts to execute it, i.e., reduce its demand. The demandincrement rate becomes 1-0.5=0.5. At time 7, the accumulateddemand reaches 5.5, and rob2 with the ability of 1.5 arrives

IEEE TRANSACTIONS ON CYBERNETICS 3

Fig. 2. An example of the accumulated demand of an executed task overtime.

at the task to execute it. The demand increment rate becomes0.5-1.5=-1, i.e., the demand starts to decrease by 1 per timeunit. Finally, the demand is decreased to 0 at time 12.5, andthe task is completed by these two robots.

The problem is to design the execution plan of robots tocomplete all the tasks subject to the following constraints.• A task is unseen by the planning system until it is

detected.• The number of incoming paths equals the number of

outgoing paths for each robot at each task.N∑j=0

xkij =

N∑j=0

xkji, ∀i = 1, . . . , N, ∀k = 1, . . . ,M (2)

xkij ∈ 0, 1, i 6= j, ∀i, j = 1, . . . , N, ∀k = 1, . . . ,M (3)

where the binary decision variable xkij takes 1 if there isa path for robk to go from vi to vj , and 0 otherwise.

• Each task is executed by each robot at most once.N∑i=0

xkij ≤ 1, ∀j = 1, . . . , N, ∀k = 1, · · · ,M (4)

• A task cannot be executed before it is detected.

stjk ≥ t0,j , ∀j = 1, . . . , N, ∀k = 1, · · · ,M (5)

where stjk represents the start time of robk at vj .• The demand of each task must decrease to zero to be

completed.

stjk =N∑i=0

(cti+tij)xkij ,∀j = 1, . . . , N, ∀k = 1, · · · ,M (6)

q0,j+γj(ctj−t0,j) =

N∑i=0

M∑k=1

xkijµk(ctj−stjk),∀j = 1, . . . , N

(7)

γj <

M∑k=1

N∑i=0

xkijµk, ∀j = 1, . . . , N (8)

where ctj represents the completion time of vj . (6)implies that a robot immediately goes to its next taskafter it completed its current task. (7) implies that theaccumulated demand at ctj (when the task is completed)equals the total demand fulfilled by the robots executingthe task till ctj . (8) implies that for every task, the totalability of the robots executing the task must be greaterthan the inherent increment rate of the task so that thetask can be completed.

TABLE ITHE NOTATIONS USED IN THE PROBLEM FORMULATION

Notation Type Descriptioni, j index the indexes of tasksk index the index of robotsN value the number of tasksM value the number of robotsqi(t) value the demand of vi accumulated at time tt0,i value the detection time viγi value the inherent increment rate of viq0,i value the initial demandtij value travel time from vi to vjµk value the ability of robkvi indicator the task with index irobk indicator the robot with index kxkij decision variable 1 if robk goes from vi to vj , and 0 otherwisestjk auxiliary variable the start time of robk at vjctj auxiliary variable the completion time of vj

• The start time and completion time of all the robots atthe depot are 0.

st0k = ct0 = 0, ∀k = 1, · · · ,M (9)

Given an instance ξ and its solution Xξ (all the xkij values),the makespan (completion time of the last task) is:

C(Xξ, ξ) = maxi∈1,··· ,N

cti (10)

where cti is calculated from (7)-(9) based on ξ and Xξ.The objective of the DMPDA problem is to minimize the

average performance on a set of instances, which is shown asfollows:

minE(C(Xξ, ξ)|ξ ∈ Ξ), (11)

where Ξ represents the set of instances.To better understand the DMPDA problem, an example of a

solution on an instance with two robots is shown in Fig. 3. Attime t0, there are 4 tasks known to the robots, and the robotsare planned with two execution paths: (Depot → A → C →D → Depot) and (Depot → B → C → D → Depot). In thisexecution plan, the two robots collaborate to complete tasksC and D. At time t1, two new tasks E and F are detected.Then, the two robots re-plan their paths to complete all thetasks: (Depot → A → E → G → F → Depot) and (Depot→ B → C → D → G → F → Depot). At the final time te,the tasks are all completed, and the two robots return to thedepot. It should be noticed that although the return of robotsis shown in the example, the returning time is not consideredin the model since in practice, compared with the time usedto execute the tasks, the returning time is not important.

B. Related work

1) Multi-Point Dynamic Aggregation: MPDA originatesfrom the multi-robot system, and it determines how to allocaterobots to tasks to optimize the performance of the multi-robotsystem [1], [11]–[13]. Due to the complex relationships amongrobots and tasks caused by the time-varying demand and coor-dinated execution, MPDA is a very challenging and interestingproblem. Over the past few years, researchers proposed severalmethods to address the problem [2], [22], [30]. These methods

IEEE TRANSACTIONS ON CYBERNETICS 4

Fig. 3. An example of a solution of a DMPDA problem with two robots andsix tasks.

can be categorised into two kinds: manually designed reactivecoordination strategy and meta-heuristic.

Manually designed reactive coordination strategy: Tenget al. [30] proposed an efficient market-based coordinationstrategy to assign multiple robots to extinguish a bushfirewith multiple fire fronts. Xin et al. [1] proposed a distributedcoordination strategy to overcome the path conflict duringthe MPDA mission. By using the designed coordinationstrategy [1], multiple robots coordinate to execute a tasksimultaneously, and they can avoid the obstacles and conflicts.A recruitment coordination strategy to address the MPDAproblem was proposed in [22]. In the designed strategy, ifthe increment rate of a task is larger than the ability of therobot (meaning the robot cannot complete the task alone),more robots will be recruited to complete the task together.Chen et al. [23] proposed an algorithm that can provide atheoretically optimal task plan for small static MPDA instanceswhen the travel cost is ignored, and proposed a task assignmentapproach to balance between the completion time of all thetasks.

The above manually designed coordination strategies cangenerate a plan quickly. However, they do not consider thedynamic task arrivals during the execution process, and maybecome ineffective in the DMPDA problem. Furthermore, thequality of the generated plans by the above manually RCSsare not satisfactory enough.

Meta-heuristic: Poggenpohl et al. [31] designed the taskplanning approach hybridised with a greedy method anda simulated annealing algorithm. The experimental resultsshowed that the hybrid approach can lead to a better makespanand a better balance among the workloads of the robots.Another hybrid method combining differential evolution andestimation of distribution algorithm was proposed to addressthe MPDA planning problem by [32]. Experiments showedthat this method [32] can obtain a good solution with stableconvergence speed. A multi-model estimation of distributionalgorithm (EDA) was designed in [15]. Compared with agenetic algorithm and a random search method, EDA [15]achieved better performance. Recently, we also proposed anadaptive coordination ant colony optimisation with a indi-vidual learning strategies to address the MPDA problem [3].Experiments showed that the proposed ant colony optimisationalgorithm significantly outperformed the previous algorithmsin terms of both effectiveness and efficiency.

The above meta-heuristic approaches for the MPDA prob-lem can obtain good performance by the evolution process.However, similar to the limitations of the manually designedcoordination strategy, the aforementioned meta-heuristic meth-ods did not consider the dynamic arrivals. It is difficult to applythem to the DMPDA problem to adjust the solution online dueto their high computational cost.

2) Genetic programming hyper-heuristic: The field of au-tomatic heuristic design using hyper-heuristics has attractedmuch attention recently [33]–[35]. The hyper-heuristic ap-proaches search in the heuristic space rather than the solutionspace [36], [37]. The obtained heuristics can then be appliedto unseen problem instances or environments.

GP is a type of evolutionary algorithm popularised by Kozain the 1990s [38]. With a flexible representation, GP is ableto generate a good performance heuristic by some blocks ofdomain knowledge given by the experts. In recent years, GPhas been the dominating approach in the scope of generativehyper-heuristics [27], [39]–[41], which is known as GPHH.Every individual in the GP population represents a candidateheuristic to the problem, which is usually defined as a priorityfunction with the symbolic representation. The evolutionaryprocess to generate these heuristics also follows three steps:evaluation, selection, and genetic operations (including re-production, crossover, and mutation) like genetic algorithms.However, since the symbolic representation is different fromthe regular real value representations, the crossover and muta-tion operations are elaborately designed which can be found in[42]. Whenever a decision is made, the priorities of candidatesare calculated by the priority function, and the candidatewith the highest priority is chosen. For example, in dynamicjob shop scheduling [43]–[45], the evolved heuristic is usedto calculate the priorities of jobs to machines. In uncertainrouting problems [24], [46], [47], it is used to calculate thepriorities of tasks to vehicles. The other works have alsodemonstrated the effectiveness of GPHH [48], [49].

When designing GPHH to evolve RCSs for the DMPDAproblem, there are two main components: 1) an effectiveheuristic template that can generate a feasible solution (robotexecution plan) based on the given DMPDA instance andRCS, and 2) an efficient GP algorithm (e.g. terminal andfunction sets, crossover mechanism, and selection mechanism)to evolve the RCS. In this paper, we develop a new GPHHalgorithm containing these two components.

III. GENETIC PROGRAMMING HYPER-HEURISTIC FOR THEDMDPA PROBLEM

This section describes the newly proposed GPHH algorithmwhich evolves RCSs for DMDPA. In this section, the repre-sentation of a RCS is presented first, followed by the designedheuristic template that applies a RCS to generate an executionplan for a given DMDPA instance. Then, the evolutionaryprocess of the proposed GPHH method is described, includingthe framework, fitness evaluation, and GP operators.

A. RepresentationIn the proposed GPHH method, each RCS (individual) is

represented as a tree-based priority function. Whenever a task

IEEE TRANSACTIONS ON CYBERNETICS 5

Fig. 4. An example of a reactive coordination strategy.

is completed, the robots that executed the completed taskbecome active. The priority function is then used to calculatethe priority of each candidate task for each active robot. Figure4 shows a manually designed RCS as an example, where TCrepresents the travel cost from the current position of the activerobot to the candidate task, TDT represents the detection timeof the candidate task, and CD represents the current demand ofthe candidate task. In this RCS, a task with a larger detectiontime, a smaller travel cost, and a smaller current demand tendsbe executed earlier.

B. Designed Heuristic Template

Given a DMPDA instance and a RCS, the heuristic templategenerates the corresponding execution plan by applying theRCS to the instance. The execution plan is represented as avariable-length sequence of events, i.e.

Plan = [event1, event2, . . . , eventp] (12)

Every event contains four elements, i.e.

evente = (robe, ve, te, Ce). (13)

where robe is the robot involved in the event, ve is the taskof the event, te is the occurrence time of the event, and Cestands for the type of the event. Specifically, Ce = 1 meansthat robe arrives at task ve at time te, while Ce = 0 means thatrobe completes and departs from ve at time te. The heuristictemplate proposed in this paper is shown in Algorithms 1 and2. It contains three stages: initialization stage (lines 1-2 inAlgorithm 1), decision making stage (lines 4-16 in Algorithm1), and execution stage (Algorithm 2).

1) Initialization stage: All the robots are at the depot andactive, and their predicted events are set as (robk, 0, 0, 1). Thepredicted event of a robot implies the task that the robot isgoing to execute. All tasks are not yet completed, and theexecution plan is an empty set. In addition, the simulationtime t is set as 0, and the archive A used to distinguish theurgent tasks is empty.

2) Decision making stage: Here we consider the executionperiod of a task for making better decisions. The (estimation)execution period ep of a task equals to the (estimation)completion time of the task minus the start time of first robotexecuting the task.

epi = cti − mink∈1,...,M

stik,∀k = 1, . . . , N, (14)

At the beginning of the decision making stage, a set ofcandidate tasks Ω are selected from all the uncompleted tasks

Algorithm 1 Designed Heuristic TemplateINPUT: A DMPDA instance ξ, a RCS h(·), parameters ω, φ.OUTPUT: The generated execution plan Plan.1: All the robots are at the depot, predicted events of all robots are

(robk, 0, 0, 1), set the active robots AR←R.2: Set simulation time t ← 0, set the execution plan Plan empty, and set

archive A as an empty queue.3: while not all the tasks are completed do4: Detect new tasks, and get all uncompleted tasks T (t) at time t.

//Decision making stage5: Sort AR in the increasing order of the robot ability.6: for robk ∈ AR do7: Ω ← Filter(robk , T (t), A, ω, φ)8: for taski ∈ Ω do9: Get the status features of robk and taski at time t.

10: Calculate the priority value pi of taski according to h(·).11: end for12: Assign v∗k = arg max

|Ω|i=1 pi to robk .

13: end for14: for robk in AR do15: eventk = (robk, v

∗k, t+ tposk,v∗k

, 1).16: end for

//Execution stage17: Plan,AR,A = Execution(Plan,R, A, φ)18: end while19: return Plan.

Algorithm 2 Execution(Plan,R, A, φ)INPUT: The execution plan Plan, robot set R, archive A, parameter φ.OUTPUT: The execution plan Plan, active robots AR, archive A.1: while True do2: Fine the next event event∗e = arg minrobk∈R te.3: (robk, ve, te, Ce) = event∗e , and t = te.4: if t ==∞ then5: return Error.6: end if7: if Ce == 1 then8: Add event∗e to Plan.9: if ve has been completed then

10: Set the predicted event of robk as [robk, ve, te, 0].11: else12: Obtain all the robots R(t) that are executing ve at time t.13: if

∑robk∈R(t) µk ≤ γe then

14: Set the predicted event of robk as [robk, ve,∞, 0].15: else16: Calculate cte according to Eq. (7).17: for all robk ∈ R(t) do18: eventk = (robk, ve, cte, 0).19: end for20: end if21: end if22: else23: Obtain all the robots R(t) that are executing ve at time t.24: Add all the predicted events of R(t) to Plan.25: Let the set of active robots AR = R(st).26: Calculate execution period epe of ve according to Eq. (14).27: if |A| == φ then28: A pop the first element, and push epe into A.29: else30: Push epe into A31: end if32: return Plan, AR, A.33: end if34: end while

by a filter scheme, which is shown in Algorithm 3. There aretwo situations considered in the filter scheme:

• Urgent tasks are assigned to active robots mandatorily(lines 2-11 in Algorithm 3): Due to the accumulateddemands of tasks in the DMPDA problem, if some taskscannot be completed in time, it will lead to a long

IEEE TRANSACTIONS ON CYBERNETICS 6

Algorithm 3 The filter algorithmINPUT: robk , uncompleted tasks T (t), archive A for recording executionperiods, and parameters ω, φ.OUTPUT: A set of candidate tasks Ω.1: Ω ← ∅.2: if |A| == φ then3: Calculate maximal execution period ep∗ from A.4: Get uncompleted tasks T (t)

e that have been executed from T (t).5: for vi ∈ T

(t)e do

6: Calculate the estimation execution period epi of vi.7: if epi > ω × ep∗ then8: return vi9: end if

10: end for11: end if12: for vj ∈ T (st) do13: if ct(vj) > ct(robk) + tij then14: Ω ← Ω ∪ vj .15: end if16: end for17: return Ω.

makespan. Thus, it is important to design a mecha-nism that allocates more robots to the urgent tasks. Wemaintain an archive A of historical information aboutprevious robot executions to identify urgent tasks. Thearchive is initially empty, and is updated by adding eachrobot execution period ep during its execution process.The identification of the urgent task is started after thesize of the archive reaches φ, when we consider enoughinformation has been gathered for the identification.Specifically, for each uncompleted task in T (t)

e , if itsestimation execution period is larger than ω times themaximal execution period ep∗ in the archive, this task isconsidered as an urgent task, seen Section IV for sensitiveanalysis of parameters. Once an urgent task is detected,the filter scheme returns only the detected urgent taskto force all the robots to execute the urgent task. Theestimation execution period is calculated according toEqs. (7)-(9) and (14).

• If there is no urgent task, the filter scheme removes vjfrom the candidate tasks of robk if vj will be completedby the current robots before robk arrives at vj (lines 12-16 in Algorithm 3).

Then, during the decision making process, every active robotcalculates the priorities of the candidate tasks in the set Ωaccording to the give a RCS h(·), and selects the task withthe highest priority to execute next.

It should be noticed that, in some cases, multiple robots willbe active at the same time. The generated execution plan canbe affected by the order of the decisions made for the activerobots. In our proposed algorithm, we make the decision forthe robot with the smallest ability first.

3) Execution stage: In this stage, the robots will performthe executions according to the decisions made in the previousdecision making stage. The states of the robots including theircurrent positions and predicted events, and the states of thetasks including their current demands, current changing rates,and predicted completion time will be updated (shown inAlgorithm 2).

In the execution stage, the predicted events of all the active

robots are updated first. Then, the designed RCS heuristictemplate runs until one task has been completed. The earliestpredicted event event∗e of all robots will be selected as thenext event, and the simulation time t goes to the time of thenext event. Then, two situations are distinguished.

The first situation is that the earliest event event∗e is anarrival event (i.e. Ce = 1), which represents rob∗k beginning toexecute the task ve. In this situation, the predicted event type ofthe corresponding robot robk is updated as the departure event(Ce = 0) representing that the robot is going to complete anddepart from the corresponding task. To estimate the occurrencetime of the predicted event, there are three rules to follow.

1) rob∗k departs from its current executing task ve immedi-ately when ve has been completed by other robots.

2) If the sum of abilities of robots executing the ve is notgreater than the inherent increment rate of ve, represent-ing that ve cannot be completed, the occurrence time ofthe updated predicted event is ∞.

3) Otherwise, if the sum of abilities of robots executing theve is greater than the inherent increment rate of ve, vecan be completed in a finite time according to the currentexecution plan. The time of the updated predicted eventof all the robots executing ve is updated according to (7).

The second situation is that the earliest event event∗e isa departure event (i.e. Ce = 1), which represents that rob∗khas completed its current task ve and is preparing to visitits next task. In some cases of this situation, multiple robotswill simultaneously complete one task. These robots will beselected as a set of active robots together to participate in thenext round’s decision making stage. Meanwhile, the executionperiod epe of ve is calculated and pushed into the archive Afor recording the execution periods of completed tasks.

To better understand the heuristic template, an example isshown in Fig. 5. First, at time t = 0, all the robots are active,and they calculate priorities of candidate tasks accordingto the given RCS. The highest priority task of each activerobot is added to the execution sequence Π. The executionsequence Π of all robots is [rob1, v1], [rob2, v2], [rob3, v2],and the predicted events of the robots are (rob1, v1, 3, 1),(rob2, v2, 2, 1), and (rob3, v2, 2, 1). With the current executionsequence Π, rob2 and rob3 will execute v2 collaboratively. Theearliest predicted event (rob2, v2, 2, 1) of rob2 is selected asthe next event. The designed RCS heuristic template advancesto the time t = 2, and status of v2 and rob2 are updatedaccording to the departure output part in Algorithms 1 and 2.Then, (rob3, v2, 2, 1) and (rob1, v1, 3, 1) are selected as thenext event in order, and the status of corresponding tasks androbots are updated. When the earliest event is the departureevent of rob2, the designed RCS heuristic template advancesto the time t = 6. At the time t = 6, new tasks v5and v6 are detected, and rob2 and rob3 are selected as theactive robots to make decisions. The whole heuristic templaterepeats the aforementioned process until all the tasks havebeen completed.

C. Evolutionary Process1) Framework: The overall framework of GPHH for the

DMPDA problem is shown in Fig. 6. It has two processes, a

IEEE TRANSACTIONS ON CYBERNETICS 7

Fig. 5. A typical reactive coordination strategy heuristic template.

Start

Randomly initializea population

Populationevaluation based on

training instances

Selection

EvolutionReproduction

MutationCrossover

No

YesStop criteria are met?

Generate sampledtraining instances

Output the bestperformance

individual

Individual evaluationbased on tested

instances

Training

Testing

End

Evaluation

Fitness

Evaluation Fitness

State Initialisation

Decision Making Stage

Execution Stage

Execution processsimulator

Fig. 6. Flowchart of the proposed GPHH.

training process and a testing process. The GPHH algorithmgenerates/trains the RCS using a set of training instances, andthen tests the best RCS on a set of unseen test instances.First, a population containing a set of individuals (each rep-resenting a RCS [50], [51]) is randomly initialized by theramp-half-and-half method [38]. In each generation of thetraining process, the training instances are sampled and theindividuals are evaluated according to the training instancesand the simulator. After evaluating all the individuals, a newpopulation is generated by the selection, crossover, mutation,and reproduction operators. For generating new individuals,one or two individuals are first selected to be the parentsby the tournament selection. Then, the tree-based crossover,mutation or reproduction operator is applied to the parentsto generate the offspring. The evolutionary process continuesuntil the stopping criteria are met, and the best individual ofthe final population is returned as the final RCS h(·)∗.

2) Mini-batch training process: A mini-batch training pro-cess is adopted in this paper to improve the generalization[24], [51]. The training instances Ξtrain are split into 20 mini-batches Ξgentrain. After the splitting process, the mini-batchesinstances Ξgentrain are used in the GPHH algorithm in differentgenerations (e.g. mini-batch 1 in generation 1, 21, 41.).

Algorithm 4 Fitness Evaluation Evaluate(Ξ∗, h(·))INPUT: A set of instances Ξ∗ and a RCS h(·).OUTPUT: The fitness value.1: OV← ∅.2: for ξ ∈ Ξ∗ do3: Planξ ← heuristic template(ξ, h(·))4: Get makespanξ of Planξ , and OV← OV ∪makespanξ .5: end for6: return 1

|OV|∑ov∈OV OV

3) Fitness Evaluation: The pseudo-code of the fitness eval-uation of RCS h(·) on the subset of training instance Ξgentrain

is described in the Algorithm 4. The evaluated RCS will beapplied to each instance of Ξgentrain in Algorithm 1, and theobjective value are recorded in a set of objective values OV .Then the average value of the makespans in OV is calculatedas the fitness of h(·).

4) Genetic Operators: The proposed GPHH uses the sub-tree crossover [38], in which the subtrees of two parents arerandomly sampled and swapped to generate new children forthe next generation. The traditional subtree-based mutationoperator [38] is also employed in the proposed GPHH. Inthe mutation operator, a randomly sampled subtree of themutated individual is replaced by a new randomly generatedsubtree. The reproduction operator selects an individual fromthe population by tournament selection, and directly copies itinto the next generation [38].

5) Testing RCS performance: After a final RCS h(·)∗ isobtained, its performance will be evaluated on a set of unseentest instances Ξtest. The testing fitness of a RCS h(·) is definedas fTE(h(·)) = Evaluate(Ξtest, h(·)) (seen in Algorithm 4).

IV. DESIGN OF EXPERIMENTS

A. Data Sets

Previous studies on the MPDA only focus on the staticscenario in which all the tasks are known in advance. There isno existing benchmark set that contains dynamic scenarios inthe DMPDA problem. To facilitate the study of the DMPDAproblem, we design a new DMPDA benchmark set with awide range of instances. The new benchmark set containsinstances that are different in the number of robots andtasks, the abilities of robots, the inherent increment ratesof tasks, and the detection time of tasks. The location ofthe depot is set as the centre of the workspace. The tasklocations are generated randomly in a 100*100 2D plane.The number of robots is chosen from 10, 15, 20. Thenumber of tasks is chosen from 200, 300, 500. Followingthe conventional settings in [2], [3], the abilities of robotsare sampled from the two normal distributions. The firstdistribution N(0.035, 0.035/2) has a large variance and thesecond distribution N(0.035, 0.035/10) has a small variance.Similarly for the mean inherent increment rates γi of tasks,they are generated uniformly within an interval, which ischosen from [0.01, 0.02], [0.02, 0.05], [0.05, 0.08]. Then, theinherent increment rates of the tasks are sampled according tothe normal distribution (i.e. γi ∼ N(γi, γi/10)) following theconventional settings in [25], [51]. For all the tasks, their initial

IEEE TRANSACTIONS ON CYBERNETICS 8

demands are set to a relatively large value 2.30 to make theproblem complicated enough to evaluate the approaches.

The detection time is also a key characteristic for theDMPDA problem. If the inter-detection time (the gap betweenthe detection times of adjacent tasks) is very small, the abilitiesof robots are not enough to effectively handle all the tasks. Ifthe inter-detection time is too large, the robots will alwaysbe idle so that the scenario is not challenging. To design anappropriate inter-detection time, we consider the travel cost,abilities of robots, the number of robots, and the incrementrate of tasks. In the new benchmark set, there are ten initialtasks with detection time of 0. The inter-detection time of theother tasks follows the exponential distribution, and its rateparameter λ is calculated as

λ = (q0 + e× γ∑Mk=1 µk − γ

+ e)× ρ (15)

where e and γ represent the average value of travel cost amongthe tasks and inherent increment rates of tasks respectively, andρ means that the number of 1/ρ tasks will be detected duringthe average period of all robots completing a new detectedtask together.

Considering these factors, 45 representative scenarios aredesigned in the benchmark set. The designed scenariosare named based on their number of robots, number oftasks, distribution of the abilities of robots, and distributionof the inherent incremental rates of tasks. For example,r10t200 L S represents that there are 10 robots whose abil-ities are sampled from the distribution with a large variance(N(0.035, 0.035/2)) and 300 tasks whose inherent incremen-tal rates are generated from a small uniform distribution([0.01, 0.02]). The details of these scenarios are shown in TableSI of the supplemental document. Then, for each DMPDAscenario, 150 sampled instances are generated by sampling therandom variables independently using different random seeds.After that, the 150 sampled instances are divided into 100training instances and 50 test instances. For each DMPDAscenario, RCSs are trained on the 100 training instances, andthen the final obtained RCS is tested on the 50 test instances.The 100 training instances are further split into 20 mini-batches, each containing 5 instances.

B. Parameter Settings

The GPHH algorithm for learning RCSs is implementedbased on a C++ GP research system named GPC++ [52].Specifically, the terminal set is given in Table II. The parametersettings of the GPHH algorithm used in the rest of thispaper follow the conventional settings [24], [28], [46]. Thepopulation size is set to 1000. 50 generations are given asthe training stopping criterion.The crossover rate is 0.8, themutation rate is 0.15, and the reproduction rate is 0.05. Themaximal depth of a GP tree is 8. The size of the tournamentselection is set to 7. For each scenario, the GPHH algorithmwas run 20 times independently and their results comparedusing the Wilcoxon rank-sum test with a significance levelof 0.05. All experiments are conducted on computers withCore(TM) i7-6700 3.40GHz CPU.

TABLE IITERMINAL AND FUNCTION SETS FOR THE GPHH ALGORITHM.

Symbol Description

TCtravel cost from the current position

to the candidate taskCD current demand of the candidate task

DAM demand of the candidate taskat the arrival moment

CR current changing rate of the candidate taskFRT fraction of the remaining(uncompleted) tasksFUT fraction of the unassigned tasksAB ability of the robotICR inherent increment rate of the tasksNRT number of the robots assigned to the candidate taskTAB total ability of robots assigned to the candidate taskITAB predicted changing rate of the candidate taskTDT detection time of the candidate taskLCT last start time of robots executing the candidate taskFEP first start time of robots executing the candidate task

RCMPthe reduction of the completion time of

the candidate task with(out) a robot executing itFunction Set +, -, ×, %, max

The “%” operator is protected, and returns 1 if divided by zero.

TABLE IIIMEAN AND STANDARD DEVIATION OF THE 20 RUNS OF THE GPHH WITH

DIFFERENT VALUES OF THE PARAMETER φ AND ω.

(φ, ω) r10t300 L S r15t500 L M r20t300 S L(5, 10) 1.06E+4(9.0E+2) * 7.93E+4(3.0E+4)(5, 20) 1.07E+4(1.0E+3) 4.36E+4(1.2E+4) 7.58E+4(2.3E+4)

(5, 100) 1.06E+4(9.0E+2) 4.21E+4(9.3E+3) 3.11E+7(1.4E+8)(5, 1000) 1.05E+4(6.7E+2) 4.22E+4(1.6E+4) 8.12E+4(3.4E+4)(10, 10) 1.05E+4(6.4E+2) 1.23E+5(3.3E+5) 6.27E+4(1.0E+4)(10, 20) 1.07E+4(6.9E+2) 4.71E+4(1.4E+4) 6.89E+4(1.3E+4)(10, 100) 1.10E+4(1.1E+3) 4.90E+4(1.8E+4) 6.85E+4(1.7E+4)

(10, 1000) 1.09E+4(1.2E+3) 4.96E+4(2.0E+4) 2.22E+12(9.7E+12)(20, 10) 1.08E+4(1.4E+3) 4.34E+4(1.1E+4) 1.61E+12(7.0E+12)(20, 20) 1.05E+4(7.2E+2) 4.38E+4(1.3E+4) 6.54E+4(1.0E+4)(20, 100) 1.07E+4(7.4E+2) 4.22E+4(1.3E+4) 7.55E+4(2.4E+4)

(20, 1000) 1.08E+4(1.2E+3) 4.66E+4(1.7E+4) 8.13E+4(3.3E+4)(100, 10) 1.03E+4(6.2E+2) 4.73E+4(1.7E+4) 7.64E+4(2.2E+4)(100, 20) 1.11E+4(1.3E+3) 4.38E+4(1.5E+4) 8.23E+4(2.7E+4)

(100, 100) 1.04E+4(6.7E+2) 4.24E+4(9.8E+3) *(100, 1000) 1.03E+4(5.4E+2) 4.03E+4(9.0E+3) *

* represents that the obtained RCS by the GPHH method cannot generatefeasible solutions in all of the tests for all 20 runs.

The size φ of the archive for recording the historicalexecution periods and the urgent threshold ω in the heuristictemplate are two new parameters for the proposed GPHHalgorithms. Theoretically, if they are set to very small val-ues, the algorithm will focus on gathering robots togetherto execute each task one by one. If their values are large,the algorithm has a poor ability to deal with urgent tasks.Hence, to investigate the influences of these parameters onthe algorithm performance, the following experiment about theparameter sensitivity analysis is carried out. Here, three sce-narios with different scales and characteristics, r10t300 L S,r15t500 L M, and r20t300 S L are used to find how theseparameters affect the performance. Four candidate values5, 10, 20, 100 are set for φ, and four candidate values10, 20, 100, 1000 are set for ω. The other settings are keptunchanged. Each parameter setting on each instance is run20 times independently. The mean and standard deviation ofthe objective values with the 20 independent runs are shown

IEEE TRANSACTIONS ON CYBERNETICS 9

in Table III. The calculation method of the objective value isbased on Eq. (11).

From the table, it can be found that:• The performance of GPHH tends to be worse with the

increase of φ. In the designed urgent task handlingmechanism, the opportunity of triggering the handlingmechanism decreases with the increase of φ. The reasonis that a large size of the archive φ leads to a late timefor triggering the handling mechanism and creates a largethreshold, which is used to distinguish the urgent tasksand normal tasks. Thus, a large φ is not suitable for theproposed GPHH.

• The performance of GPHH improves with the increaseof ω from 10 to 20. However, when ω increases to 100and 1000, the performance becomes worse. When ω isa very small value, the threshold used for distinguishingthe urgent and normal tasks is so small that majority oftasks are classified into urgent tasks. Robots are forcedto gather together to execute the majority of tasks. Whenω is very large, the threshold is so large that urgent tasksare difficult to identify. The accumulated demands of theurgent tasks lead to the poor performance of the wholemulti-robot system.

Besides, when φ and ω are too small or too large, someRCSs evolved by the GPHH method cannot generate feasiblesolutions for all tested instances in the 20 runs. Hence, it wasfound that when the increment rates of tasks are small, theperformance of GPHH is not sensitive to the parameter of φand ω. In summary, φ = 20 and ω = 20, which maintain agood performance for different scenarios, are adopted.

C. Comparison Design

Due to the dynamic process of discovering tasks in theDMPDA problem, existing meta-heuristic methods cannot beapplied directly. Thus, we take the following seven manuallydesigned RCSs as benchmark comparisons in this paper.First, four RCSs are manually designed to cater to the maincharacteristics of the DMPDA problem including travel cost,inherent increment rate, demand, and robot abilities.• NNT: select the task with the nearest distance.• MaxR: select the task with the maximum inherent incre-

ment rate.• MinD: select the task with the minimum current demand.• aveAbi: select the task with the maximal ICR-TAB.

Then, three state-of-the-art manually designed RCSs for thestatic MPDA problem are utilized: recruitment [22], DynamicDeployment (DD) [23], and auction [23] are transferred to theDMPDA problem for benchmark comparisons.

V. RESULTS AND DISCUSSIONS

A. Test Performance of the evolved RCS

The computational results of the seven methods are shownin Table IV. For every compared method, (+)/(≈)/(-) representsthat it is significantly better than, statistically comparablewith, and significantly worse than the proposed GPHH methodbased on the Wilcoxon rank-sum test with a significance level

of 0.05 and Bonferroni correction. Meanwhile, the shadow gridin GPHH indicates that GPHH significantly outperformed allthe other compared algorithms. An additional row at the bot-tom of Table IV represents the number of instances on whichthe corresponding compared algorithm achieves significantlybetter, statistically comparable, and significantly worse resultsthan the proposed GPHH method.

First, from Table IV, we can conclude that the proposedGPHH method apparently outperformed all the comparedalgorithms in all the test scenarios. There are various factorsrelated to both robots and tasks in the complex DMPDAproblem that affect the performance of the RCS. Hence,it is challenging to manually design effective RCS for agiven complex DMPDA problem scenario. In contrast, sinceGPHH has a good ability to evolve sophisticated and flexiblerules, the proposed GPHH algorithm can effectively learn theuseful knowledge of the DMPDA problem that is hard to beidentified by human from the historical data. Thus, GPHHcan automatically design RCSs that are suitable for differentscenarios with different characteristics.

Second, the statistical results are analysed in terms of thedifferent scenarios. When the number of tasks N and robotsM are the same, the performance of GPHH in the scenariosin which the abilities of the robots are sampled from the smallvariance distribution always obtained smaller objective valuesthan the situations with large variance distribution. The perfor-mance gap between GPHH and the other manually designedalgorithms in the scenarios with small inherent increment ratesis smaller than the performance gap on the scenarios withlarge inherent increment rates. The reason is that the taskswith small increase rates usually will not reach very large anduncontrolled demands. However, the tasks with large inherentincrement rates easily reach very large demands if they arenot executed in time. Several tasks with very large demandslead to a very large makespan. Thus, the impact of executionplans in scenarios with small increase rates is smaller than theimpact of execution plans in scenarios with medium and largeincrease rates.

Thirdly, the statistical results are analysed in terms of thedifferent algorithms. For the auction and DD algorithms [23],they obtain good performances on the scenarios with smallincrement rates (e.g. r10t200 L S). However, they cannotgenerate feasible solutions in the tested process for all 20 runson scenarios with large increment rates (e.g. r10t200 S L).The reason is that the auction and DD algorithms try to ensureeach task can be completed sequentially. However, in somescenarios, the total task increment rate is much greater thantotal robot ability so that a task cannot be completed at onedecision making stage. In contrast, the recruitment algorithm[22] ensures that at least one task can be completed at eachdecision making stage. Thus, the recruitment algorithm canalways generate a feasible solution. From the table, it can alsobe found that the manually designed RCSs that gather robotstogether to execute tasks are suitable for the scenarios withlarge increment rates. On the contrary, the manually designedRCSs that scatter robots are suitable for the scenarios withsmall increment rates.

Overall, it can be concluded that GPHH is an effective

IEEE TRANSACTIONS ON CYBERNETICS 10

TABLE IVCOMPUTATIONAL RESULTS OBTAINED BY DIFFERENT ALGORITHMS FOR THE BENCHMARK SCENARIOS.

No Recruitment Auction DD MaxR NNT aveAbi MinD GPHHr10t200 L S 7.61E+5(-) 1.19E+5(-) 7.07E+3(-) 6.74E+5(-) 2.80E+5(-) 3.49E+4(-) 1.19E+6(-) 5.72E+3(3.6E+2)r10t200 S S 6.76E+5(-) 6.65E+4(-) 5.57E+3(-) 6.05E+5(-) 2.41E+5(-) 1.72E+4(-) 1.06E+6(-) 5.17E+3(2.1E+2)r10t200 L M 5.66E+8(-) 9.62E+28(-) 5.62E+15(-) 5.47E+8(-) 1.97E+10(-) 3.98E+30(-) 6.41E+7(-) 7.53E+4(1.6E+4)r10t200 S M 3.36E+7(-) * * 4.19E+8(-) 1.13E+10(-) * 5.34E+7(-) 6.35E+4(2.6E+4)r10t200 L L 2.32E+27(-) * * 5.23E+15(-) 6.83E+18(-) * 1.60E+10(-) 8.00E+7(2.4E+7)r10t200 S L * * * 9.35E+14(-) 3.76E+18(-) * 1.16E+10(-) 5.97E+7(1.7E+7)r10t300 L S 1.59E+7(-) 5.60E+7(-) 6.93E+7(-) 3.69E+7(-) 1.57E+7(-) 2.12E+6(-) 2.64E+7(-) 1.05E+4(7.2E+2)r10t300 S S 1.31E+7(-) 3.29E+7(-) 6.13E+7(-) 3.17E+7(-) 1.29E+7(-) 7.39E+5(-) 2.36E+7(-) 9.11E+3(8.1E+2)r10t300 L M 5.08E+11(-) 5.37E+44(-) 9.40E+21(-) 1.42E+12(-) 6.85E+14(-) 4.81E+48(-) 5.54E+9(-) 3.26E+5(1.0E+5)r10t300 S M 2.66E+9(-) * * 5.65E+11(-) 3.10E+14(-) * 3.84E+9(-) 2.60E+5(9.8E+4)r10t300 L L 2.40E+66(-) * * 9.05E+24(-) 2.16E+28(-) * 1.60E+14(-) 5.55E+10(1.8E+10)r10t300 S L * * * 3.88E+24(-) 4.12E+27(-) * 2.36E+13(-) 3.19E+10(7.9E+10)r10t500 L S 3.03E+9(-) 1.27E+12(-) 2.14E+11(-) 7.81E+9(-) 5.18E+10(-) 2.86E+8(-) 6.15E+9(-) 2.00E+4(1.9E+3)r10t500 S S 2.32E+9(-) 6.19E+11(-) 1.53E+10(-) 5.78E+9(-) 3.41E+10(-) 3.89E+7(-) 4.91E+9(-) 1.72E+4(2.5E+3)r10t500 L M 4.37E+17(-) 3.73E+79(-) 6.12E+36(-) 7.24E+18(-) 5.40E+23(-) 1.40E+81(-) 4.61E+13(-) 6.80E+6(5.7E+6)r10t500 S M 1.80E+13(-) * * 9.63E+17(-) 1.63E+23(-) * 4.17E+13(-) 3.04E+6(1.7E+6)r10t500 L L * * * 2.41E+40(-) 6.08E+45(-) * 1.52E+20(-) 1.31E+15(9.1E+14)r10t500 S L * * * 6.12E+38(-) 6.10E+44(-) * 5.12E+19(-) 3.90E+14(2.3E+14)r15t200 L S 2.52E+5(-) 5.22E+3(-) 3.77E+3(-) 2.05E+5(-) 3.61E+4(-) 7.34E+3(-) 3.51E+5(-) 3.10E+3(6.3E+1)r15t200 S S 2.37E+5(-) 4.46E+3(-) 3.25E+3(-) 1.97E+5(-) 3.48E+4(-) 5.68E+3(-) 3.35E+5(-) 3.00E+3(3.7E+1)r15t200 L M 1.21E+7(-) 5.88E+12(-) 2.32E+9(-) 1.13E+7(-) 1.57E+7(-) 5.65E+14(-) 7.39E+6(-) 8.13E+3(1.0E+3)r15t200 S M 4.12E+6(-) * 1.40E+39(-) 1.05E+7(-) 1.45E+7(-) * 6.66E+6(-) 7.73E+3(6.4E+2)r15t200 L L 1.10E+15(-) * * 1.48E+10(-) 1.79E+12(-) * 3.44E+8(-) 2.90E+5(2.2E+5)r15t200 S L * * * 7.89E+9(-) 1.55E+12(-) * 3.04E+8(-) 2.31E+5(9.4E+4)r15t300 L S 2.75E+6(-) 9.37E+4(-) 2.29E+5(-) 3.08E+6(-) 4.27E+5(-) 5.02E+4(-) 3.98E+6(-) 4.96E+3(3.0E+2)r15t300 S S 2.47E+6(-) 9.72E+4(-) 5.14E+5(-) 2.84E+6(-) 3.97E+5(-) 3.52E+4(-) 3.77E+6(-) 4.60E+3(1.1E+2)r15t300 L M 1.24E+9(-) 1.33E+23(-) 2.70E+15(-) 1.66E+9(-) 8.97E+9(-) 9.91E+24(-) 2.92E+8(-) 1.66E+4(4.0E+3)r15t300 S M 1.50E+8(-) * 5.94E+19(-) 1.04E+9(-) 6.45E+9(-) * 2.26E+8(-) 1.31E+4(2.3E+3)r15t300 L L 4.42E+24(-) * * 8.69E+15(-) 1.38E+18(-) * 2.18E+11(-) 2.97E+6(5.6E+5)r15t300 S L * * * 2.46E+15(-) 1.00E+18(-) * 9.62E+10(-) 2.41E+6(1.4E+6)r15t500 L S 1.93E+8(-) 2.37E+7(-) 5.05E+6(-) 2.21E+8(-) 6.06E+7(-) 5.27E+5(-) 3.46E+8(-) 3.00E+4(1.0E+5)r15t500 S S 1.58E+8(-) 3.18E+7(-) 4.84E+6(-) 1.93E+8(-) 5.36E+7(-) 2.76E+5(-) 3.01E+8(-) 8.47E+3(1.3E+3)r15t500 L M 1.07E+13(-) 2.82E+36(-) 1.63E+24(-) 2.14E+13(-) 4.05E+15(-) 3.57E+42(-) 4.73E+11(-) 4.38E+4(1.3E+4)r15t500 S M 2.59E+11(-) * * 1.24E+13(-) 2.30E+15(-) * 4.43E+11(-) 3.48E+4(1.0E+4)r15t500 L L 4.97E+38(-) * * 3.55E+23(-) 1.98E+29(-) * 6.58E+15(-) 1.15E+10(5.3E+10)r15t500 S L * * * 8.29E+22(-) 5.41E+28(-) * 5.82E+15(-) 1.33E+8(8.4E+7)r20t200 S S 1.30E+5(-) 4.45E+3(-) 4.02E+3(-) 1.08E+5(-) 1.71E+4(-) 5.84E+3(-) 1.69E+5(-) 2.15E+3(5.2E+1)r20t200 S M 3.32E+6(-) * 2.09E+23(-) 4.12E+6(-) 1.05E+6(-) * 4.58E+6(-) 5.54E+3(3.5E+2)r20t200 S L * * * 1.36E+8(-) 2.11E+9(-) * 3.89E+7(-) 2.15E+4(6.1E+3)r20t300 S S 9.65E+5(-) 3.55E+4(-) 9.88E+4(-) 8.70E+5(-) 9.61E+4(-) 2.85E+4(-) 1.30E+6(-) 3.59E+3(1.0E+2)r20t300 S M 1.20E+8(-) * * 2.33E+8(-) 8.83E+7(-) * 1.82E+8(-) 1.04E+4(1.2E+3)r20t300 S L * * * 9.82E+11(-) 2.48E+13(-) * 5.20E+9(-) 6.54E+4(1.0E+4)r20t500 S S 3.40E+7(-) 3.45E+6(-) 4.63E+6(-) 3.17E+7(-) 3.61E+6(-) 2.64E+5(-) 5.52E+7(-) 7.06E+3(2.9E+2)r20t500 S M 1.99E+11(-) * * 6.80E+11(-) 1.38E+12(-) * 3.30E+11(-) 2.70E+4(5.5E+3)r20t500 S L * * * 1.14E+17(-) 3.02E+21(-) * 5.79E+13(-) 4.70E+5(1.9E+5)

(+)/(≈)/(-) 0/0/45 0/0/45 0/0/45 0/0/45 0/0/45 0/0/45 0/0/45 –

algorithm to address the DMPDA problem.

B. Training Curve

Fig. 7 shows the convergence curves of the objective valuesof GPHH in the training process on four representative sce-narios. The x-axis represents the generation. Due to the hugegap of different objective values among different solutions, they-axis stands for the natural logarithm of the average objectivevalue of the best solution fitness over 20 runs in each gener-ation. From Fig. 7, it can be observed that objective valuesobtained by the proposed GPHH method gradually decreaseduring the training process. In addition, the convergence curvesin the figure fluctuates due to the rotation of the mini-batchesinstances in different generations. However, if we compare theperformance of two generations using the same mini-batches

instance (e.g. generations 10 and 30), the performance in thelatter generation is improved.

C. Training Time

Fig. 8 shows the curves of the training time of the proposedGPHH method across different numbers of tasks accordingto the results of 18 different representative scenarios withdifferent characteristics. The x-axis represents the number oftasks and the y-axis stands for the average training time of 20runs for each scenario measured by seconds. From Fig. 8, thefollowing observations can be obtained.• Comparing Fig. 8 (a) and Fig. 8 (b), we can find that the

training time of the proposed GPHH is not sensitive tothe number of robots and the variance of robot abilities.

• The training time of the proposed GPHH increases as theinherent increment rates of tasks increase. The reason is

IEEE TRANSACTIONS ON CYBERNETICS 11

(a) (b)

(c) (d)

Fig. 7. Evolutionary progress of GPHH methods for the selected scenarios.(a) r10t200 S M (b) r15t300 L S (c) r15t500 S L (d) r20t500 S M

(a) (b)

Fig. 8. Training time across different numbers of tasks. (a) Number of robotsM = 10 and robots abilities with small variance, (b) Number of robotsM = 15 and robots abilities with large variance.

that the collaboration behaviours in the scenarios withlarge inherent increment rates are more intense thanthe scenarios with small inherent increment rates. Thus,there are more events in the entire execution plan in thescenarios with large inherent increment rates, leading toa larger training time.

• The training time of the proposed GPHH increases bythe square of the number of tasks. The reason is that thegreater the number of tasks leads to the greater timesof calculating priorities in the RCS execution simulationprocess.

VI. FURTHER ANALYSIS

The comparison results show that the RCSs generated by theproposed GPHH algorithm can achieve good performances. Inthis section, the RCSs generated by GPHH will be further in-vestigated to determine how they affect the DMPDA executionprocess.

A. Effectiveness of the designed filter

To verify the effectiveness of the designed cluster filter,we compare it with the GP without filter (GPHH-WF) andwith a baseline filter in which only the priorities of the topten tasks with minimal traveling cost will be calculated by agiven GP tree [45]. We use the r10t200 S M, r15t300 L S,

(a) (b)

(c) (d)

Fig. 9. Box plot of GPHH with(out) the designed filter on four scenarios. (a)r10t200 S M (b) r15t300 L S (c) r15t500 S L (d) r20t500 S M

r15t500 S L, and r20t500 S M with different robot abilitiesand scales as the test scenarios. Fig. 9 shows the resultsobtained by GPHH, GPHH-WF, and GPHH-DF. From Fig. 9,we can see that GPHH, GPHH-WF, and GPHH-DF obtainedobviously different distributions of the makespan. Especially,on the three scenarios (r10t200 S M, r15t500 S L, andr20t500 S M) where the inherent increment rates of the tasksare medium or large, the final results obtained by GPHH-WF and GPHH-DF have apparent outliers. GPHH has clearsuperiority against GPHH-WF and GPHH-DF in terms of bothconsistency and effectiveness.

Some tasks with medium or large inherent increment ratesare easy to grow to become urgent tasks. If there are severalurgent tasks, the time to complete these tasks in turn willincrease exponentially. In the designed filter, these urgent taskscan be identified during the designed RCS heuristic template,and robots are allocated to execute these tasks when theirdemands are not very large. Therefore, the designed filter isvery useful.

B. Effectiveness of the decision order

Decision order is a key issue in the multi-robot systemsince different decision orders can cause different decisionqualities. Thus, computational experiments are performed tovalidate the influences of decision orders on the proposedGPHH algorithm. In the experiments, a GPHH algorithmin which robots with large abilities make decisions first, aGPHH algorithm in which robots make decisions in a randomorder, and a GPHH algorithm in which robots make decisionsaccording to a pre-determined order are designed as thecontrol group, denoted as GPHH-LF, GPHH-R, and GPHH-Prespectively. The computational results of the GPHH methodswith different decision orders are shown in Table V, and(+)/(≈)/(-) represents the method is better/equal/worse thanGPHH. An additional row at the bottom of this table representsthe number of instances on which the compared algorithm hasachieved a better result than the GPHH methods under theWilcoxon rank-sum test with Bonferroni correction.

IEEE TRANSACTIONS ON CYBERNETICS 12

First, the Friedman test is carried out to detect differencesbetween the GPHH and compared algorithms across sevenscenarios. The p-value of the Friedman test show that there arenot significant differences on r10t300 S S, r15t200 S M, andr20t500 S S. From Table V, we can also see that the resultsare insensitive to the decision order on the three scenarios.The reason may be that the automated designed RCSs alwaysgather robots together to execute tasks one by one. Comparedwith GPHH-LF, GPHH obtains better performance except forr15t200 S M.

Compared with GPHH-R and GPHH-P, GPHH is signifi-cantly better on r10t500 L M and r15t300 L L in which theabilities of robots have large variance. We can also find that thedecision order in scenarios with large robot ability variance ismore sensitive than the decision order in scenarios with smallrobot ability variance. In summary, the designed decision orderin which robots with small abilities make decisions first ismore effective than the other strategies.

C. Feature Analysis

Fig. 10 shows the number of times of the features appearingin the best evolved RCSs in the 20 runs in four differentscenarios, respectively. The x-axis represents the featuresand the y-axis indicates the number of appearances of thecorresponding features in the final best performance RCSs.From Fig. 10, the following observations can be obtained.• TC and DAM are the most frequently used features

among the 14 features, representing that the travellingcost and the predicted demand at the arrival moment areimportant for the DMPDA problem.

• TDT and FEP are the most infrequently used featuresamong the 14 features, representing that the detectiontime of the candidate task and the first start time of robotsexecuting the candidate task may not be the key featuresfor the DMPDA problem.

• RCMP is more frequently used for the scenarios witha small variance of robot abilities and does not make asignificant difference in the scenarios with small inherentincrement rates.

• NRT is more frequently used in scenarios with a largenumber of robots, indicating that the number of the robotsassigned to the candidate task is important for the taskplanning of a large number of robots.

• ITAB is frequently used for the scenarios with smallinherent increment rates. This indicates that balancing theabilities of robots is important for the scenarios with smallinherent increment rates.

D. RCS semantic analysis

To gain a deeper understanding about the GP-evolved RCSs,we take the best RCS achieved by the proposed GPHHalgorithm on r15t500 S L as an example since r15t500 S Lis the most difficult to be optimised than other three examplescenarios. After the semantic simplification, e.g. a − a =0, a/a = 1, this RCS is shown in (16).

h(·) = TAB− 2DAM + 2RCMP + CD(TC− 1)− TC (16)

(a) (b)

(c) (d)

Fig. 10. Appearing times of the features for the selected scenarios. (a)r10t200 S M (b) r15t300 L S (c) r15t500 S L (d) r20t500 S M

The example RCS is quite small and easy to understand afterthe simplification. From (16), the following observations canbe obtained.• The RCS prefers the selection of a candidate task if a

large total ability of robots have been assigned to it (i.e.TAB), and the selection can lead to a large reductionof the completion time (i.e. RCMP). It suggests that therobots should complete some tasks as soon as possible,so that the future demand increment can be reduced.

• Since the travel cost (i.e. TC) is always larger than one,the candidate task with a large current demand (i.e. CD)is preferred to be executed first. The reason is that a taskwith a large CD can be regarded as a urgent task, whichneeds to be executed as soon as possible.

• To reduce the travel cost of the robots, the candidatetask with a small travel cost (i.e. TC) is preferred to beexecuted first.

• TAB and TC are relatively small compared with RCMP,DAM, and CD as the execution process goes. Thus,DAM, RCMP, and CD play a major role in the decisionmaking stage.

VII. CONCLUSIONS

In summary, this is the first paper to investigate the DMPDAproblem. Considering the three characteristics of the problem,i.e. time-varying demands, dynamic arrival/detection time,and collaborative behaviour among robots, a new model isproposed in this paper. To address the DMPDA problem, anew GPHH method is developed in this paper. The new GPHHalgorithm contains a new problem-specific heuristic templatefor fitness evaluation. Also, a new terminal set is designedconsidering the information that can be obtained during themission execution. The heuristic template also features a novelfilter scheme to handle the urgent tasks. The experimentalresults showed that the proposed GPHH method significantlyoutperforms the state-of-the-art manually designed rules insolving the DMPDA problem. Also, the proposed GPHHmethod is very flexible, so it can be easily transferred to other

IEEE TRANSACTIONS ON CYBERNETICS 13

TABLE VCOMPUTATIONAL RESULTS OBTAINED BY GPHH WITH DIFFERENT DECISION ORDERS OF THE BENCHMARK SCENARIOS.

Instance GPHH-LF GPHH-R GPHH-P GPHHMean Std Mean Std Mean Std Mean Std

r10t200 L S 5.88E+3(-) 3.1E+2 5.81E+3(≈) 3.6E+2 5.70E+3(≈) 2.6E+2 5.72E+3 3.6E+2r10t300 S S 9.38E+3(-) 6.0E+2 9.03E+3(≈) 4.7E+2 9.18E+3(≈) 8.4E+2 9.11E+3 8.1E+2r10t500 L M 1.14E+8(-) 5.0E+8 9.37E+6(-) 6.3E+6 1.10E+7(-) 6.8E+6 6.80E+6 5.7E+6r15t200 S M 8.05E+3(≈) 1.1E+3 7.65E+3(≈) 4.8E+2 7.85E+3(≈) 1.1E+3 7.73E+3 6.4E+2r15t300 L L 7.06E+6(-) 4.6E+6 4.61E+6(-) 2.3E+6 6.23E+6(-) 7.3E+6 2.97E+6 5.6E+5r15t500 S L 2.17E+8(-) 1.8E+8 1.28E+8(≈) 4.8E+7 1.54E+8(≈) 1.5E+8 1.33E+8 8.4E+7r20t500 S S 7.43E+3(-) 6.5E+2 7.20E+3(≈) 6.2E+2 7.32E+3(-) 3.3E+2 7.06E+3 2.9E+2(+)/(≈)/(-) 0/1/6 0/5/2 0/4/3 –

real-world applications (e.g. search and rescue). The furtheranalysis and discussions showed when and how to distributeand group robots to execute different types of tasks efficientlyin the DMPDA problem.

For future studies, there are still some problems that need tobe further investigated. This paper has shown that the GPHH-evolved RCS can provide favourable decisions during thedecision making stage. However, the decision making stagein this paper is based on the full-connected and no-delaynetwork environment, which is hard to implement in practice.It would be very useful to investigate the performances of thedecision making in different network environments, such ascommunication delay networks. Besides, the DMPDA problemis a naturally multi-objective problem which needs to balancethe number of robots and the efficiency of executing allthe tasks. It would be important to design a multi-objectivemethod to complete the tasks as soon as possible balancedwith a minimised number of robots. Finally, even though theproposed DMPDA model has considered the dynamic taskarrivals, the uncertainty characteristics (i.e. travel cost) whichexists in real applications are not considered in this paper. Itwould be worthwhile paying attention to extend the DMPDAmodel with these uncertainty characteristics.

REFERENCES

[1] B. Xin, Y.-G. Zhu, Y.-L. Ding, and G.-Q. Gao, “Coordinated motionplanning of multiple robots in multi-point dynamic aggregation task,”in Proceedings of the 12th IEEE International Conference on Controland Automation. IEEE, 2016, pp. 933–938.

[2] G. Guanqiang, Y. Mei, X. Bin, J. Ya-Hui, and B. Will, “A memeticalgorithm for the task allocation problem on multi-robot multi-pointdynamic aggregation missions,” in Proceedings of the 2020 IEEE WorldCongress on Computational Intelligence (WCCI). IEEE, 2020, pp. 1–8.

[3] G. Gao, Y. Mei, Y. hui Jia, B. Will, and X. Bin, “Adaptive coordinationant colony optimisation for multi-point dynamic aggregation,” IEEETrans. Cybern., 2020. Doi: 10.1109/TCYB.2020.3042511.

[4] S. Jazebi, F. De Leon, and A. Nelson, “Review of wildfire managementtechniques part i causes, prevention, detection, suppression, and dataanalytics,” IEEE Trans. Power Deliv., vol. 35, no. 1, pp. 430–439, 2019.

[5] California, “Cal fire,” Website, 2020, https://www.fire.ca.gov/.[6] Y. Rizk, M. Awad, and E. W. Tunstel, “Cooperative heterogeneous multi-

robot systems: a survey,” ACM Comput. Surv., vol. 52, no. 2, pp. 1–31,2019.

[7] K. Harikumar, J. Senthilnath, and S. Sundaram, “Multi-uav oxyrrhismarina-inspired search and dynamic formation control for forest fire-fighting,” IEEE Trans. Autom. Sci. Eng., vol. 16, no. 2, pp. 863–873,2018.

[8] P. Wu, F. Chu, A. Che, and M. Zhou, “Bi-objective scheduling of fireengines for fighting forest fires: New optimization approaches,” IEEEtrans. Intell. Transp. Syst., vol. 19, no. 4, pp. 1140–1151, 2017.

[9] G. Tian, Y. Ren, and M. Zhou, “Dual-objective scheduling of rescuevehicles to distinguish forest fires via differential evolution and particleswarm optimization combined algorithm,” IEEE trans. Intell. Transp.Syst., vol. 17, no. 11, pp. 3009–3021, 2016.

[10] Y. Ding, B. Xin, and J. Chen, “A review of recent advances incoordination between unmanned aerial and ground vehicles,” UnmannedSyst., 2020 DOI: 10.1142/S2301385021500084.

[11] Y.-L. Liao and K.-L. Su, “Multi-robot-based intelligent security system,”Artif. Life Robot, vol. 16, no. 2, p. 137, 2011.

[12] C. Robin and S. Lacroix, “Multi-robot target detection and tracking:taxonomy and survey,” Auton. Robots, vol. 40, no. 4, pp. 729–760, 2016.

[13] G. GuanQiang and X. Bin, “A-stc: auction-based spanning tree coveragealgorithm formotion planning of cooperative robots,” Front. Inf. Technol.Electron. Eng., vol. 20, no. 1, pp. 18–31, 2019.

[14] M. U. Arif, “A generic evolutionary algorithm for efficient multi-robot task allocations,” in Proceedings of the Canadian Conference onArtificial Intelligence. Springer, 2019, pp. 486–491.

[15] B. Xin, S. Liu, Z. Peng, and G. Gao, “An estimation of distributionalgorithm for multi-robot multi-point dynamic aggregation problem,” inProceedings of the 2018 IEEE International Conference on Systems,Man, and Cybernetics. IEEE, 2018, pp. 775–780.

[16] B. L. Golden, S. Raghavan, and E. A. Wasil, The vehicle routingproblem: latest advances and new challenges. Springer Science &Business Media, 2008, vol. 43.

[17] P. Toth and D. Vigo, Vehicle routing: problems, methods, and applica-tions. SIAM, 2014.

[18] M. Otte, M. J. Kuhlman, and D. Sofge, “Auctions for multi-robottask allocation in communication limited environments,” Auton. Robots,vol. 44, no. 3, pp. 547–584, 2020.

[19] C. Yu, Y. Dong, Y. Li, and Y. Chen, “Distributed multi-agent deepreinforcement learning for cooperative multi-robot pursuit,” J. Eng., vol.2020, no. 13, pp. 499–504, 2020.

[20] Z. Yan, N. Jouandeau, and A. A. Cherif, “A survey and analysis ofmulti-robot coordination,” Int. J. Adv. Robot. Syst., vol. 10, no. 12, p.399, 2013.

[21] J. Chen, Y. Ding, B. Xin, Q. Yang, and H. Fang, “A unify-ing framework for human-agent collaborative systems–part one: El-ement and relation analysis,” IEEE Trans. Cybern., 2020. DOI:10.1109/TCYB.2020.2977602.

[22] X. Du, J. Zhang, B. Xin, Y. Ding, Z. Peng, and L. Dou, “Market-based task assignment for multi-point dynamic aggregation tasks,” inProceedings of the International Conference on Computers and Indus-trial Engineering, CIE, Auckland, New zealand, 2018.

[23] J. Chen, Y. Guo, Z. Qiu, and B. Xin, “Multiagent dynamic taskassignment based on forest fire point model,” IEEE Trans. Autom. Sci.Eng., 2020, Doi: 10.1109/TASE.2021.3061757.

[24] Y. Liu, Y. Mei, M. Zhang, and Z. Zhang, “A predictive-reactive approachwith genetic programming and cooperative coevolution for the uncertaincapacitated arc routing problem,” Evol. Comput., vol. 28, no. 2, pp. 289–316, 2020.

[25] B. Xu, Y. Mei, Y. Wang, Z. Ji, and M. Zhang, “Genetic programmingwith delayed routing for multi-objective dynamic flexible job shopscheduling,” Evol. Comput., vol. 29, no. 1, pp. 75–105, 2021.

[26] F. T. Hanshar and B. M. Ombuki-Berman, “Dynamic vehicle routingusing genetic algorithms,” Appl. Intell., vol. 27, no. 1, pp. 89–99, 2007.

[27] J. Branke, S. Nguyen, C. W. Pickardt, and M. Zhang, “Automateddesign of production scheduling heuristics: A review,” IEEE Trans. Evol.Comput., vol. 20, no. 1, pp. 110–124, 2015.

[28] S. Nguyen, M. Zhang, M. Johnston, and K. C. Tan, “A computationalstudy of representations in genetic programming to evolve dispatching

IEEE TRANSACTIONS ON CYBERNETICS 14

rules for the job shop scheduling problem,” IEEE Trans. Evol. Comput.,vol. 17, no. 5, pp. 621–639, 2012.

[29] J. Lin, L. Zhu, and K. Gao, “A genetic programming hyper-heuristicapproach for the multi-skill resource constrained project schedulingproblem,” Expert Syst. Appl., vol. 140, p. 112915, 2020.

[30] K. Teng and J. Katupitiya, “Market-based task assignment strategiesfor multi-agent systems deployed for bushfire fighting,” in Proceedingsof the 11th IEEE International Conference on Control & Automation(ICCA). IEEE, 2014, pp. 25–31.

[31] F.-G. Poggenpohl and D. Guttinger, “Optimizing task allocation onfire fighting,” in 2012 Fourth International Conference on IntelligentNetworking and Collaborative Systems. IEEE, 2012, pp. 497–502.

[32] R. Hao, J. Zhang, B. Xin, C. Chen, and L. Dou, “A hybrid differentialevolution and estimation of distribution algorithm for the multi-pointdynamic aggregation problem,” in Proceedings of the Genetic andEvolutionary Computation Conference Companion. ACM, 2018, pp.251–252.

[33] J. H. Drake, A. Kheiri, E. Ozcan, and E. K. Burke, “Recent advancesin selection hyper-heuristics,” Eur. J. Oper. Res., vol. 285, no. 2, pp.405–428, 2020.

[34] X. Hao, R. Qu, and J. Liu, “A unified framework of graph-basedevolutionary multitasking hyper-heuristic,” IEEE Trans. Evol. Comput.,2020.

[35] N. Pillay and R. Qu, “Assessing hyper-heuristic performance,” J. Oper.Res., 2020. Doi:10.1080/01605682.2020.1796538.

[36] E. K. Burke, M. Gendreau, M. Hyde, G. Kendall, G. Ochoa, E. Ozcan,and R. Qu, “Hyper-heuristics: A survey of the state of the art,” J. Oper.Res., vol. 64, no. 12, pp. 1695–1724, 2013.

[37] E. K. Burke, M. R. Hyde, G. Kendall, G. Ochoa, E. Ozcan, and J. R.Woodward, “A classification of hyper-heuristic approaches: revisited,”in Handbook of Metaheuristics. Springer, 2019, pp. 453–477.

[38] J. R. Koza, Genetic programming: On the programming of computersby means of natural selection. MIT press, 1992, vol. 1.

[39] S. Nguyen, Y. Mei, B. Xue, and M. Zhang, “A hybrid genetic pro-gramming algorithm for automated design of dispatching rules,” Evol.Comput., vol. 27, no. 3, pp. 467–496, 2019.

[40] E. Kieffer, G. Danoy, M. R. Brust, P. Bouvry, and A. Nagih, “Tacklinglarge-scale and combinatorial bi-level problems with a genetic program-ming hyper-heuristic,” IEEE Trans. Evol. Comput., vol. 24, no. 1, pp.44–56, 2019.

[41] E. K. Burke, M. R. Hyde, G. Kendall, G. Ochoa, E. Ozcan, andJ. R. Woodward, “Exploring hyper-heuristic methodologies with geneticprogramming,” in Comput. Intell. Springer, 2009, pp. 177–201.

[42] W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone, Geneticprogramming: an introduction. Morgan Kaufmann Publishers SanFrancisco, 1998, vol. 1.

[43] T. Hildebrandt and J. Branke, “On using surrogates with genetic pro-gramming,” Evol. Comput., vol. 23, no. 3, pp. 343–367, 2015.

[44] F. Zhang, Y. Mei, S. Nguyen, and M. Zhang, “Collaborative multi-fidelity based surrogate models for genetic programming in dynamicflexible job shop scheduling,” IEEE Transa. on Cybern., 2021. Doi:10.1109/TCYB.2021.3050141.

[45] S. Nguyen, M. Zhang, M. Johnston, and K. C. Tan, “Genetic program-ming for evolving due-date assignment models in job shop environ-ments,” Evol. Comput., vol. 22, no. 1, pp. 105–138, 2014.

[46] J. MacLachlan, Y. Mei, J. Branke, and M. Zhang, “Genetic programminghyper-heuristics with vehicle collaboration for uncertain capacitated arcrouting problems,” Evol. Comput., pp. 1–31, 2019.

[47] Y. Liu, Y. Mei, M. Zhang, and Z. Zhang, “Automated heuristic designusing genetic programming hyper-heuristic for uncertain capacitatedarc routing problem,” in Proceedings of the Genetic and EvolutionaryComputation Conference, 2017, pp. 290–297.

[48] B. Tan, H. Ma, and Y. Mei, “A hybrid genetic programming hyper-heuristic approach for online two-level resource allocation in container-based clouds,” in Proceedings of the 2019 IEEE Congress on Evolution-ary Computation (CEC). IEEE, 2019, pp. 2681–2688.

[49] B. Tan, H. Ma, Y. Mei, and M. Zhang, “A cooperative coevolutiongenetic programming hyper-heuristic approach for on-line resourceallocation in container-based clouds,” IEEE Trans. on Cloud Comput.,2020.

[50] Y. Mei and M. Zhang, “Genetic programming hyper-heuristic for multi-vehicle uncertain capacitated arc routing problem,” in Proceedings of theGenetic and Evolutionary Computation Conference Companion, 2018,pp. 141–142.

[51] F. Zhang, Y. Mei, S. Nguyen, and M. Zhang, “Evolving schedul-ing heuristics via genetic programming with feature selection in dy-

namic flexible job shop scheduling,” IEEE Trans. Cybern., 2020. Doi:10.1109/TCYB.2020.3024849.

[52] F. Adam and W. Thomas, “Gpc++ - genetic programming c++ classlibrary,” Website, 2020, http://www0.cs.ucl.ac.uk/staff/W.Langdon/ftp/weinbenner/gp.html.

Guanqiang Gao received the B.S. degree in au-tomation from the Beijing Institute of Technology,Beijing, China, in 2015. He is currently pursuingthe Ph.D. degree with the School of Automation,Beijing Institute of Technology, Beijing. His currentresearch interests include evolutionary computationand multi-robot system.

Yi Mei (M’09-SM’18) received the BSc and PhDdegrees from the University of Science and Tech-nology of China, Hefei, China, in 2005 and 2010,respectively. He is currently a Senior Lecturer atthe School of Engineering and Computer Science,Victoria University of Wellington, Wellington, NewZealand. His research interests include evolutionaryscheduling and combinatorial optimisation, machinelearning, genetic programming, and hyper-heuristics.He has over 100 fully referred publications, includ-ing the top journals in EC and Operations Research

such as IEEE TEVC, IEEE TCYB, Evolutionary Computation Journal, Eu-ropean Journal of Operational Research, ACM Transactions on MathematicalSoftware. He serves as a Vice-Chair of the IEEE CIS Emergent TechnologiesTechnical Committee, and a member of Intelligent Systems ApplicationsTechnical Committee.

Bin Xin (Member, IEEE) received the B.S. degreein information engineering and the Ph.D. degreein control science and engineering, both from theBeijing Institute of Technology, Beijing, China, in2004 and 2012, respectively. He was an academicvisitor at the Decision and Cognitive Sciences Re-search Centre, the University of Manchester, from2011 to 2012. He serves as an associate editorof the journal Unmanned Systems and many otherinternational journals. He is currently a Professorwith the School of Automation, Beijing Institute of

Technology. His current research interests include search and optimization,evolutionary computation, unmanned systems, and multiagent systems.

Ya-Hui Jia (M’20) received the B.Eng. and Eng.D.degrees from Sun Yat-sen University, China, in 2013and 2019, respectively. He is currently a postdoctoralresearch fellow at the School of Engineering andComputer Science, Victoria University of Welling-ton, Wellington, New Zealand. His research interestsinclude evolutionary computation, combinatorial op-timization, software engineering, cloud computing,and intelligent transportation.

Will Browne (Member, IEEE) Will N. Browne’sresearch interests are in developing Artificial Cog-nitive Systems. His background is in MechanicalEngineering (B.Eng. degree (Hons.)) from the Uni-versity of Bath, U.K., 1993, and both the M.Sc.degree in Energy and the EngD degree (EngineeringDoctorate Scheme) from the University of Wales,Cardiff, U.K., in 1994 and 1999, respectively. Af-ter lecturing for eight years at the Department ofCybernetics, University of Reading, U.K., he isnow an Associate Professor with the Evolutionary

Computation Research Group, School of Engineering and Computer Science,Victoria University of Wellington, New Zealand. He has published over 100academic papers in books, refereed international journals, and conferences.