Autonomously Controlling Flexible Timelines: From Domain … · 2017. 9. 19. · Autonomously Controlling Flexible Timelines: From ... architecture and its modelling language as an

Autonomously Controlling Flexible Timelines: FromDomain-independent Planning to Robust Execution

Tiago NogueiraCenter for Telematics

D-97074 Wurzburg, [email protected]

Simone FratiniEuropean Space Agency

D-64293 Darmstadt, [email protected]

Klaus SchillingUniversity Wurzburg

D-97074 Wurzburg, [email protected]

Abstract—The introduction of more onboard autonomy in fu-ture single and multi-satellite missions is both a question oflimited onboard resources and of how far can we actually thrustthe autonomous functionalities deployed on board. In-flightexperience with NASA’s Deep Space 1 and Earth Observing 1has shown how difficult it is to design, build and test reliablesoftware for autonomy. The degree to which system-level on-board autonomy will be deployed in the single and multi satellitesystems of tomorrow will depend, among other things, on theprogress made in two key software technologies: autonomousonboard planning and robust execution. Parallel to the develop-ments in these two areas, the actual integration of planning andexecution engines is still nowadays a crucial issue in practicalapplication.This paper presents an onboard autonomous model-based ex-ecutive for execution of time-flexible plans. It describes itsinterface with an APSI-based timeline-based planner, its con-trol approaches, architecture and its modelling language as anextension of APSI’s DDL. In addition, it introduces a modifiedversion of the classical blocks world toy planning problem whichhas been extended in scope and with a runtime environment forevaluation of integrated planning and executive engines.

TABLE OF CONTENTS

1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12. NETSAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23. PLANNING FOR TIME FLEXIBILITY . . . . . . . . . . . . . . . . . 34. EXECUTING TIME FLEXIBLE PLANS . . . . . . . . . . . . . . . . 45. TEX EXECUTIVE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86. WAREHOUSE DOMAIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107. ONGOING AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . 128. SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13BIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1. INTRODUCTIONSatellite system-level onboard autonomy is a measure of thecapability of a satellite to meet mission objectives withoutground support for a given period of time. Autonomy isalways a question of degree and applicable time period. Pushfor more onboard autonomy is typically driven by either theneed to reduce operations costs, improve reactivity to sensedata such as to increase overall mission return, or guaranteerobust and safe operations in uncertain environments withlimited human supervision. In the last years, and muchdue to the availability of very small and inexpensive satelliteplatforms, there are more and more proposals for missionsinvolving multiple cooperating satellites. Onboard autonomy

978-1-5090-1613-6/17/$31.00 c©2017 IEEE

will play a decisive role in operating such new complexmissions.

Jonsson [1] classifies software technologies for autonomyinto four categories:

(1) Intelligent sensing, addresses the capabilities requiredto derive, usually in real time, the state of the system and itsenvironment from sensed data.

(2) Planning, encompasses the capabilities required to ex-pand high-level mission goals into sequence of actions (aplan) while respecting system and environment constraints.

(3) Execution, responsible for putting a plan into practice.An execution engine is responsible for the overall plan exe-cution and monitoring while guaranteeing real-time response.

(4) Fault diagnosis and protection, or FDIR builds onintelligent sensing and is concerned with failure detection,identification and recovery. These are typically model-basedsystems.

When it comes to satellite applications, the flagship mis-sions for such technologies are still today the Remote AgentExperiment (RAX) on Deep Space 1 (DS-1) [2] and theAutonomous Sciencecraft Experiment (ASE) on Earth Ob-serving 1 (EO-1) [3]. These two missions have pioneered andproven the value of these technologies for space applications.Lately, NASA has launched the IPEX CubeSat to validate newtechnologies for onboard image processing and autonomousoperations [4]. System-level onboard autonomy at the levelof what has been done for DS-1 and EO-1 is still unmatchedto this day for single-satellite systems, and has never beendemonstrated for a multi-satellite mission. The TechSat-21[5] and the Three Corner Satellite (3CS) [6] missions wereattempts at flying such technologies but never got the chancefor actual in-flight demonstration.

The complexity of the underlying models and the large num-ber of possible inputs and outputs makes it very dificult totest software for autonomy in all possible mission scenarios.First-hand account of the challenges faced when testing theRAX system are given, for example, in [2], [7], [8] and [9].Despite the extensive testing in the end in-flight anomalies didsurface, highlighting key challenges in the areas of modellingand testing. In particular, the need to:

(1) Move towards formal methods and model checking.

(2) Harmonize modelling languages and tools.

(3) Split modelling from reasoning.

Given the dificulty to reach an acceptable coverage usingempirical scenario-based testing, the introduction of formal

1

methods and model checking promises to simplify the overallverification and validation effort [8]. One of the in-flightanomalies experienced with RAX had its root cause in asubtle and undocumented difference in semantics betweenthe models used by the planning, execution and diagnosislayers [8]. Use of different modelling languages and tools forplanning, execution and FDIR leads to an increase in testingeffort and increased probability of inconsistencies. There is aneed to harmonize planning and execution modelling and, inparticular, to bridge the discrepancies between the more de-scriptive models used for planning and the more operationalmodels used for execution. The RAX experience also suggestsa clean split between modelling and reasoning, allowing toplace more effort on model building while reusing alreadyvalidated generic reasoning software, thus reducing the risksassociated with deploying autonomy software. Onboardplanning software, like the one deployed on RAX or ASE,still rely heavily on domain-specific heuristics. Advent ofmore powerful onboard computers starts to make it possibleto really explore domain-independent planning for onboardapplications. The same can be said for the execution and FDIRengines. The goal of domain-independent planning and exe-cution engines is to develop general approaches (algorithmsand models of action) that can be integrated with domain-specific planning techniques and tools. They rely on abstractmodels of time, values, resources, constraints and actions.In-flight demonstration of domain-independent planning andexecution systems is, by maximising and promoting reuseacross domains, a critical step towards gaining the requiredconfidence to push more autonomy onboard future missions.

The remainder of this paper is organized as follows. Westart with a short summary of the motivation for this workand introduce the NETSAT mission, its main objectives andthe requirements on the onboard autonomy system. Wethen take a look at timeline-based planning, and we brieflyreview the APSI AI planning and scheduling framework andits modelling language DDL. We look at how a flexibletemporal plan is synthesised in the form of a time-flexibletimeline and the advantages it can bring to execution. We thenintroduce our approach to executing time-flexible timelinesand describe the proposed controllers. We introduce theexecutive architecture, the execution environment and theunderlying runtime data structures. Finally we present a newtoy problem for integrated planning and execution evaluationbased on the classical blocks world toy planning problem.

2. NETSATThe NETSAT mission [10], under development at the Centerfor Telematics (ZFT), has as primary objective to demonstratein orbit autonomous formation flying of four picosatellites. Inthe process it will develop and test some key technologies inthe areas of inter-satellite communications, miniaturised orbitand attitude control, formation control based on miniaturizedelectrical propulsion systems [11] and onboard autonomy fordistributed satellite systems. NETSAT follows on the stepsof the percursor UWE picosatellite missions developed andoperated by the University of Wurzburg. The underlyingmodular satellite bus has already been tested in flight in thescope of the UWE-3 mission [12]. In-flight demonstration ofsuch a four satellite formation at the picosatellite scale opensthe door to explore new types of applications in the areas ofgeomagnetic gradiometry [13] or remote sensing based onphotogrammetric observations [14], for example.

Developing and operating a picosatellite mission is all about

resources and their management. As opposed to largerplatforms, where sub-systems tend to be oversized to copewith wider flight envelops, picosatellite design and operationsis a constant struggle for resources, be it mass, volume,communications bandwidth, electrical or processing power.The other side of tight resources is poor performance. Lowdownlink bandwidth means that less mission data can beretrieved. Low power and small volume and mass imposesevere constraints on the sensors and actuators for attitudedetermination and control. This in turn inevitably leads topoor pointing performance. Similarly, low volume imposessevere constraints on all sorts of optical and RF payloadswhere focal length and/or aperture play decisive roles. On-board autonomy will play, we think, a definitive role to prop-erly manage the scarce onboard resources and boost overallsystem performance. This in turn will increase the return ofsingle and multi-picosatellite missions and, hopefully, helpturning these systems into operationally useful platforms.

With NETSAT we aim at moving towards goal-based opera-tions. Ideally the operator should be able to task and monitorthe formation as if it were a single system. The operatordefines the goals at the mission level, and leaves the detailsof how to breakdown the goals into tasks and how to dis-tribute the tasks among the satellites to the onboard autonomysystem. This places the operator in a new role, the one ofdefining goals and of verifying whether or not they have beenachieved. Goal-based operations is not a new concept. TheAutonomous Exploration for Gathering Increased Science(AEGIS) software on the Mars Exploration Rover Opportunity[15] and on the Mars Science Laboratory Curiosity [16],and the ASE on EO-1 [3] are examples of actual operationalsystems. These systems managed to increase the mission sci-ence return by carrying out observations in an opportunisticfashion while reducing the need for humans in the loop. ForNETSAT the main driver is the need to reduce the burden onthe operations team an on the ground infrastructure, and tocope with the limited communication windows. We hopeat the same time to increase overall mission reliability byautonomously using the inherent redundancy provided byhaving a formation. Additionally, and being this a researchproject, we are interested in studying and understanding howto make the jump from single to multi-satellite autonomy.

On NETSAT we make a clear split between planning andexecution. Contrary to other solutions that make use of localcontinuous planning repair approaches, like ASE’s CASPER[17] or VAMOS on BIROS [18], we need to use a batchplanner if we really want to go towards fully goal-basedautonomous operations. Our planner is based on ESA’sAPSI framework [19] and runs on an ARM-based processor.On the other hand, the executive must be able to run on amicrocontroller (TI MSP430 or similar) on the primary on-board computer. To save power the ARM-based processorwill be switched off when not required. Additionally, thissplit is also driven by the requirement that it must be possibleto incrementally introduce autonomy on board. To allow forgradual deployment and testing of the autonomy functionson board we foresee three levels of autonomy, summarisedin Table 1. Starting with fully manual operations and noformation trajectory control (A1); with then move into asemi-autonomous mode (A2), where the goal-based planningis kept on ground, and the plans are uploaded to the executivefor execution. Finally, and with A3, we move the planning onboard. At this stage mission operations should be fully goal-based, and the operator’s role should be limited to defininggoals and monitoring whether or not they have been achieved.

2

Table 1. NETSAT autonomy levels.

Level Operations DescriptionA1 Fully manual, com-

pletely under groundcontrol.

Ground builds and up-links TC sequences.No formation control.

A2 Semi-autonomous,ground defines goalsand monitors planexecution.

Plan built on theground and uplinked tothe onboard executive.Formation controlplanned on the ground.

A3 Fully autonomous,ground definesand monitors goalexecution.

Operator defines,uplinks and monitorsgoal execution. Fullyautonomous formationcontrol.

In addition, it should be possible for the operator to:

(1) Have a complete overview of the onboard reasoning andexecution processes.

(2) In case a goal has not been achieved, trace down theculprit system, sub-system or event.

(3) Reattempt to execute a failed goal.

In the next sections we focus on planning and execution fora single satellite system. Extension for the NETSAT multi-satellite system domain is still work in progress.

3. PLANNING FOR TIME FLEXIBILITYTimeline-based planning is an approach to temporal planningwhich has been applied to the solution of several spaceplanning problems – e.g., [20], [21], [22], [23], [24], [25].This approach pursues the general idea that planning andscheduling for controlling complex physical systems consistin the synthesis of a set of desired temporal behaviours,named timelines, for system features that vary over time.

One of the reasons for the use of this type of planning in spaceproblems is the capability to enable, in a flexible way, theintegration of planning and scheduling tasks. Moreover, var-ious software development environments exist for rapid pro-totyping, test and synthesis of new planning and schedulingapplications based on timeline planning (EUROPA [26], AS-PEN [27]). ESA concurs in this area of advanced research bypromoting the development of APSI [19] (Advanced Planningand Scheduling Initiative) and APSI-related activities [28], itsuse for on-board autonomy with GOAC [29] (Goal OrientedAutonomous Controller) and its application on teleoperations[30].

A plan in this approach is constituted by a set of timelines.A timeline is a sequence of values, a set of ordered transitionpoints between the values and a set of distance constraintsbetween transition points. When the transition points arebounded by the planning process (lower and upper boundsare given for them) instead of being exactly specified, as ithappens in case of a least commitment solving approach forinstance, we refer to the timeline as time flexible and to theplan resulting from a set of flexible timeline as a flexible plan.

Modeling

The APSI Framework makes available two modelling prim-itives to specify the planning domain: state variables andresources.

State variables represent components that can take sequencesof symbolic states subject to various (possibly temporal)transition constraints. This primitive allows the definitionof timed automata. Here the automaton represents the con-straints that specify the logical and temporal allowed transi-tions of a timeline. A timeline for a state variable is valid ifit represents a timed word accepted by the automaton. Thetimed automaton (or in the APSI case, the state variable) is avery powerful modeling primitive, widely studied [31], andfor which different algorithms exist to find valid timelines.

Resources are used to model any physical or virtual entity oflimited availability, such that its profile represents its avail-ability over time whereas a decision on the resource modelsa quantitative use/production/consumption of the resourceover a time interval. Three types of resources are currentlyavailable in the APSI Framework: reusable resources abstractany real subsystem with a limited capacity, where an activityuses a quantity of resource during a limited interval andthen releases it at the end. Consumable resources abstractany subsystem with a minimum capacity and a maximumcapacity, where consumptions and productions consume andrestore a quantity of the resource in specific time instants.The third type of resources is the linear reservoir resource.This resource does not have a stepwise constant profile ofconsumption like reusable and consumable ones, but theactivities specify the amount of production/consumption pertime, namely slope, resulting in a profile of resource that islinear in time. As a consequence the amount of resourceavailable at each transition of the timeline depends on theduration of the time intervals over which this productionor consumption has been performed (conversely with theother types of resource where the profile of the resourceavailability at each transition depends only on when andhow much is produced/consumed and not on the duration ofthe production/consumption). This in turn induces the needfor integrated reasoning on time and data to guarantee theflexibility of the generated plan/schedule.

In timeline-based modeling the physical and technical con-straints that influence the interaction of the sub-systems(modeled either as state variables or resources) are repre-sented by temporal and logical synchronizations among thevalues taken by the automata and/or resource allocations onthe timelines. Conceptually these constructs define validschema of values allowed on timelines and link the valuesof the timelines with resource allocations. In particular theyallow the definition of Allen’s relations, like quantitative tem-poral relations among time points and time intervals as well asconstraints on the parameters of the related values [32]. Froma planning perspective, the synchronizations define the cause-effect relationships among the states of the system modelled,describing how a given status can be achieved. As an exampleto clarify the concept, let us present an excerpt from a simplelogistic domain.

Lets suppose a UAV with the task of moving goods in awarehouse (the domain use case described in Section 6).In order to model the UAV the following two subsystemsare considered: the mobility system MS and a battery BAT.In the model, we assume the UAV is able to move goodsbetween two points in space. Hence the mobility systemcan be modelled as a state variable taking the following

3

values: TRANSPORT(?b, ?x, ?y, ?z) when the UAV is movinga block ?b to a point 〈x, y, z〉 in space and WAIT(?x, ?y, ?z)when the UAV is standing in 〈x, y, z〉. In addition to that,since we model the battery consumption of the UAV, weneed a status RECHARGE(?x, ?y, ?z) that models the UAVstanding in a charging station (in this case 〈x, y, z〉 are thespatial coordinates of a battery charging station). A transitionTRANSPORT(?b, ?x, ?y, ?z) → WAIT(?x, ?y, ?z) denotes asuccessful move of a block b to 〈x, y, z〉, and a transitionWAIT(?x, ?y, ?z) → TRANSPORT(?b′, ?x′, ?y′, ?y′) denotesthe UAV starting to move a block b′ from a point 〈x, y, z〉 toa point 〈x′, y′, z′〉. From both states it is possible for the UAVto fly to a recharging station. Hence we have also transitionsfrom TRANSPORT and WAIT to RECHARGE and back.

The battery is modelled as a reservoir resource with a min-imum charging value. When an UAV is moving a block,it is supposed to consume the battery. We assume theconsumption depends only on the time the UAV takes to movefrom one point to the other (4 units per second). Besides that,also when it is waiting it consumes battery, less than when itis moving (1 unit per second) but still needs to be operational.On the contrary, when an UAV is standing in a rechargingstation, it restores its battery (of 10 units per second), hencethe resource is being produced instead of consumed.

This mix of causal and temporal relationships among theoperations can be stated with the following synchronization:

SYNCHRONIZE ms .{ uav0} {

VALUE T r a n s p o r t ( ? b , ? x , ? y , ? z ){

r1 b a t . uav0 . ACTIVITY(−4.0) ;EQUALS r1 ;

}

VALUE Wait ( ? x , ? y , ? z ){

r1 b a t . uav0 . ACTIVITY(−1.0) ;EQUALS r1 ;

}

VALUE Recharge ( ? x , ? y , ? z ){

r1 b a t . uav0 . ACTIVITY ( 1 0 . 0 ) ;EQUALS r1 ;

}

}

Problem solving

In the APSI framework, an application, being a genericplanner and/or scheduler or a domain specific deployed ap-plication is designed as a collection of solvers. Based onthe constraint-based paradigm, the search space of an APSIsolver is made of planning and scheduling statements ontimelines and temporal and data relations among them. Anapplication solves a conceptually more complex problem,like a planning problem for instance, or a global resourceoptimization problem. The main difference between a solverand an application consists in the fact that a solver searchesin the space of decisions and constraints, while an applicationsearches in the space of solvers solutions.

In this scenario we have used the PLASMA (PLAn SpaceMulti-solver Application) planner. It is made of a collectionof solvers, activated on a flaw detection base: when a flaw isdetected on a timeline, the planner activates an available listof solvers that can try to fix the problem.

PLASMA incorporates the principles of Partial Order Planning(POP, [33]), like a plan-space search space and the leastcommitment approach. Starting from an initial partial planonly made of partially specified timelines and goals to beachieved, the planner iteratively refines it into a final plan thatis compliant with all the requirements expressed by the goals.The PLASMA planner is grounded on different solvers to solvevarious flaws during the planning process:

• Partial Order Scheduler. It supports the scheduling pro-cess resulting from planning to guarantee temporal flexibilityto the plan. Partial Order Schedules [34] support the planningprocess and guarantee temporal flexibility to the plan andschedule (necessary for execution). While the POS concept1was originally introduced to cope with uncertainty and pro-vide robust solutions, a similar need is present when consid-ering an integrated planning and scheduling approach whereiteratively a planner and a scheduler are called to modify ashared representation. In fact, the schedule produced shouldbe able to accommodate possible changes introduced by theplanner preserving sufficient flexibility.

• Resource Production Allocator. It makes sure that linearresources can be adequately managed by avoiding any over-consumption. The task of the Resource Production Alloca-tor is to identify those time-windows, among the availableones, which can be used to compensate the different over-consumption on the different linear resources, that is, toproduce resource availability.

• MaxFlow Resource Profile Bounder. Its task consists inbounding position and duration of a set of activities whichassures that all the resource constraints and/or requirementsare consistent.

4. EXECUTING TIME FLEXIBLE PLANSThe executive is the layer of software responsible for puttinga plan into practice. While planing addresses the ”what”and ”when”, the executive defines the ”how”. The executivehas to execute the plan in a dynamic environment, typicallywith hard time constraints and some level of uncertainty.Uncertainty at execution time can usually be attributed toeither:

(a) Incomplete domain modelling, when for example anactivity takes longer or a resource is consumed faster thananticipated.

(b) Unexpected changes to the domain, including for ex-ample failures in the system and unexpected changes to theenvironment in which the system operates.

As planning, and in particular batch planning, is a com-putationally expensive process, the executive should try toexplore the inherent flexibility of the plan to minimize theneed for replanning. Two properties are typically desired inan executive [35]:

(a) Robustness.

(b) Effectiveness.

1It is worth recalling the POS is a set of activities partially ordered such thatany possible complete order that is consistent with the initial partial order, isa resource and time feasible schedule. Therefore the planner, when movinginside the space identified by POS, will have the possibility of modifying thetemporal allocation preserving the feasibility of the overall solution.

4

s1

s2

s3

[lb,ub]1

sn

[lb,ub]2 [lb,ub]n

time

state

s1

s2

s3

sn

h

[lb,ub]h

resource

ra

rb

s0

[lb,ub]0

s0

Figure 1. Flexible system timeline as a discrete event system with temporal constraints.

Robustness reflects the ability to manage uncertainty suchas to control the plan to the desired state in the presenceof perturbations. A perturbation, in this context, is anydeviation to the initially modelled domain or problem. Tobe robust a plan must be flexible, in that it must encompassa set of possible alternative behaviours. This is providedby the inherent time flexibility in the plan. To be effectivean executive should only have to process constraints in thevicinity of the current execution step. That is, we don’t wantthe executive to have to propagate, at runtime, over the fullset of constraints to decide which action to take next, as thiscould introduce considerable delays in execution.

Approach

The problem of executing a plan in the form of a time-flexibletimeline can be seen as the one of controlling a Discrete EventSystem (DES) with time constraints. A DES is a system withpiecewise constant state trajectories [36]. State transitions inour case are places in the plan trajectory of the system whereat least one of the timelines changes its value. The role ofthe executive, or controller, is then to achieve a state andmaintain it between transitions while respecting the temporalconstraints (Figure 1).

As depicted in Figure 2, we split the problem of controllingsuch a system in two:

(1) Control the state of the system, guaranteeing that alltimelines achieve the required state at the required time. Thisis implemented by the system controller.

(2) Control the state of a component, by guiding its be-haviour to match the one of the corresponding timeline. Thisis implemented by the component controller.

Observers provide an estimate of the state of the system or ofa given component from acquired telemetry.

Splitting the concerns between system and component-levelcontrollers rather than following a centralised approach ismainly driven by two factors. First, the need to simplifythe implementation and reduce the memory footprint, aslater detailed in Section 5. Second, the possibilities suchan approach offers to distribute the actual deployment andexecution of the individual component controllers across thesatellite sub-systems, or even across different satellites. Thissplit, however, introduces a problem. Full knowledge aboutthe domain is only held by the system controller. Knowledgewithin a component controller is local, in that it is limited tothe component it controls. This implies that we need to define

clear boundaries within which the component controller canoperate, such as to guarantee that the decisions it takes donot interfere with the other timelines being controlled. Toguarantee that the plan remains sound we must make sure thatthe component controller can only cross states that are eithernot synchronised at all to any other states of the timelinesbeing controlled or, if synchronised, are only synchronisedto a state currently being controlled for. In other words, thecomponent controller cannot cross states that would movethe other timelines away from their target states. This isaddressed by, at each transition and for each componentcontroller, introducing a set of forbidden states that cannotbe traversed by the local component controller.

More formally, we define a flexible temporal plan by thetuple:

FP = (o, h, SC,CC, TN)

where:

• o is the plan origin.

• h is the plan horizon.

• SC is the set of System Controllers, with one controller pertransition SC = {sc0, ..., scn−1} where n is the total numberof transitions.

• CC is the set of Component Controllers CC ={cc0, ..., cck−1} with one controller per component type,where k is the number of different component types.

• TN is the temporal network.

The temporal network TN is encoded as a fully connected dis-tance graph TN = (V,E) with vertices V = {v0, ..., vn−1}and edges E = {e0, ..., em} where n is the total numberof domain transitions. For a fully connected distance graphm = n(n − 1). The temporal network holds the minimumand maximum temporal bounds between transitions.

System controller

A System Controller (SC) is responsible for monitoring andcontrolling the state of the whole system by guaranteeingthat all timelines reach and maintain the required state whilerespecting the temporal constraints. A SC continuously mon-itors the state of all timelines and triggers the execution of the

5

PlannerSystem

controller n

commands

observations

plan +

Observerscurrent state

errorSET goals

MONITOR goal

execution status

telemetry

execution status

Timeline

controllersTimeline

controllersComponent

controller k

+target state, forbidden states

error

Sub system nSub system nComponent k

Figure 2. Goal-based planning and execution.

corresponding Component Controller (CC) if the state of thetimeline deviates from the expected.

The system control procedure, presented as Controller 1,shows the control logic to achieve and maintain a targetdomain state St for the domain transition Ti, where:

• TL is the set of n timelines being controlled.

• St is the target system state for domain transition Ti asthe set of n target timeline states. One per timeline beingcontrolled.

• SF is the set of forbidden states for each of the n timelines.This is an empty set if there are no synchronisations betweenthe timelines being controlled.

• PRECOND is a sub-procedure that checks that all pre-conditions are met before starting the execution.

• MAIN is the sub-procedure that implements the actuallogic that controls the system timeline.

• EXIT used as exit point for the controller. Implementssome clean-up functions and error checking.

• elt, lb and ub are the current plan’s execution elapsedtime and the controller’s execution lower and upper temporalbounds, respectively.

The controller starts by running the PRECOND sub-procedurethat checks the temporal constraints and that the previouscontroller has achieved the required state. Once both con-ditions are met, declares itself as the active controller andmoves to the MAIN sub-procedure. This forces the previouslyrunning system controller to stop and exit. Given that onlyone system state is valid at any time, only one MAIN sub-procedure can be active at any given time. This approachmakes it possible to launch several system controllers in par-allel while guaranteeing that only one has control authority.This is required since due to the temporal flexibility in theplan, the temporal bounds of the transitions will partiallyoverlap. The controller will then enter a loop that controls thetimelines tlj to their target state sj by starting the correspond-ing component controller CCj . The set of forbidden states

SFj and the execution upper bound ub are passed down to

the component controller. SFj identifies the set of states that

the controller must avoid while controlling towards sj . ub isused as an execution deadline for the component controllers.Once the target domain state is achieved for the first time,the lower and upper temporal bounds for this transition areupdated with the current execution elapsed time. The tem-poral network is then propagated. Note that after reachingSt for the first time, and for as long as it remains active, thecontroller will continuously try to steer the system towardsSt. The temporal network, however, is only updated the firsttime St is reached.

Component controller

A Component Controller (CC) is responsible for monitoringand controlling the behaviour of a particular component type,as defined by the corresponding timeline. Several timelinescan share the same controller. A CC can be defined as either:

• A Weighted Finite State Automaton (WFSA).

• A user-defined script.

The states modelled by a component controller are a supersetof the states allowed by the corresponding component modelused by the planner. Extra states can be added such as tomonitor and control failure scenarios and unexpected states,making the state of the component fully observable. This in-herently provides the capability to model failure identificationand recovery mechanisms together with nominal behaviour.The option to specify a controller using a user-define script isthere to allow to model controllers whose behaviour cannotbe encoded as a WFSA. This is typically the case for manydomain-specific controllers like an attitude controller in asatellite. This type of controllers will not be discussed furtherin this contribution.

The component control procedure, presented as Controller2, shows the control logic to achieve and maintain an inputtarget state st, where:

• st is the target state towards which the component must becontrolled.

6

Controller 1 System controller for transition Ti

1: procedure SCi2: TL← {tl0, ..., tln−1}i3: St ← {s0, ..., sn−1}i4: SF ← {{SF

l , ..., SFm}0, ..., {SF

l , ..., SFm}n−1}i

5: procedure PRECOND6: while elt < lb do7: sleep for ∆t8: end while9: while elt < ub and state Ti−1 6= achieved do10: sleep for ∆t11: end while12: if elp ≥ ub then13: return error14: end if15: set SCi as the active controller16: end procedure

17: procedure MAIN18: while elp < ub and active controller = SCi do19: for j ← 0, n do20: if current state of tlj 6= sj then21: (result)← CCj(sj , S

Fj , ub)

22: end if23: end for24: if current system state = St then25: state← achieved26: if temporal network not yet updated then27: lb← elt28: ub← elt29: propagate the temporal network30: end if31: else32: state← not achieved33: end if34: end while35: end procedure

36: procedure EXIT37: clean up execution context38: if active controller = SCi then39: return error40: else41: return success42: end if43: end procedure

44: end procedure

• SF is the set of forbidden states that must be avoided whilecontrolling the component towards st.

• ub is the temporal upper bound within which the controllerhas to achieve st.

• S is the set of all defined states for this component.

• Tw is the set of weighted transitions or edges betweenstates. An edge is any combination of temporal and valueguards and actions.

The controller starts by updating its state transition matrixto account for the forbidden states. It then evaluates thecurrent state and derives the next step m in the control fromthe current state sc, the target state st and the available time

Controller 2 Component controller for component j

1: Input: target state st, the set of forbidden states SF , thedeadline or temporal upper bound ub to reach st, andsome optional controller-specific arguments a

2: procedure CCj(st, SF , d, [a])3: M ← {{S − SF }, Tw}4: evaluate current component state sc5: compute next step m6: while m ∈M and sc 6= st do7: if m ∈ Tw then8: (result,∆t)← execute m9: if result = success then10: p← 111: sleep for ∆t12: else13: p← penalty14: end if15: mw ← mw + p16: end if17: evaluate current component state sc18: compute next step m19: end while20: if sc = st then21: return result22: else23: return error24: end if25: end procedure

for execution, given by the difference between the deadlineand the current elapsed time (ub − elt). The next step iseither an edge that must be executed or an intermediate statethat must be traversed. If the state is not reachable or is notreachable within the available time, m is set to a value suchthat m /∈ M , and the procedure exists with an error. Thecontroller will loop until it reaches the target state or exitswith an error condition if the desired state is not reachable.Every time an edge is run it increments its weight mw andreturns a delay ∆t. The weight is incremented by one onsuccess, or by a penalty value if the execution exited withan error. Weights are introduced to avoid that unsuccessfulpaths are repeated. By default the weights for all edges areinitially set to one. The default weights can be overwrittenby the user in the model such as to give priority to somepaths over others. This could be used to make sure, forexample, that the controller always tries the nominal pathbefore attempting a recovery solution. Once the controllerexists, either nominally or with a failure, the weights arereset to their initial values. The sleep is introduced to allowthe system to settle before checking the state again. Thisallows us to account for all the delays between executingthe transition and the corresponding state being updated intelemetry. Each edge can define its own expected delay.

Temporal knowledge

The plan’s temporal information is encoded as a SimpleTemporal Network (STN) formulated as a distance graph. Thevertices correspond to the (temporal) events and the edgesto the temporal constraints between events. The edges arelabeled with the lower (lb) and upper (ub) temporal boundsfor the duration constraint between events [37]. At runtimethe network must be checked for consistency to ensure theplan’s consistency. The STN is full path consistent as long asthe distance graph does not contain negative cycles. Once atransition is confirmed, the corresponding lower and upper

7

bounds in the STN are updated with the actual executiontime. The STN is then propagated to adjust the temporalconstraints of future events accordingly, while maintaininga consistent network. If, after a propagation, the network isfound inconsistent the plan is considered failed, triggering areplan. If the STN is consistent than it is guaranteed that itis possible to pick any time point within the allowed timerange for an event, and still find valid times for all otherevents such that the plan is consistent [38]. We use the Floyd-Warshall (FW) all pairs shortest path algorithm for constraintpropagation and to verify consistency [39], as every all pairsshortest path network is dispatchable [38]. This makesit, though, the largest dispatchable network, which is notoptimal for execution if the number of vertices and edges islarge [35]. Depending on the size of the network, propagationcan be resource and time consuming and introduce latenciesin the execution. More optimised solutions using a minimaldispatchable STN have been proposed by Muscettola [38] andTsamardinos [35]. One advantage of fully propagating thenetwork is that it allows us to detect, as early as possible, ifand which temporal constraint will be violated. From this wecan derived which goal, or goals, will be affected and triggera replanning as soon as possible.

5. TEX EXECUTIVEThe Tiny Executive (TEX) introduced in this section is animplementation of the controllers and control approach de-scribed in Section 4. TEX aims at maximising decouplingbetween the domain and the underlying engines and algo-rithms, towards a domain-independent executive. The fullcontrol logic is modelled in the Tinytus Executive Language(TEL), an extension to the DDL language used on APSI, andthe controllers synthesised automatically at runtime. TELprovides already a large set of primitives that allow to modelrelatively complex behaviours.

Requirements and architecture

Much of the design and implementation requirements onthe executive are driven by the fact that it must run on anMCU on the main on-board computer on a pico-satellite,with very limited memory and processing. The key drivingrequirements for the TEX design and implementation are:

• The executive shall be quickly reprogrammable with mini-mum modifications to the underlying system.

• It shall be possible to run the executive on a 16 or 32 bitmicrocontroller (MCU).

• The executive shall minimise its RAM footprint (down to acouple of KiByte).

• The executive shall have deterministic memory consump-tion at runtime.

• The operator, and the system, shall be able to monitor andcontrol the execution of the controllers: start, stop, pause.

• As much of the code as possible shall run in a sandbox-likeenvironment.

• It shall be written in C.

Figure 3 depicts the main components and interfaces.

The executive manager oversees the overall execution, inter-

faces with the planner to receive new plans and report theexecution status, and interfaces with the onboard storage tostore newly received plans and to load controllers as required.

A shared database is used to share runtime informationamong all running controllers and the executive manager.Such information includes the current plan origin and hori-zon, the current execution elapsed time, the current activetransition and the status of the current system state (achieved/ not achieved). The database is also used to post and monitorevent messages, if any, that need to be passed betweencontrollers.

The system and component controllers run on the Tinytusscript engine. The controllers run in a sandbox environmentand interact with the underlying system by issuing commandsand monitoring telemetry.

The temporal network holds the plan temporal information.The controllers use the network to retrieve the lower andupper temporal bounds for their transitions, and to updatethe temporal network with the actual transition time once itoccurs.

Tinytus script language and interpreter

Tinytus is a script language and interpreter for embeddedsystems that provides an onboard sandbox environment forsafe software execution [40]. The language and onboardinterpreter were developed at the University of Wurzburgfollowing the experience gained with the operations of theUWE-3 picosatellite. UWE-3 was launched in 2013 and is thethird from the University of Wurzburg Experimental (UWE)satellite program. To update the onboard software (OBSW) ofa flying mission is always a critical procedure. In addition,the low uplink bandwidth of the radio-amateur transceiverflying on UWE-3, and typical of most picosatellites, makesmodifications to the OBSW cumbersome and time consuming.Given the educational nature of the UWE-3 project, it wasimportant to give the opportunity to researchers and studentsto experiment with new software. It was, however, essentialthat these new updates could be developed and uplinkedfast, and with no need for a full system regression test.Most importantly, the new software should be allowed tofail without compromising the rest of the system. The finalgoal was to provide an in-orbit sandbox-like environment toallow to easily test and quickly iterate on new experimentalsoftware, or using Tom Peters’ famous quote test fast, failfast, adjust fast. The Tinytus interpreter has been running onthe TI MSP430 16 bit MCU onboard UWE-3 since June 2014.It is still today extensively used to test, among other things,new attitude control algorithms.

The Tinytus script language uses Polish prefix notation andoffers the basic constructs and expressiveness typical of im-perative programming language. This includes: (a) arithmeticand logic operations; (b) flow control primitives; (c) declara-tion, access and casting of numerical variables and arrays;(d) function calls. In addition Tinytus gives the possibilityto define a set of external system functions in C that canbe invoked from within the script. This makes it possiblefor the script logic to interact with the underlying platformto dispatch commands, monitor telemetry or launch anotherscript, for example. A Tinytus script must first be compiledinto byte-code. The byte-code is then transferred to the targetsystem, loaded on the Tinytus interpreter and executed. Morethan one scripts can be executed in parallel. The only limit isthe RAM availability. The usage of an intermediate byte-coderepresentation, makes the interpreter simpler to implement

8

Temporal

network

Tinytus script engine

Executive manager

Shared

database

Onboard

storage

SC1

post monitor

System

telemetry

telecommandsSCn

CC1

CC2

CCn

set lb/ub get lb/ub

flexible plan

execution status

current state

postmonitor

load and monitor

controllers

controllers

temporal constraints

flexible plan

update

Figure 3. TEX executive main components and interfaces.

and with a much smaller footprint. This is particularlyrelevant as Tinytus targets embedded systems with severelimitations on computing power and available memory. Ascript can be started, stopped, paused and resumed at alater time by the operator. This is possible since Tinytusimplements dedicated stack memories for each of the scripts.This has the added advantage that any memory access canbe pre-validated by the interpreter. The system functions andmemory areas that can be accessed from within the script arelimited and controlled by the interpreter. Information suchas the script execution status (running, paused, finished), thecurrent execution step or any error code returned by the scriptare also available to the operator during execution [40].

Tinytus executive language

The Tinytus Executive Language, or TEL, extends the APSIDDL language with executive grammar. TEL is our firstattempt at harmonizing planning and execution modelling.TEL builds on the constructs provided by DDL and the Tinytusscript language, providing the added capability to model:

• Component controllers.

• System controllers.

• Observers.

• Actions.

• Telemetry checks.

• Telecommands.

• Guards.

• Expressions.

A component controller, introduced in Section 4, is modelledas a Weighted Finite State Automaton. The automatonencodes the monitoring and control logic needed on boardto move a component between states. The user defines: (a)the input parameters; (b) a set of allowed states (values)

with the corresponding expressions allowing to evaluate thecurrent component state (the automaton is in a given stateif the corresponding expression evaluates to true); (c) a setof transitions with the corresponding edges. An edge is anycombination of actions and guards that tells the controllerhow to move between states. The values and transitions area superset of the ones used by the planner. An additionalUnknown state is usually added to the original model used bythe planner as a catch all for any behaviour that is inconsistentwith all known/modelled behaviour. Other states to handlenon-nominal behaviour could also be defined. Coming backto the domain already introduced in Section 3, and furtherdetailed in Section 6, a simple controller for the UAV gripcould be described like:

CONTROLLER COMPONENT TIMELINE ArmGrip.<uav0> {

PARAMETERS{

t a r g e t : U8 ;}

VALUES{

Open ( ) : G r i p S t a t u s ( ) == 0 ;Close ( ) : G r i p S t a t u s ( ) == 1 ;Unknown ( ) : G r i p S t a t u s ( ) != 0 &&

G r i p S t a t u s ( ) != 1 ;}

TRANSITIONS{

Open ( ) TO Close ( ) : C l o s e G r i p ( ) ;C lose ( ) TO Open ( ) : OpenGrip ( ) ;Unknown ( ) TO Close ( ) : C l o s e G r i p ( ) ;

}

}

A system controller, introduced in Section 4, is simply a listof all the timelines that need to be actively monitored andcontrolled. For the UAV case this would be its position andthe status of the grip used to move the goods:

CONTROLLER SYSTEM TIMELINE {

VALUES

9

{P o s i t i o n .<uav0>;ArmGrip.<uav0>;

}

}

A state observer provides an estimate of the state of the com-ponent or system. An observer is any valid expression andwraps the logic required to derive the state of a componentfrom telemetry. In its simplest form an observer simply wrapsa telemetry check:

OBSERVER U8 G r i p S t a t u s ( ) {

tm ( GRIP STATUS ) ;

}

OBSERVER FLT P o s i t i o n E r r o r ( x , y , z ) {

s q r t ( ( tm ( POSITION X ) − x ) ˆ2 +( tm ( POSITION Y ) − y ) ˆ2 +( tm ( POSITION Z ) − z ) ˆ 2 ) ;

}

An action implements a control directive. It is any validexpression and wraps the logic required to implement thecontrol directive in the target system. In its simplest forman action simply wraps a telecommand:

ACTION OpenGrip ( ) {

t c ( GRIP GOTO ( 0 ) ) ;

}

A telemetry check allows a user to retrieve telemetry valuesfrom within the model. This could be used within an expres-sion to check the value of a parameter, or within a guard toimplement a value constraint:

tm ( GRIP STATUS ) ;

A telecommand allows the user to invoke a telecommand onthe target system from within the model:

t c ( GRIP GOTO ( 0 ) ) ;

A guard is used to model temporal and value guard con-ditions. Guards are used to dynamically enable or disableactions in a transition:

[AT == [ 1 0 , 5 0 ] ] ;[ tm ( GRIP STATUS ) == 0 ] ;

Expressions are any combination of arithmetic and logicalexpressions and can include observers, telemetry checks,actions and telecommands:

s q r t ( ( tm ( POSITION X ) − x ) ˆ2 +( tm ( POSITION Y ) − y ) ˆ2 +( tm ( POSITION Z ) − z ) ˆ 2 ) ;

6. WAREHOUSE DOMAINToy-problems have a along tradition in the planning literature.They have since long been used by the community to studyparticular types of reasoning in an abstract and controlled

domain, as well as to evaluate and benchmark new general-purpose planning algorithms. Famous examples include theTravelling Salesman Problem (TSP) [41], the N-Queens [42]and the Blocks World (BW) [43]. Many other problemshave been put forward by the community in forums suchas the Competition of Distributed and Multiagent Planners(CODMAP) [44]. Such domains are typically of little prac-tical interest and, at best, very simplified versions of real-life problems. They are purposely designed to abstract andstudy particular features relevant for more realistic scenarios.While toy problems for planning have already a long history,formulation and implementation of problems that combineboth planning and execution are not common. Given thetight relation between planning and execution, and giventhat integration of planning and execution is still today acrucial issue in real applications, we were prompt to developa problem, and corresponding environment, that would allowus to evaluate, in an integrated way, planning and executionengines. In our case, the use of a toy problem serves mainlytwo purposes:

(1) It forces the development of domain-independent plan-ning and executive engines without falling into domain-specific simplifications.

(2) It allows to evaluate, separately, how key domain fea-tures, like time and resources, affect the overall planning andexecution.

We take the Blocks World problem as the basis to devise amore realistic domain that, while maintaining most of BW’soriginal features, extends it with new ones relevant for ourrobotics domain. BW is among the most famous planningdomains, and it is usually seen as the ”Hello World” inplanning. In its original formulation the domain comprises afinite number of blocks arranged in stacks on an infinite table.The problem consists in deriving the sequence of actions thattakes the blocks from an initial configuration to a final goalconfiguration [43].

The rest of this section presents our Warehouse domain asan extension of the BW problem. This new implementationis intended for the development and evaluation of domain-independent planning and execution engines.

Domain and problem definition

The Warehouse domain extends the original BW problemwith:

• Time.

• Resources.

• Uncertainty at runtime.

The domain, depicted in Figure 4, is composed of one ware-house of finite dimensions containing:

• One storage area S = {s}

• One loading dock D = {d}

• One Unmanned Automated Vehicle U = {u}

• One charging station C = {c}

• A number n of blocks of the same shape and size: B ={b0, b1, . . . , bn−1}

10

c

d

b

b

b

b

b

b

b

bb

b

b

b

b

b

b

b

b

b

bb

b

bb b

b

b

b

b

b

b

u

b

b

b

b

b b

b b

b

b

b

b b

b b

b b

b b

b b

b b

b

bb b

b b

bb b

b

b

b

b

bbb

b

b

b

b

b

b

s

Figure 4. Warehouse planning domain, showing thestorage area (S), the loading dock (D), the UAV (U) , the

charging station (C) and the stacks of blocks (B).

The warehouse storage area has pre-defined finite dimen-sions, constraining the maximum number of blocks that canbe placed on the floor and the maximum number of blocksthat can be stacked. A UAV is used to move the blocks fromthe storage area to the loading dock, for posterior loading fordistribution. The UAV has a battery of limited capacity andan arm with a grip to pickup blocks. The battery depletesas the UAV moves and picks up blocks. The warehouse hasa charging station used by the UAV to recharge its battery.Finally, the warehouse has in store a number n of blocks upto a maximum number limited by the warehouse dimensionsdisposed in an initial configuration. Each block is identifiedby a tag from the finite set B = {b0, b1, . . . , bn−1}.

The task at hand involves moving a given set of blocks fromthe storage area to the loading dock for transport within agiven time window. As they arrive in the warehouse theblocks are first placed in the storage area. The blocks mustthen be moved to the loading dock and arranged in a specificconfiguration such as to facilitate their posterior loading anddelivery. The initial number of blocks, their starting andfinal positions as well as the time window allotted to movethe blocks are given by the initial problem. The UAV initialposition and battery state-of-charge (SoC) are also set at thestart.

On the execution side, we extend the initial domain to allowto inject the following perturbations at runtime:

(a) Add or remove a block.

(b) Move a block to a different position.

(c) Switch the tag between two blocks.

(d) Modify the battery charge or discharge rates.

By adding and removing blocks we can simulate scenarioswhere activities are added or removed as a plan is beingexecuted. We can evaluate, among other things, how theexecutive handles changes to the plan, how it interfaces withthe planner and replanning algorithms. By moving a blockfrom its initial position we introduce delays in execution andunplanned battery depletion. As the blocks are placed inthe loading dock they are checked to verify that the correct

c

d

bb

b

b

b

b

bb

b

bb

b

b

b

bb

bbb

bb b

bb

b

b

b

b

u

b

bb

b

b bb b

b

bb

b bb b

b bb bb bb b

bbb b

b bbb b

bbb

b

bbbb

b

bb

bb

sb (b) move block

(c) switch tags

b (a) add block

(d) set discharge rate

(d) set charge rate

b

b

(a) remove block

Figure 5. Warehouse planning and execution domain,showing the four types of perturbations that can be

injected at runtime.

block has been collected and that it has been put in thecorrect place for loading. By switching the tags on the blockswe can simulate scenarios where the outcome of an actionis not as intended. By modifying the battery charge anddischarge rates we can simulate scenarios where resources arenot consumed or produced as modelled.

To be noted that since the UAV path planning is out of scopefor our application, we settled for a very simplified, thoughrepresentative, solution for the path planning problem: (i)move upwards above the highest block in the warehouse; (ii)move straight along the line of sight to the target horizontalposition while maintaining the same height; (iii) once at thetarget position move downwards to the target height.

A similar domain, but limited to planning and using gruntswith no battery resources to manipulate the blocks has beenintroduced by Hamilton in [45].

Unity-based environment

The environment was implemented using the Unity gameengine [46], very popular among game developers. The sce-nario is implemented in C# and interfaces with the executivethrough a UDP/IP socket.

The environment provides de following telemetry in real-timeto the executive:

• Simulation time.

• UAV battery state-of-charge.

• UAV position and grip status.

• Current position of each of the blocks.

• The assigned tag for each of the blocks.

The executive can control the UAV and its grip using thefollowing commands:

• go-to(x,y,z) to command the UAV to move from its currentposition to a new position given by the coordinates (x,y,z).

11

telecommands

telemetry

execution status

current stateflexible plan

perturbations

udp/ip

Warehouse

(Unity)TEX executive

APSI based

planner

Figure 6. Planning and execution setup with the Unity-based environment.

Figure 7. Unity-based environment showing the blocksand the UAV in their initial configuration.

• grip-open() to open the UAV arm grip such as to release ablock.

• grip-close() to close the UAV arm grip such as to pickup ablock.

Figure 7 and Figure 8 show two screenshots of the warehouseat the start and at end of a simulation, respectively.

7. ONGOING AND FUTURE WORKCurrent and planned work to extend the planner, executiveand evaluation environment includes:

• Extend the planner and executive implementations formulti-satellite systems.

• Extend the warehouse Unity-based environment to supportscenarios involving multiple UAVs.

• Introduce a fitness function to evaluate planning-executionperformance. Fitness will be based on: total battery usage,total distance travelled and total time required.

• Port the warehouse Unity-based environment to OpenAIGym [47]. Several toolkits have been recently introducedtargeting the development and benchmarking of Reinforce-ment Learning (RL) algorithms. Among the most popular we

Figure 8. Unity-based environment showing the blocksin their final configuration in the loading dock ready for

load and transport.

count OpenAI Gym [48] from OpenAI and Project Malmo[49] from Microsoft. Extensions to OpenAI have alreadybeen developed that allow simulation of more realistic roboticdomains and integration with robotic hardware [50]. Portingthe current Unity-based environment to OpenAI will facilitatefuture development and integration of RL add-ons to theplanning and execution engines.

• Extend the planner to include path planning. This would beparticularly beneficial to better understand how to combinedomain-independent and domain-specific planning, be it pathplanning for an UAV or trajectory optimization and control fora spacecraft.

8. SUMMARYWe have introduced a model-based domain-independent ex-ecutive for execution of time-flexible plans. The executivebuilds on the Tinytus script engine and is designed to run on16 and 32 bit microcontrollers. We introduced as well theTinytus Executive Language, an extension to APSI’s DDL, andour first attempt to bring planning and execution modellingcloser together.As a way to evaluate the planner and the executive we havealso presented a new Warehouse domain and correspondingruntime environment based on the Blocks World toy planningproblem. This new environment is built on the Unity game

12

engine, and is used for integrated testing and evaluation ofplanning and execution engines.

ACKNOWLEDGMENTSThis work has been co-funded by the European Space AgencyNetworking/Partnering Initiative (NPI) between ESA-ESOCand the Center for Telematics (Zentrum fur Telematik e.V.),and by the European Research Council (ERC) AdvancedGrant “NETSAT” under the Grant Agreement No. 320377.

REFERENCES[1] A. K. Jonsson, R. A. Morris, and L. Pedersen, “Au-

tonomy in Space Exploration: Current Capabilities andFuture Challenges,” in IEEE Aerospace Conference,Big Sky, Montana, 2007, pp. 1–12.

[2] N. Muscettola, P. Nayak, B. Pell, and B. C. Williams,“Remote Agent: To Boldly Go Where No AI SystemHas Gone Before,” Artificial Intelligence, vol. 103, no.1-2, pp. 5–47, 1998.

[3] R. Sherwood, S. Chien, D. Tran, B. Cichy, R. Castano,A. Davies, and G. Rabideau, “Intelligent Systemsin Space: The EO-1 Autonomous Sciencecraft,” inInfotech@Aerospace. Reston, Virigina: AmericanInstitute of Aeronautics and Astronautics, sep 2005, pp.1–11. [Online]. Available: http://arc.aiaa.org/doi/abs/10.2514/6.2005-6917

[4] S. Chien, J. Doubleday, D. R. Thompson, K. L.Wagstaff, J. Bellardo, C. Francis, E. Baumgarten,A. Williams, E. Yee, D. Fluitt, E. Stanton, and J. Piug-suari, “Onboard Autonomy on the Intelligent PayloadEXperiment (IPEX ) CubeSat Mission : A Pathfinderfor the Proposed HyspIRI Mission Intelligent PayloadModule,” in International Symposium on Artificial Intel-ligence, Robotics and Automation for Space, Montreal,Canada, 2014.

[5] R. Sherwood, S. Chien, M. Burl, R. Knight, G. Ra-bideau, B. Engelhardt, A. Davies, P. Zetocha, R. Wain-right, P. Klupar, A. Force, V. Baker, and J. Doan,“The Techsat-21 Autonomous Sciencecraft Constella-tion Demonstration,” in International Symposium on Ar-tificial Intelligence, Robotics and Automation in Space,Montreal, Canada, 2001.

[6] E. Henrikson, “Three Corner Sat: Mission Review andLessons Learned,” in 19th Annual AIAA/USU Confer-ence on Small Satellites. Logan, Utah: Arizona StateUniversity, 2005.

[7] D. Bernard, G. Dorais, C. Fry, E. Gamble, B. Kanef-sky, J. Kurien, W. Millar, N. Muscettola, P. Nayak,B. Pell, K. Rajan, N. Rouquette, B. Smith, andB. Williams, “Design of the Remote Agent Experimentfor Spacecraft Autonomy,” in IEEE Aerospace Confer-ence, Snowmass, Colorado, 1998.

[8] D. Bernard, E. Gamble, N. Rouquette, B. Smith,Y. Tung, N. Muscettola, G. Dorias, B. Kanefsky,J. Kurien, W. Millar, P. Nayak, K. Rajan, and W. Taylor,“Remote Agent Experiment - DS1 Technology Vali-dation Report,” Jet Propulsion Laboratory and AmesResearch Center, Tech. Rep., 2000.

[9] B. D. Smith, M. S. Feather, and N. Muscettola, “Chal-lenges and Methods in Testing the Remote Agent Plan-ner,” in International Conference on Artificial Intel-

ligence Planning Systems (AIPS), Breckenridge, Col-orado, 2000.

[10] K. Schilling, P. Bangert, S. Busch, S. Dombrovski,A. Freimann, A. Kramer, T. Nogueira, D. Ris, J. Schar-nagl, and T. Tzschichholz, “NetSat: A Four Pico/Nano-Satellite Mission for Demonstration of AutonomousFormation Flying,” in 66th International AstronauticalCongress, Jerusalem, Israel, 2015.

[11] D. Bock, A. Kramer, P. Bangert, K. Schilling, andM. Tajmar, “NanoFEEP on UWE Platform - FormationFlying of CubeSats using Miniaturized Field EmissionElectric Propulsion Thrusters,” in Joint Conference of30th International Symposium on Space Technology andScience, 34th International Electric Propulsion Confer-ence and 6th Nano-satellite Symposium, Hyogo-Kobe,Japan, 2015, pp. 4–10.

[12] S. Busch, P. Bangert, S. Dombrovski, and K. Schilling,“UWE-3 In-Orbit Performance and Lessons Learned ofa Modular and Flexible Satellite Bus for Future Pico-Satellite Formations,” Acta Astronautica, vol. 117, pp.73–89, 2015.

[13] T. Nogueira, J. Scharnagl, S. Kotsiaros, andK. Schilling, “NetSat-4G: A Four Nano-SatelliteFormation for Global Geomagnetic Gradiometry,” in10th IAA Symposium on Small Satellites for EarthObservation, Berlin, Germany, 2015.

[14] T. Nogueira, S. Dombrovski, S. Busch, K. Schilling,K. Zaksek, and M. Hort, “Photogrammetric Ash CloudObservations by Small Satellite Formations,” in Pro-ceedings of the 3rd IEEE Workshop on Metrology forAerospace (MetroAeroSpace), Florence, Italy, 2016, pp.450–455.

[15] T. A. Estlin, B. J. Bornstein, D. M. Gaines, R. C.Anderson, D. R. Thompson, M. Burl, R. Castano,and M. Judd, “AEGIS Automated Science Targetingfor the MER Opportunity Rover,” ACM Transactionson Intelligent Systems and Technology (TIST), vol. 3,no. 3, pp. 1–19, 2012. [Online]. Available: http://doi.acm.org/10.1145/2168752.2168764

[16] R. Francis, T. Estlin, D. Gaines, G. Doran, O. Gas-nault, S. Johnstone, S. Montano, V. Mousset, V. Verma,B. Bornstein, and M. Burl, “AEGIS Intelligent Tar-geting Deployed for the Curiosity Rover’s ChemCamInstrument,” in 47th Lunar and Planetary Science Con-ference, The Woodlands, Texas, 2016, p. 2487.

[17] S. Chien, R. Sherwood, D. Tran, B. Cichy, G. Rabideau,R. Castano, S. Frye, B. Trout, D. Boyer, and S. Christa,“The Autonomous Sciencecraft Embedded Systems Ar-chitecture,” in 2005 IEEE International Conference onSystems, Man and Cybernetics, Waikoloa, Hawai, 2005,pp. 3927 – 3932.

[18] W. Halle, T. Terzibaschian, and K.-d. Rockwitz, “TheDLR-Satellite BIROS for Fire-Detection and Techno-logical Experiments,” in 10th IAA Symposium on SmallSatellites for Earth Observation, Berlin, Germany,2015.

[19] S. Fratini and A. Cesta, “The APSI Framework: A Plat-form for Timeline Synthesis,” in Proceedings of the 1stWorkshop on Planning and Scheduling with Timelinesat ICAPS-12, Atibaia, Sao Paulo, Brazil, 2012.

[20] N. Muscettola, “HSTS: Integrating Planning andScheduling,” in Intelligent Scheduling, Zweben, M. andFox, M.S., Ed. Morgan Kauffmann, 1994.

13

[21] A. Jonsson, P. Morris, N. Muscettola, K. Rajan, andB. Smith, “Planning in Interplanetary Space: Theoryand Practice,” in AIPS-00. Proc. of the Fifth Int. Conf. onArtificial Intelligence Planning and Scheduling, 2000,pp. 177–186.

[22] D. Smith, J. Frank, and A. Jonsson, “Bridging theGap Between Planning and Scheduling,” KnowledgeEngineering Review, vol. 15, no. 1, pp. 47–83, 2000.

[23] J. Frank and A. Jonsson, “Constraint Based Attributeand Interval Planning,” Journal of Constraints, vol.8(4), pp. 339–364, 2003.

[24] A. Cesta, G. Cortellessa, S. Fratini, and A. Oddi, “MR-SPOCK: Steps in Developing an End-to-End Space Ap-plication,” Computational Intelligence, vol. 27, no. 1,2011.

[25] S. Chien, D. Tran, G. Rabideau, S. Schaffer, D. Mandl,and S. Frye, “Timeline-Based Space OperationsScheduling with External Constraints,” in ICAPS-10.Proc. of the 20th International Conference on Auto-mated Planning and Scheduling, 2010.

[26] EUROPA, “Europa Software Distribution Web Site,”2008, https://babelfish.arc.nasa.gov/trac/europa/.

[27] S. Chien, G. Rabideau, R. Knight, R. Sherwood, B. En-gelhardt, D. Mutz, T. Estlin, B. Smith, F. Fisher, T. Bar-rett, G. Stebbins, and D. Tran, “ASPEN - AutomatedPlanning and Scheduling for Space Mission Opera-tions,” in SpaceOps, 2000.

[28] A. Cesta, G. Cortellessa, S. Fratini, A. Oddi, andG. Bernardi, “Deploying Interactive Mission PlanningTools - Experiences and Lessons Learned.” JACIII,vol. 15, no. 8, pp. 1149–1158, 2011. [Online].Available: http://dblp.uni-trier.de/db/journals/jaciii/jaciii15.html#CestaCFOB11

[29] A. Ceballos, S. Bensalem, A. Cesta, L. de Silva,S. Fratini, F. Ingrand, J. Ocon, A. Orlandini, F. Py,K. Rajan, R. Rasconi, and M. van Winnendael, “A Goal-Oriented Autonomous Controller for Space Explo-ration,” in Proceedings of the ASTRA 2011, 11th Sympo-sium on Advanced Space Technologies in Robotics andAutomation, 2011.

[30] S. Fratini, S. Martin, N. Policella, and A. Donati,“Planning-Based Controllers for Increased Levels ofAutonomous Operations,” in ASTRA 2013. 12th Sympo-sium on Advanced Space Technologies in Robotics andAutomation, 2013.

[31] R. Alur and D. L. Dill, “A Theory of Timed Automata,”Theoretical Computer Science, vol. 126, no. 2, pp. 183–235, 1994.

[32] J. Allen, “Maintaining Knowledge about Temporal In-tervals,” Communications of the ACM, vol. 26, no. 11,pp. 832–843, 1983.

[33] D. S. Weld, “An Introduction to Least CommitmentPlanning,” AI Magazine, vol. 15, no. 4, pp. 27–61, 1994.

[34] N. Policella, A. Cesta, A. Oddi, and S. F. Smith, “FromPrecedence Constraint Posting to Partial Order Sched-ules,” AI Communications, vol. 20, no. 3, pp. 163–180,2007.

[35] I. Tsamardinos, N. Muscettola, and P. Morris, “FastTransformation of Temporal Plans for EfficientExecution,” Proceedings of the 15th NationalConference on Artificial Intelli-gence (AAAI’98),

pp. 254–261, 1998. [Online]. Available: http://www.aaai.org/Papers/AAAI/1998/AAAI98-035.pdf

[36] P. Ramadge and W. Wonham, “The Control of DiscreteEvent Systems,” Proceedings of the IEEE, vol. 77, no. 1,pp. 81–98, 1989.

[37] R. Dechter, “Temporal Constraint Networks,” ArtificialIntelligence, vol. 49, no. 1-3, pp. 61–95, 1991.

[38] N. Muscettola, P. H. Morris, and I. Tsamardinos, “Re-formulating Temporal Plans for Efficient Execution.”6th International Conference on Principles of Knowl-edge Representation and Reasoning (KR 98), pp. 444–452, 1998.

[39] M. Ghallab, D. Nau, and P. Traverso, Automated Plan-ning: Theory & Practice. Elsevier, 2004.

[40] S. Dombrovski and P. Bangert, “Introduction of a NewSandbox Interpreter Approach for Advanced SatelliteOperations and Safe On-board Code Execution,” in 66thInternational Astronautical Congress, Jerusalem, Israel,2015.

[41] G. Laporte, “The Vehicle Routing Problem: AnOverview of Exact and Approximate Algorithms,” Eu-ropean Journal of Operational Research, vol. 59, no. 2,pp. 231–247, 1992.

[42] I. Rivin, I. Rivin, I. Vardi, I. Vardi, P. Zimmerman,and P. Zimmerman, “The n-Queens Problem,” TheAmerican Mathematical Monthly, vol. 2, no. 3, pp.9–13, 1994. [Online]. Available: http://www.jstor.org/stable/2974691

[43] S. J. Russell, P. Norvig, J. F. Canny, J. M. Malik,and D. D. Edwards, Artificial Intelligence: a ModernApproach. Upper Saddle River: Prentice hall, 2003.

[44] “CoDMAP- Competition of Distributed and MultiagentPlanners.” [Online]. Available: http://agents.fel.cvut.cz/codmap/

[45] P. A. Hamilton, “A Composite Architecture for a Re-alistic Blocks World Domain,” University of Maryland,Baltimore, Maryland, Tech. Rep., 2009.

[46] “Unity.” [Online]. Available: https://unity3d.com

[47] “OpenAI Gym.” [Online]. Available: https://gym.openai.com

[48] G. Brockman, V. Cheung, L. Pettersson, J. Schneider,J. Schulman, J. Tang, and W. Zaremba, “OpenAIGym,” arXiv preprint, 2016. [Online]. Available:http://arxiv.org/abs/1606.01540

[49] M. Johnson, K. Hofmann, T. Hutton, and D. Bignell,“The Malmo Platform for Artificial Intelligence Exper-imentation,” in International joint conference on artifi-cial intelligence (IJCAI), 2016.

[50] I. Zamora, N. Gonzalez Lopez, V. M. Vilches,A. Hernandez Cordero, and E. Robotics, “Extending theOpenAI Gym for Robotics: a Toolkit for ReinforcementLearning using ROS and Gazebo,” arXiv preprint, 2016.

14

BIOGRAPHY[

Tiago Nogueira received his degree inaerospace engineering from the InstitutoSuperior Tecnico in Lisbon in 2005, andpursued further studies in astrodynamicsand satellite systems at TU Delft beforejoining ESA-ESOC in 2007. Since 2014he is a research assistant at the Centerfor Telematics in Wurzburg Germany,and is also pursuing a PhD at the Uni-versity of Wurzburg. His research inter-

ests include satellite operations, satellite formation flying,satellite onboard autonomy and automated planning andexecution.

Simone Fratini is a Research Engi-neer in the Advanced Mission ConceptsTeam at the European Space Agency inESOC. Simone received a M.Sc. (2002)and a PhD (2006) in Computer Sci-ence Engineering at the University ofRome La Sapienza. Simones researchactivities apply to the use of ArtificialIntelligence for modelling and solvingplanning and scheduling problems for

controlling complex physical systems in various domains:spatial, robotics, industrial.

Klaus Schilling is currently the pres-ident of the Center for Telematicsand professor and chair for roboticsand telematics at the University ofWurzburg. His research interests in-clude autonomous and adaptive con-trol strategies, telematics methods, sen-sorics, mechatronic systems, and con-trol of distributed systems. These tech-niques are applied in design and tele-

operations of pico-satellites, industrial mobile robots, sensorsystems, tele-education and medical systems.

15

Documents

Autonomously Controlling Flexible Timelines: From Domain … · 2017. 9. 19. · Autonomously Controlling Flexible Timelines: From ... architecture and its modelling language as an