6
A probabilistic control architecture for robust autonomy of an anthropomorphic service robot Sven R. Schmidt-Rohr, Steffen Knoop, Martin L¨ osch, R¨ udiger Dillmann Abstract—In this paper, we present a probabilistic control architecture, which has been built around the concept of probabilistic decision making. By utilizing partially observable Markov decision processes (POMDPs) on an abstract level, the system is able to deal with imperfect multi-modal percep- tion and stochastic environment dynamics in real world set- tings. By compiling POMDP models from structured, symbolic background knowledge, handling of distinct superimposing stochastic properties of different modalities becomes feasible. The system has been implemented on a highly multi-modal, domestic service robot companion and autonomous behavior has been evaluated in real world scenarios against a baseline state machine controller. I. I NTRODUCTION A recent trend in robotics is to build domestic robot companions which operate autonomously in complex indoor environments and interact or cooperate with humans in a natural way. This leads to the requirement of having a control system which enables autonomous behavior of the robot companion. Real world environments have properties which com- plicate autonomous decision making: because of sensor limitations, the environment is only partially observable, environment dynamics are not fully deterministic (e.g. human behavior), the environment is dynamic and the course of events is sequential, in contrast to episodic. A probabilistic methodology enabling autonomous de- cision making in mission scenarios within such environ- ments are partially observable Markov decision processes (POMDPs). This paper presents a control architecture and actual algorithmic system which utilizes POMDP models and algorithms for mission level decision making in real, typical and highly multi-modal scenarios of a domestic robot companion. Compared to classical control architectures, the presented work stands out by using the powerful POMDP methodology and by being actually utilized on a physical, multi-modal robot. Compared to related POMDP applications, it is dis- tinct by utilizing POMDPs on a high, abstract level for all modalities. Additionally, a methodology has been developed to com- pile POMDP models from individual, semantically meaning- ful building blocks, each describing a certain aspect of the scenario and environment dynamics. This paper first describes related work in robot control architectures, POMDP foundation and applications. Sub- Institute of Computer Science and Engineering (CSE), Univer- sity of Karlsruhe, Germany {srsr|knoop|loesch|dillmann} @ira.uka.de sequently, the system architecture, algorithmic procedures and the data flow are presented. Knowledge structures and compilation of POMDP models are discussed then, followed by an analysis of experiments on the physical robot and conclusions drawn. II. STATE OF THE ART Existing research concerning robot control architectures for multi-modal autonomous robots has centered mostly around hierarchical approaches. While for simple, mobile robots, reactive systems have been successfully investigated, more complex controllers for multi-modal robots often em- ploy strongly hierarchical three layer architectures [1]. In those architectures, low level sensor processing and reactive actuator control takes place on the first level, while the second level supervises procedural command sequences or hierarchical task network processing and the third level creates sequences by classical planning methods. While these methods have proven suitable for simplified environment domains, they fail in partially observable and stochastic environments. The shortcomings of robot control systems could be solved by probabilistic techniques, lately popular in robotics be- cause they are able to deal with uncertainty in observation and environment dynamics. Probabilistic decision theory deals with reasoning of rational agents in the presence of uncertainty. A very promising framework within general probabilistic decision theory are partially observable Markov decision processes (POMDPs) [2], especially the class of discrete, model based POMDPs. A POMDP is an abstract environment model for reasoning under uncertainty [3], [4]. A POMDP models a flow of events in discrete states and discrete time. A specific POMDP model is represented by the 8-tupel (S,A,M,T,R,O,γ,b 0 ). S is a finite set of states, A is a discrete set of actions and M is a discrete set of measurements. The transition model T (s 0 , a, s) describes the probability of a transition from state s to s 0 when the agent has performed action a. The observation model O(m, s) describes the probability of a measurement m when the intrinsic state is s. The reward model R(s, a) defines the numeric reward given to the agent when being in state s and executing action a. The parameter γ controls the time discount factor for possible future events. The initial belief state is marked by b 0 . As POMDPs handle partially observable environments, there exists only an indirect representation of the intrinsic state of the world. In POMDPs, the belief state, a discrete probability distribution over all states in a scenario model, forms this

A probabilistic control architecture for robust autonomy of an anthropomorphic service robot

Embed Size (px)

Citation preview

A probabilistic control architecture for robust autonomy of ananthropomorphic service robot

Sven R. Schmidt-Rohr, Steffen Knoop, Martin Losch, Rudiger Dillmann

Abstract— In this paper, we present a probabilistic controlarchitecture, which has been built around the concept ofprobabilistic decision making. By utilizing partially observableMarkov decision processes (POMDPs) on an abstract level,the system is able to deal with imperfect multi-modal percep-tion and stochastic environment dynamics in real world set-tings. By compiling POMDP models from structured, symbolicbackground knowledge, handling of distinct superimposingstochastic properties of different modalities becomes feasible.The system has been implemented on a highly multi-modal,domestic service robot companion and autonomous behaviorhas been evaluated in real world scenarios against a baselinestate machine controller.

I. INTRODUCTION

A recent trend in robotics is to build domestic robotcompanions which operate autonomously in complex indoorenvironments and interact or cooperate with humans in anatural way. This leads to the requirement of having a controlsystem which enables autonomous behavior of the robotcompanion.

Real world environments have properties which com-plicate autonomous decision making: because of sensorlimitations, the environment is only partially observable,environment dynamics are not fully deterministic (e.g. humanbehavior), the environment is dynamic and the course ofevents is sequential, in contrast to episodic.

A probabilistic methodology enabling autonomous de-cision making in mission scenarios within such environ-ments are partially observable Markov decision processes(POMDPs). This paper presents a control architecture andactual algorithmic system which utilizes POMDP modelsand algorithms for mission level decision making in real,typical and highly multi-modal scenarios of a domestic robotcompanion.

Compared to classical control architectures, the presentedwork stands out by using the powerful POMDP methodologyand by being actually utilized on a physical, multi-modalrobot. Compared to related POMDP applications, it is dis-tinct by utilizing POMDPs on a high, abstract level for allmodalities.

Additionally, a methodology has been developed to com-pile POMDP models from individual, semantically meaning-ful building blocks, each describing a certain aspect of thescenario and environment dynamics.

This paper first describes related work in robot controlarchitectures, POMDP foundation and applications. Sub-

Institute of Computer Science and Engineering (CSE), Univer-sity of Karlsruhe, Germany {srsr|knoop|loesch|dillmann}@ira.uka.de

sequently, the system architecture, algorithmic proceduresand the data flow are presented. Knowledge structures andcompilation of POMDP models are discussed then, followedby an analysis of experiments on the physical robot andconclusions drawn.

II. STATE OF THE ART

Existing research concerning robot control architecturesfor multi-modal autonomous robots has centered mostlyaround hierarchical approaches. While for simple, mobilerobots, reactive systems have been successfully investigated,more complex controllers for multi-modal robots often em-ploy strongly hierarchical three layer architectures [1]. Inthose architectures, low level sensor processing and reactiveactuator control takes place on the first level, while thesecond level supervises procedural command sequences orhierarchical task network processing and the third levelcreates sequences by classical planning methods. While thesemethods have proven suitable for simplified environmentdomains, they fail in partially observable and stochasticenvironments.

The shortcomings of robot control systems could be solvedby probabilistic techniques, lately popular in robotics be-cause they are able to deal with uncertainty in observationand environment dynamics. Probabilistic decision theorydeals with reasoning of rational agents in the presence ofuncertainty. A very promising framework within generalprobabilistic decision theory are partially observable Markovdecision processes (POMDPs) [2], especially the class ofdiscrete, model based POMDPs.

A POMDP is an abstract environment model for reasoningunder uncertainty [3], [4]. A POMDP models a flow ofevents in discrete states and discrete time. A specific POMDPmodel is represented by the 8-tupel (S,A,M, T,R,O, γ, b0).S is a finite set of states, A is a discrete set of actionsand M is a discrete set of measurements. The transitionmodel T (s′, a, s) describes the probability of a transitionfrom state s to s′ when the agent has performed actiona. The observation model O(m, s) describes the probabilityof a measurement m when the intrinsic state is s. Thereward model R(s, a) defines the numeric reward given tothe agent when being in state s and executing action a. Theparameter γ controls the time discount factor for possiblefuture events. The initial belief state is marked by b0. AsPOMDPs handle partially observable environments, thereexists only an indirect representation of the intrinsic state ofthe world. In POMDPs, the belief state, a discrete probabilitydistribution over all states in a scenario model, forms this

representation. At each time step, the belief state is updatedby Bayesian forward-filtering.

A decision about which action is most favorable forthe agent when executed next, can be retrieved from apolicy function which contains information about the mostfavorable action for any possible belief distribution. Thepolicy incorporates balancing the probabilities of the courseof events into the future with the accumulated reward whichhas to be maximized.

Computing a policy is computationally challenging andcomputing exact, optimal policies is intractable [5]. Recentapproximate solutions as point based value iteration (PBVI)[6], discrete PERSEUS [7] or HSVI2 [8], however, are quitefast and yield good results for most mid-size scenarios.

While POMDP policy computation has made a lot ofprogress recently, application and usage is still in its infancy.POMDP decision making has already been applied to severaldifferent modalities in robotics, like autonomous navigation[9], dialog management [10] and grasping [11], however onlyfor low level controlling of one modality at a time.

What should be investigated now, is building a completerobot control architecture around high-level POMDP deci-sion making. Key topics which have to be tackled to solvethis challenge are (i) abstraction of multi-modal perceptionand fusion to be usable by POMDP decision making, (ii)the layout of an autonomous runtime system, and (iii) aflexible system to derive POMDP mission models, includingall modalities. In the following, our system is described,which deals with these challenges.

III. SYSTEM ARCHITECTURE

The layout of the main system components follows twomain axes: the rational agent cycle of perception – decisionmaking – acting on the one hand and knowledge processingon the other. Fig. 1 shows the agent cycle on the horizontalaxis and knowledge processing on the vertical axis.

Low level perceptive components process sensor read-ings and are the basis for situation assessment which isan important input to probabilistic decision making. Thoselow level components already perform complex algorithmicprocedures to present relevant environment properties in amore abstract way. In the system, all components deliverinformation about the uncertainty of their observations, toallow more specific assessment of risks by decision making.Components therefore deliver probability distributions anddepending on the underlying property either continuous,parametric or discrete, non-parametric. Thus, the input to thedecision system is a set P of parametric and non-parametricprobability distributions which present different environmentproperties:

percept state =p(c1) : nonparametric....p(cn) : continuous parametric

(1)

Perceptive components, that are actually used on our robotare speech recognition, human body activity recognition andself localization of the robot.

Speech recognition is realized by using an onboard mi-crophone and the Sphinx4 [12] speech recognition en-gine, modified to deliver discrete probability distributionsover a set of human utterances: percept human speech =p(utter1), p(utter2), ..., p(uttern),

∑ni p(utteri) = 1, e.g.

p(”Hello robot”) = 0.2, p(”Hold position”) = 0.8.Human body activity recognition is realized by using a

3D-time-of-flight camera, supported by color cameras whichdeliver sensory input to the human body tracking methodVooDoo, employing a cylinder model of the human body pos-ture [13]. Classification methods, based on relevance criteriaand support vector machine selection of body posture overtime, label symbolic human activities with certain probabili-ties [14]. These probabilities do not sum up to one, as someactivities may occur simultaneously, e.g. sitting and waving.Thus, percept human activity = p(act1), p(act2), ..., p(actn),∑n

i p(acti) ≥ 0, e.g. p(”Waving”) = 0.7, p(”Standing”) =0.8.

Mobility and navigation is realized by using a mobileplatform with laser based self-localization on a known map.The platform can drive on preexisting topological graphsor plan dynamic graphs during runtime. Self-localization ofthe platform includes a Bayesian update and results in atrivariate normal distribution, indicating current position anduncertainty: ~µ = (x, y, θ), covariance Σ.

These sensor observations have to be processed intoa common representation and finally fused into a single,discrete situation description for decision making based onthe policy of a discrete POMDP. This task is provided bythe feature filter (see fig. 1). For components, where noBayesian update has been performed, it is done in the FeatureFilter, based on a prediction model. This model is derivedfrom the transition model of the POMDP. E.g. for speechrecognition, the perceptive information, including uncertaintycan be improved by including prediction:S is a set of abstract states in which the dialog can

be, which reflects human intention. U is a set of possibleutterances of the robot, and M is a set of possible humanutterances, the robot can detect. T (s′, u, s) represents theprediction model for each robot utterance, while O(m, s) isthe observation model mapping human utterances to states.

p′(s′) = α(∑m

P (m)O(m, s′)) (∑

s

T (s′, u, s) p(s)) (2)

With P (m) being the probability for a specific utterance asdelivered by the speech recognition module and p(s) beingthe dialog state distribution.

Other components, e.g. self localization deliver observa-tions already including modality-specific Bayesian updates,however need to be discretized. For self localization this canbe done e.g. by a grid or region based approach, computingthe cumulative distribution function (CDF) of the robot’sposition distribution for regions defined in the knowledgebase with a numerical algorithm [15]:

p(ri) =

xrp2∫xrp1

yrp2∫yrp1

N(x, y; ~µpos,Σpos) dx dy (3)

Fig. 1. The system architecture showing perception, decision and actuation from left to right and knowledge synthesis from top.

These processes finally lead to a set of discrete, Bayesianfiltered observation distributions, features, one for eachmodality, which represents the belief state of a discrete,factored POMDP:

feature state =p(c1,1) ... p(c1,n1)... ... ...p(cm,1) ... p(cm,nm

)(4)

As a Bayesian update has been performed on each modality,it is a valid belief representation, however including specificinformation about the uncertainty of the current sensor mea-surement which acts as a dynamic observation model. Thefactored feature state is finally expanded into a flat belief,matching the flat POMDP model:

belief = p(c1)× ...× p(cn) (5)

Depending on the POMDP model, some states may beredundant and merged into a single state to minimize thesize of the state space.

The belief state is used by the decision core to querya POMDP policy for the currently optimal decision toperform a symbolic action: a = policy(belief ). The policyis a piecewise linear and convex (PWLC) maximum of aset of |b|-dimensional linear functions as calculated by anapproximately optimal value iteration algorithm, e.g. PBVI[6] or a FRTDP/HSVI [8] variant. Each policy describes ap-proximately optimal decision making in a predefined missionscenario and can be calculated online from POMDP missionmodels or offline and stored in a policy knowledge database.Decision query is done as often as the fastest low levelperception component delivers new measurements, which isin our case self-localization and leads to 20Hz.

The reasoning system decides on the global task to becarried out, while a sequencer performs the actual subtasksas Flexible Programs that reach the associated goal. FlexiblePrograms translate the symbolic action commands, given

by the decision core, into actuator commands. The task isdescribed as a hierarchical network of basic actions whichis processed with a depth-first left-to-right search strategy.A detailed description of the task description called FlexiblePrograms can be found in [16]. The task set in the taskdatabase for the presented experiments comprises the tasksDriveToPos, GraspObject, SpeakText, MonitorHumanActivityand PlaceObject. By decoupling the atomic sensor andactuator controlling from abstract reasoning, it is possible toreduce the POMDP decision state space to computationallyreasonable dimensionality.

As a standard POMDP model expects a symbolic actionto terminate before executing a new one, a new FlexibleProgram is only started when the previous has terminated.The system may also run in a mode which allows to interruptFlexible Programs if the decision changes, however it has tobe guaranteed that the POMDP model from which the policyhas been calculated regards the resulting, different dynamicsin transition and observation model.

IV. MODEL COMPILATION

In addition to the runtime part of the system, an impor-tant aspect is knowledge processing to create and managePOMDP models, the runtime process is based on (see fig. 1upper part). Apart from some modality specific knowledgewhich is included in processes taking place in low levelcomponents, all three parts of the decision system, namely,Feature Filter, Decision Core and Flexible Programs accessbackground knowledge, making it an integral part of thecontrol architecture.

The most important and complex part is formed by thePOMDP scenario models, which are made up from theset of states S, measurements M and actions A as wellas a stochastic transition model T (s′, a, s), an observationmodel, describing average measurement uncertainty O(m, s)and a reward model for mission objectives and constraints

R(s, a). In general, those models, containing matrices, growquickly in size, e.g. |T | = |S|2 ∗ |A|, leading to ahuge number of probabilities even for smaller state spaces.Additionally, in these multi-modal scenarios, prevalent inthe presented system, superimposing effects from differentmodalities are common throughout the models. For example,if the robot was in a state InRegionX-HumanFacingRobot-HumanInitiatedDialog, then an action GotoRegionY wouldinclude likelihoods in the stochastic transition model forreaching the target location or getting stuck, but also likeli-hoods that the human-robot interaction is interrupted by therobot movement, e.g. pending on how far away the target islocated and if the robot turned away.

These insights, the huge number of probabilities andsuperimposing effects, lead to the necessity of having amethodology to compile expanded POMDP models, usablefor value iteration, from more fundamental, simple and flexi-ble building blocks. The approach used in the presented sys-tem is a 2-tiered compilation process which creates specificPOMDP mission scenario models from abstract, symbolicand reusable entities, representing environment characteristicin an ontology. Those entities are more suitable as scenariobuilding blocks by either human knowledge engineering orsymbolic machine learning processes.

The lower tier of the process compiles model matricesfrom compact, parameterized functions. Those functionsperform arithmetic operations on tables which are precursorsto the final model matrices in O, R or T . A table B hasdistinct entries for each modality, thus being a factored repre-sentation, yet for an expanded state space. The correspondingmodel D is created at the end of the process by forming theproduct of all modalities and scaling to 1.0 afterwards:

B : {1, ..., k} × {1, ...,m} × {1, ..., n} 7→ R (6)

D : {1, ..., k} × {1, ...,m} 7→ R, D(i, j) =n∏

l=1

B(i, j, l) (7)

With k, m being row and column number of thecorresponding model and n being the number of modalities.A function may modify a single entry, row, column or matrixof a table and is addressed accordingly: function(table, k,m, n) with wildcards where necessary. The application ofsuch functions shall be addressed by using an example ofthose actually used in our system. The mobile platformwith topological navigation on a graph is more likely notto reach the goal when driving long sections and overmany nodes. It is more likely to get stuck in the originor close to the goal than in between, however most of thetime it reaches a goal successfully. Applied to a POMDPtransition model T this means that for all actions includinga Goto command ag , transition probabilities betweenstates which represent different locations have to includethe aforementioned characteristics. It can be achieved bytwo functions, one realizes getting stuck in the origin:

b(i, j, loc) =

{p(stuck), if si = s′

j

0, otherwiseand one calculates

the likelihood to end up at the goal p(g) and less likely,depending on distance d, somewhere close before, withscaling φ():

b(i, j, loc) =

max (0, p(g)− φ(d)) , if s′

j on shortestpath to goal from si

0, otherwiseThus, these functions calculate probabilities for manytransitions in the model. Many other effects can be used,as in the system many arithmetic functions to performcalculations on tables exist, e.g. applying gaussian functionsor interpolations. While these functions are powerful toolsto compile models with thousands of probabilities, they lacksimplicity and clear semantic meaning as a representationof environment characteristics.

A knowledge base of clear building blocks to assem-ble specific scenario models, including different kinds ofstochastic human behavior, is therefore desirable. This isrealized by the second tier. Basically, these building blocksorganize first tier functions in an entity relationship structure,being composed of classes C and instances I . Classes areorganized in a knowledge base as a hierarchy, as they caninherit properties from their parent P . Properties of classescontain first tier functions and further parameters, whileinstances usually contain just parameters. A second tier pro-gram, from which the POMDP model is compiled, consistsof several program directives, each of which makes use ofa set of instances, derived indirectly from different types ofbase classes. A typical set of used base classes is {action,target, robot, operator, enviroment, further parameter}. Thespecific action instance creates one or a part of a POMDPaction, e.g. drive or speak and basic transitions, while targetassigns those transitions to a specific state set. Robot, e.g.”cautious”, and human operator, e.g. ”familiar”, describemore general scenario settings and contain correspondingreward and transition modifying functions. While a flexibleconditional application of instances is also possible on thistier, more detailled descriptions are beyond the scope of thispaper and the basic concept shall be explained with an actualexample, used in one of the experiments:

(”Interact0”,”LabRegC”,””,”UnknownOperator0”,””,””)

The instance of the class Interact0 loads typical human-robot interaction initiation transition probabilities and createscorresponding robot actions, while the location parameterrestricts transitions to states, where the location modality isLabRegC. The instance derived from operator modifies thebasic interaction probabilities to take into account that thehuman is a new interaction peer.

These semantically meaningful building blocks assemblefirst tier arithmetic functions, which in turn calculate certain,related parts of the transition, reward and observation model.Addressed entries in the model matrices may not be situatedclose to each other in the matrix, but are related conceptually.Therefore, the presented concept makes it possible to operateon these huge, unstructured model matrices in a highlystructured and organized way.

Fig. 2. Top: Robot and interacting human during different stages the autonomous cup-serving experiment, in this case steered by POMDP decisionmaking. A live-visualization of the belief state and a fraction of the policy as delivered over wireless network can be seen projected in the background.Bottom: The Flexible Program for evaluation expanded after one cup-serving task.

V. EXPERIMENTS AND RESULTS

As the system is intended for autonomous control of amulti-modal domestic robot companion, evaluation of thesystem has been performed in such a context. First, a typicalscenario was picked for evaluation with common aspectsof domestic service robot scenarios: verbal and non-verbalhuman-robot interaction, navigation and object delivery. Al-though being still somewhat limited in size, the scenariowas designed as being mostly natural: the robot waits forpotentially interested persons, engages in interaction whilekeeping awareness about body postures and offers its servicesto persons who are considered interested. When requested,it fetches a cup from a fixed position, then addresses thehuman again to request to which of two destinations itshall it shall be delivered, expecting an instruction withboth gesture and speech. If the robot is unsure about theperception, it may request again, if it is sure enough, itdelivers the cup and returns to a position, waiting for new”clients”. Only onboard sensors are used, speech recognitionis using an onboard microphone, activity recognition theonboard time-of-flight camera and navigation the onboardlaser scanner. Thus, measurement uncertainties are high andPOMDP reasoning makes perfect sense, modeling humanbehavior and navigation glitches as stochastic.

Grasping was handled in the sequential programs in thiscase, thus the modality domains utilized were navigation,dialog and human body activity. For navigation, 8 nodeswere chosen as relevant states from the navigation graph,for dialog 5 states were used, as were for body activity. Asin most nodes, not all interaction stages are performed, thereare many redundant product states, which were combined to

speed up policy computation. In the end, this leads to 28unique states and 11 unique actions in the POMDP.

The following second tier rules with corresponding in-stances were used to assemble the scenario (”Lab*” arenames of navigation nodes/states), {action, target, robot,operator, enviroment, further parameter}:("Idle0","","","","","")("Explore0","","RobotCautious0","","Environment0","","")("Interact0","LabRegC","","UnknownOperator0","","")("Interact1","LabRegD","","KnownOperator0","","")("ConditionImportant","LabRegC/Say Bring cup/FaceRobot")("PickUp0","LabRegD","","","","Cup0")("ConditionImportant","LabRegD/Say To RegE/PointFront")("PutDown0","LabRegE","","","","Cup0")("ConditionImportant","LaborRegD/Say To RegF/PointBack")("PutDown0","LabRegF","","","","Cup0")

The instances, derived from the Condition class are themain tasks in the scenarios, setting positive rewards. Whilethere is still quite some knowledge about probabilities,especially concerning interaction, in the knowledge base andjust used by these rules, it is quite easy to modify importantcharacteristics of the scenario, which would not be possibleon the POMDP model directly. It has in this case a rewardmodel with 308 entries, an observation model with 784probabilities and a transition model with 8624 probabilities.

For evaluation purposes, the system was to be evaluatedagainst a purely baseline state-machine approach. Whenusing Flexible Programs standalone in enhanced mode, theyare capable of dynamic branching and recursive expansion,thus being able to process a full state automata. The lowlevel observations were in this case processed with a fixedthreshold and assuming the observation with the highestprobability being the correct one. The Flexible Program wasdesigned to perform as good as possible when keeping to thePOMDP rewards. Fig. 2 shows the evaluation Flexible Pro-

gram expanded after one straight cup-serving task executionto give an impression about its complexity.

Both approaches were evaluated on the physical robot(see fig. 2), controlling it completely autonomously andrelying solely on onboard sensors. Each time, the robotwas given exactly half an hour to perform waiter dutieswhile the interacting human behaved stochastically, but onaverage according to the assumed behavior model. A humansupervisor recored true states and actual behavior of therobot.

The following tables show the most important aspectsof the scenario where the action requested by the useris compared to the actual robot behavior. Thus entries onthe main diagonal indicate desired behavior, while the firstcolumn indicates reassurance questions and actions of therobot.

a) State machine:PPPPPPReq.

Perf. Reassure Fetch cup Put to A Put to B

Other 0 1 0 1Fetch cup 3 4 0 0Put to A 5 0 2 0Put to B 7 0 0 2

b) POMDP:PPPPPPReq.

Perf. Reassure Fetch cup Put to A Put to B

Other 0 0 0 1Fetch cup 1 9 0 0Put to A 3 0 5 1Put to B 1 0 0 2

The results of two randomly picked runs of each controlsystem make it obvious that while the state machine doesnot make many big mistakes, it is conservative and annoysthe human with a lot of reassurance questions, including theinitial interaction not shown in the tables.

When controlled by the POMDP approach, the robot candeliver more cups (7 correct deliveries, compared to 4)because it performs quicker as it has better risk-assessmentof when multi-modal observations with uncertainty are goodenough, the policy showing that the risks are reasonableto take. This is an important aspect of POMDP decisionmaking and the results indicate that POMDP decision makingfor a domestic service robot is promising, even if it cannotavoid all errors, resulting from imperfect real-world robotperception.

VI. CONCLUSION AND OUTLOOK

This paper has presented a probabilistic control system forautonomous, domestic service robots which enables robustbehavior under uncertainty as present in realistic environ-ments. It consists of a runtime system, filtering observa-tions while preserving uncertainty to make decisions basedon POMDPs and a system to compile complex, numericPOMDP decision models from semantically meaningfulbuilding blocks. The system has been evaluated on a physicalrobot, acting completely autonomously in a realistic missionscenario against a baseline state-automata controlling the

same robot in the same context. While not able to com-pensate for all errors introduced by imperfect sensing, thePOMDP based method has proven to be more efficient thanthe baseline method because of its inherent risk assessmentcapabilities in the face of imperfect knowledge about theenvironment. Creating a POMDP model for the examinedscenario is only feasible with a compilation process aspresented.

Further work shall build a larger knowledge base, makingit possible to evaluate even more complex POMDP scenar-ios and introduce scenario model learning methods, usingthe existing knowledge structures as guidance. Additionally,POMDP based object manipulation beyond simple graspingas in [11] should be investigated.

VII. ACKNOWLEDGEMENTS

This work was partially conducted within the EU ProjectCOGNIRON under Contract FP6-IST-FET-002020.

REFERENCES

[1] E. Gat, “On three-layer architectures,” Artificial Intelligence andMobile Robots. MIT/AAAI Press, 1997.

[2] K. J. Astrom, “Optimal control of markov decision processes withincomplete state estimation,” Journal of Mathematical Analysis andApplications, vol. 10, 1965.

[3] E. J. Sondik, “The optimal control of partially observable markovdecision processes,” Ph.D. dissertation, Stanford university, 1971.

[4] A. R. Cassandra, L. P. Kaelbling, and M. L. Littman, “Acting optimallyin partially observable stochastic domains,” in In Proceedings of theTwelfth National Conference on Artificial Intelligence, 1994.

[5] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning andacting in partially observable stochastic domains,” Artif. Intell., vol.101, no. 1-2, pp. 99–134, 1998.

[6] J. Pineau, G. Gordon, and S. Thrun, “Point-based value iteration: Ananytime algorithm for POMDPs,” in International Joint Conferenceon Artificial Intelligence (IJCAI), August 2003, pp. 1025 – 1032.

[7] M. Spaan and N. Vlassis, “Perseus: Randomized point-based valueiteration for pomdps,” Journal of Artificial Intelligence Research,vol. 24, pp. 195–220, 2005.

[8] T. Smith and R. Simmons, “Focused real-time dynamic programmingfor mdps: Squeezing more out of a heuristic,” in Nat. Conf. on ArtificialIntelligence (AAAI), 2006.

[9] A. Foka and P. Trahanias, “Real-time hierarchical pomdps for au-tonomous robot navigation,” Robot. Auton. Syst., vol. 55, no. 7, pp.561–571, 2007.

[10] J. D. Williams, P. Poupart, and S. Young, “Using factored partiallyobservable markov decision processes with continuous observationsfor dialogue management,” Cambridge University Engineering Depart-ment Technical Report: CUED/F-INFENG/TR.520, Tech. Rep., March2005.

[11] K. Hsiao, L. P. Kaelbling, and T. Lozano-Perez, “Grasping pomdps,”in ICRA, 2007, pp. 4685–4692.

[12] W. Walker, P. Lamere, P. Kwok, B. Raj, R. Singh, E. Gouvea, P. Wolf,and J. Woelfel, “Sphinx-4: A flexible open source framework forspeech recognition,” SUN Microsystems, Tech. Rep., 2004.

[13] S. Knoop, S. Vacek, and R. Dillmann, “Sensor fusion for 3d humanbody tracking with an articulated 3d body model,” in Proceedings ofthe 2006 IEEE International Conference on Robotics and Automation(ICRA), Orlando, Florida, 2006.

[14] M. Losch, S. Schmidt-Rohr, S. Knoop, S. Vacek, and R. Dillmann,“Feature set selection and optimal classifier for human activity recog-nition,” in ROMAN, Korea, Aug 2007.

[15] A. Genz, “Numerical computation of rectangular bivariate and trivari-ate normal and t probabilities,” Statistics and Computing, vol. 14, pp.151–160, 2004.

[16] S. Knoop, S. R. Schmidt-Rohr, and R. Dillmann, “A FlexibleTask Knowledge Representation for Service Robots,” in The 9thInternational Conference on Intelligent Autonomous Systems (IAS-9),Kashiwa New Campus, The University of Tokyo, Tokyo, Japan, Marz2006.