Cogent confabulation-based hierarchical behavior planner

Cogent Confabulation-based HierarchicalBehavior Planner for Task Performance

Se-Hyoung Cho2, Seung-Hwan Baek2, Deok-Hwa Kim1, Yong-Ho Yoo1,Sanghyun Cho2, and Jong-Hwan Kim1

1School of Electrical Engineering2Robotics Program, KAIST

291 Daehak-ro, Yuseong-gu, Daejeon 305-701, Republic of Korea{shcho, shbaek, dhkim, yhyoo, scho, johkim}@rit.kaist.ac.kr

Abstract—This paper proposes a novel hierarchical behaviorplanner with a multi-layered confabulation based behaviorselection structure for robots to perform tasks. The proposedplanner integrates a STRIPS based behavior selection approachand cogent confabulation approach. The STRIPS based behav-ior selection approach is a goal tree search that induces goal-oriented sequences of behaviors, while the cogent confabulationapproach is based on conditional probabilities between inputsymbols and target behaviors, aims to model human thinkingmechanism. Our planner is applied with a set of behaviorsdefined in a multi-layered structure to show that it can plana hierarchical sequences of behaviors to perform given tasks.The effectiveness and applicability of the proposed scheme isdemonstrated through the experiments with the robot Mybot,developed in the Robot Intelligence Technology Lab. at KAIST.

I. INTRODUCTION

Numerous research has been performed regarding behav-ior selection and planning for robots with task intelligence.For this purpose, behaviors and resultant effects were associ-ated with situational context as object-action-effect situation[2]. A sequence of behaviors was planned by STRIPS [3],[4] and tree search algorithm [5]. The concept that humanaction is biased by learning frequently occurring behaviors,was used for behavior planning [6], [7]. Confabulation as aheuristic metric was used in tree search algorithm to describehuman intelligence [8], [9]. A sequence of behaviors wasplanned [11], [12], [13], [14] based on that behaviors canbe intuitively understood in a hierarchical form [10]. On theother hand, intelligence operating architecture was presentedto realize the mechanism of human thought [10].

This paper proposes a cogency based behavior planningarchitecture for robots to perform given tasks. In overallscheme, this cogency based behavior planner utilizes hierar-chical behavior semantic memory from which task plannertakes out context-dependent situated objects at each level ofbehavior planning. Having initial and goal states at the firstabstract level, the modified A* search is devised to searchfor possible solutions to solve problems. The concept of

cogency ranking is introduced to calculate the most possiblebehaviors for given pre-conditions.

Also, reinforcement learning is employed for increasingthe probability of once-selected solution to be selectedagain by reducing the search space. In the experiment, theproposed planner’s re-planning capability is analyzed byallowing user intervention.

This paper is organized as follows. Section II intro-duces preliminaries for the proposed architecture. SectionIII presents the development of confabulation based hierar-chical behavior planner and Section IV for the experimentsperformed with the proposed task planner and discussion.Finally, the conclusion remarks follow in Section V.

II. PRELIMINARIES

A. Situated Affordance

Situation Effect

Object Action

(a) Template

Is emptyBecome Full of water

Cup

Pour with water bottle

(b) Example

Figure 1: Situated Affordance concept. (a) Template. (b)Example of pouring water in a cup.

Fig. 1 presents a template (left) and an example (right) ofSituated Affordance concept. Traditional concept of Affor-dance defines an action of associating behaviors with objects.For example, when you have a chair and a ball, it is morelikely that you sit on the chair and bounce the ball. Theaffordance consists of objects, behaviors, and effect. Eventhough this previous version of Affordance concept is used

2016 IEEE International Conference on Systems, Man, and Cybernetics • SMC 2016 | October 9-12, 2016 • Budapest, Hungary

978-1-5090-1897-0/16/$31.00 ©2016 IEEE SMC_2016 003670

Authorized licensed use limited to: Korea Advanced Inst of Science & Tech - KAIST. Downloaded on July 15,2021 at 10:43:09 UTC from IEEE Xplore. Restrictions apply.

widely in artificial intelligence research, it should be moreunderstood in the context or environment where the objectsare present; the behavior ‘eating’ is not appropriate for breaddropped on the floor. The existent Affordance concept isextended to a new one where environmental context relevantto objects is considered and named it situated affordance.With situated affordance, a robot will move an empty cupon the table in the living room to the kitchen, while it fillsbeverage in an empty cup if it is requested by a human.

B. Hierarchy of Behaviors

Even though the symbolic representation in task planningis understood intuitively and straightforwardly, for robotsto actually perform given tasks the symbolic representationshould be segmented into primitive behaviors that can beexecuted by robots. It becomes a complicated problem, if wetry to solve the problems only at the primitive behavior level.Therefore, it requires to consider simultaneously the abstractlevels that have less number of symbolic behaviors to changethe problem to the simpler and easier one to plan. Thisrequirement leads to considering the hierarchical structureof behaviors.

C. Automated Planner

The confabulation based task planning algorithm employsSTRIPS for task planning module. A STRIPS instanceconsists of initial state, goal states, a set of actions with pre-and post-conditions. Condition is defined by free variablesthat are switched by actual objects or situation attributes(situated objects) in real environment. Combination of thesituated objects are considered as states that describe the en-vironment where robots are present. The states describing theenvironment change by actions that satisfy pre-condition. Asa searching algorithm, A* is used to find the action sequencesatisfying the pre-condition. In this sense, behavior selectionalgorithm becomes goal-oriented by adopting STRIPS-likeproblem solvers. If b actions are combined to make asequence of length n to solve a problem, its complexitybecomes bn. Therefore, if only primitive behaviors are usedto solve the problem, a number of solutions become availableand the solution path becomes lengthy. This leads to theneed for introducing a hierarchical behavior structure, suchas Hierarchical Task Network (HTN) to downsize the searchspace.

D. Confabulation Theory

Bayesian and cogent confabulation are formal logicsfor induction. Both methods put reliability on conclusions.When premises a, b, c, d and a conclusion set E is given ininduction, the reliability of the conclusion for Bayesian isdefined as posterior probability p(e|abcd). The reliability ofthe conclusion of cogent confabulation is cogency p(abcd|e).The cogency can be calculated as follows:

p(abcd|e) ≈ k · [p(a|e) · p(b|e) · p(c|e) · p(d|e)] (1)

Figure 2: Overall behavior planning architecture

The selection of the maximum one is the confabulationprocess. It is applied to the proposed behavior planningalgorithm.

III. LAYERED CONFABULATION BASED BEHAVIORPLANNING

Proposed layered confabulation architecture is initiatedfrom confabulation for abstract behavior to propagatethrough the hierarchical network down to more complicatedbehavior layers and finally to the primitive behavior layer. Asshown in Fig. 2, proposed architecture consists of constituentmodules. Perception module obtains information about agentand environment using internal and external sensors. Withsemantic memory as reference, task recognition moduledefines current situation as a set of situated objects composedof situation related with the objects. Memory module iscomposed of situated object semantic memory, hierarchicalbehavior semantic memory, and confabulation memory. Con-fabulation memory stores confabulation probability betweensituated objects and behaviors. Behavior planning moduleplans appropriate behavior order using obtained informationand confabulation process. Lastly, actuator module executesthe behavior.

A. Situated Object Semantic Memory

Since situated affordance is expressed as object, situation,behavior, and effect pairs, it is appropriate to construct thesemantic memory based on ontology that expresses class,instance, property, and relation. Behavior selection usesenvironmental status as input and outputs behaviors, so itis appropriate to classify the four elements that composessituated affordance into environmental status part that in-cludes object and situation, and behavioral part that includesbehavior and effect.

Fig. 3 shows several situated objects present in environ-mental status using ontology based semantic memory. Situ-ation type is the property of object type and is able to defineinstances as the type is regarded as a class at the same time.


SMC_2016 003671


Figure 3: Situated object semantic memory

Figure 4: Example of hierarchical behavior pools in semanticmemory

The environmental status is expressed with detailed instancesof the situated object. Thus, the actual instance needs to besubstituted into the concept expressed abstractly with object-situation pair, which is the quantification process.

B. Hierarchical Behavior Semantic Memory

Fig. 4 depicts behavior pools for hierarchical behaviorplanning. Planning task is on the top level, several abstractbehavior levels are continued, and lastly, primitive behaviorlevel is placed. The behaviors are represented by theirsymbols and parameters as follows:

The nth level abstract behavior: Bn(p1, p2)Primitive behavior: bp(p1, p2)

where p1 and p2 are parameters of the behavior and freevariable to be quantified, respectively.

C. Confabulation Memory and Process

Behavior semantic memory expresses behaviors, parame-ters, preconditions, and effects. Similar to situated object inthe semantic memory, behavior is quantified and takes actualobjects and situations as parameters.

Confabulation process uses assumed fact symbols, con-clusion symbol, and the probability values between thesymbols. Fig. 5 depicts an example. The example showseight quantified situated-objects as assumed fact symbols

Figure 5: Confabulation memory and process. (left: factsymbols, right: conclusion symbols)

and one quantified behavior as a conclusion symbol. Eachassumed fact symbol has several instances and is ableto express one of them. The conclusion symbol also hasseveral instances and each of them can have their owncogency. There is a probability value between each assumedfact and conclusion. According to confabulation theory, bymultiplying probability values connected to a conclusion,an approximate cogency value for the conclusion can beobtained. In other words, in an expressed environment status,the probability of each behavior can be evaluated and usedas a considerable element in behavior planning.

D. Cogency based behavior planner

The proposed behavior planner merges a goal directedbehavior planning method and a memory based behaviorselection method. STRIPS style behavior planner is anexample of the goal directed method. In order to handlethe situational change of an object, the object is fixed witha constant and a symbol is assumed as a variable so thatthe situated object symbol and instance can be expressed asfollows:

< oi, Sij >: situated object symbol

< oi, sijk >: situated object instance,

whereoi: ith objectSij : jth situation type related to the object instance oisijk : kth instance of Si

j .For example, when oi = cup, Si

j = BEVERAGE = milk,coke, the situated object symbol <cup1, BEVERAGE> canexpress situated object instances as <cup1, milk>, <cup1,coke>.

The confabulation cogency value is calculated as follows:

cogency(bm) =∏I

i=1

∏Ji

j=1 p(< oi, Sij > |bm),


SMC_2016 003672


wherebm: mth behaviorcogency(bm): cogency value of mth behaviorI: number of object instances over all of the object typesJi: number of situation types of object instance oiSij : substitution of sijk with proper k, according to the current

environmental state.When a behavior is successfully planned and executed,

the reinforcement learning on the behavior is initiated toincrease the chance of being selected at a later time. Thereinforcement learning method is as follows:

Reinforce: p′(<oi, Sij > |bm)← p(< oi, S

ij > |bm) + λ

Normalization: p(<oi, Sij > |bm)← p′(<oi,S

ij>|bm)∑Kij

n=1 p′(<oi,Sij>|bm)

(2)

where λ is learning amount.

STRIPS style behavior planner uses a tree search algo-rithm like A* to find an appropriate behavior order. Here, analgorithm that searches tree using not only cost and heuristicbut also cogency score is proposed as follows:

f = g + h+ c, (3)

whereg: behavior cost from initial state to current stateh: estimated cost to goal state from current statec: cogency score of current state, considering the cogencyof previous state given the behavior that has made transitionto current state.

Since the cogency value is approximated with the mul-tiplication of large number of confabulation probabilities,expression range is very wide. Thus, instead of using thevalues directly, the following cogency scores obtainable fromranking system are used:

cm =Ranking(cogency(bm))

M, (4)

wherebm: mth behaviorM : number of quantified behaviorscm: cogency ranking score of bm normalized to [0,1].

Algorithm 1 is the pseudo-code for the proposed cogency-based behavior planning algorithm.

IV. EXPERIMENTS

A. Experiment Setting

With demonstrations of two scenarios, beverage serving,and cereal and milk serving with the real humanoid robot, theproposed architecture showed its effectiveness. To show thefunction of primitive behavior planning, a simple example

Algorithm 1 Confabulation-based Behavior PlannerProcedure: Confabulation-based Behavior Planner(start,goal)

1: Closed_Set = {}2: Open_Set = {}3: previous_state[start] = empty4: g_score[start] = 05: c_score[start] = 06: f_score[start] = g_score[start] +

heuristic_cost_estimate(start, goal)7: while OpenSet is not empty do8: current_state = the node in Open_Set having

the lowest f_score[] value9: if current_state = goal then

10: return reconstruct_behavior_sequence(previous_state,current_state)

11: end if12: Open_Set.Remove(current_state)13: Closed_Set.Add(current_state)14: for each behavior and next_state from current_state

do15: Open_Set.Add(next_state)16: b[next_state] = behavior17: previous_state[next_state] = current_state18: g_score[next_state]=g_score[current_state] +

behavior_cost_between(current_state, next_state)19: h_score[next_state] =

heuristic_cost_estimate(next_state, goal)20: c_score[next_state] = confabulation_cogency

_ranking_score(current_state, behavior)21: f_score[next_state]=g_score[next_state] +

h_score[next_state] + c_score[next_state]22: end for23: end while24: return failureend procedureProcedure: reconstruct_behavior_sequence(previous_state,current_state)

totoal_sequence = [current]while current in previous_state.Keys do

current = previous_state[current]total_sequence.append(current)

end whilereturn total_sequence

end procedure

environment was set with eight locations on a table. Asudden problem change by a user intervention was assumed,and a new solution conducted by replanning after the userintervention was shown (Fig. 6).

Table I shows the hierarchical plan of the given task. Eachraw starts with Behavior ID number followed by the name


SMC_2016 003673


(a) Original plan

Table

C2 C1 B2 B1

Mybot

1 2 3 4

5 6 7 8

1: Move 2: Pour

Table

C1

C2 B2 B1

Mybot

1 2 3 4

5 6 7 8

1: move 2: pour

B1: bottle1B2: bottle2

C1: cup1C2: cup2


C1: cup1C2: cup2

Table

C1 C2

B2 B1

Mybot

1 2 3 4

5 6 7 8

2: pour


C1: cup1C2: cup2

Human intervention

Original plan

Changed plan

(b) Altered after user intervention

(c) Initial state of beverage serving (d) Goal state of beverage serving

Figure 6: Plans for beverage serving task. (a) Original planon cup2 before user intervention. (b) Altered plan on cup2after user intervention.

of the behavior and parameters. Hierarchical level of eachbehavior was denoted at the head of the behavior name. B1,B2, and B3 for abstract level behavior, manipulation levelbehavior, and primitive behavior, respectively.

User intervention was assumed during the execution of theplan. The user moved Cup2 to Location8, and the locationof Cup2 in the goal state was also changed to Location8by the user. Replanning was done and remaining plan waschanged to use right hand rather than left hand consideringthe reachable area of each hand.

Score calculation for the states of the cogency-basedbehavior planner and its parameter settings are as follows:

f = g + h+ (cm − 1)× k ×w,w = max(1− n

N, 0), (5)

whereg: one per each behaviorh: number of situated object symbols not matching with thegoal statek = 3: gain constant of cogency ranking scoreN : anticipated solution length of search treen: depth of the current node in search tree

B. Experimental result

This section presents the experimental results for beverageserving task. Primitive behaviors consider reachable area ofeach hand in the planning (Figs. 6 (a), (b)).

Table I enumerates definitions of planning for beverageserving task. As indicated, the goal state had been changed inthe middle of task by user. Despite of it, Mybot, successfullyperformed beverage serving (Fig. 6 (d)). Referring to theinitial state and original goal state in Table I, originallyplanned solution is shown in Fig. 7. Each raw starts withBehavior ID number followed by the name of the behavior

Task name Beverage serving taskInitial state Bottle1 is at Location4 and containing Beverage1

Bottle2 is at Location3 and containing Beverage2Cup1 is at Location2 and emptyCup2 is at Location1 and empty

Goal State Bottles are at their original locationCup1 is at Location7 and containing Beverage1Cup2 is at Location5 and containing Beverage2

Changed Goal State Bottles are at their original locationCup1 is at Location7 and containing Beverage1Cup2 is at Location8 and containing Beverage2

Abstract level behav- Move Object to Locationiors Pour source Object to target ObjectManipulation behav- Pick Object with Handiors Pour to target Object with object in Hand

Place the object in Hand to destination LocationPrimitive behaviors Reach Hand to Location

Reach Hand to ObjectGrasp with HandLift with HandPut down with HandTilt with HandStand upright with Hand

Table I: Definitions of the planning problem for beverageserving task

and parameters. Hierarchical level of each behavior wasdenoted at the head of the behavior name. B1, B2, andB3 for abstract level behavior, manipulation level behavior,and primitive behavior, respectively. In the original plan, theCup2 was placed from Location2 to Location5 and supposedto be filled with beverage in Bottle2. However, Cup2 wasmoved to Location8 by user intervention. In Fig. 7 and 8the last behavior in the abstract level is the ID number39. Even though the high level behaviors are the same,the lower level behaviors are totally changed. <Video link:http://rit.kaist.ac.kr/home/Confabulation_based_Planner>

V. CONCLUSION

In this paper, the layered confabulation architecture forhierarchical behavior planning was proposed as a behaviorselection method of an intelligent agent. As a method to con-sider working environment status, the situated object conceptwas introduced. Also, semantic memory, which expresses thesituated objects as ontology in order to express environmen-tal status and behaviors in memory, and the quantificationmethod for semantic memory was presented. Confabulationmemory and process could learn from past experiences andinfer the probability and suitability of each behavior ata given situation. Then, STIRPS based behavior planningmethod, which includes goal-oriented behavior planningfeatures to achieve operation goals, was merged with thememory based inference method. The proposed method wasapplied to the task intelligence for two arms operation ofthe humanoid robot and three confabulation based behaviorplanning layers were constructed. With demonstrations oftwo scenarios, beverage serving, and cereal and milk servingwith the real humanoid robot, the proposed architectureshowed its effectiveness.


SMC_2016 003674


22: B1_MOVE, O_CUP1, S_OBJECT_LOCATION72: B2_PICK, O_HAND_LEFT, O_CUP1

2: B3_REACH_HAND_OBJECT, O_HAND_LEFT, O_CUP128: B3_GRASP, O_HAND_LEFT32: B3_LIFT, O_HAND_LEFT

22: B2_PLACE, O_HAND_LEFT, S_OBJECT_LOCATION718: B3_REACH_HAND_LOCATION, O_HAND_LEFT, S_OBJECT_LOCATION730: B3_PUT_DOWN, O_HAND_LEFT32: B3_LIFT, O_HAND_LEFT

34: B1_POUR, O_BOTTLE1, O_CUP14: B2_PICK, O_HAND_RIGHT, O_BOTTLE1

8: B3_REACH_HAND_OBJECT, O_HAND_RIGHT, O_BOTTLE129: B3_GRASP, O_HAND_RIGHT33: B3_LIFT, O_HAND_RIGHT

14: B2_POUR, O_HAND_RIGHT, O_CUP110: B3_REACH_HAND_OBJECT, O_HAND_RIGHT, O_CUP135: B3_TILT, O_HAND_RIGHT37: B3_STAND_UPRIGHT, O_HAND_RIGHT33: B3_LIFT, O_HAND_RIGHT

27: B2_PLACE, O_HAND_RIGHT, S_OBJECT_LOCATION423: B3_REACH_HAND_LOCATION, O_HAND_RIGHT, S_OBJECT_LOCATION431: B3_PUT_DOWN, O_HAND_RIGHT33: B3_LIFT, O_HAND_RIGHT




39: B1_POUR, O_BOTTLE2, O_CUP21: B2_PICK, O_HAND_LEFT, O_BOTTLE2

5: B3_REACH_HAND_OBJECT, O_HAND_LEFT, O_BOTTLE228: B3_GRASP, O_HAND_LEFT32: B3_LIFT, O_HAND_LEFT

11: B2_POUR, O_HAND_LEFT, O_CUP27: B3_REACH_HAND_OBJECT, O_HAND_LEFT, O_CUP234: B3_TILT, O_HAND_LEFT36: B3_STAND_UPRIGHT, O_HAND_LEFT32: B3_LIFT, O_HAND_LEFT

18: B2_PLACE, O_HAND_LEFT S_OBJECT_LOCATION314: B3_REACH_HAND_LOCATION, O_HAND_LEFT, S_OBJECT_LOCATION330: B3_PUT_DOWN, O_HAND_LEFT32: B3_LIFT, O_HAND_LEFT








User moved CUP2 from LOCATION5 TO LOCATION8 at this momentRemaining task was replanned and altered to use right hand





Figure 7: A solution for the original plan for beverageserving scenario.

ACKNOWLEDGMENT

This work was supported by the Technology InnovationProgram, 10045252, Development of robot task intelligencetechnology, supported by the Ministry of Trade, Industry,and Energy (MOTIE), Korea.

REFERENCES

[1] J.-H. Kim, S.-H. Choi, I.-W. Park, and S. A. Zaheer, “Intelligencetechnology for robots that think [application notes],” ComputationalIntelligence Magazine, IEEE, vol. 8, no. 3, pp. 70–84, 2013.

[2] M. Kammer, T. Schack, M. Tscherepanow, and Y. Nagai, “From affor-dances to situated affordances in robotics-why context is important,” inFrontiers in Computational Neuroscience (Conference Abstract: IEEEICDL-EPIROB 2011), vol. 5, no. 30, 2011.

[3] R. E. Fikes and N. J. Nilsson, “Strips: A new approach to the appli-cation of theorem proving to problem solving,” Artificial intelligence,vol. 2, no. 3-4, pp. 189–208, 1971.

[4] T. Bylander, “The computational complexity of propositional stripsplanning,” Artificial Intelligence, vol. 69, no. 1, pp. 165–204, 1994.

[5] J. Hoffmann, “Ff: The fast-forward planning system,” AI magazine,vol. 22, no. 3, p. 57, 2001.

[6] A. Patt and R. Zeckhauser, “Action bias and environmental decisions,”Journal of Risk and Uncertainty, vol. 21, no. 1, pp. 45–72, 2000.

[7] R. Bogacz, S. M. McClure, J. Li, J. D. Cohen, and P. R. Montague,“Short-term memory traces for action bias in human reinforcementlearning,” Brain research, vol. 1153, pp. 111–121, 2007.











39: B1_POUR, O_BOTTLE2, O_CUP21: B2_PICK, O_HAND_LEFT, O_BOTTLE2

5: B3_REACH_HAND_OBJECT, O_HAND_LEFT, O_BOTTLE228: B3_GRASP, O_HAND_LEFT32: B3_LIFT, O_HAND_LEFT

11: B2_POUR, O_HAND_LEFT, O_CUP27: B3_REACH_HAND_OBJECT, O_HAND_LEFT, O_CUP234: B3_TILT, O_HAND_LEFT36: B3_STAND_UPRIGHT, O_HAND_LEFT32: B3_LIFT, O_HAND_LEFT

18: B2_PLACE, O_HAND_LEFT S_OBJECT_LOCATION314: B3_REACH_HAND_LOCATION, O_HAND_LEFT, S_OBJECT_LOCATION330: B3_PUT_DOWN, O_HAND_LEFT32: B3_LIFT, O_HAND_LEFT








User moved CUP2 from LOCATION5 TO LOCATION8 at this momentRemaining task was replanned and altered to use right hand





Figure 8: A solution for the altered plan for beverage servingscenario.

[8] J.-H. Kim, S.-H. Cho, Y.-H. Kim, and I.-W. Park, “Two-layered con-fabulation architecture for an artificial creature’s behavior selection,”Systems, Man, and Cybernetics, Part C: Applications and Reviews,IEEE Transactions on, vol. 38, no. 6, pp. 834–840, 2008.

[9] S.-H. Cho, Y.-H. Kim, I.-W. Park, and J.-H. Kim, “Behavior selectionand memory-based learning for artificial creature using two-layeredconfabulation,” in Robot and Human interactive Communication,2007. RO-MAN 2007. The 16th IEEE International Symposium on.IEEE, 2007, pp. 992–997.

[10] R. C. Arkin, Behavior-based robotics. MIT press, 1998.[11] D. Wilkins and A. I. Center, “Sipe-2: System for interactive planning

and execution,” 2000.[12] K. Erol, J. Hendler, and D. S. Nau, “Complexity results for htn

planning,” Annals of Mathematics and Artificial Intelligence, vol. 18,no. 1, pp. 69–93, 1996.

[13] R. Alford, P. Bercher, and D. W. Aha, “Tight bounds for htn planning.”in ICAPS, 2015, pp. 7–15.

[14] R. Alford, U. Kuter, and D. S. Nau, “Translating htns to pddl: A smallamount of domain knowledge can go a long way.” in IJCAI, 2009,pp. 1629–1634.


SMC_2016 003675


Documents

Cogent confabulation-based hierarchical behavior planner