Pat Langley Computational Learning Laboratory Center for the Study of Language and Information

Pat LangleyPat LangleyComputational Learning LaboratoryComputational Learning Laboratory

Center for the Study of Language and InformationCenter for the Study of Language and InformationStanford University, Stanford, CaliforniaStanford University, Stanford, California

http://cll.stanford.edu/http://cll.stanford.edu/

Learning Hierarchical Task NetworksLearning Hierarchical Task Networksfrom Problem Solvingfrom Problem Solving

Thanks to Dongkyu Choi, Kirstin Cummings, Seth Rogers, and Daniel Shapiro for Thanks to Dongkyu Choi, Kirstin Cummings, Seth Rogers, and Daniel Shapiro for contributions to this research, which was funded by Grant HR0011-04-1-0008 from contributions to this research, which was funded by Grant HR0011-04-1-0008 from DARPA IPTO and by Grant IIS-0335353 from NSF. DARPA IPTO and by Grant IIS-0335353 from NSF.

Classical Approaches to PlanningClassical Approaches to Planning

Generated plan

ClassicalPlanning

Operators Problem

??

Planning with Hierarchical Task NetworksPlanning with Hierarchical Task Networks

Generated plan

HTNPlanning

Problem

??

Task Network

Classical and HTN PlanningClassical and HTN Planning

Challenge: Challenge: Can we unify classical and HTN planning in a Can we unify classical and HTN planning in a single framework?single framework?

Challenge: Challenge: Can we use learning to gain the advantage of Can we use learning to gain the advantage of HTNs while avoiding the cost of manual construction?HTNs while avoiding the cost of manual construction?

Hypothesis: Hypothesis: The responses to these two challenges are The responses to these two challenges are closely intertwined. closely intertwined.

Mixed Classical / HTN PlanningMixed Classical / HTN Planning

Generated plan

Problem

??

Task Network

Operators

HTNPlanning

impasse?

ClassicalPlanning

yesyes

nono

Learning HTNs from Classical PlanningLearning HTNs from Classical Planning

Generated plan

Problem

??

Task Network

Operators

HTNPlanning

impasse?

ClassicalPlanning

yesyes

nono

HTNLearning

Four Contributions of the ResearchFour Contributions of the Research

Representation:Representation: A specialized class of hierarchical task nets. A specialized class of hierarchical task nets. Execution:Execution: A reactive controller that utilizes these structures. A reactive controller that utilizes these structures. Planning:Planning: A method for interleaving HTN execution with A method for interleaving HTN execution with

problem solving when impasses are encountered.problem solving when impasses are encountered. Learning:Learning: A method for creating new HTN methods from A method for creating new HTN methods from

successful solutions to these impasses.successful solutions to these impasses.

A New Representational FormalismA New Representational Formalism

Concepts:Concepts: A set of conjunctive relational inference rules; A set of conjunctive relational inference rules; Primitive skills:Primitive skills: A set of durative STRIPS operators; A set of durative STRIPS operators; Nonprimitive skills:Nonprimitive skills: A set of HTN methods which specify: A set of HTN methods which specify:

a head that indicates a goal the method achieves;a head that indicates a goal the method achieves; a single (typically defined) precondition;a single (typically defined) precondition; one or more ordered subskills for achieving the goal. one or more ordered subskills for achieving the goal.

This special class of hierarchical task networks can be executed This special class of hierarchical task networks can be executed reactively but in a goal-directed manner (Nilsson, 1994).reactively but in a goal-directed manner (Nilsson, 1994).

A A teleoreactive logic programteleoreactive logic program consists of three components: consists of three components:

Some Defined Concepts (Axioms)Some Defined Concepts (Axioms)(clear (?block)(clear (?block) :percepts :percepts ((block ?block))((block ?block)) :negatives :negatives ((on ?other ?block)))((on ?other ?block)))(hand-empty ( )(hand-empty ( ) :percepts :percepts ((hand ?hand status ?status))((hand ?hand status ?status)) :tests :tests ((eq ?status 'empty)))((eq ?status 'empty)))(unstackable (?block ?from)(unstackable (?block ?from) :percepts :percepts ((block ?block) (block ?from))((block ?block) (block ?from)) :positives :positives ((on ?block ?from) (clear ?block) (hand-empty)))((on ?block ?from) (clear ?block) (hand-empty)))(pickupable (?block ?from)(pickupable (?block ?from) :percepts :percepts ((block ?block) (table ?from))((block ?block) (table ?from)) :positives :positives ((ontable ?block ?from) (clear ?block) (hand-empty)))((ontable ?block ?from) (clear ?block) (hand-empty)))(stackable (?block ?to)(stackable (?block ?to) :percepts :percepts ((block ?block) (block ?to))((block ?block) (block ?to)) :positives :positives ((clear ?to) (holding ?block)))((clear ?to) (holding ?block)))(putdownable (?block ?to)(putdownable (?block ?to) :percepts :percepts ((block ?block) (table ?to))((block ?block) (table ?to)) :positives :positives ((holding ?block)))((holding ?block)))

Some Primitive Skills (Operators)Some Primitive Skills (Operators)

(unstack (?block ?from)(unstack (?block ?from) :percepts :percepts ((block ?block ypos ?ypos) (block ?from))((block ?block ypos ?ypos) (block ?from)) :start :start (unstackable ?block ?from)(unstackable ?block ?from) :actions :actions ((((**grasp ?block) (grasp ?block) (**vertical-move ?block (+ ?ypos 10)))vertical-move ?block (+ ?ypos 10))) :effects :effects ((clear ?from) (holding ?block)))((clear ?from) (holding ?block)))

(pickup (?block ?from)(pickup (?block ?from) :percepts :percepts ((block ?block) (table ?from height ?height))((block ?block) (table ?from height ?height)) :start :start (pickupable ?block ?from)(pickupable ?block ?from) :effects :effects ((holding ?block)))((holding ?block)))

(stack (?block ?to)(stack (?block ?to) :percepts :percepts ((block ?block) (block ?to xpos ?xpos ypos ?ypos height ?height))((block ?block) (block ?to xpos ?xpos ypos ?ypos height ?height)) :start :start (stackable ?block ?to)(stackable ?block ?to) :effects :effects ((on ?block ?to) (hand-empty)))((on ?block ?to) (hand-empty)))

(putdown (?block ?to)(putdown (?block ?to) :percepts :percepts ((block ?block) (table ?to xpos ?xpos ypos ?ypos height ?height))((block ?block) (table ?to xpos ?xpos ypos ?ypos height ?height)) :start :start (putdownable ?block ?to)(putdownable ?block ?to) :effects :effects ((ontable ?block ?to) (hand-empty)))((ontable ?block ?to) (hand-empty)))

Some NonPrimitive Recursive SkillsSome NonPrimitive Recursive Skills

(clear (?C)(clear (?C) :percepts :percepts ((block ?D) (block ?C))((block ?D) (block ?C)) :start :start (unstackable ?D ?C)(unstackable ?D ?C) :skills :skills ((unstack ?D ?C)))((unstack ?D ?C)))

(clear (?B)(clear (?B) :percepts :percepts ((block ?C) (block ?B))((block ?C) (block ?B)) :start :start [(on ?C ?B) (hand-empty)][(on ?C ?B) (hand-empty)] :skills :skills ((unstackable ?C ?B) (unstack ?C ?B)))((unstackable ?C ?B) (unstack ?C ?B)))

(unstackable (?C ?B)(unstackable (?C ?B) :percepts :percepts ((block ?B) (block ?C))((block ?B) (block ?C)) :start :start [(on ?C ?B) (hand-empty)][(on ?C ?B) (hand-empty)] :skills :skills ((clear ?C) (hand-empty)))((clear ?C) (hand-empty)))

(hand-empty ( )(hand-empty ( ) :percepts :percepts ((block ?D) (table ?T1))((block ?D) (table ?T1)) :start :start (putdownable ?D ?T1)(putdownable ?D ?T1) :skills :skills ((putdown ?D ?T1)))((putdown ?D ?T1)))

[Expanded for readability][Expanded for readability]

Teleoreactive logic programs Teleoreactive logic programs are executed in a top-down, are executed in a top-down, left-to-right manner, much as left-to-right manner, much as in Prolog but extended over in Prolog but extended over time, with a single path being time, with a single path being selected on each time step. selected on each time step.

Interleaving HTN Execution and Classical PlanningInterleaving HTN Execution and Classical Planning

Solve(G)Solve(G) Push the goal literal G onto the empty goal stack GS. Push the goal literal G onto the empty goal stack GS. On each cycle, On each cycle, If the top goal G of the goal stack GS is satisfied, If the top goal G of the goal stack GS is satisfied, Then pop GS. Then pop GS. Else if the goal stack GS does not exceed the depth limit, Else if the goal stack GS does not exceed the depth limit, Let S be the skill instances whose heads unify with G. Let S be the skill instances whose heads unify with G. If any applicable skill paths start from an instance in S, If any applicable skill paths start from an instance in S, Then select one of these paths and execute it. Then select one of these paths and execute it. Else let M be the set of primitive skill instances that have not already failed in which G is an effect. Else let M be the set of primitive skill instances that have not already failed in which G is an effect. If the set M is nonempty, If the set M is nonempty, Then select a skill instance Q from M. Then select a skill instance Q from M.

Push the start condition C of Q onto goal stack GS. Push the start condition C of Q onto goal stack GS. Else if G is a complex concept with the unsatisfied subconcepts H and with satisfied subconcepts F,Else if G is a complex concept with the unsatisfied subconcepts H and with satisfied subconcepts F, Then if there is a subconcept I in H that has not yet failed, Then if there is a subconcept I in H that has not yet failed, Then push I onto the goal stack GS. Then push I onto the goal stack GS. Else pop G from the goal stack GS and store information about failure with G's parent. Else pop G from the goal stack GS and store information about failure with G's parent. Else pop G from the goal stack GS. Else pop G from the goal stack GS. Store information about failure with G's parent. Store information about failure with G's parent.

This is traditional means-ends analysis, with three exceptions: (1) conjunctive This is traditional means-ends analysis, with three exceptions: (1) conjunctive goals must be defined concepts; (2) chaining occurs over both skills/operators goals must be defined concepts; (2) chaining occurs over both skills/operators and concepts/axioms; and (3) selected skills are executed whenever applicable. and concepts/axioms; and (3) selected skills are executed whenever applicable.

A Successful Planning TraceA Successful Planning Trace

(ontable A T)

(on B A)

(on C B)

(hand-empty)

(clear C)

(unst. C B) (unstack C B) (clear B)

(putdown C T)

(unst. B A) (unstack B A) (clear A)

(holding C) (hand-empty)

(holding B)

ABC B

A C

initial stateinitial state

goalgoal

Three Questions about HTN LearningThree Questions about HTN Learning

What is the hierarchical structure of the network?What is the hierarchical structure of the network? What are the heads of the learned clauses/methods?What are the heads of the learned clauses/methods? What are the conditions on the learned clauses/methods? What are the conditions on the learned clauses/methods?

The answers follow naturally from our representation and from The answers follow naturally from our representation and from our approach to plan generation. our approach to plan generation.

Recording Results for LearningRecording Results for Learning

Solve(G)Solve(G) Push the goal literal G onto the empty goal stack GS. Push the goal literal G onto the empty goal stack GS. On each cycle, On each cycle, If the top goal G of the goal stack GS is satisfied, If the top goal G of the goal stack GS is satisfied, Then pop GS Then pop GS and let New be Learn(G).and let New be Learn(G). If G's parent P involved skill chaining, If G's parent P involved skill chaining, Then store New as P's first subskill. Then store New as P's first subskill. Else if G's parent P involved concept chaining, Else if G's parent P involved concept chaining, Then store New as P's next subskill. Then store New as P's next subskill. Else if the goal stack GS does not exceed the depth limit, Else if the goal stack GS does not exceed the depth limit, Let S be the skill instances whose heads unify with G. Let S be the skill instances whose heads unify with G. If any applicable skill paths start from an instance in S, If any applicable skill paths start from an instance in S, Then select one of these paths and execute it. Then select one of these paths and execute it. Else let M be the set of primitive skill instances that have not already failed in which G is an effect. Else let M be the set of primitive skill instances that have not already failed in which G is an effect. If the set M is nonempty, If the set M is nonempty, Then select a skill instance Q from M, Then select a skill instance Q from M, store Q with goal G as its last subskill,store Q with goal G as its last subskill, Push the start condition C of Q onto goal stack GSPush the start condition C of Q onto goal stack GS, and mark goal G as involving skill chaining., and mark goal G as involving skill chaining. Else if G is a complex concept with the unsatisfied subconcepts H and with satisfied subconcepts F, Else if G is a complex concept with the unsatisfied subconcepts H and with satisfied subconcepts F, Then if there is a subconcept I in H that has not yet failed, Then if there is a subconcept I in H that has not yet failed, Then push I onto the goal stack GS, Then push I onto the goal stack GS, store F with G as its initially true subconcepts, store F with G as its initially true subconcepts, and mark goal G as involving concept chaining.and mark goal G as involving concept chaining. Else pop G from the goal stack GS and store information about failure with G's parent. Else pop G from the goal stack GS and store information about failure with G's parent. Else pop G from the goal stack GS. Else pop G from the goal stack GS. Store information about failure with G's parent. Store information about failure with G's parent.

The extended problem solver The extended problem solver calls on calls on LearnLearn to construct a to construct a new skill clause and stores the new skill clause and stores the information it needs in the goal information it needs in the goal stack generated during search.stack generated during search.


What is the hierarchical structure of the network?What is the hierarchical structure of the network? The structure is determined by the subproblems solved The structure is determined by the subproblems solved

during planning, which, because both operator conditions during planning, which, because both operator conditions and goals are single literals, form a semilattice.and goals are single literals, form a semilattice.

What are the heads of the learned clauses/methods?What are the heads of the learned clauses/methods? What are the conditions on the learned clauses/methods? What are the conditions on the learned clauses/methods?

(ontable A T)

(on B A)

(on C B)

(hand-empty)

(clear C)


(putdown C T)



(holding B)

ABC B

A C

1

skill chainingskill chaining

Constructing Skills from a TraceConstructing Skills from a Trace

(ontable A T)

(on B A)

(on C B)

(hand-empty)

(clear C)


(putdown C T)



(holding B)

ABC B

A C

1

2



(ontable A T)

(on B A)

(on C B)

(hand-empty)

(clear C)


(putdown C T)



(holding B)

ABC B

A C

1 3

2

concept chainingconcept chaining


(ontable A T)

(on B A)

(on C B)

(hand-empty)

(clear C)


(putdown C T)



(holding B)

ABC B

A C

1 3

2

4



Learned Skills After Structure DeterminedLearned Skills After Structure Determined

(<head> (?C)(<head> (?C) :percepts :percepts ((block ?D) (block ?C))((block ?D) (block ?C)) :start :start <condition><condition> :skills :skills ((unstack ?D ?C)))((unstack ?D ?C)))

(<head> (?B)(<head> (?B) :percepts :percepts ((block ?C) (block ?B))((block ?C) (block ?B)) :start :start <condition><condition> :skills :skills ((unstackable ?C ?B) (unstack ?C ?B)))((unstackable ?C ?B) (unstack ?C ?B)))

(<head> (?C ?B)(<head> (?C ?B) :percepts :percepts ((block ?B) (block ?C))((block ?B) (block ?C)) :start :start <condition><condition> :skills :skills ((clear ?C) (hand-empty)))((clear ?C) (hand-empty)))

(<head> ( )(<head> ( ) :percepts :percepts ((block ?D) (table ?T1))((block ?D) (table ?T1)) :start :start <condition><condition> :skills :skills ((putdown ?D ?T1)))((putdown ?D ?T1)))




What are the heads of the learned clauses/methods?What are the heads of the learned clauses/methods? The head of a learned clause is the goal literal that the The head of a learned clause is the goal literal that the

planner achieved for the subproblem that produced it.planner achieved for the subproblem that produced it. What are the conditions on the learned clauses/methods? What are the conditions on the learned clauses/methods?

Learned Skills After Heads InsertedLearned Skills After Heads Inserted

(clear (?C)(clear (?C) :percepts :percepts ((block ?D) (block ?C))((block ?D) (block ?C)) :start :start <condition><condition> :skills :skills ((unstack ?D ?C)))((unstack ?D ?C)))

(clear (?B)(clear (?B) :percepts :percepts ((block ?C) (block ?B))((block ?C) (block ?B)) :start :start <condition><condition> :skills :skills ((unstackable ?C ?B) (unstack ?C ?B)))((unstackable ?C ?B) (unstack ?C ?B)))

(unstackable (?C ?B)(unstackable (?C ?B) :percepts :percepts ((block ?B) (block ?C))((block ?B) (block ?C)) :start :start <condition><condition> :skills :skills ((clear ?C) (hand-empty)))((clear ?C) (hand-empty)))

(hand-empty ( )(hand-empty ( ) :percepts :percepts ((block ?D) (table ?T1))((block ?D) (table ?T1)) :start :start <condition><condition> :skills :skills ((putdown ?D ?T1)))((putdown ?D ?T1)))




What are the heads of the learned clauses/methods?What are the heads of the learned clauses/methods? The head of a learned clause is the goal literal that the The head of a learned clause is the goal literal that the

planner achieved for the subproblem that produced it.planner achieved for the subproblem that produced it. What are the conditions on the learned clauses/methods? What are the conditions on the learned clauses/methods?

If the subproblem involved skill chaining, they are the If the subproblem involved skill chaining, they are the conditions of the first subskill clause.conditions of the first subskill clause.

If the subproblem involved concept chaining, they are the If the subproblem involved concept chaining, they are the subconcepts that held at the outset of the subproblem. subconcepts that held at the outset of the subproblem.

Learned Skills After Conditions InferredLearned Skills After Conditions Inferred

(clear (?C)(clear (?C) :percepts :percepts ((block ?D) (block ?C))((block ?D) (block ?C)) :start :start (unstackable ?D ?C)(unstackable ?D ?C) :skills :skills ((unstack ?D ?C)))((unstack ?D ?C)))

(clear (?B)(clear (?B) :percepts :percepts ((block ?C) (block ?B))((block ?C) (block ?B)) :start :start [(on ?C ?B) (hand-empty)][(on ?C ?B) (hand-empty)] :skills :skills ((unstackable ?C ?B) (unstack ?C ?B)))((unstackable ?C ?B) (unstack ?C ?B)))

(unstackable (?C ?B)(unstackable (?C ?B) :percepts :percepts ((block ?B) (block ?C))((block ?B) (block ?C)) :start :start [(on ?C ?B) (hand-empty)][(on ?C ?B) (hand-empty)] :skills :skills ((clear ?C) (hand-empty)))((clear ?C) (hand-empty)))

(hand-empty ( )(hand-empty ( ) :percepts :percepts ((block ?D) (table ?T1))((block ?D) (table ?T1)) :start :start (putdownable ?D ?T1)(putdownable ?D ?T1) :skills :skills ((putdown ?D ?T1)))((putdown ?D ?T1)))

Learning an HTN Method from a Problem SolutionLearning an HTN Method from a Problem Solution

Learn(G)Learn(G) If the goal G involves skill chaining, If the goal G involves skill chaining, Then let S Then let S11 and S and S22 be G's first and second subskills. be G's first and second subskills. If subskill S If subskill S11 is empty, is empty, Then return the literal for clause S Then return the literal for clause S22.. Else create a new skill clause N with head G, Else create a new skill clause N with head G, with S with S11 and S and S22 as ordered subskills, and as ordered subskills, and with the same start condition as subskill S with the same start condition as subskill S11.. Return the literal for skill clause N. Return the literal for skill clause N. Else if the goal G involves concept chaining, Else if the goal G involves concept chaining, Then let {C Then let {Ck+1k+1, ..., C, ..., Cnn} be G's initially satisfied subconcepts.} be G's initially satisfied subconcepts. Let {C Let {C11, ..., C, ..., Ckk} be G's stored subskills.} be G's stored subskills. Create a new skill clause N with head G, Create a new skill clause N with head G, with {C with {Ck+1k+1, ..., C, ..., Cnn} as ordered subskills, and} as ordered subskills, and with the conjunction of {C with the conjunction of {C11, ..., C, ..., Ckk} as start condition.} as start condition. Return the literal for skill clause N. Return the literal for skill clause N.

Creating a Clause from Skill ChainingCreating a Clause from Skill Chaining

1

83X Z

2

81 2X ZYProblemProblemSolutionSolution

NewNewMethodMethod

Creating a Clause from Concept ChainingCreating a Clause from Concept Chaining

ProblemProblemSolutionSolution

NewNewMethodMethod

8

2

1

Y

BX

CZ

D

A

A

D#

B

83 Z

C

Important Features of Learning MethodImportant Features of Learning Method

it occurs incrementally from one experience at a time; it occurs incrementally from one experience at a time; it takes advantage of existing background knowledge; it takes advantage of existing background knowledge; it constructs the hierarchies in a it constructs the hierarchies in a cumulativecumulative manner. manner.

Our approach to learning HTNs has some important features: Our approach to learning HTNs has some important features:

In these ways, it is similar to explanation-based approaches to In these ways, it is similar to explanation-based approaches to learning from problem solving.learning from problem solving.

However, the method for finding conditions involves neither However, the method for finding conditions involves neither analytic or inductive learning in their traditional senses. analytic or inductive learning in their traditional senses.

An In-City Driving EnvironmentAn In-City Driving Environment

Our focus on learning for Our focus on learning for reactive control comes reactive control comes from an interest in complex from an interest in complex physical domains, such as physical domains, such as driving a vehicle in a city.driving a vehicle in a city.

To study this problem, we To study this problem, we have developed a realistic have developed a realistic simulated environment that simulated environment that can support many different can support many different driving tasks. driving tasks.

Skill Clauses Learning for In-City DrivingSkill Clauses Learning for In-City Driving

parked (?ME ?G1152)parked (?ME ?G1152) :percepts :percepts (( (lane-line ?G1152) (self ?ME))(lane-line ?G1152) (self ?ME)) :start :start ( )( ) :skills :skills (( (in-rightmost-lane ?ME ?G1152)(in-rightmost-lane ?ME ?G1152) (stopped ?ME))(stopped ?ME))

in-rightmost-lane (?ME ?G1152) in-rightmost-lane (?ME ?G1152) :percepts :percepts (( (self ?ME) (lane-line ?G1152))(self ?ME) (lane-line ?G1152)) :start :start (( (last-lane ?G1152))(last-lane ?G1152)) :skills :skills (( (driving-in-segment ?ME ?G1101 ?G1152))(driving-in-segment ?ME ?G1101 ?G1152))

driving-in-segment (?ME ?G1101 ?G1152)driving-in-segment (?ME ?G1101 ?G1152) :percepts :percepts (( (lane-line ?G1152) (segment ?G1101) (self ?ME))(lane-line ?G1152) (segment ?G1101) (self ?ME)) :start :start (( (steering-wheel-straight ?ME))(steering-wheel-straight ?ME)) :skills :skills (( (in-lane ?ME ?G1152)(in-lane ?ME ?G1152) (centered-in-lane ?ME ?G1101 ?G1152)(centered-in-lane ?ME ?G1101 ?G1152) (aligned-with-lane-in-segment ?ME ?G1101 ?G1152)(aligned-with-lane-in-segment ?ME ?G1101 ?G1152) (steering-wheel-straight ?ME))(steering-wheel-straight ?ME))

Learning Curves for In-City DrivingLearning Curves for In-City Driving

Transfer Studies of HTN LearningTransfer Studies of HTN Learning

Because we were interested in our method’s ability to transfer its Because we were interested in our method’s ability to transfer its learned skills to harder problems, we: learned skills to harder problems, we:

created concepts and operators for Blocks World and FreeCell;created concepts and operators for Blocks World and FreeCell;

let the system solve and learn from simple training problems;let the system solve and learn from simple training problems;

asked the system to solve and learning from harder test tasks;asked the system to solve and learning from harder test tasks;

recorded the number of steps taken and solution probability;recorded the number of steps taken and solution probability;

as a function of the number of transfer problems encountered; as a function of the number of transfer problems encountered;

averaged the results over many different problem orders. averaged the results over many different problem orders.

The resulting transfer curves revealed the system’s ability to take The resulting transfer curves revealed the system’s ability to take advantage of prior learning and generalize to new situations. advantage of prior learning and generalize to new situations.

Transfer Effects in the Blocks WorldTransfer Effects in the Blocks World

On 20-block tasks, there is no difference in solved problems. On 20-block tasks, there is no difference in solved problems.

20 blocks

Transfer Effects in the Blocks WorldTransfer Effects in the Blocks World

However, there is difference in the effort needed to solve them. However, there is difference in the effort needed to solve them.

20 blocks

FreeCell SolitaireFreeCell Solitaire

FreeCell is a full-information card game that, in most cases, can FreeCell is a full-information card game that, in most cases, can be solved by planning; it also has a highly recursive structure. be solved by planning; it also has a highly recursive structure.

Transfer Effects in FreeCellTransfer Effects in FreeCell

On 16-card FreeCell tasks, prior training aids solution probability.On 16-card FreeCell tasks, prior training aids solution probability.

16 cards


However, it also lets the system solve problems with less effort. However, it also lets the system solve problems with less effort.

16 cards


On 20-card tasks, the benefits of prior training are much stronger.On 20-card tasks, the benefits of prior training are much stronger.

20 cards


However, it also lets the system solve problems with less effort. However, it also lets the system solve problems with less effort.

20 cards

Where is the Utility Problem?Where is the Utility Problem?

Many previous studies of learning and planning found that: Many previous studies of learning and planning found that:

learned knowledge reduced problem-solving steps and searchlearned knowledge reduced problem-solving steps and search

but increased CPU time because it was specific and expensivebut increased CPU time because it was specific and expensive

We have not yet observed the utility problem, possibly because:We have not yet observed the utility problem, possibly because:

the problem solver does not chain off learned skill clauses; the problem solver does not chain off learned skill clauses;

our performance module does not attempt to eliminate search. our performance module does not attempt to eliminate search.

If we encounter it in future domains, we will collect statistics on If we encounter it in future domains, we will collect statistics on clauses to bias selection, like Minton (1988) and others. clauses to bias selection, like Minton (1988) and others.

Related Work on Planning and ExecutionRelated Work on Planning and Execution

problem-solving architectures like Soar and Prodigyproblem-solving architectures like Soar and Prodigy Nilsson’s (1994) notion of teleoreactive controllersNilsson’s (1994) notion of teleoreactive controllers execution architectures that use HTNs to structure knowledgeexecution architectures that use HTNs to structure knowledge Nau et al.’s encoding of HTNs for use in plan generationNau et al.’s encoding of HTNs for use in plan generation

Our approach to planning and execution bears similarities to: Our approach to planning and execution bears similarities to:

These mappings are valid but provide no obvious approach to These mappings are valid but provide no obvious approach to learning HTN structures from successful plans.learning HTN structures from successful plans.

Erol et al.’s (1994) complexity analysis of HTN planningErol et al.’s (1994) complexity analysis of HTN planning Barrett and Weld’s (1994) use of HTNs for plan parsingBarrett and Weld’s (1994) use of HTNs for plan parsing

Other mappings between classical and HTN planning come from:Other mappings between classical and HTN planning come from:

Related Research on LearningRelated Research on Learning

production composition (e.g., Neves & Anderson, 1981)production composition (e.g., Neves & Anderson, 1981) macro-operator formation (e.g., Iba, 1985)macro-operator formation (e.g., Iba, 1985) explanation-based learning (e.g., Mitchell et al., 1986)explanation-based learning (e.g., Mitchell et al., 1986) chunking in Soar (Laird, Rosenbloom, & Newell, 1986)chunking in Soar (Laird, Rosenbloom, & Newell, 1986)

Our learning mechanisms are similar to those in earlier work on:Our learning mechanisms are similar to those in earlier work on:

which also learned decomposition rules from problem solutions. which also learned decomposition rules from problem solutions.

Marsella and Schmidt’s (1993) REAPPRMarsella and Schmidt’s (1993) REAPPR Ruby and Kibler’s (1993) SteppingStoneRuby and Kibler’s (1993) SteppingStone Reddy and Tadepalli’s (1997) X-LearnReddy and Tadepalli’s (1997) X-Learn

But they do not rely on analytical schemes like goal regression, But they do not rely on analytical schemes like goal regression, and their creation of hierarchical structures is closer to that by:and their creation of hierarchical structures is closer to that by:

The IThe ICARUSCARUS Architecture Architecture

Long-TermLong-TermConceptualConceptual

MemoryMemory

Long-TermLong-TermSkill MemorySkill Memory

Short-TermShort-TermConceptualConceptual

MemoryMemory

Goal/SkillGoal/SkillStackStack

CategorizationCategorizationand Inferenceand Inference

SkillSkillExecutionExecution

PerceptionPerception

EnvironmentEnvironment

PerceptualPerceptualBufferBuffer

Problem SolvingProblem SolvingSkill LearningSkill Learning

MotorMotorBufferBuffer

SkillSkillRetrievalRetrieval

Hierarchical Structure of Long-Term MemoryHierarchical Structure of Long-Term Memory

conceptsconcepts

skillsskills

Each concept is defined in terms of other concepts and/or percepts.Each concept is defined in terms of other concepts and/or percepts.

Each skill is defined in terms of other skills, concepts, and percepts.Each skill is defined in terms of other skills, concepts, and percepts.

IICARUS CARUS organizes both concepts and skills in a hierarchical organizes both concepts and skills in a hierarchical manner.manner.

Interleaved Nature of Long-Term MemoryInterleaved Nature of Long-Term Memory

conceptsconcepts

skillsskills

For example, the skill highlighted here refers directly to For example, the skill highlighted here refers directly to the highlighted concepts.the highlighted concepts.

IICARUS CARUS interleaves its long-term memories for concepts and skills.interleaves its long-term memories for concepts and skills.

Recognizing Concepts and Selecting SkillsRecognizing Concepts and Selecting Skills

conceptsconcepts

skillsskills

Concepts are matched bottom up, starting from percepts.Concepts are matched bottom up, starting from percepts.

Skill paths are matched top down, starting from Skill paths are matched top down, starting from intentions.intentions.

IICARUS CARUS matches patterns to recognize concepts and select skills.matches patterns to recognize concepts and select skills.

Directions for Future WorkDirections for Future Work

Despite our initial progress on structure learning, we should still:Despite our initial progress on structure learning, we should still: evaluate approach on more complex planning domains;evaluate approach on more complex planning domains; extend method to support HTN planning rather than execution;extend method to support HTN planning rather than execution; generalize the technique to acquire partially ordered skills;generalize the technique to acquire partially ordered skills; adapt scheme to work with more powerful planners; adapt scheme to work with more powerful planners; extend method to chain backward off learned skill clauses;extend method to chain backward off learned skill clauses; add technique to learn recursive concepts for preconditions; add technique to learn recursive concepts for preconditions; examine and address the examine and address the utility problemutility problem for skill learning. for skill learning.

These should make our approach a more effective tool for learning These should make our approach a more effective tool for learning hierarchical task networks from classical planning. hierarchical task networks from classical planning.

relies on a new formalism – relies on a new formalism – teleoreactive logic programsteleoreactive logic programs – – that identifies heads with goals and has single preconditions;that identifies heads with goals and has single preconditions;

executes stored HTN methods when they are applicable but executes stored HTN methods when they are applicable but resorts to classical planning when needed;resorts to classical planning when needed;

caches the results of successful plans in new HTN methods caches the results of successful plans in new HTN methods using a simple learning technique; using a simple learning technique;

creates recursive, hierarchical structures from individual creates recursive, hierarchical structures from individual problems in an incremental and cumulative manner. problems in an incremental and cumulative manner.

We have described an approach to planning and execution that:We have described an approach to planning and execution that:

This approach holds promise for bridging the longstanding divide This approach holds promise for bridging the longstanding divide between two major paradigms for planning. between two major paradigms for planning.

Concluding RemarksConcluding Remarks

End of PresentationEnd of Presentation

Documents

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information