Learning Control Knowledge for Planning

Yi-Cheng Huang

Outline

I. Brief overview of planning

II. Planning with Control knowledge

III. Learning control knowledge

IV. Conclusion

I. Overview of Planning

Planning - a very general framework for many applications:Robot control;Airline scheduling;Hubble space telescope control.

Planning – find a sequence of actions that leads from an initial state to a goal state.

Planning Is Difficult –Abundance of Negative Complexity Results

Domain-independent planning: PSPACE-complete or worse (Chapman 1987; Bylander 1991; Backstrom 1993).

Domain-dependent planning: NP-complete or worse (Chenoweth 1991; Gupta and Nau 1992).

Approximate planning: NP-complete or worse (Selman 1994).

Recent State-of-the-art Planners

Constraint-based Planners – Graphplan, Blackbox.

Heuristic Search Planners – HSP, FF.

Both kinds of planners can solve problems in seconds or minutes that traditional planners take hours or days.

Graphplan (Blum & Furst, 1995)

... ...

FactsActions

Search on planning graph to find plan

Time i Time i+1

Blackbox (Kautz & Selman, 1999)

z)yu)(xstd)(cb)(a(

Satisfiability Tester ( Chaff ,WalkSat, Satz, RelSat, ...)

problem

Heuristic Search Based Planning (Bonet & Geffner, ‘9

Use various heuristic functions to approximate the distance from the current state to the goal state based on the planning graph.

Use Best-First Search or A* search to find plans.

II. Planning With Control

General focus on planning: avoid search as much as possible.

Many real-world applications are tailored and simplified by domain specific knowledge.

TLPlan is an efficient planner when using control knowledge to guild a forward-chaining search planner (Bacchus & Kabanza 2000) .

TLPlan

Temporal Logic Control Formula

A Simple Control Rule Example

Do NOT move an object at the goal location

(goal (at (obj loc)) at (obj loc))

Temporal logic operator: “always” “next”

Question:

Whether the same level of control can be effectively incorporated into constraint-based planner?

I. Rules involves only static information.

II. Rules depends on the current state.

III. Rules depends on the current state and

require dynamic user-defined predicates.

Control Rules Categories

Category I Control Rules(only depends on goal; toy example)

Do NOT unload an package from an airplane if the current location is not in the package’s goal

p)) in(a l) goal(al) at(pp) (in(a

Pruning the Planning GraphCategory I Rules

Facts FactsActions

... ...

Effect of Graph Pruning

log-a log-b log-c log-d

Original Pruned

Category II Control Rules

Do NOT move an airplane if there is an object in the airplane that needs to be unloaded at that location.

l))) at(p l) at(pp) (in(a l) goal(a:la,(

Control by Adding Constraints

Temporal Logic Control Rules

Planning Formula Constraints Clauses

)yyx( 1iii

l))) at(p l) at(pp) (in(a l) goal(a:la,(

Rules Without Compact Encoding

Do NOT move a vehicle unless(a) there is an object that needs to be picked up(b) there is an object in the vehicle that needs to be unloaded

Complex Encoding for Category III Rules

Need to define extra predicates:

need_to_move_by_airplane; need_to_unload_by_airplane Introduce extra literals and clauses.

O(mn) ground literals; O(mn+km^2) clauses at each time step.

m: #cities, n: #objects, k: #airports

No easy encoding for category III rules. However, it appears category I & II rules do m

ost of work.

Blackbox with Control Knowledge(Logistics domain with hand-coded rules)

log-a log-b log-c log-d log-e

blackbox blackbox(I) blackbox(II) blackbox(I&II)

Note: Logarithmic time scale

Comparison of Blackbox and TLPlan (Run Time)

log-a log-b log-c log-d log-e

TLPlan Blackbox(I&II)

Comparison of Blackbox and TLPlan(parallel plan length; “plan quality”)

log-c log-d log-e log-1 log-2

TLPlan TLPlan-R Blackbox

Summary Adding Control Knowledge

We have shown how to add declarative control knowledge to a constraint-based planners by using temporal logic statements.

Adding such knowledge gives significant speedups (up to two orders of magnitude).

Pure heuristic search with control can be still faster but with much lower plan quality.

III. Can we learn domain knowledge from example plans?

Motivation

Control Rules used in TLPlan and Blackbox are hand-coded.

Idea: learn control rules on a sequence of small problems solved by planner.

Learning System Framework

Plan Justification / Type Inference

Blackbox Planner

Problem

ILP Learning Module / Verification

Control Rules

Target Concepts for Actions

Action Select Rule: indicate conditions under which the action can be performed immediately.

Action Reject Rule: indicate conditions under which it must not be performed.

Basic Assumption on Learning Control

Plan found by planner on simple problems are optimal or near-optimal.

Actions appear in an optimal plan must be selected.

Actions that can be executed but do not appear in the plan must be rejected.

Real action: action appears in the plan.

Virtual action: action that its preconditions are hold but does not appear in the plan.

Definition

An Toy Planning Example

GoalInitial

BOS SFONYC

Initial

a ba b

Real & Virtual Actions for UnloadAirplane

Time 1: LoadAirplane (P a BOS)Time 2: FlyAirplane (P SFO NYC) UnloadAirplane (P a BOS)Time 3: LoadAirplane (P b NYC) UnloadAirplane (P a NYC)Time 4: FlyAirplane (P NYC SFO) UnloadAirplane (P a NYC) UnloadAirplane (P b NYC)Time 5: UnloadAirplane (P a SFO) UnloadAirplane (P b SFO)

Virtual

Heuristics for Extracting Examples

Select Rule Reject Rule

+ example - example + example - example

real virtual virtual real

Rule Induction

Literal: Xi = Xj , ex., loc1 = loc2 P(X1,…, Xn), ex., at (pkg, loc) goal (P(X1,…, Xn)), ex., goal (at (pkg, loc)) negation of the above

literalsaction )(

Based on Quinlan’s FOIL (Quinlan 1990; 1996).

Reject Rule: UnloadAirplane

time pln pkg apt

+ 2 P a BOS

+ 3 P a NYC

+ 4 P a NYC

- 5 P a SFO

UnloadAirplane (pln pkg apt)

UnloadAirplane (pln pkg apt) goal(at (pkg loc))

time pln pkg apt loc

+ 2 P a BOS SFO

+ 3 P a NYC SFO

+ 4 P a NYC SFO

- 5 P a SFO SFO

UnloadAirplane (pln pkg apt) goal(at (pkg loc)) ^ (apt != loc)

time pln pkg apt loc

+ 2 P a BOS SFO

+ 3 P a NYC SFO

+ 4 P a NYC SFO

- 5 P a SFO SFO

Learning Time

logitics(10)

briefcase(3)

grid (6) gripper (2) mystery(6)

tireworld(5)

Logistics Domain

100000

Problems

w/ control w/o control

Learned Logistics Control Rules

If an object’s goal location is at different city, do NOT unload the object from airplanes.

p)) in(o c) incity(lc) incity(ml) goal(om) at(pp) (in(o

c) incity(ac) incity(ll) goal(aairport(a)a) at(tt) (in(o

Unload an object from a truck if the current location is an airport and it is not in the same city as the package’s goal location.

a)) at(o

Briefcase Domain

Problems

Grid Domain

100000

Problems

Gripper Domain

100000

Problems

Mystery Domain

Problems

Tireworld Domain

Problems

Summary of Learning for Planning

Introduced inductive logic programming methodology into constraint-based planning framework to obtain “trainable planner”.

Demonstrated clear practical speedups on range of benchmark problems.

IV. Single-agent vs. Multi-agent planning

Observations: heuristic planners degrade rapidly in multi-agent settings. They tend to assign all work to a single agent.

We studied this phenomenon by exploring different work-load distributions.

Force the Planners

There is no easy way to modify the heuristic search planners to find better quality plans.

Limit the number of feature actions an agent can perform to force the planners to find plans with the same level of participation of all agents.

Sokoban Domain

Restricted Sokoban Domain

1000000

sokoban-1 (4,4,4) sokoban-2 (3,3,3) sokoban-3 (5,5,5)

Blackbox HSP FF

Complexity Analysis on Restricted Domain

C.B.P H.P.

Sokoban PSPACE-Complete(Culberson, 1997)

Rocket NP-Complete(reduce from vertex feedback)

Grid Polynomial Solvable V

Elevator Polynomial Solvable V

Conclusions (a)

Demonstrated how performance of state-of-the-art general purpose planning systems can be boosted by incorporating control knowledge.

Knowledge encoded in purely declarative form using temporal logic formulas.

Obtained up to 2 orders of magnitude speedup on series of benchmarks.

Conclusions (b) Demonstrated feasibility of a “trainable” planning

system: system learns domain / control knowledge from many small example plans.

Based on concepts from inductive logic programming. Learned knowledge in temporal logic form.

First demonstration of practical speedups using learning in a planning system on realistic benchmarks.

Approach avoids learning “accidental truths” that can hurt system performance (problem in earlier systems)

Conclusions (c)

Uncovered link between performance of planners and inherent complexity of planning task.

Heuristic search planners work well on problems solvable in poly time with specialized algorithms.

Constraint-based planner dominate on NP-complete planning tasks.

Conclusion

Comparison of constraint-based planner and heuristic search planner shows that they complement each other on different domains.

Hand-coded control knowledge can be effectively applied in constraint-based planners.

Conclusion (cont.)

Our learning system is simple and modular; learning time is short.

Learned rules are on par with hand-coded ones and shown to improve the performance for over two orders of magnitude.

Learned rules are in logic form and can be used on other planning systems.

Demonstrated a way for effectively learning domain knowledge from small general plans. Learned control knowledge boosts performance on larger problems. First clear demonstration of boosting plan system performance through learning.

Declarative, logic-based approach is general and fits wide range of planning applications.

The End

Learning Control Knowledge for Planning

Documents

Strategic Planning copy - CLICKSStrategic Planning ten (10) critical questions Barriers to strategy implementation Program Structure Center for Learning Innovations & Customized Knowledge

Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

Intelligent Knowledge Distribution for Multi-Agent ... · Intelligent Knowledge Distribution for Multi-Agent Communication, Planning, and Learning MichaelC.Fowler DissertationsubmittedtotheFacultyofthe

Active Learning Knowledge, Learning and - UCL

Knowledge convergence in collaborative learning · 2012-05-22 · Knowledge Convergence 2 Running head: KNOWLEDGE CONVERGENCE Knowledge Convergence in Collaborative Learning: Concepts

Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised

Knowledge context, learning and innovation: an …...Knowledge context, learning and innovation 2 Knowledge context, learning and innovation: an integrating framework Stephen Roper

learning and knowledge

Learning control knowledge and case-based planning Jim Blythe, with additional slides from presentations by Manuela Veloso

STRATEGIC PLANNING FOR KNOWLEDGE MANAGEMENT

CIS 678 Artificial Intelligence problems deduction, reasoning knowledge representation planning learning natural language processing motion and manipulation

For Early Access of Innovative Medical Devices to Patients：Deep Learning Cat(’12) Rules Data Machine Learning Knowledge representation model Fuzzy inference Planning Chess(’96)

Affordances as Transferable Knowledge for Planning Agents · ning framework (BURLAP3) is working toward integrating planning and reinforcement learning algorithms with a va-riety

PLANNING: PLANNING COMPREHENSIVE BREAST CANCER PROGRAMS ... · KNOWLEDGE SUMMARY PLANNING: PLANNING COMPREHENSIVE BREAST CANCER PROGRAMS: CALL TO ACTION KNOWLEDGE SUMMARY PLANNING:

THE KNOWLEDGE ECONOMY AND EDUCATION Knowledge …wall.oise.utoronto.ca/resources/teacher-learning... · Teacher Learning and Power in the Knowledge Society Teacher Learning and Power

UNIDO Final Evaluation of the Learning and Knowledge ... · Learning and Knowledge Development Facility (LKDF) ... LKDF Learning and Knowledge Development Facility ... and the private

Planning for learning: Building knowledge for teaching ... · Planning for learning: Building knowledge for teaching primary science and technology Judy Moreland, Bronwen Cowie, Kathrin

Specialised Knowledge Planning & myWECOBIS BNBRelevant

Introduction to AI - cvut.cz · Key AI Topics computer vision natural language processing robotics planning learning knowledge representation reasoning about knowledge problem ZUI

Learning Control Knowledge for Planning Yi-Cheng Huang