From high level goals to policies: a polynomial time algorithm for k-maintainable goals

From high level goals to policies: a polynomial time algorithm for k-

maintainable goals

Chitta BaralArizona State university

(joint work with Marcus Bjareland, Thomas Eiter, Mutsumi Nakamura, and

Tran Son)

Quick overview of my research

Knowledge Representation and ReasoningLanguage design; theoretical building blocks; implementation; applications.

Action, change and historiesDeveloping languages for representing actions, the structure of the world, and the effects of the actions on the world. Developing languages for expressing goals or Developing languages for expressing goals or directives. directives. Developing ways to achieve goalsDeveloping ways to achieve goalsFormulating various kinds of reasoning (e.g. prediction, planning, explanation, diagnosis, counterfactuals, etc.)

Application of the above to modeling cell behavior Prediction: (side) effect of drugsPlanning: Drug designExplanation: explaining unusual behavior; medical diagnosisOthers: hypothesis generation

Motivation: Parameterized maintainability goals

Always f, also written as □ f- too strong for many kind of maintainability (eg. maintain the room clean)

Always Eventually f, also written as □ ◊ f. - Weak in the sense it does not give an estimate on when f will be made true.

- May not be achievable in presence of continuous interference by belligerent agents.

□ f ------------------ □ ◊k f -------------------------- □ ◊ f

□ ◊3 f is a shorthand for □ ( f V O f V OO f V OOO f )But if an external agent keeps interfering how is one supposed to guarantee □ ◊3 f .

Motivation: a controller-agent transcript

Controller (to the agent/robot): Your goal is to maintain the room clean.

Robot/Agent: Can you be precise about what you mean by ‘maintain’? Also can I clean anytime or are there restrictions?

Controller: You can only clean when the room is unoccupied.Controller: By ‘maintain’ I mean ALWAYS clean.Robot/Agent: I won’t be able to guarantee that. What if while the room

is occupied some one makes it dirty?Controller: Ok, I understand. How about

ALWAYS EVENTUALLLY clean.Controller’s Boss: ‘Eventually’ is too lenient. We can’t have the room

unclean for too long. We should put some bound.

Controller-agent transcript (cont)

Controller: Sorry, Sir. I should have made it more precise.ALWAYS EVENTUALLY3 clean

Robot/Agent: Sorry. I can neither guarantee ALWAYS EVENTUALLLY clean nor guarantee ALWAYS EVENTUALLLY3 clean. What if the room is continuously being used and you told me I can not clean while it is being used.

Controller: You have a good point. Let me clarify again.If you are given an opportunity of 3 units of time without the room being occupied (i.e., without any interference from external agents) then you should have the room clean during that time.

Robot/Agent: I think I understand you. But as you know I am a robot and not that good at understanding English. Can you please input it in a precise language.

Formulating k-maintainability: a system

A system is a quadruple A = (S,A,Ф, poss), where– S is the set of system states;– A is the set of actions, which is the union of the set of agents actions, Aag, and the set of environmental actions, Aenv;

– Ф : S x A → 2 S is a non-deterministic transition function that specifies how the state of the world changes in response to actions;

– poss : S → 2 A is a function that describes which actions are possible to take in which states.

A system

s1s4

s3

s5s2

s6

s7

a1

a1

a2a3

a4

a5

S = {s1,s2,s3,s4,s5,s6,s7}

A = {a1, a2, a3,a4,a5}Ф : as shown in the pictureposs(s1) = {a1,a2,a3}poss(s4) = {a4}

b

cd

hf

g

a

a’

e

a

a

a

S = {b,c,d,f,g,h}

A = {a, a’, e}

Aag = {a, a’}

Aenv = {e}

Ф : as shown in the pictureposs(b) = {a} when our policy dictates a to be executed at b.

Controls and super-controls

Given a system A = (S,A,Ф, poss) and a set Aag (subset of A) of agent actions,

– a control policy for A w.r.t. Aag is a partial

function K: S → Aag, such that K(s) is an element of poss(s) whenever K(s) is defined.

– a super-control policy for A w.r.t. Aag is a partial function

K : S → 2 Aag, such that K(s) is a subset of poss(s)

and K(s) ≠ { } whenever K(s) is defined.

Reachable states and closure

Reachable states R(A,s): Given a system A = (S,A,Ф, poss) and a state s, R(A, s) (subset of S ) is the smallest set of states that satisfy the following conditions: (i) s is in R(A, s) ; and (ii) If s’ is in R(A, s) and a is in poss(s′), then Ф(s’, a) is a subset of R(A, s) .Let A = (S,A,Ф, poss) be a system and let S be a subset of S. Then the closure of A w.r.t. S, denoted by Closure(S,A), is defined by Closure(S,A) = Us in S R(A, s) .

b

cd

hf

g

a

a’

e

a

a

a

A = (S,A,Ф, poss)R(A,d) = {d,h}R(A,f) = {f, g, h}Closure({d,f}, A) = {d,f,g,h}

Unfoldk(s,A,K):

An element of Unfoldk(s,A,K) is a sequence of states of length at most k + 1 that the system may go through if it follows the control K starting from the state s. Formally:

Let A = (S,A,Ф, poss) be a system, let s belong to S,

and let K be a control for A. Then Unfoldk(s,A,K) is the set of all sequences

σ = s0, s1, . . . , sl where l ≤ k and s0 = s, such that K (sj)

is defined for all j<l, sj +1 belongs to Ф (sj, K(sj)), and if

l<k, then K(sl) is undefined.

b

cd

hf

g

a

a’

e

a

a

a

Consider policy K : Do action a in states b, c, and d

Unfold3(b,A,K) = { <b,c,d,h>, <b,g>}

Unfold3(c,A,K) = { <c,d,h> }

a

Definition of k-maintainability: the parameters

1. a system A = (S,A,Ф, poss) ,

2. a set Aag ⊆ A of agent actions,

3. set of initial states S 4. a set of desired states E that we want to maintain,5. Maintainability parameter k.

6. a function exo : S → 2 Aenv detailing exogenous actions, such that exo(s) is a subset of poss(s), and

7. a control K (mapping a relevant part of S to Aag) such that K (s) belongs to poss(s).

Basic IdeaIgnoring interference:

From any state under consideration by following the control policy one should visit E in k steps.

Accounting for interference:Broaden the states under consideration from the initial states to all states that can be reached due to the control policy and the environment. (Use the notion of Closure.)When using Closure

take into account the control policy; ignore other agents actions besides the one dictated by the control policy.Also only consider exogenous actions in exo(s).

Definition of k-maintainability possK,exo (s) is the set {K (s)} U exo(s).

AK,exo = (S,A,Ф, possK,exo)

Given a system A = (S,A,Ф, poss), a set of agents action Aag (subset of A ) and a specification of exogenous action occurrence exo, we say that a control K for A w.r.t. Aag k-maintains subset S of S with respect to subset E of S, where k≥0, if - for each state s in Closure(S,AK,exo) and each sequence σ

= s0, s1, . . . , sr in Unfoldk(s,A,K) with s0 = s, it holds that {s0, s1, . . . , sr } ∩ E ≠ { }.

b

cd

hf

g

a

a’

e

a

a

a

Consider policy K: Do action a in states b, c, and d

poss(b) = {a,a’} possK,exo (b) = {a}

Closure({b,c},A)= {b,c,d,f,g,h}

Closure({b,c},AK,exo)= {b,c,d,h}

b

cd

hf

g

a

a’

e

a

a

a

Goal: 3-maintainable policy for S={b} w.r.t. E={h}

Such a policy: Do a in b, c, and d

b

cd

hf

g

a

a’

e

a

a

ae

Goal: 3-maintainable policy for S={b} w.r.t. E={h}

No such policy.

Constructing k-maintainable control policies: pre-formulation attempts

Handwritten policies: subsumption architecture, RAPs, situation control rules, protocols.Our initial motivation behind formulating maintainability was when we tried to formalize what a control module was doing.Kaebling and Rosenschien 1991: In the control rule “if condition c is satisfied then do action a”, the action a is the action that leads to the goal from any state where the condition c is satisfied.

b

cd

hf

g

a

a’

e

a

a

a

Forward Search: If we use minimal paths or minimal cost paths we might pick a’; then we would have to backtrack.

Backward Search: Should we include both d and f.

Propositional Encoding of solutions

Input: An input I is a system A= (S, A,Φ, poss), set of goal states E S , set of initial states S S, a set Aag A, a function exo, and an integer k 0

Output: A control K such that S is k-maintainable with respect to E (using the control K), if such a control exists. Otherwise the output is the answer that no such control exists.

AIM: Given an input I, we construct a SAT instance sat(I) in polynomial time such that sat(I) is satisfiable if and only if the input I allows for a k-maintainable control, and that the satisfying assignments for sat(I) encode possible such controls.

Propositional encoding: notation

si denotes thatthere is a path from state s to some state in E using only agent actions and at most i of them, to which we refer as “there is an a-path from s to E of length at most i,” and thatfrom each state s' reachable from s, there is an a-path from s' to E of length at most k.

The encoding sat(I)(0) For all states s, and for all j, 0 j <k: sj sj+1

(1) For all s E: s0

(2) For all states s, t such that Φ(a,s) = t for some action a exo(s): sk tk

(3) For all states s not in E and all i, 1 i k:si t PS(s) ti-1 , where

PS(s) = {t S | a Aag poss(s): t = Φ(a,s) };

(4) For all initial states not in E: sk

(5) For all states s not in E: s0

Constructing policies from the models of sat(I)

Let M be a model of Sat(I).CM = {s S | M╞ sk}

LM (s): the smallest index j such that M╞ sj (i.e., s0, s1 ,…, sj-1 are false and sj is true), which we call the level of s w.r.t. M.

K(s) is defined iff s CM \ E and

K(s) {a Aag | Φ(s,a) = t ,

t CM , LM (t) < LM (s) }

Proposition Let I consist of a system A= (S, Aag, Φ, poss), where Φ is deterministic, a set Aag A, sets of states E S, and S S, an exogenous function exo, and a integer k. Then,

(i) S is k-maintainable w.r.t E iff sat(I) is satisfiable.(ii) Given any model M of sat(I), any control K constructed from the algorithm above k-maintains S w.r.t. E.

Reverse Encodinga b is equivalent to a b is equivalent to ( b) a is equivalent tob a is equivalent tob’ a’ is equivalent toa’ b’

Rearranging sat(I)(0) For all states s and for all j, 0 j <k:

sj sj+1 s’j s’j+1

(1) For all s E: s0 s’0

(2) For all states s, t such that Φ(a,s) = t for some action aexo(s): sk tk s’k tk'

(3) For all state s not in E and all i, 1 i k:

si tPS(s) ti-1 , s’i ^tPS(s) t’i-1

where

PS(s) = {t S | a Aag poss(s): t = Φ(a,s) };

(4) For all initial states s not in E: sk s’k

(5) For all states not in E: s0 s’0

b

cd

hf

g

a

a’

e

a

a

a

(6) b’0, c’0, d’0, f’0, g’0 (From 5)(7) g’1, g’2, g’3 (From 3)(8) b’1, c’1 (From 6 and 3)(9) f’3 (From 7 and 2)(10) f’2 (From 9 and 0)(11) f’1 (From 10 and 0)(12) b’2 (From 8, 11, and 3)Thus M = {g’3, g’2, g’1 , g’0, f’3, f’2, f’1 , f’0, b’2, b’1, b’0, c’1, c’0, d’0}LM(b) = 3LM(c) = 2LM(d) = 1

Polynomial time generation of control policy and maximal control

policyComputing a model of a Horn theory is a well-known polynomial problem (Dowling & Gallier 84). Thus,Theorem: Under deterministic state transitions, problem k-MAINTAIN is solvable in polynomial time.Maximal Control

Each satisfiable Horn theory T has the least model, MT, which is given by the intersection of all its models.The least model is computable in linear time in the size of the encoding.This model not only leads to a k-maintainable control, but also leads to a maximal control, in the sense that the control is defined on a greatest set of states outside E among all possible k-maintainable controls for S' w.r.t. E such that S is a subset of S'.

Dealing with non-deterministic transition functions

Notations:We say that there exists an a-path of length at most k 0 from a state s to a set of states S' , if either s S', or s S' , k > 0 and there is some action a Aag poss(s) such that for every t Φ(s,a) there exists an a-path of length at most k-1 from t to S'.s_ai, i > 0, will denote that there is an a-path from s to E of length at most i starting with action a.

The encoding sat'(I) has again groups (0)-(5) of clauses as follows:(0), (1), (4) and (5) are the same as in sat(I).(2) For any state s and t such that t Φ(a,s) for some action

a exo(s): sk tk

Dealing with non-deterministic transition functions (cont.)

(3) For every state s not in E and for all i, 1 i k :

(3.1) si (a Aag poss(s) ) s_ai;

(3.2) for every a Aag poss(s) and t Φ(s,a) : s_ai ti-1;

(3.3) for every a Aag poss(s) if i < k: s_ai s_ai+1 ;

A direct algorithmInitialization

For all states s not in E make s’0 true.For all states s not in E without any outgoing edges with agents actions then make s’0 … s’k true.For all states s, if agent action a is not executable in s then make s_a’0 … s_a’k true.

Repeat until no change or until s’k is true for some initial state s.

If s’i is true then make s’i-1 true. If s_a’i is true then make s_a’i-1 . true.If t Φ(a,s) for some exogenous action a and t’k is true then make s’k true.For any state s not in E

If t Φ(a,s) for some agent action a and t’i-1 is true then make s_a’i true.If for all agents actions a that is executable in s we have s_a’i then make s’i true.

A direct algorithm (cont.)If for some initial state s, s’k is true then the system is not k-maintainable, else construct super-control as follows:

For states s in E, K(s) is undefined and for other states K(s) = { a : s_a’k is not true}

Direct algorithm using counters

Idea: c[s] = i means s’0 … s’i and c[s_a] = i means s_a’0 … s_a’i

InitializationFor all states s not in E make s’0 true. c[s]:= 0.For all states s not in E without any outgoing edges with agents actions then make s’0 … s’k true. c[s] := k.For all states s, if agent action a is not executable in s then make s_a’0 … s_a’k true. c[s_a] := k.

The other steps are similar.The idea can then be extended to actions with durations (or costs).

Computational Complexityk-maintainability is PTIME-complete (under log-space reduction). PTIME-hardness holds for 1-maintainability, even if all actions are deterministic, and there is only one deterministic exogenous actionk-maintainability is EXPTIME-complete when we have a compact representation. EXPTIME-hardness holds for 1-maintainability, even if all actions are deterministic, and there is only one deterministic exogenous action

ConclusionHigh level goal specification is important.Certain important goal specification notions can not be expressed using existing goal representation languages.k-maintainability is an important notion.

finite-maintainability is reinvention of Dijkstra's notion of self-stabilization.

There is a big research community of self-stabilization in distributed control and fault tolerance.But they have not much focused on automatic generation of control (protocol, in their parlance)They have focused more on proving correctness of hand written protocol

Most specifications over infinite trajectories would be better of with k-maintainability like notions as part of the specification.

Role 1 of k: length of the window of opportunityRole 2 of k: bound within which maintenance is guaranteed

Conclusion (cont.)Sat encoding to Horn logic program encoding – an interesting and novel approach to design polynomial algorithms

One often does not think in terms of negative propositions.

THANK YOU!

Documents

From high level goals to policies: a polynomial time algorithm for k-maintainable goals