2015_11_15_5353_5362

Journal of Computational Information Systems 11: 15 (2015) 5353{5362Available at http://www.Jofcis.com

A Linear Algorithm for Optimal Probabilistic Planning

Lijun WU 1;2;, Kai WANG 1, Jiajun LI 11School of Computer Science and Engineering, University of Electronic Science and Technology of

China, Chengdu 610054, China2School of Information Technology and Electrical Engineering, The University of Queensland,

Brisbane, Australia

Abstract

The maximum probability path problem has been applied in many real elds. However, the problemof computing a path of maximum probability often is transformed into a shortest path problem so asto use suitably the existed shortest path algorithm such as Dijkstra's algorithm. We propose a newalgorithm for maximum probability path problem, which is linear in size of a system and does notneed to transform. The algorithm mainly exploits probability-rst strategy, a probability ordered queueand some FIFO (First In First Out) queues. We prove our algorithm's soundness and completenesswith respect to optimal probability problem, and take project application as an example to show ouralgorithm's application.

Keywords: Maximum Probability Path; Shortest Path Problem; Probability-rst Strategy

1 Introduction

The method based on probability has gotten wider and wider application [1]. Especially, thecomputation of maximum probability path has received more and more attention [2-5]. However,these applications have the following shortcomings. Firstly, they did not consider how to selectbest one in multi maximum probability paths; secondly, they did not consider inherent character-istic of maximum probability path, but simply transform the maximum probability path probleminto the shortest path problem; thirdly, they did not associate the probabilistic transitions withactions of agents.

In this paper, we propose a new algorithm for maximum probability path problem in multi-agent systems, which is linear in size of a system. The algorithm exploits probability-rst strategy,a probability ordered queue and some FIFO queues. The strategy and data structure make ouralgorithm has good performance. We rst observe the following example.

Example 1 project application.

?Project supported by the National Nature Science Foundation of China (Nos. 61073033 and 61370072).Corresponding author.Email address: [email protected] (Lijun WU).

1553{9105 / Copyright 2015 Binary Information PressDOI: 10.12733/jcis13602August 1, 2015

5354 L. Wu et al. /Journal of Computational Information Systems 11: 15 (2015) 5353{5362

Assume that the process of project application consists of three steps: topic selection, writing,and oral response to reviewers. In the step of topic selection, the applicant collects relatedinformation, and selects and decides the topic of project to be applied; in the second step, theapplicant nishes writing proposal according to the topic above and the collected information;nally he carries out an oral response to the reviewers of project. For an applicant, while nishingtopic selection, there are three states: good topic, fair topic, and poor topic, denoted by s11; s12and s13, respectively; while nishing writing, there also are three states: good proposal, fairproposal and poor proposal, denoted by s21; s22 and s23, respectively; similarly, while nishingthe last step, there also are three states: good response, fair response and poor response, denotedby s31; s32 and s33, respectively. The start state is denoted by s0.

Now person A and person B plan to apply a project cooperatively. Assume for every person,the probabilistic transitions of states are listed in Tables 1(a) and 1(b).

Clearly, it is very important to determine how they cooperate according to the two tables,namely for every step, to decide who performs the work of this step. The key factor aectingthe decision is how to make the probability from the start to the last step be maximum, namely,the probability of reaching the state s31 from the start state s0 is maximum, which in fact is justprobabilistic planning problem. In real world, there are many similar problems that need to besolved. Thus it necessary to set up a general model and an algorithm of probability planning onthe model, which is focus of the paper.

Table 1: Probabilistic transitions of states(a) Transitions for person A

s11 s12 s13 s21 s22 s23 s31 s32 s33

s0 0:60 0:25 0:15

s11 0:55 0:30 0:15

s12 0:45 0:40 0:15

s13 0:40 0:45 0:15

s21 0:25 0:40 0:35

s22 0:46 0:40 0:14

s23 0:35 0:45 0:20

(b) Transitions for athlete B

s11 s12 s13 s21 s22 s23 s31 s32 s33

s0 0:56 0:34 0:10

s11 0:45 0:35 0:20

s12 0:41 0:30 0:28

s13 0:30 0:40 0:30

s21 0:35 0:30 0:35

s22 0:35 0:40 0:25

s23 0:48 0:40 0:12

2 Related Work

2.1 Maximum probability path

The maximum probability path problem has been applied in many real elds. There have beenmany related works [2-5]. A. Zalesky proposed a robust probabilistic ber tracking algorith-m [2] that overcomes the unreliability of locally greedy algorithm. His algorithm oered bettertractability than probabilistic approaches, yielding a single well-dened trajectory, and being

L. Wu et al. /Journal of Computational Information Systems 11: 15 (2015) 5353{5362 5355

guaranteed to yield the same trajectory between two points of interest irrespective of which pointis used as a seed. In Andrew Zalesky's ber tracking algorithm, he transformed the problem ofcomputing a path of maximum probability into a shortest path problem so as to use suitably theexisted shortest path algorithm such as Dijkstra's algorithm [6, 7].

A. Robles-Kelly and E. R. Hancock developed a graph-spectral method for path estimation [3].The basic idea of the method is to exploit the steady random walk on the graph as an estimationof the maximum probability path across the graph, and use this path for surface integration andheight recover.

The methods above transformed the maximum probability path problem into the shortest pathproblem or other problem, which are non-linear in size of the graph.

2.2 Probabilistic planning

In general, probabilistic planning problems can be divided into two types: fully observable (orwith no observability) and partially observable.

Some research eorts adopt a planning model based on fully observable (or with no observ-ability) markov decision process (MDP) [13-16]. The typical work is the algorithm proposed byNicholas Kushmerick et. al. They dened the probabilistic planning problem in terms of a prob-ability distribution over states, the goal represented by a Boolean combination of propositions, aprobability threshold, and actions. The aim of the algorithm is to seek plans whose probabilityof success exceeds the threshold.

In recent years, partially observable MDPs (POMDPs) are received wide attention. POMDPis a powerful probabilistic model with partially observable states and uncertain actions, whichhas a lot of important applications including clinical decision, dialog management, and controlpolicies for robots. Much related work has been performed [17-19]. However, for POMDP, thereis the size limits of tractability.

3 Denitions and Notations

In this section, we dene the optimal probabilistic planning problem. The goal is expressed byusing propositional logic formula. The objective states are those states where the given goalis true. Here, we consider environment uncertain. In the following, we rst give some basicdenitions.

3.1 Basic denitions

Denition 1 (Probabilistic transition of a multi-agent system) The probabilistic transition of amulti-agent system is modeled as a probabilistic function P : SS ! [0; 1], where S is the setof all states in a multi-agent system and is the set of all actions of all agents in a multi-agentsystem. P (s; a; s0) is the probability that state s will be transited to state s0 after action a of someagent is performed.

Denition 2 (Enriched path) An enriched path is dened as a transition sequence s0a1! s1 a2!

an! sn, where si 2 S and P (si1; ai1; si) > 0 for all i 2 f1; 2; ; ng. We say the action


sequence (ai)ni=1 is an induced action sequence of the enriched path, and the enriched path is an

inducing path of (ai)ni=1.

For the convenience of expression, we denote an action sequence a1a2 an by (ai)ni=1.Let r be an enriched path. We use (r) to denote the induced action sequence of r.

3.2 Probabilistic planning problem

After giving some basic denitions, we now dene our key problem, namely, probabilistic planningproblem.

Denition 3 (Probabilistic planning problem) A probabilistic planning problem is 5-tuple < s0;S;; T ;G >, where s0 is the initial state, S is the set of all states in a multi-agent system, is theset of all actions, T is the probabilistic transition, and G is the set of all objective states.

Denition 4 (Objective path) An objective path in a probabilistic planning problem < s0;S;; T ;G > is an enriched path with the form of s0 a1! s1 a2! an! sn, where s0 is the initial state and snis an objective state.

Denition 5 (Maximum Probability objective Path) A maximum probability objective path is anobjective path that has maximum probability in all objective paths.

Denition 6 (Optimal objective path) An optimal objective path is dened as an objective paththat has shortest length in all maximum probability objective paths.

Denition 7 (Solution) If s0a1! s1 a2! an! sn is an objective path, then we say the action

sequence (ai)ni=1 is a solution of the probabilistic planning problem.

Denition 8 (Solution equivalence) We say two solutions are equivalent if their inducing pathsare optimal objective paths.

Given a probabilistic planning problem, our aim is to design an algorithm that compute anaction sequence contained in an optimal objective path. Thus we dene an optimal solution forthe probabilistic planning problem as follows.

Denition 9 (Optimal solution) Assume =< s0;S;; T ;G > is a probabilistic planning prob-lem, and (ai)

ni=1 is a sequence of actions. (ai)

ni=1 is said to be an optimal solution to if and

only if there is an optimal objective path that induces action sequence (ai)ni=1.

Clearly, two optimal solutions are equivalent.

4 Optimal Probabilistic Planning Algorithm

In this section, we propose an optimal probabilistic planning algorithm with linear time complex-ity, called OPP algorithm. This algorithm consists of OPP top-level algorithm, objective pathcomputation algorithm, and state extension algorithm.

We rst describe the data structures used in OPP algorithm.


4.1 Data structures

For every state, we add ve elds: probability, distance, predecessor, objective, and visited. Theinitial values of the rst four elds of every state are 0 except that the initial arrival probabilityof the initial state is 1, and the values of objective eld of every objective state is 1, those ofother states are 0. The probability eld of a state records its current arrival probability; thedistance eld records the distance from the initial state to the state along the current path, andthe distance of every step is 1; the predecessor eld hints the direct predecessor of this state inthe current path; the objective eld shows the state is an objective state if its objective eld'svalue is 1, otherwise, it is not; the visited led shows if the state is visited or extended. If thevalue of the visited eld of the state is 0, then the state has not been visited; if the value is 1,then the state has been visited but not extended; if the value is 2, then the state has been visitedand extended.

OPP needs a state array, a probabilistic transition matrix, an probability ordered queue andsome FIFO queues. We use a probabilistic transition matrix to describe the probabilistic transitionfunction. Every element of the probabilistic transition matrix is a 4-tuple with the form , where p > 0 is the probability that the state s transits to t under the action a, namely,P (s; a; t) = p > 0. Thus we also call < s; t; a; p > a probabilistic transition.

State array S: store all states in the planning problem. All states are numbered from 1to N and the array is ordered by these numbers, where N is the state number of planningproblem.

The probabilistic transition matrix T : an adjacent matrix, store all probabilistic transitionsin the planning problem.

FIFO queues: every FIFO queue stores the visited and unextended states with the samearrival probability. We assume the basic operations over any FIFO queue, such as enqueue()that inserts an element at the end of a queue and dequeue() that pops an element from thebeginning of a queue.

The probability ordered queue: It stores the arrival probabilities of the visited and unex-tended states by increasing order of probabilities. Every element in the probability orderedqueue points to a FIFO queue.

4.2 OPP algorithm: top-level

In this algorithm, to nd a optimal objective path, we exploit probability-rst strategy. Thus,in every while loop, OPP rst selects the last element in the probability ordered queue U andthe FIFO queue which the element points to (End(U):queue), where End(U) is the last elementof U , and End(U):queue means the FIFO queue which End(U) points to. Clearly, in all stateswhich have been visited and unextended at the current moment, the states in the FIFO queuehave the same maximum arrival probability.

If there are objective states in the FIFO queue (End(U):queue), then OPP can nd an optimalobjective path according to these objective states by invoking the function OPcompute(); other-wise, it continues to extend the FIFO queue V by invoking the function Extend(). This processis described in Algorithm 1.


Insertqq(s;U) is a function that if there is an element in U whose value is equal to s:probability,then s is added to the FIFO queue which the element points to. Otherwise, s:probability is insertedinto the probability ordered queue U , and a new FIFO queue is created, and s:probability pointsto the FIFO queue, and s is added to the FIFO queue.

The function delqueue(U) is to delete the last element from the probability ordered queue U ,and the FIFO queue which the last element points to.

Algorithm 1 OPP algorithm: top-level

input a probability planning problem < s0;S;; T ;G >;output an optimal objective path;Var U : probability ordered queue;function OPP()Var t: state; : state set; V : FIFO queue; beginInsertqq(s0;U);s0:probability 1;s0:visited 1;s0:distance 0;while U is not empty doV := End(U):queue;

ftjt:objective = 1 and t 2 Vg;if is not empty thenOPcompute();

elseExtend(V);delqueue(U);

end ifend while

return FAIL;endfunction Insertqq(s;U)Var t: real number variable; beginif there is t 2 U such that t = s:probability then(t:queue):enqueue(s);

elseinsert(s:probability;U);(End(U):queue):enqueue(s);

end ifend

4.3 Optimal objective path computation

In OPP 's search, when the set of states which have been visited and unextended contains ob-jective states, the enriched paths from the initial state to these objective states are maximumprobability objective paths, but the enriched paths from the initial state to other objective states


are impossibly maximum probability objective paths due to the usage of the probability-rst s-trategy. Thus according to the set Q consisting of these objective states, we can nd an optimalobjective path by Algorithm 2.

To nd an optimal solution, we traverse the set Q. In the traversing process, the variable shortdrecords the current shortest path in the found maximum probability objective paths (see Lines 11and 12). Finally when all objective states in Q have been traversed, the recorded shortest pathis just an optimal objective path. The function path(s0; shortd) is to compute the enriched pathfrom the initial state s0 to the state shortd, namely the recorded shortest path. Note that forevery state, tis predecessor eld has unique value, thus there is unique path from the initial states0 to the state. In fact, beginning from the state shortd, by using the predecessor eld of everystate in turn, path(s0; shortd) can get reversely the recorded shortest path from s0 to shortd, andits induced action sequence just is an optimal solution of this probabilistic planning problem.

Algorithm 2 Optimal path computation

function OPcompute(Q)/* Q is a set of objective states */Var s; shortd: state; beginshortd:distance 0;for s 2 Q doif shortd:distance = 0 then/*the rst maximum probability objective path*/shortd s;

elseif s:distance < shortd:distance thenshortd s;

end ifend if

end foroutput (path(s0; shortd)); terminate;end

4.4 State extension

Given a FIFO queue V , the state extension is to deal with all possible successors of every statein V , and these successors have smaller arrival probability than elements in V , which is outlinedin Algorithm 3. For every state s in V , there are three cases for every successor t of s: the rstis the case that t has not be visited; the second is that t has been visited but not extended; thethird is that t has been visited and extended.

For the rst case, it is simple to deal with. For the second case, t:visited = 1 shows t isa visited and unextended state, thus there has been unique path R from the initial state to t,whose probability is t:probability. t:probability < (s:probabilityp) means that there now is a newpath from the initial state to s and then to t, and this path has higher probability than R, thus itis necessary to replace the old path R with the new path, and change correspondingly the orderedqueue by the functions Destate(t;U) and Insertqq(t;U). For the third case, because t has beenvisited and extended, the arrival probability of t is larger than s:probability probability(s; t).Thus we do not need to make any change to t.


The function Destate(t;U) is to delete t from the FIFO queue that the element with valuet:probability in U points to, and delete t:probability from U if the FIFO queue has unique state t.

Algorithm 3 State extension algorithm

function Extend(V)/* V is a FIFO queue */Var s; t: state; beginwhile V is not empty dos := V:dequeue();for all t 2 successor(s) doif t:visited = 0 thent:distance (s:distance+ 1);t:probability (s:probability probability(s; t));t:visited 1;t:predecessor s;Insertqq(t;U);

elseif t:visited = 1 thenif (t:probability < s:probability probability(s; t)) thent:distance (s:distance+ 1);t:probability (s:probability probability(s; t));t:predecessor s;Destate(t;U);Insertqq(t;U);

end ifend if

end ifend fors:visited 2;Destate(s;U);

end whileend

5 Complexity Analysis

We easily prove the soundness and completeness of OPP algorithm with similar line of refer-ence [15]. Thus, in this section, we only discuss the complexity of our algorithm.

Lemma 1 The complexity of optimal objective path computation algorithm is linear in size ofinput parameter.

Theorem 1 Suppose the state number and the edge (transition) number of an optimal probabilis-tic planning problem are N and M , respectively. Then the time complexity of OPP algorithm isO(N +M).


From Algorithm 1, we know the main factors aecting the complexity of OPP algorithm arethe functions OPcompute(), Extend(V) and delqueue(U). Because OPcompute() is linear insize of input parameter, the accumulated operation number of OPcompute() in all iterations isO(N). We assume that the probability is expressed by a pure decimal number with accuracy ofc decimal digits. Then U has at most 10c elements. Because U is a probability ordered queue,Insertqq(t;U) needs O(log(10c))=O(c) operations. Thus the total operations of Extend(V) in alliterations are O(c M +N)=O(M +N). Clearly, the function delqueue(U) needs in total O(N)in all iterations. Thus the complexity of Algorithm 1 is O(M +N), linear with respect to the sizeof the system.

6 A Case Study

We still take Example 1 as a case study. The actions for topic selection, writing and responseperformed by person A are denoted by a1; a2 and a3, respectively. Similarly, the actions for topicselection, writing and response performed by person B are denoted by b1; b2 and b3, respectively.

Now we describe the process of extension. Beginning from initial state s0, the arrival prob-abilities of states s11; s12 and s13 after the rst extension are 0:60(a1); 0:34(b1), and 0:15(a1),respectively. Selecting state s11 with current maximum probability to carry out the second exten-sion, the arrival probability of state s21 is 0:33(a1a2). Selecting state s12 with current maximumprobability 0:34(b1) to carry out the third extension, the arrival probability of state s21 remains0:34(a1a2) because 0:34 0:45 < 0:33. Selecting state s21 with current maximum probability0:33(a1a2) to carry out the fourth extension, the arrival probability of state s31 is 0:15(a1a2b3).

Thus the enriched objective path with maximum arrival probability is s0a1! s11 a2! s21 b3! s31.

7 Conclusion

This paper's main contributions are listed as follows.

1. We proposed a linear algorithm for optimal probabilistic planning, namely OPP algorithm.The algorithm is mainly based on probability-rst strategy, a probability ordered queue andsome FIFO queues.

2. We dened and proved OPP algorithm's soundness and completeness with respect to opti-mal probabilistic planning problem.

3. We gave a proof of linear complexity of OPP algorithm.

4. We took project application as a case study to show OPP algorithm's application.

References

[1] X. Qiu, X. Huang, L. Wu, Probabilistic Text Categorization using Sparse Topical Encoding, Jour-nal of Computational Information Systems, 2009, 5(3): 1317-1329.


[2] A. Zalesky, Dt-mri ber tracking: A shortest paths approach, IEEE Transactions on MedicalImaging 27 (2008) 1458-1471.

[3] A. Robles-Kelly and E. R. Hancock, Steady state random walks for path estimation, Proc. of theinternational conference of Structural, Syntactic, and Statistical Pattern Recognition, (2004), pp.143-152.

[4] J. G. David Forney, The viterbi algorithm, Proceeding of the IEEE 61 (1973) 268-278.

[5] S. Lim, H. Balakrishnan, D. Giord, S. Madden and D. Rus, Stochasticmotion planning andapplications to trac, The International Journal of Robotics Research 30 (2011) 699-712.

[6] D. Bertsekas and R. Gallagher, Data Networks, 2nd ed (New York: Prentice Hal, 1992).

[7] R. K. Ahuja, T. L. Magnanti and J. B. Orlin, Network Flows: Theory, Algorithms and Applications(Englewood Clis, NJ: Prentice Hall, 1993).

[8] E. Dijkstra, A note on two problems in connexion with graphs, Numerical Mathe-matics 1 (1959)269-271.

[9] D. Johnson, Ecient algorithms for shortest paths in sparse networks, Journal of the Associationfor Computing Machinery 24 (1977) 1-13.

[10] R. Raman, Recent results on single-source shortest paths problem, SIGACT News 28 (1997) 81-87.

[11] R. Ahuja, K. Mehlhorn, J. Orlin and R. Tarjan, Faster algorithms for the shortest path problem,Journal of the Association for Computing Machinery 37 (1990) 213-223.

[12] B. Cherkasky, A. Goldberg and T. Radzik, Shortest-paths algorithms: Theory and experimentalevaluation, Mathematical Programming 73 (1996) 129-174.

[13] T. Dean, L. Kaelbling, J. Kirman and A. Nicholson, Planning with deadlines in stochastic domains,Proc. 11th Nat. Conf. on A.I., (1993).

[14] S. Koenig, Optimal probabilistic and decision-theoretic planning using markovian decision theory,Proc. of UCB/CSD 92/685, Berkeley, (1992).

[15] N. Kushmerick, S. Hanks and D. S. Weld, An algorithm for probabilistic planning, ArticialIntelligence 76 (1995) 239-286.

[16] C. Domshlak and J. Homann, Probabilistic planning via heuristic forward search and weightedmodel counting, Journal of Articial Intelligence Research 30 (2007) 565-620.

[17] S. Sanner and K. Kersting, Symbolic dynamic programming for rst-order pomdps, Proc. of the24th AAAI Conference on Articial Intelligence (AAAI-10)., (2010), pp. 1140-1146.

[18] C. Wang and R. Khardon, Relational partially observable mdps, Proc. of the 24th AAAI Confer-ence on Articial Intelligence (AAAI-10), (2010), pp. 1153-1158.

[19] T. Smith and R. Simmons, Heuristic search value iteration for pomdps, Proc. of the 20th conferenceon Uncertainty in articial intelligence, (2004), pp. 520-527.

[20] R. Bellman, A markovian decision process, Journal of Mathematics and Mechanics 6 (1957) 679-684.

Documents

2015_11_15_5353_5362