Real-Time Search Algorithmscs621-2011/cs621-2007/... · 2007. 10. 29. · Vadim Bulitko and Greg Lee. Learning in Real Time Search: A Unifying Framework. Journal of Artificial Intelligence

Real-Time Search Algorithms

● Chintan Shah (06305013)● Faraz Shahbazker

(06305008)● Khem Dhaval (06305021)● Manish Aggarwal

(06305005)

A Real World scenario

● A patient in the ICU hooked to numerous monitoring and life supporting devices

● Devices senses some type of trauma (shock) in patients conditions

● What to do?– Raises an alarm to call for doctor??– But ... in the time taken by doctor to come, the

condition of patient may deteriorate

continue...

● If only ...– A RTCS could diagnose the cause of shock,

have suggestions ready– Or take some precursor actions like reducing a

particular gas in a ventilator.

● Convergence of AI and Real-Time system

Mars Path Finder problem

● Agent needs to navigate the terrain to collect samples● Does not know the terrain in advance● Visibility limited by the strength of sensors● Hostile conditions

● Can you think of any algorithm that can be applied here??

GoalStart

Search in 2D grid

GoalCurrent

Interactive Strategy Gaming

● In many Real-Time strategy games– resources are scattered in the map– Agent has to travel from start state to some

goal state

● Each state of map is either passable or occupied by the wall

● Commitment to moves in constant time – Moving the army in AoE

Observations

● Map may be known/unknown● Real-time response is must● Optimum path not really required

Agenda

● MiniMin● Real-Time A* ● Learning – RTA*● Conclusion

Single Agent Search

● Examples:

– 8-puzzle problem– Autonomus navigation in a n/w of roads

● Admissible Heuristic search Algorithms:

– A*, IDA*– finds optimal path– A* Requires exponential amount of memory– Fails in case of giving quick sub optimal real

time answer

Limitation

● Tries to find the optimal solution

● Finds the whole path before making even the first move

– Can not work “fast enough”● Can not work with limited visibility

Two Player Games

● MiniMax

– Move in constant time– Limited search space– Sub optimal Decision

● Can we do some thing similar for SAS?

MiniMin lookahead search

● Specialization of MiniMax for single-agent problems

● Limited Search depth

– determined by available resources– limited by visibility

● Planning Mode:

– Apply the heuristic function at the leaves– Back up minimum value of children for each

node● Execution Mode:

– A single(?) move in the direction of the best child

MiniMin with nonuniform cost

● If the operators have nonuniform costs

● Use f(n) = g(n) + h(n)

● Back up the min. value of f(n)

1312

h=7

12

h=7 h=9 h=5

g=2

g=4g=3

g=4 g=5g=3

g=4

f=16f=12 f=16 f=13f=14

Time Limited A*

● Run A* until time runs out

● Make a single move in the direction of the best node in the OPEN list

● Time exponential in search depth.

● Will exhaust the memory if allotted per move is not very small

● Run minimin to select next state and continue in this manner until the goal state is found

● Cycles ?● Exclude previously visited states ● All successors of the current state may have

been already visited...

Generating a sequence of moves

RTA* (Real Time A*)

● Let s be the current node● h(n) – Found using depth bounded search

algo. such as MiniMin, Time Limited A* etc.● g(n) - distance from s (and NOT the

distance from the origin)● Choose the best neighbour● Store the f value of 2nd best neighbour with

s

h=6

h=5

h=10 h=11

h=6

h=5

h=10

h=11

h=12

h=12

h=10 h=11

h=6

h=12

h=10

h=11 h=11

h=11 h=11

RTA*

1. s = Sstart

2. If s ∈ G, then return success

3. a = one_of ( arg_min (h, succ (s)))

4. b = one_of ( arg_min_2 (h, succ (s)))

5. h(s) = 1+h(b)

6. Execute action for a

7. s = a

8. goto 2

Observations

● Requires memory linear in no. of EXECUTED moves

● Look ahead requires constant time because of limited search depth

● Execution interleaved with planning

Learning and Search

● Learn ... “how to search?”● Different problem instances in the same

search space● Different Start states, same Goal state● Improve with each successful search● How to apply previously acquired knowledge

(+ve/-ve) to find Goal-state?

Learning and Search● Naïve Approach

– Remember all optimal paths found so far.– Once you come across an optimal path, follow

previously saved trail to the Goal– Learning or Memorization?

● General Approach

– Learn the heuristic function– h(x) should become more precise ( / less

approximate) with every successful search– h(x) => h*(x)

Learning in vanilla A*● Optimal Path Memorization

– Hash-table of know nodes– Insert each node on optimal path along with it's

next pointer.● New data structure (hash-table) needed for

persistent memory

● Can vouch only for nodes on optimal path, but with full confidence

● No learning from explored nodes outside optimal path – negative examples are ignored.

Learning in vanilla A*● Improve the Heuristic ... but how?

– Apply ML techniques to learn h(x)● each successful search adds to the training set

by providing actual values of h*(x)● use the revised heuristic for next search

– Simple Revision● update h(x) to h*(x) for know paths● Danger: unknown siblings who under-

approximate h*(x) appear more attractive– Incremental Updates

● Step by step revision of h values● All h values (globally) converge to h*(x)

Why is learning so important?● Fits in well with our real-world notions

first survive, then thrive– Find a satisficing solution as soon as possible– Try to improve/optimize at leisure

● Most realtime search algorithms settle for non-optimal results

● Each subsequent execution has to pay a big price for the initial fast search

● With learning, this cost decreases over time

● It's easy ... at least for RTA*

Learning RTA*

● In RTA* we already have a hash-table

● We store all explored nodes with (2nd-best) heuristic

● Make hash-table persistent across searches

– Remember previously visited nodes and– Remember heuristic calculated at previous visit

● But ...using the 2nd-best heuristic for next search can cause problems

– A good node may appear less attractive than it's siblings because (2nd best > best)

Consequence of using 2nd Best h(n)

S1

Goal

b

a

S2

c

d

??h(c)=8h*(c)=20

h(d)=9

h(b) = (9+1)=10h*(b)=12

eh(e)=6

What if we use Best h(n)?

S1

Goal

b

a

S2

c

d

??h(c)=8h*(c)=20

h(d)=9

h(b) = (6+1) = 7h*(b)=12

eh(e)=6

Learning RTA* - Algorithm

1. s := Sstart

2. If s ∈ G, then return success

3. a = one_of ( arg_min (h, succ (s)))

4. h(s) = max (h(s), 1+h(a))

5. Execute action a

6. current_state, s = a

7. goto 2

2nd Best vs. Best

● Conservative approach● Bad approximation

of h(n)● Less commitment

(prone to back-tracking)

● Reaches solution slowly● Wander around

unexplored areas and learn more about the terrain

● Aggressive approach● Better approximation

of h(n)● Progressive and

committed(move forward)

● Reaches solution quickly

● Learn less about general terrain

4

5

6

10

To Goal State6

Example of RTA*

11

4

5

6

10

To Goal State6

11

7

5

6

10

To Goal State6

11

7

8

6

10

To Goal State6

Example of LRTA*

4

5

6

10

To Goal State6

5

4

5

6

10

To Goal State6

5

6

5

6

10

To Goal State6

5

6

6

6

10

To Goal State6

Prefers to move back rather than forward

Example of LRTA* (contd.)

7

6

6

6

10

To Goal State6

7

7

6

6

10

To Goal State6

7

7

6

8

10

To Goal State6

7

7

6

8

10

To Goal State6

See how a new path is explored here

Example of LRTA* (contd.)

7

7

7

8

10

To Goal State6

7

7

7

8

10

To Goal State7

Cycle is broken. Move forward.We had gone backwards, last

time we were here!!

11

7

8

6

10

To Goal State6

Observations

Final state of LRTA*

7

7

7

8

10

To Goal State7

Final state of RTA*

● Reached goal in 5 steps

● Did not explore extra path

● Moves forward quickly

● h-values rise suddenly and only along the exact path

● Reached goal in 10 steps

● Explored the extra path (shown in bold)

● Reluctant to move forward

● h-values rise slowly and uniformly all around the actual path => algo “learns” h(n)

Conclusions● Known search method are inadequate for certain

applications

– Key Issues● Limited Visibility● Real-time action required

– Solutions:● Forfeit optimality● Make locally optimal decisions● Interleave search(planning) and movement

● MiniMin – limited depth planning phase

● RTA* - Interleave planning and execution

● LRTA* - Learn from previous searches

References

● BL06. Vadim Bulitko and Greg Lee. Learning in Real Time Search: A Unifying Framework. Journal of Artificial Intelligence Research (JAIR), 25:119 157, 2006.

● KL06. Sven Koenig and Maxim Likhachev. Realtime adaptive a*. In AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, pages 281288, New York, NY, USA, 2006. ACM.

● Koe01. Sven Koenig. Minimax realtime heuristic search. Artif. Intell., 129(12):165197, 2001.

References (Contd.)

●Koe04. Sven Koenig. A comparison of fast search methods for real-time situated agents. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, volume 2, pages 864-871. IEEE Computer Society Washington, DC, USA, 2004.

●Kor90. Richard E. Korf. Real-time heuristic search. Artif. Intell., 42(2-3):189-211, 1990.

●MHA+94. David John Musliner, James Hendler, Ashok K. Agrawala, Edmund H. Durfee, Jay K. Strosnider, and C. J. Paul. The challenges of real-time AI. Technical Report CS-TR-3290, 1994.

Documents

Real-Time Search Algorithmscs621-2011/cs621-2007/... · 2007. 10. 29. · Vadim Bulitko and Greg Lee. Learning in Real Time Search: A Unifying Framework. Journal of Artificial Intelligence