Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Real-Time Search Algorithms
● Chintan Shah (06305013)● Faraz Shahbazker
(06305008)● Khem Dhaval (06305021)● Manish Aggarwal
(06305005)
A Real World scenario
● A patient in the ICU hooked to numerous monitoring and life supporting devices
● Devices senses some type of trauma (shock) in patients conditions
● What to do?– Raises an alarm to call for doctor??– But ... in the time taken by doctor to come, the
condition of patient may deteriorate
continue...
● If only ...– A RTCS could diagnose the cause of shock,
have suggestions ready– Or take some precursor actions like reducing a
particular gas in a ventilator.
● Convergence of AI and Real-Time system
Mars Path Finder problem
● Agent needs to navigate the terrain to collect samples● Does not know the terrain in advance● Visibility limited by the strength of sensors● Hostile conditions
● Can you think of any algorithm that can be applied here??
GoalStart
Search in 2D grid
GoalCurrent
Interactive Strategy Gaming
● In many Real-Time strategy games– resources are scattered in the map– Agent has to travel from start state to some
goal state
● Each state of map is either passable or occupied by the wall
● Commitment to moves in constant time – Moving the army in AoE
Observations
● Map may be known/unknown● Real-time response is must● Optimum path not really required
Agenda
● MiniMin● Real-Time A* ● Learning – RTA*● Conclusion
Single Agent Search
● Examples:
– 8-puzzle problem– Autonomus navigation in a n/w of roads
● Admissible Heuristic search Algorithms:
– A*, IDA*– finds optimal path– A* Requires exponential amount of memory– Fails in case of giving quick sub optimal real
time answer
Limitation
● Tries to find the optimal solution
● Finds the whole path before making even the first move
– Can not work “fast enough”● Can not work with limited visibility
Two Player Games
● MiniMax
– Move in constant time– Limited search space– Sub optimal Decision
● Can we do some thing similar for SAS?
MiniMin lookahead search
● Specialization of MiniMax for single-agent problems
● Limited Search depth
– determined by available resources– limited by visibility
● Planning Mode:
– Apply the heuristic function at the leaves– Back up minimum value of children for each
node● Execution Mode:
– A single(?) move in the direction of the best child
MiniMin with nonuniform cost
● If the operators have nonuniform costs
● Use f(n) = g(n) + h(n)
● Back up the min. value of f(n)
1312
h=7
12
h=7 h=9 h=5
g=2
g=4g=3
g=4 g=5g=3
g=4
f=16f=12 f=16 f=13f=14
Time Limited A*
● Run A* until time runs out
● Make a single move in the direction of the best node in the OPEN list
● Time exponential in search depth.
● Will exhaust the memory if allotted per move is not very small
● Run minimin to select next state and continue in this manner until the goal state is found
● Cycles ?● Exclude previously visited states ● All successors of the current state may have
been already visited...
Generating a sequence of moves
RTA* (Real Time A*)
● Let s be the current node● h(n) – Found using depth bounded search
algo. such as MiniMin, Time Limited A* etc.● g(n) - distance from s (and NOT the
distance from the origin)● Choose the best neighbour● Store the f value of 2nd best neighbour with
s
h=6
h=5
h=10 h=11
h=6
h=5
h=10
h=11
h=12
h=12
h=10 h=11
h=6
h=12
h=10
h=11 h=11
h=11 h=11
RTA*
1. s = Sstart
2. If s ∈ G, then return success
3. a = one_of ( arg_min (h, succ (s)))
4. b = one_of ( arg_min_2 (h, succ (s)))
5. h(s) = 1+h(b)
6. Execute action for a
7. s = a
8. goto 2
Observations
● Requires memory linear in no. of EXECUTED moves
● Look ahead requires constant time because of limited search depth
● Execution interleaved with planning
Learning and Search
● Learn ... “how to search?”● Different problem instances in the same
search space● Different Start states, same Goal state● Improve with each successful search● How to apply previously acquired knowledge
(+ve/-ve) to find Goal-state?
Learning and Search● Naïve Approach
– Remember all optimal paths found so far.– Once you come across an optimal path, follow
previously saved trail to the Goal– Learning or Memorization?
● General Approach
– Learn the heuristic function– h(x) should become more precise ( / less
approximate) with every successful search– h(x) => h*(x)
Learning in vanilla A*● Optimal Path Memorization
– Hash-table of know nodes– Insert each node on optimal path along with it's
next pointer.● New data structure (hash-table) needed for
persistent memory
● Can vouch only for nodes on optimal path, but with full confidence
● No learning from explored nodes outside optimal path – negative examples are ignored.
Learning in vanilla A*● Improve the Heuristic ... but how?
– Apply ML techniques to learn h(x)● each successful search adds to the training set
by providing actual values of h*(x)● use the revised heuristic for next search
– Simple Revision● update h(x) to h*(x) for know paths● Danger: unknown siblings who under-
approximate h*(x) appear more attractive– Incremental Updates
● Step by step revision of h values● All h values (globally) converge to h*(x)
Why is learning so important?● Fits in well with our real-world notions
first survive, then thrive– Find a satisficing solution as soon as possible– Try to improve/optimize at leisure
● Most realtime search algorithms settle for non-optimal results
● Each subsequent execution has to pay a big price for the initial fast search
● With learning, this cost decreases over time
● It's easy ... at least for RTA*
Learning RTA*
● In RTA* we already have a hash-table
● We store all explored nodes with (2nd-best) heuristic
● Make hash-table persistent across searches
– Remember previously visited nodes and– Remember heuristic calculated at previous visit
● But ...using the 2nd-best heuristic for next search can cause problems
– A good node may appear less attractive than it's siblings because (2nd best > best)
Consequence of using 2nd Best h(n)
S1
Goal
b
a
S2
c
d
??h(c)=8h*(c)=20
h(d)=9
h(b) = (9+1)=10h*(b)=12
eh(e)=6
What if we use Best h(n)?
S1
Goal
b
a
S2
c
d
??h(c)=8h*(c)=20
h(d)=9
h(b) = (6+1) = 7h*(b)=12
eh(e)=6
Learning RTA* - Algorithm
1. s := Sstart
2. If s ∈ G, then return success
3. a = one_of ( arg_min (h, succ (s)))
4. h(s) = max (h(s), 1+h(a))
5. Execute action a
6. current_state, s = a
7. goto 2
2nd Best vs. Best
● Conservative approach● Bad approximation
of h(n)● Less commitment
(prone to back-tracking)
● Reaches solution slowly● Wander around
unexplored areas and learn more about the terrain
● Aggressive approach● Better approximation
of h(n)● Progressive and
committed(move forward)
● Reaches solution quickly
● Learn less about general terrain
4
5
6
10
To Goal State6
Example of RTA*
11
4
5
6
10
To Goal State6
11
7
5
6
10
To Goal State6
11
7
8
6
10
To Goal State6
Example of LRTA*
4
5
6
10
To Goal State6
5
4
5
6
10
To Goal State6
5
6
5
6
10
To Goal State6
5
6
6
6
10
To Goal State6
Prefers to move back rather than forward
Example of LRTA* (contd.)
7
6
6
6
10
To Goal State6
7
7
6
6
10
To Goal State6
7
7
6
8
10
To Goal State6
7
7
6
8
10
To Goal State6
See how a new path is explored here
Example of LRTA* (contd.)
7
7
7
8
10
To Goal State6
7
7
7
8
10
To Goal State7
Cycle is broken. Move forward.We had gone backwards, last
time we were here!!
11
7
8
6
10
To Goal State6
Observations
Final state of LRTA*
7
7
7
8
10
To Goal State7
Final state of RTA*
● Reached goal in 5 steps
● Did not explore extra path
● Moves forward quickly
● h-values rise suddenly and only along the exact path
● Reached goal in 10 steps
● Explored the extra path (shown in bold)
● Reluctant to move forward
● h-values rise slowly and uniformly all around the actual path => algo “learns” h(n)
Conclusions● Known search method are inadequate for certain
applications
– Key Issues● Limited Visibility● Real-time action required
– Solutions:● Forfeit optimality● Make locally optimal decisions● Interleave search(planning) and movement
● MiniMin – limited depth planning phase
● RTA* - Interleave planning and execution
● LRTA* - Learn from previous searches
References
● BL06. Vadim Bulitko and Greg Lee. Learning in Real Time Search: A Unifying Framework. Journal of Artificial Intelligence Research (JAIR), 25:119 157, 2006.
● KL06. Sven Koenig and Maxim Likhachev. Realtime adaptive a*. In AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, pages 281288, New York, NY, USA, 2006. ACM.
● Koe01. Sven Koenig. Minimax realtime heuristic search. Artif. Intell., 129(12):165197, 2001.
References (Contd.)
●Koe04. Sven Koenig. A comparison of fast search methods for real-time situated agents. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, volume 2, pages 864-871. IEEE Computer Society Washington, DC, USA, 2004.
●Kor90. Richard E. Korf. Real-time heuristic search. Artif. Intell., 42(2-3):189-211, 1990.
●MHA+94. David John Musliner, James Hendler, Ashok K. Agrawala, Edmund H. Durfee, Jay K. Strosnider, and C. J. Paul. The challenges of real-time AI. Technical Report CS-TR-3290, 1994.