27
External Memory Value Iteration Stefan Edelkamp, Shahid Jabbar Chair for Programming Systems, University of Dortmund, Germany Blai Bonet Departamento de Computacion Universidad Simon Bolivar, Caracas, Venezuela

External Memory Value Iteration

  • Upload
    ave

  • View
    55

  • Download
    0

Embed Size (px)

DESCRIPTION

External Memory Value Iteration. Stefan Edelkamp, Shahid Jabbar Chair for Programming Systems, University of Dortmund, Germany Blai Bonet Departamento de Computacion Universidad Simon Bolivar, Caracas, Venezuela. Agent. s t. c t. a t. Environment. Motivation: Reinforcement Learning. - PowerPoint PPT Presentation

Citation preview

Page 1: External Memory Value Iteration

External Memory Value Iteration

Stefan Edelkamp, Shahid JabbarChair for Programming Systems, University of Dortmund, GermanyBlai BonetDepartamento de ComputacionUniversidad Simon Bolivar, Caracas, Venezuela

Page 2: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 2

Motivation: Reinforcement Learning

Aim: Write Controller to act successfully in the environment

Minimize Cost/Maximize Rewards

Agent

Environment

atctst

Page 3: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 3

Motivation: External Reinforcement Learning

Cover deterministic, non-deterministic, probabilistic environments (and games)

But what to do, if the agent’s state space or policy space is too large to be computed and stored in RAM?

Disk Space is Cheap (500 GB ~ 100$)

External Memory Algorithm

Page 4: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 4

Overview

Uniform Search Model Internal Memory Value Iteration Existing External Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook

Page 5: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 5

Overview

Uniform Search Model Internal Memory Value Iteration Existing External Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook

Page 6: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 6

Uniform Search Modell: Deterministic

Non-Deterministic

Probabilistic

Page 7: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 7

Overview

Uniform Search Model Internal Memory Value Iteration Existing External Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook

Page 8: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 8

ε-Optimal for solving MDPs, AND/OR trees…

Problem:Needs to have the whole state space in the main memory.

Page 9: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 9

Why External Memory Algorithms ?

Search algorithms perform well as long as they consume RAM only!

Virtual memory slows down the performance!

0x000…000

0xFFF…FFF

Virtual Address Space

Memory Page

7 I/Os

Page 10: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 10

Overview

Uniform Search Model Internal Memory Value Iteration Existing External Memory Model

and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook

Page 11: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 11

External Memory Model [Vitter and Shriver, 94]

M

If the input size is very large, running time depends on the I/Os rather than on the number of instructions.

BN

BNONsort

BNONscan

BMlog)(

)(

Input of size N >> M

B

Page 12: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 12

External Breadth-First Search (Munagala and Ranade, SODA’99)

A

D

C

B

E

A

Open (0)

A

A

D

D

E

Extern

al

Sort

Open

(2)

A

D

E

Compa

ct

Open

(2)

D

E

Remov

e

Duplic

ates

w.r.

t 2

prev

ious

layer

s

Open (2)

B

C

Open (1)

D

A

A

D

E

For undirected graphs, subtracting two layers is enough [Munagala & Ranade, 99].

For directed graphs, the longest back-edge has to be taken into account [Zhou & Hansen, 05].

Page 13: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 13

External Memory Algorithms for Implicit Graphs Frontier Search [Korf, 03] External A* [Edelkamp, Jabbar, Schrödl, 04] Structured Duplicate Detection [Zhou & Hansen,

04]. Cost-Optimal External Planning [Edelkamp,

Jabbar, 06] Model Checking for Linear Temporal Logic

[Jabbar & Edelkamp, 05] for safety error detection [Edelkamp & Jabbar, 06] for liveness detection (cycle) [Barnat, Brim, Simecek, 07] for liveness detection (cycle)

Real-Time Model Checking/Scheduling [Edelkamp, Jabbar, 06]

Page 14: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 14

Overview

Uniform Search Model Internal Memory Value Iteration Existing External Memory Model and

BFS External Memory Value Iteration Experimental Highlights Summary & Outlook

Page 15: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 15

External Memory Algorithm for Value Iteration

What makes value iteration different from the usual external memory search algorithms?

Answer: Propagation of information from states to

predecessors! Edges are more important than the states.

Ext-VI works on Edges:

vuwhereavhvu a ,,,,

Page 16: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 16

External Memory Value Iteration Phase I: Generate the edge space by External BFS.

Open(0) = Init; i = -1 while (Open(i-1) != empty)

Open(i) = Succ(Open(i-1)) Externally-Sort-and-Remove-Duplicates(Open(i)) for loc = 1 to Locality(Graph)

Open(i) = Open(i) \ Open(i - loc) i++

endwhile

Merge all BFS layers into one edge list on disk!Opent = Open(0) U Open(1) U … U Open(DIAM)Temp = Opent Sort Opent wrt. the successors; Sort Temp wrt. the predecessors

Remove previous layers

Page 17: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 17

Working of Ext-VIPhase-II

{(Ø, 1), (1,2), (1,3), (1,4), (2,3), (2,5), (3,4), (3,8), (4,6), (5,6), (5,7), (6,9), (7,8), (7,10), (9,8), (9,10)}

{(Ø,1), (1,2), (1,3), (2,3), (1,4), (3,4), (2,5), (4,6), (5,6), (5,7), (3,8), (7,8), (9,8), (6,9), (7,10), (9,10)}

3 2 2 2 2 1 2 0 1 1 1 1 0 0 0 0

3 2 2 2 2 2 1 1 1 1 0 0 0 1 0 03 2 1 1 2 2 2 2 2 1 0 0 0 1 0 0

1

2

3

4

7

8

9

5

6

10I T Th=3

2

2

2

1

1

1

1

0 0

Temp : Edge List on Disk – Sorted on Predecessors

Opent : Edge List on Disk – Sorted on Successors

h=

h=

h’=

uSuccvvhuh'

min1

Alternate sorting and update until residual < epsilon

Page 18: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 18

Complexity Analysis Phase-I: External Memory

Breadth-First Search. Expansion:

Scanning the red bucket: O(scan(|E|))

Duplicates Removal: Sorting the green bucket having

one state for every edge from the red bucket.

Scanning and compaction. O(sort(|E|))

Subtraction: Removing states of blue buckets

(duplicates free) from the green one.

O(l x scan(|E|))

Complexity of Phase-I:O(l x scan(|E|) + sort(|E|) ) I/Os

………

Page 19: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 19

Complexity Analysis Phase-II: Backward

Update Update:

Simple block-wise scanning. Scanning time for red and

green files: O(scan(|E|)) I/Os External Sort:

Sorting the blue file with the updated values to be used as red file later: O(sort(|E|)) I/Os

Fast External Sort: If |E| / M < Max file pointers O(scan(|E|)) I/Os

Total Complexity of Phase-II: For tmax iterations,

O(tmax x sort(|E|)) I/Os

With Fast External Sort:

O(tmax x scan(|E|)) I/Os

Sorted on preds

Sorted on states

Updated h-values

………

Page 20: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 20

Overview

Uniform Search Model Internal Memory Value Iteration Existing External Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook

Page 21: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 21

Experiments: 3x3 Sliding Tiles Puzzle

p=1.0; heuristic = 0Alg. |S|/|E| RAM #Iteration

sTime

VI 181,440 21M 27 6.3Ext-VI

483,839 11M 32 71.5p=0.9; heuristic = Manhattan distance

Alg. |S|/|E| RAM #Iterations

Time

VI 181,440 21M 35 8.3Ext-VI

967,677 12M 43 237.4Number of Iterations differ!!

Page 22: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 22

3x4 Sliding Tile Puzzle with p=0.9 (State space: 12!/2 = 239 x 106) On 2 Gigabytes, VI could not

generate the state space. External VI Finished:

Took 45 GB of disk space for the edges. Total 1,357,171,197 edges. Took 437 hours and 72 iterations to

converge. ε = 0.0001

RAM used: 1.4 Gigabytes

Page 23: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 23

Race Track Domain Example Alg. 150x300 RaceTrack

VI Out of mem.> 2GB

LRTDP

Out of mem.>2 GB; 12 hours

LDFS Out of time>1.5 GB; 118 hours

Ext-VI Converged! 1.6GB; 91 hours

Page 24: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 24

Overview

Uniform Search Model Internal Memory Value Iteration Existing External Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook

Page 25: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 25

Summary

Achievements First I/O efficient disk-based algorithm for

solving Markov Decision Processes. I/O Complexity Analysis.Features General Cost Model Can Pause-and-Resume Execution to add more

Hard Disks.Refinements Disk Space eaten by Duplicate States:

Start “Early” Delayed Duplicate Detection

Page 26: External Memory Value Iteration

External Memory Value Iteration

Edelkamp, Jabbar & Bonet 26

Outlook

Application to Bellman-Ford Parallel External Value Iteration:

During the time of internal update, hard disk is not in use..

Page 27: External Memory Value Iteration

Thank You!Questions ?