Αποτίμηση Ερωτημάτων σε Συλλογές Διαδρομών (Evaluating Queries on Route Collections)

Αποτίμηση Ερωτημάτων σε Συλλογές Διαδρομών

(Evaluating Queries on Route Collections)

Παναγιώτης ΜπούροςΕνδιάμεση Κρίση

Επιβλέπων: Ι. Βασιλείου

Outline

• Introduction– Route collections, queries & frequent updates

• Existing work• Proposed framework• Experiments• Conclusions

Examples of route collections

• People visiting Athens– Use e-devices to track their

sightseeing– Create routes through

interesting places– Upload/propose routes to Web

sites like ShareMyRoutes.com• Query such route collection:

– Find a sequence of interesting places from Omonia Square to Cathedral

– Find a sequence of interesting places from Academy to Museum of Acropolis that passes through Zappeion

• Answers may involve nodes from more than one route





sites like ShareMyRoutes.com• Querying such route collection:



• Answers may involve nodes from more than one route





sites like ShareMyRoutes.com• Querying such route collection:



• Answers may involve nodes from more than one routes


• Courier company offering same day pickup and delivery services

• Route collection created from previous day

– Each route performed by a vehicle– Each node in routes is a point for picking-

up, delivering or just waiting– Each node in a route has time interval I

• Querying such collection:– Ad-hoc customer requests

• Pick-up parcel from Academy within IS• Deliver parcel at Zappeion within IT

• Serve ad-hoc request– Not add new route– Exploit one or more existing routes, pass

parcel among vehicles







• Pick-up parcel from Academy within IS

• Deliver parcel at Zappeion within IT























Passing parcelpossible










parcel among vehicles• Related to dynamic pickup and delivery

problem

Passing parcelimpossible

What’s all about…

• Consider route collections– Very large, stored in secondary storage– Frequently updated with new routes

• E.g. new routes are proposed or new vehicle routes are included• Evaluate queries that identify paths

– Sequences of distinct nodes from one or more routes– In the latter we use links, i.e., shared nodes– PATH(S,T)

• Find a sequence of interesting places from Omonia Square to Cathedral– FLSP(S,IS,T,IT)

• Moving cost m inside routes• Changing cost c between routes via links• Find the sequence of points to pick-up parcel from Cathedral and deliver it to

Zappeion that:– Abides with the temporal constraints imposed by time intervals– Minimizes primarily the changing cost and secondarily the moving cost



















• Find a sequence of interesting places from Omonia Square to Cathedral– FJP(S,IS,T,IT)



Outline

• Introduction• Existing work

– Graph-based solutions for• PATH queries• FJP queries

• Proposed framework• Experiments• Conclusions

Existing work

• Convert route collection to a graph– GPATH for PATH queries– GFJP for FJP queries

• Answer queries on graph following one of the following paradigms:– Direct evaluation

• Low storage and maintenance cost• Slow query evaluation

– Preprocessing• Fast query evaluation• High maintenance cost

Evaluating PATH queries

• Strongly related with REACH query

– Exists path between two nodes?• Direct evaluation

– Search algorithms• Depth-first or breadth-first

search• Preprocessing GPATH

– Directly exploiting transitive closure is not feasible

– Labeling and compression schemes

• Mainly focus on REACH• Some transform graph to DAG• 2-hop, HOPI, 3-hop, Dual

Labeling, Interval labeling, Path Cover, GRIPP etc

r1 (A,B,C,D,J)

r2 (A,F,D,Z,B,T)

r3 (Z,L,M)

r4 (D,Z,B)

r5 (A,F,K)

Evaluating FJP queries

• Shortest path on GFJP

• Direct evaluation– Dijkstra, A*

• Preprocessing– Compression scheme

• 2-hop, construction cost

– Graph embedding + A*• Assumes that graph

structure does not change

r1(A,B,C,D,J)

r2(A,F,D,Z,B,T)

r3(Z,L,M)

r4(D,Z,B)

r5(A,F,K)

Outline

• Introduction• Existing work• Proposed framework

– PATH & FJP queries– Indices, algorithms & methods for handling

updates

• Experiments• Conclusions

Proposed framework

• Evaluate PATH and FJP queries directly on route collections– Route are precomputed answer to queries

• Our framework involves the following:– A search algorithm

• Depth-first search for PATH queries• Uniform-Cost Search for FJP queries

– Indices on route collection for efficiently answering queries

• Reduce iterations of the search algorithm• P-Index, H-Index, L-Index

– Methods for handling frequent updates

Indexing route collections P-Index

• Associates each node of the collection with the routes containing it

• List routes[n]– <ri, oi>

– Sorted by route identifier

r1 (A,B,C)

r2 (F,N,B,L)

r3 (T,B,N,M)

node routes list

A <r1:1>

B <r1:2>, <r2:3>, <r3:2>

C <r1:3>

F <r2:1>

L <r2:4>

M <r3:4>

N <r2:2>, <r3:3>

T <r3:1>

Indexing route collections P-Index

• Associates each node of the collection with the routes containing it

• List routes[n]– <ri, oi>

– Sorted by route identifier

r1 (A,B,C)

r2 (F,N,B,L)

r3 (T,B,N,M)

node routes list

A <r1:1>

B <r1:2>, <r2:3>, <r3:2>

C <r1:3>

F <r2:1>

L <r2:4>

M <r3:4>

N <r2:2>, <r3:3>

T <r3:1>

Indexing route collections H-Index

• Captures all possible transitions between routes via links– Links are shared nodes

• List edges[r]– <ri, n:o:oi>– Sorted by primarily by

route identifier ri and then by position o

– Each node in r before n can reach every node after n in ri

r1 (A,B,C)

r2 (F,N,B,L)

r3 (T,B,N,M)

node edges list

r1 <r2,B:2:3>, <r3,B:2:2>

r2 <r1,B:3:2>, <r3,B:3:2>, <r3,N:3:2>

r3 <r1,B:2:2>, <r2,B:2:3>, <r2,N:3:2>






r1 (A,B,C)

r2 (F,N,B,L)

r3 (T,B,N,M)

node edges list

r1 <r2,B:2:3>, <r3,B:2:2>

r2 <r1,B:3:2>, <r3,B:3:2>, <r3,N:3:2>

r3 <r1,B:2:2>, <r2,B:2:3>, <r2,N:3:2>






r1 (A,B,C)

r2 (F,N,B,L)

r3 (T,B,N,M)

node edges list

r1 <r2,B:2:3>, <r3,B:2:2>

r2 <r1,B:3:2>, <r3,B:3:2>, <r3,N:3:2>

r3 <r1,B:2:2>, <r2,B:2:3>, <r2,N:3:2>






node edges list

r1 <r2,B:2:3>, <r3,B:2:2>

r2 <r1,B:3:2>, <r3,B:3:2>, <r3,N:3:2>

r3 <r1,B:2:2>, <r2,B:2:3>, <r2,N:3:2>

r1 (A,B,C)

r2 (F,N,B,L)

r3 (T,B,N,M)

Indexing route collections L-Index

• Disadvantage of H-Index– Large edges lists– High storage and

maintenance cost– Notice: each edges list

involves a few distinct links

• Store links contained in each route

• List links[r]– <n:o>– Sorted by link identifier n

r1 (A,B,C)

r2 (F,N,B,L)

r3 (T,B,N,M)

node links list

r1 <B:2>

r2 <B:3>, <N:2>

r3 <B:2>, <N:3>

Indexing route collections L-Index

• Disadvantage of H-Index– Large edges lists– High storage and

maintenance cost– Notice: each edges list

involves a few distinct links

• Store links contained in each route

• List links[r]– <n:o>– Sorted by link identifier n

r1 (A,B,C)

r2 (F,N,B,L)

r3 (T,B,N,M)

node links list

r1 <B:2>

r2 <B:3>, <N:2>

r3 <B:2>, <N:3>


• Basic idea– Traverse nodes similar to depth-first search– Exploit indices on route collection to terminate search

• P-Index => method pfsP• H-Index => method pfsH• L-Index => method pfsL

– At each iteration:• Pop current node n• Check termination condition• Access routes containing n, routes[n] from P-Index

– For each route, push nodes after n in search stack

Evaluating PATH queries cont.

• pfsP terminates search– When: current node n is contained in a route before target

t– How: joins routes[n] with routes[t] & stops after finding a

common route entry

SS TT




common route entry

SS TTnn…




common route entry

SS TTnn

r1

r2

r3

routes[n] = {<r1:o1>,<r2:o2>,<r3:o3>}

…




common route entry

SS TTnn

r1

r2

r3

routes[n] = {<r1:o1>,<r2:o2>,<r3:o3>} JOIN routes[T] = {…,<r2:oT>,…}

…




common route entry

SS TTnn

r1

r2

r3

…

o2 < oT

routes[n] = {<r1:o1>,<r2:o2>,<r3:o3>} JOIN routes[T] = {…,<r2:oT>,…}




common route entry

SS TTnn…

r1

r2

rk

…

Answer: path from S to n => part of r2 from n to T


• pfsH terminates search– When: current node n is contained in a route that is

connected with a route containing target t– How: for each route r containing n, joins edges[r] with

routes[t] & stops after finding a common route entry

SS TT





SS TTnn…





SS TTnn

r1

r2

r3

routes[n] = {<r1:o1>,<r2:o2>,<r3:o3>}

…





SS TTnn

r1

r2

r3

…

rX

rY

XX YY

routes[n] = {<r1:o1>,<r2:o2>,<r3:o3>}, edges[r2] = {<rX,X:oX2:oX>,<rY,Y:oY2:oY>}





SS nn

r1

r2

r3

…

rX

rY

XX YY

edges[r2] = {<rX,X:oX2:oX>,<rY,Y:oY2:oY>} JOIN routes[T] = {…,<rY:oT>,…}

TT





SS nn

r1

r2

r3

…

rX

rY

XX YY

TTo2 < oY2 AND oY < oT

edges[r2] = {<rX,X:oX2:oX>,<rY,Y:oY2:oY>} JOIN routes[T] = {…,<rY:oT>,…}





SS nn

r1

r2

r3

…

rX

rY

XX YY

TT

Answer: path from S to n => part of r2 from n to Y => part of rY from Y to T


• pfsL– Visits only the links in a route– Constructs list T of the links before target– Terminates search

• When: current node n is contained in a route before a link of T• How: for each route r containing n, joins links[r] with list T & stops after

finding a common link entry

SS

TT


SS

rY

YY

T = {<Y,(rY:oY:oT)>}

TT





SS nn…

rY

YY

TT

T = {<Y,(rY:oY:oT)>}





SS nn…

pY

YY

routes[n] = {<r1:o1>,<r2:o2>,<r3:o3>}, T = {<Y,(rY:oY:oT)>}

TT

r1

r2

rk





SS nn…

pY

YY

links[r2] = {…<X:oX>,<Y:oY2>} JOIN T = {<Y,(rY:oY:oT)>}

TT

r1

r2

r3





SS nn

r1

r2

r3

…

rY

XX YY

TTo2 < oY2

links[r2] = {…<X:oX>,<Y:oY2>} JOIN T = {<Y,(rY:oY:oT)>}





SS nn

r1

r2

rk

…

rY

XX YY…

TT

Answer: path from S to n => part of r2 from n to Y => part of rY from Y to T





• Basic idea– Traverse nodes similar to uniform-cost search– Compute lower bound of the answer, a candidate answer

• Using P-Index => method MP, L-Index => method ML

• Triggers early termination condition• Prunes search space

– At each iteration:• Pop current node n• Check termination condition against candidate answer• Update candidate answer through n • Access routes containing n, routes[n] from P-Index

– For each route, push nodes after n in search queue iff expanding them would give answer better than candidate

Handling frequent updates

• P-Index, H-Index, L-Index as inverted files on disk– Updates -> adding new routes– Not consider each new route separately– Batch updates, consider set of new routes

• Basic idea:– Build memory resident P-Index, H-Index, L-Index for

new routes– Merge disk-based indices with memory resident ones

Outline

• Introduction• Existing work• Proposed framework• Experiments

– Index creation– Query evaluation– Updates

• Conclusions

Setup

• Synthetic route collections– Number of routes |R| = {50k, 100k, 500k}– Average route length lavg = {5, 10, 50}– Number of distinct nodes |N| = {50k, 100k, 500k}– Number of links |L| = {10k, 30k, 50k, 80k, 100k}– Update factor U (% |R|) = {1%, 5%, 10%}

• Comparison– PATH queries

• dfs @ GPATH Vs pfsP, pfsH, pfsL– FJP queries

• Dijkstra @ GFJP Vs MP, ML

Index construction

Vary |R| Vary |N|


Vary |R|


Vary |R|

Updates

Outline

• Introduction• Existing work• Proposed framework• Experiments• Conclusions

– Summary– Future work

To sum up…

• Focus of my thesis:– Evaluating queries that find paths on large route

collections– Route collections are stored on disk and they are

frequently updated• Current work:

– A framework for evaluating PATH / FJP queries involving:• Depth-first / uniform-cost search• P-Index, H-Index, L-Index• Methods for handling frequent updates

Further work

• Three directions:① Address other kind of updates

– Remove routes– Insert nodes in routes – as dynamic pickup and delivery problem is

solved– Update time intervals of nodes

② Evaluate other types of queries– Trip planning or optimal sequence like queries

• Each node is an instance of a class in set C, e.g. {Museum,Stadium,Restaurant}

• Find a path passing through a Museum, then a Stadium and finally a Restaurant

③ Combine query evaluation with keyword search– Starting and ending node given as set of keywords– Find a path passing through a Restaurant relevant to “sea food,

lobster”

Thank you

• Evaluating queries on route collections:– P. Bouros, Fewer-Jumps Path Queries under Temporal Constraints, TR– P. Bouros, Evaluting Path Queries over Frequently Updated Routes, TR– P. Bouros, Evaluating Path Queries over Route Collections, ICDE’10 PhD

Workshop.– P. Bouros, S. Skiadopoulos, T. Dalamagas, D. Sacharidis and T. Sellis,

Evaluating reachability queries over path collections, SSDBM’09.– P. Bouros, T. Dalamagas, S. Skiadopoulos and T. Sellis, Evaluating “Find a

Path” Reachability Queries, ECAI’08 Workshop.• Other work:

– D. Sacharidis, P. Bouros, and T. Sellis, Caching Dynamic Skyline Queries, SSDBM’08.

– T. Dalamagas, P. Bouros, T. Galanis, M. Eirinaki and T. Sellis, Mining User Navigation Patterns for Personalizing Topic Directories, CIKM’07 Workshop

Outline

• Introduction• Existing work• Proposed framework• Experiments• Conclusions

Thank you

Documents

Αποτίμηση Ερωτημάτων σε Συλλογές Διαδρομών (Evaluating Queries on Route Collections)