2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong...
Click here to load reader
2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor
Text of 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong...
Distance Indexing on Road NetworksVLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor Lee (City University of Hong Kong, Hong Kong) Distance Indexing on Road Networks * objects Queries: large-degree nodes By Dijkstra's single-source shortest path algorithm: Maintain nodes whose distances are not finalized Pick the node with the shortest distance and finalize it Relax all not-yet-finalized distances Limitations: Running time O(NlgV) Return k nearest neighbor Precomputed indexes are costly to store and update * Distance signature --- the first general-purposed index on road networks that Categorizes the distances of a node to all objects Supports both rough and exact distance computation Accelerates processing of common query types Reduces the storage and maintenance cost Is orthogonal to other query optimization techniques * Construction and Maintenance Distance Signature Basic Idea: Precomputing distances is a good trade-off between having no indexing and solution space indexing Maintain the approximate distance between objects and nodes How rough is the approximation? Apply rough approximation to faraway objects Queries are always interested in local objects Faraway objects are more than local objects We use an exponential sequence of categories In the form of [0, T), [T, cT), [cT, c2T), [c2T, c3T), ... T and c are constant parameters E.g., T = 3, c = 2, then [0, 3), [3,6), [6,12), [12,24), ... 3 6 24 12 Distance Signature (Cont'd) For each node n, signature component S(n)[i] denotes the category of dist(n,i) S(n)[i].link denotes the next node from n in the shortest path to i * Construction and Maintenance Distance Operations on Signatures Principle: trace back the link until the distance range is accurate enough Exact Approximate Trace back through the link from node to object Terminate once the distance range does not partially overlap with input Comparison (distances from node n to objects a and b) Trace back until the two distance ranges don’t overlap Sorting First apply approximate sorting, then apply bubble sort using exact comparison Quick sort using approximate comparison 11 4 n2 n3 n6 11 p1 p2 * Compare the distances of two objects based on one signature Avoid accessing the signatures of other nodes Used to get a rough result of distance sorting How? Select an observer n3 n3 tells if n2 or n6 is closer to n4 If n4 is on the perpendicular bisector, is it possible for n3 to find n4 within distance range s(n4)[n3]? Categories tell the approximate distances between q and other objects Get k closest objects according to their category values If no need to know the distances or order, return objects based on category ranges To find the ordering: To find exact distances: * Construction and Maintenance Exponential categories [0, T), [T, cT), [cT, c2T], ... How to determine c and T? Factors: Storage availability Spreading is uniformly distributed 4.unknown 5.unknown Build shortest path spanning tree for each object (Dijkstra) Fill in s(n)[i] when the tree of object i is spanned to node n Variable length encoding the number of objects in each category is not even # of objects 1 unit, 2 units, 3 units, ... away: 4, 8, 12, ... Use fewer bits for larger categories * Under assumptions "exponential partition", "grid topology", "uniform distance range of queries", and c>1.5, this coding scheme is optimal [0, T) [T, cT) [cT, c2T) [c2T, c3T) [c3T, ∞) Average code length is approximately : 1 01 001 0001 0000 Fixed coding u v n 1-bit flag not compressed in memory The shortest path spanning trees of all objects A reverse index for each edge of trees that comprise this edge limit the number of trees affected by the change of this edge How (suppose edge (a,b) is updated) : Find those affected spanning trees For each affected tree of object c, check s(a)[c] or s(b)[c] (whichever is smaller) Propagate to adjacent nodes until no more updates * Construction and Maintenance Page size: 4K bytes Network Voronoi Diagram (NVD) from VN3 Tuning parameters * * * * Speed up general query processing Optimal choice of distance categories and category encoding Future work The signatures of nearby nodes are similar Derivation of optimal distance categories for a wider range of network topologies and object distributions H i g h w a y G a s s t a t i o n Q u e r y p o i n t 4