Transcript
Page 1: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

2006-09-15

VLDB '2006

Haibo Hu (Hong Kong Baptist University, Hong Kong)Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong)Victor Lee (City University of Hong Kong, Hong Kong)

Distance Indexing on Road Networks

Page 2: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

2

Modeling Road Networks

Network -> Undirected weighted graphRoad junction -> Vertex (node)Road segment -> Edge Distance -> Edge weightData object and query point -> On node only

H igh wa y G a s s ta tio n Q ue ry po in t

Ac tual N eares t

N eares t in E uc lidean S pac e

4.50.7

1

0.84

objects query point

Page 3: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

3

Query Processing on Road Networks

Queries: Window querykNN, continuous kNN

Processing methods:Network Expansion [Papadias VLDB03]

Use Euclidean distance for preliminary pruningIndexing the objects byspatial index

Precomputed Index [Kolahdouzan VLDB04]

Voronoi Network Nearest Neighbor (VN3)NN list: precompute and store the kNNs for some large-degree nodes

4.50.7

1

0.84

5

Page 4: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

4

Problems and Disadvantages

Distance computation is still toughBy Dijkstra's single-source shortest path algorithm:

Maintain nodes whose distances are not finalizedPick the node with the shortest distance and finalize itRelax all not-yet-finalized distances Repeat until all distances are finalized

Limitations:Must visit nodes in the ascending order of distancesRunning time O(NlgV)

Precomputed indexes cannot suit all queriesReturn k nearest neighborReturn the actual shortest path

Precomputed indexes are costly to store and update

Page 5: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

5

Our Solution at a Glance

Distance signature --- the first general-purposed index on road networks that

Categorizes the distances of a node to all objectsSupports both rough and exact distance computationAccelerates processing of common query typesReduces the storage and maintenance costIs orthogonal to other query optimization techniques

Page 6: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

6

Roadmap

BackgroundDistance Signature OverviewOperations on SignaturesQuery Processing on SignaturesSmart Choice of Distance CategoriesConstruction and MaintenanceExperimental ResultsConclusion

Page 7: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

7

Distance Signature

Basic Idea:Precomputing distances is a good trade-off between having no indexing and solution space indexingMaintain the approximate distance between objects and nodesHow rough is the approximation?

Apply rough approximation to faraway objectsQueries are always interested in local objectsFaraway objects are more than local objects

We use an exponential sequence of categoriesIn the form of [0, T), [T, cT), [cT, c2T), [c2T, c3T), ... T and c are constant parametersE.g., T = 3, c = 2, then [0, 3), [3,6), [6,12), [12,24), ...

3 6 2412

Cat 0Cat 1 Cat 2 Cat 3

Page 8: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

8

Distance Signature (Cont'd)

For each node n, signature component S(n)[i] denotes the category of dist(n,i)

S(n)[i].link denotes the next node from n in the shortest path to i

Signature S(n) is the whole set of components S(n)[i] 33

5

6

81264

0 0 21 2

01

2

0 104 16

n1

n2

n3n4

n5 n6

n7

s (n2)

n3 n6

6

1 0 10 0 2

adjac enc y lis t

s (n) n1 6 n3 4 n5 5 null

s (n) n3 5 n5 15 n6 8 null

dis tance category

s (n2).link

n2

5

s (n4)s (n4).link

3

node object

Page 9: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

9

Roadmap

BackgroundDistance Signature OverviewOperations on SignaturesQuery Processing on SignaturesSmart Choice of Distance CategoriesConstruction and MaintenanceExperimental ResultsConclusion

Page 10: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

10

Distance Operations on Signatures

Principle: trace back the link until the distance range is

accurate enoughExact Approximate

Retrieval

(distance between node and object)

Trace back through the link from node to object

Terminate once the distance range does not partially overlap with input

Comparison

(distances from node n to objects a and b)

Trace back until the two distance ranges don’t overlap

Sorting First apply approximate sorting, then apply bubble sort using exact comparison

Quick sort using approximate comparison

114n2

n3

n611

p1

p2

p1p2: possible positions of n4

Page 11: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

11

Approximate Distance Comparison

What and Why?Compare the distances of two objects based on one signatureAvoid accessing the signatures of other nodesUsed to get a rough result of distance sorting

How?Example: compare dist(n4,n2) with dist(n4,n6)

Select an observer n3

Embed objects n2,n3,n6 into Euclidean spacen3 tells if n2 or n6 is closer to n4

If n4 is on the perpendicularbisector, is it possible for n3

to find n4 within distance ranges(n4)[n3]?

Let multiple observers vote

11

114

n2

n3po ss ib le po s itio n fo r n 4

n6

p1

p2

Page 12: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

12

kNN Search on Signatures

ProceduresRead signature s(q) of query node qCategories tell the approximate distances between q and other objectsGet k closest objects according to their category valuesIf no need to know the distances or order, return objects based on category rangesTo find the ordering:

Sort objects within each category

To find exact distances:Perform exact distance retrieval for each knn

Page 13: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

13

Roadmap

BackgroundDistance Signature OverviewOperations on SignaturesQuery Processing on SignaturesSmart Choice of Distance CategoriesConstruction and MaintenanceExperimental ResultsConclusion

Page 14: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

14

Smart Choice of Distance Categories

Exponential categories [0, T), [T, cT), [cT, c2T], ...

How to determine c and T?Factors:

Dataset density, distributionQuery type, load (metric: spreading)Storage availability

SimplificationsThe road network is a uniform gridSpreading is uniformly distributed in [0, SP]Unlimited disk storage

TheoremThe optimal c = e, T = (SP/e)0.5

n

O (2)

Page 15: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

15

Signature Construction

Basic proceduresAllocate storage for signaturesBuild shortest path spanning tree for each object (Dijkstra)Fill in s(n)[i] when the tree of object i is spanned to node n

Variable length encodingObservation

the number of objects in each category is not even

# of objects 1 unit, 2 units, 3 units, ... away: 4, 8, 12, ...

Use fewer bits for larger categories

Page 16: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

16

Variable Length Encoding

Reverse zero codingBased on Huffman encoding schemeUnder assumptions "exponential partition", "grid topology", "uniform distance range of queries", and c>1.5, this coding scheme is optimal[0, T) [T, cT) [cT, c2T) [c2T, c3T) [c3T, ∞)

Average code length is approximately :

10100100010000 Reverse coding

000 001 010 011 100Fixed coding

c2

c2−1≈1.2

Page 17: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

17

Signature Compression

Idea: Many objects share the same link

u v

n

If s(n)[u] + s(u)[v] = s(n)[v], then s(n)[v] can be replaced by1-bit flag

not compressedin memory

Page 18: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

18

Signature Update

RequirementThe shortest path spanning trees of all objectsA reverse index for each edge of trees that comprise this edge

limit the number of trees affected by the change of this edge

How (suppose edge (a,b) is updated) :Find those affected spanning treesFor each affected tree of object c, check s(a)[c] or s(b)[c] (whichever is smaller)Propagate to adjacent nodes until no more updates

Page 19: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

19

Roadmap

BackgroundDistance Signature OverviewOperations on SignaturesQuery Processing on SignaturesSmart Choice of Distance CategoriesConstruction and MaintenanceExperimental ResultsConclusion

Page 20: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

20

Experiment Settings

Statistics183K nodes351K edges Random edge weights from 1 to 10Page size: 4K bytes

kNN CompetitorsSignature indexingFull indexing (NN list for all nodes)Network Voronoi Diagram (NVD) from VN3

Tuning parametersp: object densityT, c, k

Comparison metrics: page access (I/O cost), CPU time

Page 21: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

21

Index Construction Cost

Good for medium and sparse datasets

Page 22: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

22

KNN Search Performance

Moderate performance over various k

Page 23: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

23

Robustness

The choice of parameters does not make large difference

Page 24: 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor

24

Conclusion

Our ContributionsThe first index for distance computation on road networksSpeed up general query processingOptimal choice of distance categories and category encoding

Future workCross-node signature compression

The signatures of nearby nodes are similarDerivation of optimal distance categories for a wider range of network topologies and object distributions


Recommended