The University of Hong Kong 1 Capacity Constrained Assignment in Spatial Databases Authors: Leong Hou U, University of Hong Kong Man Lung Yiu, Aalborg

11

The University of Hong Kong

Capacity Constrained Assignment in Spatial Databases

Authors: Leong Hou U, University of Hong KongMan Lung Yiu, Aalborg UniversityKyriakos Mouratidis, Singapore Management UniversityNikos Mamoulis, University of Hong Kong

The University of Hong Kong 22

Outlines

MotivationRelated Work

Assignment Problems

SolutionsApproximate SolutionsConclusion


Motivation

Assume that Our system has a set of service providers (Q)

which serve a set of customers Each service provider (q) can serve at most k

customers simultaneously For every provider-customer (q,p) pair, our

central server knows the cost to assign p to q

Our aim is to maximize our service utilization

1. Maximize the number of served customers2. Minimize the total sum of weights


Case Study I

Concerning the case between wireless routers and laptops

each router can serve at most 3 users concurrently the signal strength is measured by the Euclidean

distance (longer distance means weaker signal)

Can it be solved by Nearest Neighbor Queries?

3-Nearest Neighbor Queries


Case Study I

Can it be solved by Reverse Nearest Neighbor Queries?

Reverse Nearest Neighbor Queries


Case Study I

Can it be solved by Closest Pair Queries?

6-Clostest Pairs(2 routers * 3 capacities)


32

Case Study I

Can it be solved by Spatial Matching (Exclusive Closest Pair)?

ECP matchingRouter’s capacity is 3

Find ECP between set {A} and {B}

1. Find closest pair (a,b) from (A,B)

2. (a,b) is a pair of ECP, a.k=a.k-1, b.k=b.k-1 (* k is the capacity value)

3. {A}={A}-a if a.k=0, {B}={B}-b if b.k=0, go to step 1 until {A} or {B} is empty

10

3210


Case Study I

Can it be solved by optimal assignment?

Optimal assignment

Optimal assignment tries to server as many as possible users and also tries to minimize the sum cost (distance)


Related Work

Optimal assignment is to compute the maximum size matching with minimum assignment cost

Two popular algorithms Hungarian Algorithm Successive Shortest Path Algorithm (SSPA)

The time complexity of two algorithms is O(n3) in worst case where n is the number of service providers or

customers


Successive Shortest Path Algorithm (SSPA)

q1

q2

p1

p2

0

1 1

3

S D

q1

q2

p1

p2

0

1 1

3

S D

q1

q2

p1

p2

0

-1 -1

3

S D

1. Find shortest path (SP) from source to destination

2. Reverse the edge direction on SP3. Repeat steps 1~2, Until no more path can

be foundq1

q2

p1

p2

0

1 1

3

S D

-1 -1


Successive Shortest Path Algorithm (SSPA)

SSPA is easy to implement with capacity constraint

Assume that data set A is our routers with capacity 2, data set B is our users

q1

q2

p1

p2

0

1 1

3

S D

0q1

q2

p1

p2

-1 1

3

S D

q1

q2

p1

p2

0

1 1

3

S D

q1

q2

p1

p2

0

1 1

3

S D


Preliminary Solution

In our problem settings, we have a set of service providers (Q) with capacity value k and a set of customers (P) which are indexed by an R-tree

Let us analyze SSPA performance in detail Consider the case |Q|=|P| and k=1 For every q in Q, we need to find a SP (time=N, where N=|

Q|) Find a SP in the bipartite graph between Q and P (time=|

Eall|, where Eall is all the edges between Q and P)

So the time complexity is N*|Eall|

The algorithm should do better if the bipartite graph is smaller

N*|Esub| << N*|Eall|, if |Esub| << |Eall|


Preliminary Solution

A SP can be determined by a sub-graph, if the sub-graph is built in order

q1

q2

p1

p2S D

q3 p3

q1

q2

p1

p2S D

q3 p3

Only add edges with weight ≤ 1 into our graph

p1 p2 p3

q1 0 1 5

q2 1 12 14

q3 4 3 8

p1 p2 p3

q1 0 1 >1

q2 1 >1 >1

q3 >1 >1 >1


Solution - RIA

Range Incremental Algorithm (RIA) is based on the last observation to build the bipartite graph incrementally

Lemma 1 If all the edges with weight ≤ T are added into sub-graph

(Esub), then a SP from Esub with weight ≤ T must be a SP from EQxP

q1

q2

p1

p2S D

q3 p3

T=1, Only add those edges with weight ≤ T into our graph

Weight of SP is 2

Increase threshold T=T+1 => T=2, it does not add any edge into graph

PROBLEM

p1 p2 p3

q1 0 1 >1

q2 1 >1 >1

q3 >1 >1 >1

p1 p2 p3

q1 0 1 >2

q2 1 >2 >2

q3 >2 >2 >2


Solution - NIA

Nearest Neighbor Incremental Algorithm (NIA) increases Esub by nearest neighbor

q1

q2

p1

p2S D

q3 p3

Heap H={(q1,p1,0), (q2,p1,1), (q3,p2,3)}

Heap H={(q1,p2,1), (q2,p1,1), (q3,p2,3)}

Lemma 2

If the weight of SP ≤ H.top(), then it is also a SP in Eall

Otherwise, add a new edge from H to Esub

Heap H={(q2,p1,1), (q3,p2,3), (q1,p3,5)}

p1 p2 p3

q1 0 ≥0 ≥0

q2 ≥0 ≥0 ≥0

q3 ≥0 ≥0 ≥0

p1 p2 p3

q1 0 1 ≥1

q2 ≥1 ≥1 ≥1

q3 ≥1 ≥1 ≥1


Solution - IDA

Lemma 3 If any object in Q (which is our service providers) is not

accessed from source S, then it is not necessary to add its nearest neighbor into Esub

We develop a novel algorithm Incremental On-Demand Algorithm (IDA) which is based on this lemma

q1

q2

p1

p2S D

q3 p3

Heap H={(q1,p2,1), (q2,p1,1), (q3,p2,3)}

It is not necessary to add this edge in current state, since it cannot help us to find any new SP

Heap H={(q2,p1,1), (q3,p2,3), (q1,p3,5)}

p1 p2 p3

q1 0 1 ≥1

q2 ≥1 ≥1 ≥1

q3 ≥1 ≥1 ≥1


Solution - IDAq1

q2

p1

p2S D

q3 p3

Heap H={(q1,p2,1), (q2,p1,1), (q3,p2,3)}

Heap H={(q2,p1,1), (q3,p2,3)}

Heap H={(q3,p2,3), (q2,p2,12)}

Heap H={(q1,p2,1), (q3,p2,3), (q2,p2,12)}

Heap H={(q3,p2,3), (q1,p3,5), (q2,p2,12)}

IDA only expands the graph when it is necessary It is expected to have a smaller sub-graph

(smaller Esub) when executing SP searches

Weight of SP is 1-0+1=2p1 p2 p3

q1 0 ≥1 ≥1

q2 ≥1 ≥1 ≥1

q3 ≥1 ≥1 ≥1

p1 p2 p3

q1 0 ≥3 ≥3

q2 1 ≥3 ≥3

q3 ≥3 ≥3 ≥3

p1 p2 p3

q1 0 1 ≥3

q2 1 ≥3 ≥3

q3 ≥3 ≥3 ≥3


Experiments

Number of Service Providers |Q| (in thousands):0.25 0.5 1 2.5 5

Number of Customers |P| (in thousands):25 50 100 150 200

Capacity k:20 40 80 160 320

Both datasets were generated on the road map of San Fransisco

Language C++ Pentium D 3.0 GHz with running on Ubuntu 7.10


Experiments

First, we test the performance on small dataset over different capacity k (|Q|=0.25, |P|=25 [in thousand])

20 40 80 160 320

0.1

1

10

100

1000

10000C

PU

tim

e (

s)

k

SSPA RIANIA IDA


Experiments


Experiments


Approximate Solution

Time-critical applications could favor fast answers over exact matching

Our approximate solutions provide a tunable trade-off between result accuracy and response time

with theoretical guarantees for the assignment cost Three phases of our general method

Partitioning phase Concise matching phase Refinement phase

ab

centroid of group


Service provider Approximation (SA)

Service providers are sorted by Hilbert value and are grouped by this order

Each point q is inserted into an existing group G so that the diagonal of G’s MBR does not exceed δ

If no such group is found, then a new group is formed to contain q

The centroid of a group G is the geometric centroid. e.g., for x-coordinate,

sum( q.x*q.k ) / sum(q.k) where q in G

Theoretical error bound is 2 * num of assignment * δ

3

41

1δ


e1

Customer Approximation (CA)

Unlike SA, CA can do the grouping in R-tree Theoretical error bound is num of assignment * δ

e3

e2

e6

e7

e5

e4

e1 e2

e3 e4 e5 e6 e7

δ


Refinement Phase

In refinement phase, SA and CA only solve some smaller assignment problems

We could run the exact algorithm for each of these smaller problems. This, however, is expensive

Therefore, two heuristics methods are proposed NN-based refinement

Use round robin fashion to find NN customer for each service provider

Exclusive Closest Pair refinement Use ECP to make assignment

3

1

2 4


Experiments

Quality =sum of approximate cost

sum of optimal costSACA


Experiments


Conclusion

We proposed three algorithms which solve the CCA problem efficiently

All our methods try to Minimize I/O accesses Minimize CPU time

Also, we proposed two approximate solutions which achieve good tradeoff between execution time and matching quality

Our next step is to investigate Incremental updates to CCA solution Continuous monitoring of CCA Other types of matching (assignment) problems

2929

The University of Hong Kong

Thank you!

Any question?


Hungarian Algorithm

1. Find the smallest value for each row, and reduce it to every elements in each row

2. Find the smallest value for each column, and reduce it to every elements in each column

3. Find minimum number of lines to cover all zero4. Find the smallest value for all uncovered elements, and reduce it to every

uncovered elements (also, add it to the cell which is the intersection of two covered lines)

5. Repeat steps 3~4, until the number of lines is equal to |A| or |B|

878a3

751a2

910a1

b3b2b1

878a3

751a2

910a1

b3b2b1

101a3

640a2

910a1

b3b2b1

101a3

640a2

910a1

b3b2b1

001a3

540a2

810a1

b3b2b1

001a3

540a2

810a1

b3b2b1

002a3

430a2

700a1

b3b2b1

002a3

430a2

700a1

b3b2b1

-0

-1

-7

-0 -0 -1


Hungarian Algorithm

Hungarian is not easy to work with capacity constraint efficiently duplicating the row/column is a not a good solution

The memory usage of Hungarian is very high Sum(a.k)xSum(b.k), where a in A, b in B

Step 3 of Hungarian is not easy to do further optimization Find minimum number of lines to cover all zero

b1 b2 b3

a1 0 1 9

a1 0 1 9

a2 1 5 7

a2 1 5 7

a3 8 7 8

a3 8 7 8

b1 b2 b3

a1 0 1 8

a2 0 4 5

a3 1 0 0


Optimization – Reducing Dijkstra Execution

Some optimizations to Dijkstra can be done Dijkstra stops search when the weight of a potential SP is

higher than the top value in heap H Once a new path adds into Esub, it only affects one vertex and

its sequential vertices Notice that Dijkstra cannot run with negative weight on the

edges, but potential value can be used to solve this problem Each node has a potential value, and it is changed when

updating the graph The potential weight of edges is calculated by the edge

weight+two vertices’ potential values which is always larger than zero

Potential vertices are affected by new added edge

Unaffected vertices


Optimization – Incremental All Nearest Neighbor

All three proposed algorithms invoke numerous range/NN search operations around the service providers to the R-tree that indexes the customers

To reduce the I/O cost, we employ an incremental all-nearest-neighbor technique

qp

Documents

The University of Hong Kong 1 Capacity Constrained Assignment in Spatial Databases Authors: Leong Hou U, University of Hong Kong Man Lung Yiu, Aalborg