Upload
sheena-martin
View
221
Download
2
Embed Size (px)
Citation preview
11
The University of Hong Kong
Capacity Constrained Assignment in Spatial Databases
Authors: Leong Hou U, University of Hong KongMan Lung Yiu, Aalborg UniversityKyriakos Mouratidis, Singapore Management UniversityNikos Mamoulis, University of Hong Kong
The University of Hong Kong 22
Outlines
MotivationRelated Work
Assignment Problems
SolutionsApproximate SolutionsConclusion
The University of Hong Kong 33
Motivation
Assume that Our system has a set of service providers (Q)
which serve a set of customers Each service provider (q) can serve at most k
customers simultaneously For every provider-customer (q,p) pair, our
central server knows the cost to assign p to q
Our aim is to maximize our service utilization
1. Maximize the number of served customers2. Minimize the total sum of weights
The University of Hong Kong 44
Case Study I
Concerning the case between wireless routers and laptops
each router can serve at most 3 users concurrently the signal strength is measured by the Euclidean
distance (longer distance means weaker signal)
Can it be solved by Nearest Neighbor Queries?
3-Nearest Neighbor Queries
The University of Hong Kong 55
Case Study I
Can it be solved by Reverse Nearest Neighbor Queries?
Reverse Nearest Neighbor Queries
The University of Hong Kong 66
Case Study I
Can it be solved by Closest Pair Queries?
6-Clostest Pairs(2 routers * 3 capacities)
The University of Hong Kong 77
32
Case Study I
Can it be solved by Spatial Matching (Exclusive Closest Pair)?
ECP matchingRouter’s capacity is 3
Find ECP between set {A} and {B}
1. Find closest pair (a,b) from (A,B)
2. (a,b) is a pair of ECP, a.k=a.k-1, b.k=b.k-1 (* k is the capacity value)
3. {A}={A}-a if a.k=0, {B}={B}-b if b.k=0, go to step 1 until {A} or {B} is empty
10
3210
The University of Hong Kong 88
Case Study I
Can it be solved by optimal assignment?
Optimal assignment
Optimal assignment tries to server as many as possible users and also tries to minimize the sum cost (distance)
The University of Hong Kong 99
Related Work
Optimal assignment is to compute the maximum size matching with minimum assignment cost
Two popular algorithms Hungarian Algorithm Successive Shortest Path Algorithm (SSPA)
The time complexity of two algorithms is O(n3) in worst case where n is the number of service providers or
customers
The University of Hong Kong 1010
Successive Shortest Path Algorithm (SSPA)
q1
q2
p1
p2
0
1 1
3
S D
q1
q2
p1
p2
0
1 1
3
S D
q1
q2
p1
p2
0
-1 -1
3
S D
1. Find shortest path (SP) from source to destination
2. Reverse the edge direction on SP3. Repeat steps 1~2, Until no more path can
be foundq1
q2
p1
p2
0
1 1
3
S D
-1 -1
The University of Hong Kong 1111
Successive Shortest Path Algorithm (SSPA)
SSPA is easy to implement with capacity constraint
Assume that data set A is our routers with capacity 2, data set B is our users
q1
q2
p1
p2
0
1 1
3
S D
0q1
q2
p1
p2
-1 1
3
S D
q1
q2
p1
p2
0
1 1
3
S D
q1
q2
p1
p2
0
1 1
3
S D
The University of Hong Kong 1212
Preliminary Solution
In our problem settings, we have a set of service providers (Q) with capacity value k and a set of customers (P) which are indexed by an R-tree
Let us analyze SSPA performance in detail Consider the case |Q|=|P| and k=1 For every q in Q, we need to find a SP (time=N, where N=|
Q|) Find a SP in the bipartite graph between Q and P (time=|
Eall|, where Eall is all the edges between Q and P)
So the time complexity is N*|Eall|
The algorithm should do better if the bipartite graph is smaller
N*|Esub| << N*|Eall|, if |Esub| << |Eall|
The University of Hong Kong 1313
Preliminary Solution
A SP can be determined by a sub-graph, if the sub-graph is built in order
q1
q2
p1
p2S D
q3 p3
q1
q2
p1
p2S D
q3 p3
Only add edges with weight ≤ 1 into our graph
p1 p2 p3
q1 0 1 5
q2 1 12 14
q3 4 3 8
p1 p2 p3
q1 0 1 >1
q2 1 >1 >1
q3 >1 >1 >1
The University of Hong Kong 1414
Solution - RIA
Range Incremental Algorithm (RIA) is based on the last observation to build the bipartite graph incrementally
Lemma 1 If all the edges with weight ≤ T are added into sub-graph
(Esub), then a SP from Esub with weight ≤ T must be a SP from EQxP
q1
q2
p1
p2S D
q3 p3
T=1, Only add those edges with weight ≤ T into our graph
Weight of SP is 2
Increase threshold T=T+1 => T=2, it does not add any edge into graph
PROBLEM
p1 p2 p3
q1 0 1 >1
q2 1 >1 >1
q3 >1 >1 >1
p1 p2 p3
q1 0 1 >2
q2 1 >2 >2
q3 >2 >2 >2
The University of Hong Kong 1515
Solution - NIA
Nearest Neighbor Incremental Algorithm (NIA) increases Esub by nearest neighbor
q1
q2
p1
p2S D
q3 p3
Heap H={(q1,p1,0), (q2,p1,1), (q3,p2,3)}
Heap H={(q1,p2,1), (q2,p1,1), (q3,p2,3)}
Lemma 2
If the weight of SP ≤ H.top(), then it is also a SP in Eall
Otherwise, add a new edge from H to Esub
Heap H={(q2,p1,1), (q3,p2,3), (q1,p3,5)}
p1 p2 p3
q1 0 ≥0 ≥0
q2 ≥0 ≥0 ≥0
q3 ≥0 ≥0 ≥0
p1 p2 p3
q1 0 1 ≥1
q2 ≥1 ≥1 ≥1
q3 ≥1 ≥1 ≥1
The University of Hong Kong 1616
Solution - IDA
Lemma 3 If any object in Q (which is our service providers) is not
accessed from source S, then it is not necessary to add its nearest neighbor into Esub
We develop a novel algorithm Incremental On-Demand Algorithm (IDA) which is based on this lemma
q1
q2
p1
p2S D
q3 p3
Heap H={(q1,p2,1), (q2,p1,1), (q3,p2,3)}
It is not necessary to add this edge in current state, since it cannot help us to find any new SP
Heap H={(q2,p1,1), (q3,p2,3), (q1,p3,5)}
p1 p2 p3
q1 0 1 ≥1
q2 ≥1 ≥1 ≥1
q3 ≥1 ≥1 ≥1
The University of Hong Kong 1717
Solution - IDAq1
q2
p1
p2S D
q3 p3
Heap H={(q1,p2,1), (q2,p1,1), (q3,p2,3)}
Heap H={(q2,p1,1), (q3,p2,3)}
Heap H={(q3,p2,3), (q2,p2,12)}
Heap H={(q1,p2,1), (q3,p2,3), (q2,p2,12)}
Heap H={(q3,p2,3), (q1,p3,5), (q2,p2,12)}
IDA only expands the graph when it is necessary It is expected to have a smaller sub-graph
(smaller Esub) when executing SP searches
Weight of SP is 1-0+1=2p1 p2 p3
q1 0 ≥1 ≥1
q2 ≥1 ≥1 ≥1
q3 ≥1 ≥1 ≥1
p1 p2 p3
q1 0 ≥3 ≥3
q2 1 ≥3 ≥3
q3 ≥3 ≥3 ≥3
p1 p2 p3
q1 0 1 ≥3
q2 1 ≥3 ≥3
q3 ≥3 ≥3 ≥3
The University of Hong Kong 1818
Experiments
Number of Service Providers |Q| (in thousands):0.25 0.5 1 2.5 5
Number of Customers |P| (in thousands):25 50 100 150 200
Capacity k:20 40 80 160 320
Both datasets were generated on the road map of San Fransisco
Language C++ Pentium D 3.0 GHz with running on Ubuntu 7.10
The University of Hong Kong 1919
Experiments
First, we test the performance on small dataset over different capacity k (|Q|=0.25, |P|=25 [in thousand])
20 40 80 160 320
0.1
1
10
100
1000
10000C
PU
tim
e (
s)
k
SSPA RIANIA IDA
The University of Hong Kong 2020
Experiments
The University of Hong Kong 2121
Experiments
The University of Hong Kong 2222
Approximate Solution
Time-critical applications could favor fast answers over exact matching
Our approximate solutions provide a tunable trade-off between result accuracy and response time
with theoretical guarantees for the assignment cost Three phases of our general method
Partitioning phase Concise matching phase Refinement phase
ab
centroid of group
The University of Hong Kong 2323
Service provider Approximation (SA)
Service providers are sorted by Hilbert value and are grouped by this order
Each point q is inserted into an existing group G so that the diagonal of G’s MBR does not exceed δ
If no such group is found, then a new group is formed to contain q
The centroid of a group G is the geometric centroid. e.g., for x-coordinate,
sum( q.x*q.k ) / sum(q.k) where q in G
Theoretical error bound is 2 * num of assignment * δ
3
41
1δ
The University of Hong Kong 2424
e1
Customer Approximation (CA)
Unlike SA, CA can do the grouping in R-tree Theoretical error bound is num of assignment * δ
e3
e2
e6
e7
e5
e4
e1 e2
e3 e4 e5 e6 e7
δ
The University of Hong Kong 2525
Refinement Phase
In refinement phase, SA and CA only solve some smaller assignment problems
We could run the exact algorithm for each of these smaller problems. This, however, is expensive
Therefore, two heuristics methods are proposed NN-based refinement
Use round robin fashion to find NN customer for each service provider
Exclusive Closest Pair refinement Use ECP to make assignment
3
1
2 4
The University of Hong Kong 2626
Experiments
Quality =sum of approximate cost
sum of optimal costSACA
The University of Hong Kong 2727
Experiments
The University of Hong Kong 2828
Conclusion
We proposed three algorithms which solve the CCA problem efficiently
All our methods try to Minimize I/O accesses Minimize CPU time
Also, we proposed two approximate solutions which achieve good tradeoff between execution time and matching quality
Our next step is to investigate Incremental updates to CCA solution Continuous monitoring of CCA Other types of matching (assignment) problems
2929
The University of Hong Kong
Thank you!
Any question?
The University of Hong Kong 3030
Hungarian Algorithm
1. Find the smallest value for each row, and reduce it to every elements in each row
2. Find the smallest value for each column, and reduce it to every elements in each column
3. Find minimum number of lines to cover all zero4. Find the smallest value for all uncovered elements, and reduce it to every
uncovered elements (also, add it to the cell which is the intersection of two covered lines)
5. Repeat steps 3~4, until the number of lines is equal to |A| or |B|
878a3
751a2
910a1
b3b2b1
878a3
751a2
910a1
b3b2b1
101a3
640a2
910a1
b3b2b1
101a3
640a2
910a1
b3b2b1
001a3
540a2
810a1
b3b2b1
001a3
540a2
810a1
b3b2b1
002a3
430a2
700a1
b3b2b1
002a3
430a2
700a1
b3b2b1
-0
-1
-7
-0 -0 -1
The University of Hong Kong 3131
Hungarian Algorithm
Hungarian is not easy to work with capacity constraint efficiently duplicating the row/column is a not a good solution
The memory usage of Hungarian is very high Sum(a.k)xSum(b.k), where a in A, b in B
Step 3 of Hungarian is not easy to do further optimization Find minimum number of lines to cover all zero
b1 b2 b3
a1 0 1 9
a1 0 1 9
a2 1 5 7
a2 1 5 7
a3 8 7 8
a3 8 7 8
b1 b2 b3
a1 0 1 8
a2 0 4 5
a3 1 0 0
The University of Hong Kong 3232
Optimization – Reducing Dijkstra Execution
Some optimizations to Dijkstra can be done Dijkstra stops search when the weight of a potential SP is
higher than the top value in heap H Once a new path adds into Esub, it only affects one vertex and
its sequential vertices Notice that Dijkstra cannot run with negative weight on the
edges, but potential value can be used to solve this problem Each node has a potential value, and it is changed when
updating the graph The potential weight of edges is calculated by the edge
weight+two vertices’ potential values which is always larger than zero
Potential vertices are affected by new added edge
Unaffected vertices
The University of Hong Kong 3333
Optimization – Incremental All Nearest Neighbor
All three proposed algorithms invoke numerous range/NN search operations around the service providers to the R-tree that indexes the customers
To reduce the I/O cost, we employ an incremental all-nearest-neighbor technique
qp