View
216
Download
2
Category
Tags:
Preview:
Citation preview
Lars Arge
I/O-algorithms
2
I/O-Model
• Parameters
N = # elements in problem instance
B = # elements that fits in disk block
M = # elements that fits in main memory
T = # output size in searching problem
• I/O: Movement of block between memory and disk
D
P
M
Block I/O
Lars Arge
I/O-algorithms
3
Fundamental Bounds [AV88]
Internal External
• Scanning: N
• Sorting: N log N
• Permuting
– List rank
• Searching: NBlog
BN
BN
BMlog
BN
log,minBN
BN
BMNN
N2log
Lars Arge
I/O-algorithms
4
Data Structures until now
• One-dimensional searching
– B-trees: Node degree (B) queries in
Rebalancing using split/fuse updates in
– Weight-balanced B-tress: Weight rather than degree constraint
Ω(w(v)) updates below v between rebalancing operations
– Persistent B-trees: N O(N/B) space
Update in current version in
Search in all previous versions in
)(log BT
B NO )(log NO B
)(log BT
B NO )(log NO B
Lars Arge
I/O-algorithms
5
Data structures until now
• Last week: “dimension 1.5” searching:
– Interval stabbing: External interval tree
O(N/B) space, query, update
• We utilized a number of tools/techniques
– B-trees, weight-balanced B-trees, persistent B-trees
– Logarithmic method
– Global rebuilding
– Construction using buffer technique
x
)(log NO B)(log BT
B NO
Lars Arge
I/O-algorithms
6
Interval Management• External interval tree:
– Fan-out weight-balanced B-tree on endpoints
– Intervals stored in O(B) secondary structure in each internal node
– Query efficiency using filtering
– Bootstrapping used to avoid O(B) search cost in each node
* Size O(B2) underflow structure in each node
* Constructed using sweep and persistent B-tree
* Dynamic using global rebuilding
$m$ blocksv)( B
v
)( B
Lars Arge
I/O-algorithms
7
3-Sided Range Searching• Interval management corresponds to simple form of 2d range search
• More general problem: Dynamic 3-sidede range searching
– Maintain set of points in plane such
that given query (q1, q2, q3), all points
(x,y) with q1 x q2 and y q3 can
be found efficiently
(x,x)
(x1,x2)
x
x1 x2
q3
q2q1
Lars Arge
I/O-algorithms
8
3-Sided Range Searching : Static Solution• Construction: Sweep top-down inserting x in persistent B-tree at (x,y)
– O(N/B) space
– I/O construction using buffer technique
• Query (q1, q2, q3): Perform range query with [q1,q2] in B-tree at q3
– I/Os
• Dynamic?
q3
q2q1
)(log BT
B NO
)log( NO BBN
Lars Arge
I/O-algorithms
9
• Base tree on x-coordinates with nodes augmented with points
• Heap on y-coordinates
– Decreasing y values on root-leaf path
– (x,y) on path from root to leaf holding x
– If v holds point then parent(v) holds point
Internal Priority Search Tree9
16.20
1619,9
1313,3
1920,3
45,6
59,4
11,2
201916139544,1
1
Lars Arge
I/O-algorithms
10
• Linear space
• Insert of (x,y) (assuming fixed x-coordinate set):
– Compare y with y-coordinate in root
– Smaller: Recursively insert (x,y) in subtree on path to x
– Bigger: Insert in root and recursively insert old point in subtree
O(log N) update
Internal Priority Search Tree9
16.20
1619,9
1313,3
1920,3
45,6
59,4
11,2
201916139544,1
1
Insert (10,21) 10,21
Lars Arge
I/O-algorithms
11
Internal Priority Search Tree
• Query with (q1, q2, q3) starting at root v:
– Report point in v if satisfying query
– Visit both children of v if point reported
– Always visit child(s) of v on path(s) to q1 and q2
O(log N+T) query
916.20
1619,9
1313,3
1920,3
45,6
59,4
11,2
201916139544,1
1
4
194
Lars Arge
I/O-algorithms
12
• Natural idea: Block tree
• Problem:
– I/Os to follow paths to to q1 and q2
– But O(T) I/Os may be used to visit other nodes (“overshooting”)
query
Externalizing Priority Search Tree9
16.20
1619,9
1313,3
1920,3
45,6
59,4
11,2
201916139544,1
1
)(log NO B
)(log TNO B
Lars Arge
I/O-algorithms
13
Externalizing Priority Search Tree
• Solution idea:
– Store B points in each node * O(B2) points stored in each supernode
* B output points can pay for “overshooting”
– Bootstrapping:
* Store O(B2) points in each supernode in static structure
916.20
1619,9
1313,3
1920,3
45,6
59,4
11,2
201916139544,1
1
Lars Arge
I/O-algorithms
14
External Priority Search Tree• Base tree: Weight-balanced B-tree with branching parameter 1/4B
and leaf parameter B on x-coordinates
• Points in “heap order”:
– Root stores B top points for each of the child slabs
– Remaining points stored recursively
• Points in each node stored in “O(B2)-structure”
– Persistent B-tree structure for static problem
Linear space
)(B
)(B
Lars Arge
I/O-algorithms
15
External Priority Search Tree
• Query with (q1, q2, q3) starting at root v:
– Query O(B2)-structure and report points satisfying query
– Visit child v if
* v on path to q1 or q2
* All points corresponding to v satisfy query
Lars Arge
I/O-algorithms
16
External Priority Search Tree• Analysis:
– I/Os used to visit node v
– nodes on path to q1 or q2
– For each node v not on path to q1 or q2 visited, B points reported in parent(v)
query
)1()(log 2B
TB
TB
vv OBO )(log NO B
)(log BT
B NO
Lars Arge
I/O-algorithms
17
External Priority Search Tree• Insert (x,y) (ignoring insert in base tree - rebalancing):
– Find relevant node v:
* Query O(B2)-structure to find
B points in root corresponding
to node u on path to x
* If y smaller than y-coordinates
of all B points then recursively
search in u
– Insert (x,y) in O(B2)-structure of v
– If O(B2)-structure contains >B points for child u, remove lowest point and insert recursively in u
• Delete: Similarly
u
Lars Arge
I/O-algorithms
18
• Analysis:
– Query visits nodes
– O(B2)-structure queried/updated in each node
* One query
* One insert and one delete
• O(B2)-structure analysis:
– Query:
– Update in O(1) I/Os using update
block and global rebuilding
I/Os
External Priority Search Tree
u
)(log NO B
)1()/(log 2 OBBBO B
)(log NO B
Lars Arge
I/O-algorithms
19
Dynamic Base Tree• Deletion:
– Delete point as previously
– Delete x-coordinate from base
tree using global rebuilding
I/Os amortized
• Insertion:
– Insert x-coordinate in base tree
and rebalance (using splits)
– Insert point as previously
• Split: Boundary in v becomes boundary in parent(v)
)(log NO B
v
v’’v’
Lars Arge
I/O-algorithms
20
Dynamic Base Tree• Split: When v splits B new points needed in parent(v)
• One point obtained from v’ (v’’) using “bubble-up” operation:
– Find top point p in v’
– Insert p in O(B2)-structure
– Remove p from O(B2)-structure of v’
– Recursively bubble-up point to v’
• Bubble-up in I/Os
– Follow one path from v to leaf
– Uses O(1) I/O in each node
Split in I/Os
v’’v’
))((log vwO B
))(())(log( vwOvwBO B
Lars Arge
I/O-algorithms
21
Dynamic Base Tree• O(1) amortized split cost:
– Cost: O(w(v))
– Weight balanced base tree: inserts below v between splits
• External Priority Search Tree
– Space: O(N/B)
– Query:
– Updates: I/Os amortized
• Amortization can be removed from update bound in several ways
– Utilizing lazy rebuilding
))(( vw
)(log NO B
)(log BT
B NO
v’’v’
Lars Arge
I/O-algorithms
22
Two-Dimensional Range Search• We have now discussed structures for special cases of two-
dimensional range searching
– Space: O(N/B)
– Query:
– Updates:
• Cannot be obtained for general 2d range searching:
– query requires space
– space requires query
q3
q2q1q
q
q3
q2q1
q4
)(loglog
logN
NBN
BB
BΩ)(log NO cB
)(BNO )( B
N
)(log NO B
)(log BT
B NO
Lars Arge
I/O-algorithms
23
• Base tree: Weight balanced tree with branching parameter
and leaf parameter B on x-coordinates
height
• Points below each node stored in 4 linear space secondary structures:
– “Right” priority search tree
– “Left” priority search tree
– B-tree on y-coordinates
– Interval tree
space
External Range Tree
)(loglog
logN
N
BB
BO
)(log NB
)(loglog
logN
NBN
BB
BO
)(log NB
Lars Arge
I/O-algorithms
24
• Secondary interval tree structure:
– Connect points in each slab in y-order
– Project obtained segments in y-axis
– Intervals stored in interval tree
* Interval augmented with pointer to corresponding points in y-coordinate B-tree in corresponding child node
External Range Tree
)(log NB
Lars Arge
I/O-algorithms
25
• Query with (q1, q2, q3 , q4) answered in top node with q1 and q2 in different slabs v1 and v2
• Points in slab v1
– Found with 3-sided query in v1
using right priority search tree
• Points in slab v2
– Found with 3-sided query in v2
using left priority search tree
• Points in slabs between v1 and v2
– Answer stabbing query with q3 using interval tree
first point above q3 in each of the slabs
– Find points using y-coordinate B-tree in slabs
External Range Tree
)(log NO B
)(log NO B
)(log NB
v1 v2
Lars Arge
I/O-algorithms
26
External Range Tree• Query analysis:
– I/Os to find relevant node
– I/Os to answer two 3-sided queries
– I/Os to query interval tree
– I/Os to traverse B-trees
I/Os
)(log NO B
)(log BT
B NO )(log)(log log NONO BB
NB
B )(log B
TB NO )(log NO B
)(log BT
B NO )(log NB
v1 v2
Lars Arge
I/O-algorithms
27
External Range Tree• Insert:
– Insert x-coordinate in weight-balanced B-tree
* Split of v can be performed in I/Os
I/Os
– Update secondary structures in all nodes on one root-leaf path
* Update priority search trees
* Update interval tree
* Update B-tree
I/Os
• Delete:
– Similar and using global rebuilding
)(loglog
logN
N
BB
BO)(
logloglog2
NN
BB
BO
))(log)(( vwvwO B
)(loglog
log2
NN
BB
BO
)(log NB
v1 v2
Lars Arge
I/O-algorithms
28
Summary: External Range Tree• 2d range searching in space
– I/O query
– I/O update
• Optimal among query structures
)(loglog
log2
NN
BB
BO
)(log BT
B NO )(
logloglog
NN
BN
BB
BO
)(log BT
B NO
q3
q2q1
q4
Lars Arge
I/O-algorithms
29
kdB-tree
• kd-tree:
– Recursive subdivision of point-set into two half using vertical/horizontal line
– Horizontal line on even levels, vertical on uneven levels
– One point in each leaf
Linear space and logarithmic height
Lars Arge
I/O-algorithms
30
kd-Tree: Query
• Query
– Recursively visit nodes corresponding to regions intersecting query
– Report point in trees/nodes completely contained in query
• Query analysis
– Horizontal line intersect Q(N) = 2+2Q(N/4) = regions
– Query covers T regions I/Os worst-case
)( NO
)( TNO
Lars Arge
I/O-algorithms
31
kdB-tree
• KdB-tree:
– Blocking of kd-tree but with B point in each leaf
• Query as before
– Analysis as before except that each region now contains B points
I/O query)( B
TB
NO
Lars Arge
I/O-algorithms
32
Construction of kdB-tree
• Simple algorithm
– Find median of y-coordinates (construct root)
– Distribute point based on median
– Recursively build subtrees
• Idea in improved algorithm [AAPV01]
– Construct levels at a time using O(N/B) I/Os
– Recursively build subtrees
)log( 2 BN
BNO
)log(BN
BN
BMO
BMlog
Lars Arge
I/O-algorithms
33
Construction of kdB-tree• Sort N points by x- and by y-coordinates using I/Os
• Building levels ( nodes) in O(N/B) I/Os:
1. Construct by grid
with points in each slab
2. Count number of points in each
grid cell and store in memory
3. Find slab s with median x-coordinate
4. Scan slab s to find median x-coordinate and construct node
5. Split slab containing median x-coordinate and update counts
6. Recurse on each side of median x-coordinate using grid (step 3) Grid grows to during algorithm Each node constructed in I/Os
BMlog
)log(BN
BN
BMO
BM
BM
BM
N
BM
)( BM
BM
BM
BM
))/(( BNO BM
Lars Arge
I/O-algorithms
34
kdB-tree
• kdB-tree:
– Linear space
– Query in I/Os
– Construction in I/Os
• Dynamic?
– Deletes in I/Os using global rebuilding
– Insertions?
)( BT
BNO
)log()log( NOO BBN
BN
BN
BM
)(log NO B
Lars Arge
I/O-algorithms
35
Insertions using Logarithmic Method• Partition pointset S into subsets S0, S1, … Slog N, |Si| = 2i or |Si| = 0
• Build kdB-tree Di on Si
• Operations:
– Query: Query each Di
– Insert: Find first empty Di and construct Di out of
elements in S0,S1, … Si-1
– I/Os per moved point
– Point moved O(log N) times
I/Os amortized
– Delete: Delete from relevant Di (ignore in log-method)
..................................
0 2222 1 2 log N
iij
j 221 10
)()(log
02
BT
BN
BTN
i B OO ii
)2log( 2 iBB
iO )2log( 1 i
BBO
)(log)log2log( 21 NONO Bi
BB
)(log NO B
Lars Arge
I/O-algorithms
36
O-Tree Structure• O-tree:
– B-tree on vertical slabs
– B-tree on horizontal slabs in each vertical slab
– kdB-tree on points in each leaf
NBBN log
)log( NBBN
)log()( 2)log( 2 NB BN
NBB
N
NBBN 2log
)log( NBBN
NB B2log
Lars Arge
I/O-algorithms
37
O-Tree Query• Perform rangesearch with q1 and q2 in vertical B-tree
– Query all kdB-trees in leaves of two horizontal B-trees with x-interval intersected but not spanned by query
– Perform rangesearch with q3 and q4 horizontal B-trees with x-interval spanned by query
* Query all kdB-trees with range intersected by query
NBBN log
NBBN 2log
NB B2log
Lars Arge
I/O-algorithms
38
O-Tree Query Analysis• Vertical B-tree query:
• Query of all kdB-trees in leaves of two horizontal B-trees:
• Query horizontal B-trees:
• Query kdB-trees not completely in query
• Query in kdB-trees completely
contained in query:
I/Os
)())log((log BN
BBN
B ONO
)()()log()log( 2BT
BN
BT
BBBN OOBNBONO
)log( NO BBN
)())log((log)log( BN
BBN
BBBN ONONO
)()()log()log(2 2BT
BN
BT
BBBN OOBNBONO
)(BTO
)(BT
BNO
)log(2 NO BBN
Lars Arge
I/O-algorithms
39
O-Tree Update• Insert:
– Search in vertical B-tree: I/Os
– Search in horizontal B-tree: I/Os
– Insert in kdB-tree: I/Os
• Use global rebuilding when structures grow too big/small
– B-trees not contain elements
– kdB-trees not contain elements
I/Os
• Deletes can be handled
in I/Os similarly
)log( NBBN
)log( 2 NB B
)(log NO B
)(log NO B
)(log))log((log 22 NONBO BBB
)(log NO B
)(log NO B
Lars Arge
I/O-algorithms
40
Summary: O-Tree• 2d range searching in linear space
– I/O query
– I/O update
• Optimal among structures
using linear space
• Can be extended to work in d-dimensions
with optimal query bound
q3
q2q1
q4
)(log NO B
)(BT
BNO
))((11
BT
BN dO
Lars Arge
I/O-algorithms
41
Summary: 3 and 4-sided Range Search• 3-sided 2d range searching: External priority search tree
– query, space, update
• General (4-sided) 2d range searching:
– External range tree: query, space,
update
– O-tree: query, space, update
q3
q2q1
q3
q2q1
q4
)(loglog
logN
NBN
BB
BO)(log BT
B NO
)(BNO)( B
TB
NO
)(log NO B)(log BT
B NO
)(log NO B
)(loglog
log2
NN
BB
BO
)(BNO
Lars Arge
I/O-algorithms
42
Techniques (one final time)• Tools:
– B-trees
– Persistent B-trees
– Buffer trees
– Logarithmic method
– Weight-balanced B-trees
– Global rebuilding
• Techniques:
– Bootstrapping
– Filtering
q3
q2q1
q3
q2q1
q4
(x,x)
Lars Arge
I/O-algorithms
43
Other results• Many other results for e.g.
– Higher dimensional range searching
– Range counting
– Halfspace (and other special cases) of range searching
– Structures for moving objects
– Proximity queries
– Structures for objects other than points (bounding rectangles)
• Many heuristic structures in database community
• Implementation efforts:
– LEDA-SM (MPI)
– TPIE (Duke/Aarhus)
Recommended