21
Project Proposals Simonas Šaltenis Aalborg University Nykredit Center for Database Research Department of Computer Science, Aalborg University

Project Proposals Simonas Šaltenis Aalborg University Nykredit Center for Database Research Department of Computer Science, Aalborg University

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Project Proposals

Simonas ŠaltenisAalborg University

Nykredit Center for Database ResearchDepartment of Computer Science, Aalborg University

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 2

Outline

• An overview of the R-tree and the TPR-tree• Project proposals:

Update-Efficient TPR-tree Time-parameterized SS-tree

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 3

Spatial Indexing With the R-Tree

• Example

QueryR1

R2

R1 R2

R3 R4 R5

p6 p7p5p1 p2

Pointers to data tuples

p8

p3 p4 p9 p10p11 p12p13

R6 R7

R3 R4

R5

R6

R7

p1

p7

p6

p8

p2

p3

p4

p5

p9 p10

p11

p12

p13

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 4

R-tree Properties

• Leaf entry = <n- dimensional point, rid >• Non-leaf entry = < n- dim MBR, ptr to a child node >

MBR – a Minimum Bounding Rectangle of all points in the subtreee pointed to by ptr

• R-tree is a balanced tree – all leaves are at same depth from root

• Through insertion and deletion algorithms, nodes are kept at least m% full (except root)

m is usually chosen to be 40%. m is the minimum fill factor, depending on the workload the

average fill factor is usually 70%.

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 5

Internal Nodes

Leaf Nodes

BP1 BPnBP2...

…..

...

Grow-Post Trees

• Union( ) – computes a BP of a coleection of entries (in the R-tree, computes an MBR – minimum and maximum in all dimensions )

• Penalty(BP, E) – returns an estimate how “worse” BP becomes if E is inserted under it

Bounding predicate (BP) = something that describes entries in a subtree

Building blocks of algorithms:• Consistent(BP, Q) – returns true if results of query Q can be under BP (in the R-tree, MBR intersects Q)• PickSplit(node) – splits a page of entries into two groups

R-Tree – a Grow-Post tree

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 6

Range Query in R-trees

• Answering range query Q in R-trees1. Start at the root

2. If current node is non- leaf, for each entry <MBR, ptr>, if Consistent(MBR, Q) , search subtree identified by ptr

3. If current node is leaf, for each entry <E, rid>, if E overlaps Q, rid identifies a point that overlaps Q

• Note: We may have to search several subtrees at each node! (In contrast, a B- tree equality search goes to just one leaf.) Worst-case performance O(n)! But in practice, R-trees exhibit good query performance for

various data sets

• What about insertion and deletion?

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 7

Insert Entry E<point, ptr>• Insertion algorithm

1. cn = root

2. If cn is leaf stop.

3. From all entries in cn choose the one e with the smallest Penalty (e.BP, E). (In R-trees, choose an entry whose MBR needs least enlargement to cover B; resolve ties by going to smallest area child)

4. cn = e.ptr, go to 3.

5. Insert e into cn. Call PropogateUp (cn).

• PropogateUp(cn)1. If cn is overfull, call PickSplit(cn) to produce cn1 and cn2, replace cn’s

old entry in its parent by e1 = Union(cn1), e2 = Union(cn2), call PropogateUp on cn’s parent.

2. Otherwise, if e = Union(cn) is different from cn’s old entry in its parent, replace the old entry with e, call PropogateUp on cn’s parent.

• Create a new root with two entries whenever a root is split.

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 8

Heuristics for Penalty

• Heuristics of least area enlargement and smallest area are used in the R-tree’s Penalty.

R1

R2

R1 R2

R3 R4 R5

p6 p7p5p1 p2

Pointers to data tuples

p8

p3 p4 p9 p10p11 p12p13

R6 R7

R3 R4

R5

R6

R7

p1

p7

p6

p8

p2

p3

p4

p5

p9 p10

p11

p12

p13p14

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 9

Heuristics for Penalty

• Heuristics of least area enlargement and smallest area are used in the R-tree’s Penalty.

R1

R2

R1 R2

R3 R4 R5

p6 p7p5p1 p2

Pointers to data tuples

p8

p3 p4 p9 p10p11 p12p13

R6 R7

R3 R4

R5

R6

R7

p1

p7

p6

p8

p2

p3

p4

p5

p9 p10

p11

p12

p13p14

p14

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 10

Deletion in R-trees

• Delete entry E1. Using the search procedure, find a leaf cn where entry E is

located

2. Remove E from cn. Call PropogateUp(cn).

• PropogateUp(cn)1. If cn is underfull, deallocate the node cn remove cn’s entry in its

parent, call PropogateUp on cn’s parent, and reinsert all cn’s entries or merge them into some other node

2. Otherwise, if e = Union(cn) is different from cn’s old entry in its parent, replace the old entry with e, call PropogateUp on cn’s parent.

• No additional heuristics are involved in Delete, underfull nodes are handled using Insert as a subroutine.

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 11

Modeling Continuous Movement

• In conventional databases, data is assumed constant unless explicitly modified.

• With continuous movement, this is problematic. Too frequent updates Outdated, inacurate data

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 12

Modeling Continuous Movement

• In conventional databases, data is assumed constant unless explicitly modified.

• With continuous movement, this is problematic. Too frequent updates Outdated, inacurate data

• Instead of storing position values, we store positions as functions of time, yielding time-parameterized positions.

We use linear functions to capture the present and future positions.

Updates are necessary only when the parameters of the functions change. For example, given , the current and anticiapted, future position of a two-

dimensional point can be described by four parameters.

where,)()()( 00 nowtttvtxtx

0t

yx vvtytx ,),(),( 00

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 13

Queries

• Type 1: objects that intersect a given rectangle at

• Type 2: objects that intersect a given rectangle sometime from to

• Type 3: objects that intersect a given moving rectangle sometime between and

1t 2t

1t 2t

t

1

23456x

t1 2 3 4 5 6

o1

o1o2

o3

o4

• We can expect, that most queries will be consentrated in the sliding window [CT, CT+W], i.e. CT <= t, t1, t2 <= CT + W

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 14

Time-Parameterized Rectangles

• The TPR-tree is based on the R-tree.

• Moving points are bounded with time-parameterized rectangles.

Are bounding from now on. The R-tree allows overlap.

• The tree employs conservative bounding rectangles.

min

max

min

max

Union( ) :

( ) min . ( )

( ) max . ( )

min .

max .

i c i c

i c i c

i i

i i

node

x t o x t o node

x t o x t o node

v o v o node

v o v o node

min min min

max max max

( ) ( ) ( )

( ) ( ) ( )

i i c i c

i i c i c

x t x t v t t

x t x t v t t

• At any t > tc we can get a valid R-tree: TPBR-tree(t) = R-tree

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 15

Insertion: Grouping Points

• How to group moving points (Penalty and PickSplit)? The R-tree’s algorithms minimize characteristics of MBRs such as

area, overlap, and margin. How does that work for moving points?

7

1

6

5

4

2

3

7

5

6

4

2

31

6

5

4

2

31

7

7

5

6

4

2

31

7

5

6

4

2

31

7

5

6

4

2

31

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 16

Insertion in the TPR-Tree

• The bounding rectangle characteristics (area, overlap, and margin) are functions of time.

• The goal is to minimize these for all time points from now to now+H.

Minimizing the characteristics for time now + H/2 does not work (e.g., the area of a conservative bounding rectangle is not linear).

,Hnow

now

dttA )( where A(t) is, e.g., the area of an MBR

• We use the regular R*-tree algorithms, but all bounding rectangle characteristics are replaced by their integrals.

• What H to use? H depends on the update rate, and on how far queries may reach

into the future (W).

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 18

Outline

• An overview of the R-tree and the TPR-tree• Project proposals:

Update-Efficient TPR-tree Time-parameterized SS-tree

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 19

Update-Efficient TPR-tree

• Handling hyper-dynamic data 500,000 objects; on the average each object updates its positional

info three times per hour => ~400 updates per second

• Update – deletion followed by an insertion• Observations:

Usually object’s positional information does not change too drastically in-between updates

Most of the update cost is due to a search phase of a deletion (several paths down the tree may be followed)

We assume that the object reports it’s previous positional information, so that we know what to delete.

We need to spend I/Os on making bounding predicates as “tight” as possible, although we may be willing to sacrifice query performance

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 20

In-place Updates

• Lazy Update R-tree (LUR-tree): Hash table (on object id’s) is used to access leaf pages directly

(without the search phase of deletion). Update is one operation:

1. Go to the hash table with an object’s id, and get the pointer to the leaf page

2. Update the object’s information in this page or, if object’s information changed too “drastically”, insert it from the top of the tree using the normal insertion procedure

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 21

Problems to Solve Problems (that you have to try to solve, refining and

applying these ideas to the TPR-tree): How do we update bounding rectangles in ancestor nodes?

Possible solution: hash table storing the full path from the root to the leaf

When do we do a real insertion and when an update in place? What do we do when nodes are split/merged? (Can we spend so

many I/Os maintaining our hash table?) Possible solution: Lazy updating of the hash table and use of pointers

to split-off nodes as in R-link trees.

WIM workshop, Gl. Vrå Slot, December 6-8, 2001 22

Time-Parameterized SS-trees

• SS-tree – a Grow-Post tree, where bounding predicates are spheres:

Good for Nearest Neighbor queries Compact description of a bounding predicate (independent of

dimensionality)

• Project – explore time-parameterized SS-trees. Issues to be addressed:

Writing the Consistent method Writing the Penalty method Experimentally comparing with TPR-tree for range queries and NN

queries