Download ppt - Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras , D. Gunopulos and B. Seeger ** * University

Efficient Computation of Temporal

Aggregates with Range Predicates

D. Zhang*, A. Markowetz**, V. J. Tsotras*,

D. Gunopulos* and B. Seeger**

* University of California, Riverside

** Philipps Universität Marburg, Germany

Outline

• Introduction & Motivation

• Problem Decomposition

• The MVSB-tree

• Performance Results

• Conclusions

Introduction & Motivation• Consider a collection of temporal records.

• Each record: key k , value v , time interval [t1 , t2].

• E.g.: employees and their salaries over time.

• Temporal Aggregation: aggregate values over time.

• Focus on SUM/COUNT/AVG.

Introduction & Motivation

4

time

key

1 2

2 3 4

7 5

6

Previous Work

‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00]

4

time

key

1 2

2 3 4 7 5

6


Previous Work

‘Given interval [t1, t2], aggregate over all records that intersect [t1, t2]’. (SB-tree [YW01])

4 6

2

4

time

key

1

2 3

5 7

t2 E.g. the sum at t2 is 13.



Previous Work

3 4

time

key

1 2

2

7 5

6

t1 t2

4

E.g. the sum over [t1 , t2] is 28.

‘Given interval [t1, t2], aggregate over all records that intersect [t1, t2]’. (SB-tree [YW01])

E.g. the sum at t2 is 13.



Range-Temporal Aggregation (RTA)

‘Aggregate over all records intersecting interval [t1, t2] with keys in range [k1, k2]’.

5 7

time

key

1 2

2 3 4

6

k2

k1

t1 t2

4 E.g. the RTA-sum over [k1, k2]x[t1, t2] is 19.


Range-Temporal Aggregation (RTA)

‘Aggregate over all records intersecting interval [t1, t2] with keys in range [k1, k2]’.

5 7

time

key

1 2

2 3 4

6

k2

k1

t1 t2

4 E.g. the RTA-sum over [k1, k2]x[t1, t2] is 19.


• Find AVG salary over past ten years of all employees whose last names start with ‘B’.

• Alternative:


• Previous approaches would need a separate index for each possible key range. (inefficient)

• Our solution: O(logbn).

- index the records;

- selection query: ‘find all records intersecting [k1, k2]x [t1, t2]’.

- Query time is O(n).

Problem Decomposition

LKST query: given k, t, aggregate over all records with keys less than k and intervals containing t.

7

time

key

1 2

2 3 4 5

6

k2

k1

t1 t2

4


• Decompose RTA into LKST and LKLT queries.

E.g. LKST(k2, t2)=11.

7 2 k2

time

key

1 2

3 4 5

6 k1

t1 t2

4

LKLT query: given k, t, aggregate over all records with keys less than k and intervals ending before t.


E.g. LKLT(k2, t2)=20.

5 7

time

key

1 2

2 3 4

6

k2

k1

t1 t2

4

7

time

key

1 2

2 3 4 5

6

k2

k1

t1 t2

4

=

7 2 k2

time

key

1 2

3 4 5

6 k1

t1 t2

4

+

time

key

1 2

2 3 4 7 5

6

k2

k1

t1 t2

4

-


RTA([k1, k2]x[t1, t2])

5 7

time

key

1 2

2 3 4

6

k2

k1

t1 t2

4

7

time

key

1 2

2 3 4 5

6

k2

k1

t1 t2

4

=

7 2 k2

time

key

1 2

3 4 5

6 k1

t1 t2

4

+

time

key

1 2

2 3 4 7 5

6

k2

k1

t1 t2

4

-


RTA([k1, k2]x[t1, t2]) LKST(k2, t2)

5 7

time

key

1 2

2 3 4

6

k2

k1

t1 t2

4

7

time

key

1 2

2 3 4 5

6

k2

k1

t1 t2

4

=

7 2 k2

time

key

1 2

3 4 5

6 k1

t1 t2

4

+

time

key

1 2

2 3 4 7 5

6

k2

k1

t1 t2

4

-


RTA([k1, k2]x[t1, t2]) - LKST(k1, t2)

5 7

time

key

1 2

2 3 4

6

k2

k1

t1 t2

4

7

time

key

1 2

2 3 4 5

6

k2

k1

t1 t2

4

=

7 2 k2

time

key

1 2

3 4 5

6 k1

t1 t2

4

+

time

key

1 2

2 3 4 7 5

6

k2

k1

t1 t2

4

-


RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2)

5 7

time

key

1 2

2 3 4

6

k2

k1

t1 t2

4

7

time

key

1 2

2 3 4 5

6

k2

k1

t1 t2

4

=

7 2 k2

time

key

1 2

3 4 5

6 k1

t1 t2

4

+

time

key

1 2

2 3 4 7 5

6

k2

k1

t1 t2

4

-



LKLT(k2, t2)

5 7

time

key

1 2

2 3 4

6

k2

k1

t1 t2

4

7

time

key

1 2

2 3 4 5

6

k2

k1

t1 t2

4

=

7 2 k2

time

key

1 2

3 4 5

6 k1

t1 t2

4

+

time

key

1 2

2 3 4 7 5

6

k2

k1

t1 t2

4

-



- LKLT(k1, t2)

5 7

time

key

1 2

2 3 4

6

k2

k1

t1 t2

4

7

time

key

1 2

2 3 4 5

6

k2

k1

t1 t2

4

=

7 2 k2

time

key

1 2

3 4 5

6 k1

t1 t2

4

+

time

key

1 2

2 3 4 7 5

6

k2

k1

t1 t2

4

-



LKLT(k2, t2) - LKLT(k1, t2)

5 7

time

key

1 2

2 3 4

6

k2

k1

t1 t2

4

7

time

key

1 2

2 3 4 5

6

k2

k1

t1 t2

4

=

7 2 k2

time

key

1 2

3 4 5

6 k1

t1 t2

4

+

time

key

1 2

2 3 4 7 5

6

k2

k1

t1 t2

4

-



LKLT(k2, t2) - LKLT(k1, t2) LKLT(k2, t1)

5 7

time

key

1 2

2 3 4

6

k2

k1

t1 t2

4

7

time

key

1 2

2 3 4 5

6

k2

k1

t1 t2

4

=

7 2 k2

time

key

1 2

3 4 5

6 k1

t1 t2

4

+

time

key

1 2

2 3 4 7 5

6

k2

k1

t1 t2

4

-



LKLT(k2, t2) - LKLT(k1, t2) - LKLT(k1, t1)

5 7

time

key

1 2

2 3 4

6

k2

k1

t1 t2

4

7

time

key

1 2

2 3 4 5

6

k2

k1

t1 t2

4

=

7 2 k2

time

key

1 2

3 4 5

6 k1

t1 t2

4

+

time

key

1 2

2 3 4 7 5

6

k2

k1

t1 t2

4

-



LKLT(k2, t2) - LKLT(k1, t2) LKLT(k2, t1) - LKLT(k1, t1)

RTA([k1, k2]x[t1, t2]) = LKST(k2, t2) - LKST(k1, t2)

+ LKLT(k2, t2) - LKLT(k1, t2)

- LKLT(k2, t1) + LKLT(k1, t1)

• The RTA query is decomposed to LKST and LKLT.


• Both LKST and LKLT are point queries: ‘given k, t, return value’.

• An index for LKST and LKLT should:

store points in key-time space;maintain a value for each point;support point queries.

Index Design

Index Design

Model• Assume updates come in increasing time order

(transaction-time model).

t1 tmax v k at t1, inserted as:

t1 t2 v k at t2, updated as:

Index Design

t1 t2 v k a record:

The LKST index

at t1

t1 tmax time

key

k +v

kmax

The effect of inserting record (k, [t1, t2], v):

at t2

t1 tmax time

key

k -v

t2

kmax

Index Design

The LKLT index

no update at t1

Index Design

The effect of inserting record (k, [t1, t2], v):

at t2

t1 tmax time

key

k +v

t2

kmax

Update Operation

• Common update operation for both: insert (k, t):v.

Index Design

• That is: add v to all points in [k, t] x [kmax, tmax].• Conclusion: an index supporting point query and

the above update can be used for LKLT and LKST.

The MVSB-tree• A partially persistent SB-tree. It inherits features from

both the SB-tree [YW01] and the MVBT [BGO+96].

The MVSB-tree

0

1 10

1 4 10

2

1 80

20 4

kmax

20

10 1

1 4 2 3

20

0

1

1

0 0

kmax

tmax

1

kmax

20

4 10

0

0

root1: [1, 4)

root2: [4, 10)

0

3 5

1 10 tmax

3

2 15

10 10

10 20

tmax

1

kmax

20

10 tmax

0

3

root3: [10, tmax)

15

5

6 8

5 4

0

3 1 2

10 0

6

Insertion

tmax 1 1

kmax

0

The initial MVSB-tree.

tmax 1 1

kmax

0

after inserting (20, 2):1

2

20 1

0

tmax 1 1

kmax

0

after inserting (10, 3):1 (conceptual view)

2

20 1

0 10

3

2

1 0

tmax 1 1

kmax

0

instead, logical splitting

2

20 1

0 10

3

1 0

The MVSB-tree

Insertion (cont.)

The MVSB-tree

• To handle overflow, copy records with end=tmax to a new page.

tmax 1 1

kmax

0

2

20 1

0 10

3

1 0

Insertion (cont.)

The MVSB-tree

• To handle overflow, copy records with end=tmax to a new page.

tmax 1 1

kmax

0

Overflow after (80, 4):1. 2

20 1

0 10

3

1 0 4

80 1 1

tmax 1

kmax

20 10

1 0

4

80 1 1 copy

• Strong overflow: limit the number of records in a new page.

tmax 1

20

10

0

4

1

tmax 20

kmax

10

2

4

1

tmax

20 kmax

1 4

0 0

root2: [4, tmax)

4 1 1

kmax

0

2

20 1

0 10

3

1 0

root1: [1, 4)

Point Query (k , t )• Follows a single path: the nodes containing (k , t ).• Aggregates the values found in this path.

The MVSB-tree

0

1 10

1 4 10

2

1 80

20 4

kmax

20

10 1

1 4 2 3

20

0

1

1

0 0

kmax

tmax

1

kmax

20

4 10

0

0

root1: [tmin, 4)

root2: [4, 10)

0

3 5

1 10 tmax

3

2 15

10 10

10 20

tmax

1

kmax

20

10 tmax

0

3

root3: [10, tmax)

15

5

6 8

5 4

0

3 1 2

10 0

6

Point Query (k , t )• Follows a single path: the nodes containing (k , t ).

The MVSB-tree

0

1 10

1 4 10

2

1 80

20 4

kmax

20

10 1

1 4 2 3

20

0

1

1

0 0

kmax

tmax

1

kmax

20

4 10

0

0

root1: [tmin, 4)

root2: [4, 10)

0

3 5

1 10 tmax

3

2 15

10 10

10 20

tmax

1

kmax

20

10 tmax

0

3

root3: [10, tmax)

15

5

6 8

5 4

0

3 1 2

10 0

6

• E.g.: PointQuery(23, 7) = 5+2 = 7.

• Aggregates the values found in this path.

Efficiency• Theorem: with 2 MVSBT indices, we achieve:

RTA query: O(logbn);

Update: O(logbK);

Space: O( * logbK).• n = number of updates;

• K= number of different keys;

• b = page capacity (in records).

b

n

The MVSB-tree

Performance Results• Sun Enterprize 250 Server; two 300 Mhz Ultra

SPARC-II processors; Solaris 2.8; GNU C++;

• Datasets: created using the TimeIT [KS98] software and transformed to add record keys.

• Each dataset has a million records (10k unique keys; on average 100 intervals per key).

• Compare against the straightforward approach using the MVBT [BGO+96] as temporal index.

Performance Results

Index Sizes

Performance Results

2KB 4KB 8KB

0

25

50

75

100

125

150

2MVSBT

naïve

Varying page size

Ind

ex

Siz

es

(#M

B)

0.1% 1% 10% 50%

0

50

100

150

200

319

151

763

Varying query rectangle size

Performance Results

Query Speedup

MVSBT

naive

2

• Query time is averaged over 100 queries of the same query rectangle size.

Conclusions• We addressed the range-temporal aggregation (RTA)

problem;

• New index structure (MVSB-tree) for incrementally maintaining and efficiently computing RTAs;

• Query time reduced from O(n) to O(logbn) with small space overhead;

• Open problems: Min/Max range-temporal aggregation; Valid-time environment; Multi-dimensional aggregation over objects with extents.

Download ppt - Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras *, D. Gunopulos * and B. Seeger ** * University

Download ppt - Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras , D. Gunopulos and B. Seeger ** * University