28
October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.1 S. Raskhodnikova and A. Smith LECTURE 23 Augmenting Data Structures Dynamic order statistics Methodology Interval trees Algorithms and Data Structures CSE 465

Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.1

S. Raskhodnikova and A. Smith

LECTURE 23Augmenting Data

Structures• Dynamic order statistics• Methodology• Interval trees

Algorithms and Data StructuresCSE 465

Page 2: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.2

Review Question

• Recall last lectures were on 2-3 trees “Special” search trees that guarantee O(log n) depth

of the tree Two kinds of nodes: “2-nodes” and “3-nodes”

• Question: Suppose we had “4-nodes” What fields would a 4-node have? What ranges would the keys of subtrees lie in?

• “B-trees” consist of nodes with between t-1 and2t-1 keys (where t is some parameter dependingon the application). See CLRS, Chapter 18.

Page 3: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.3

Dynamic order statistics

Dynamic set that supports standard Dictionaryoperations:• INSERT, DELETE, FIND,

as well as order statistic queries:• SELECT(i, S): returns the i th smallest element

in the dynamic set S.• RANK(x, S): returns the rank of x ∈ S in the

sorted order of S’s elements.

Abstract Data Type:

Page 4: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.4

First Attempts

• Keep data in binary search tree To Select ith largest element, just start from left-

most node in tree and skip to its successor i-1 times Running time: Θ(n) time in worst case

• We will see a data structure that implements alloperations in O(log n) time

Page 5: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.5

Augmenting a data structure

DESIGN TECHNIQUE: Start from standard datastructure, and augment it with additional info.

For order stats: Use a binary search tree for the setS, but keep subtree sizes in the nodes.

keysizeNotation for nodes:

Page 6: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.6

Example of an OS-tree

M9

C5

A1

F3

N1

Q1

P3

H1

D1

size[x] = size[left[x]] + size[right[x]] + 1

Page 7: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.7

Selection

OS-SELECT(x, i) ⊳ i th smallest element in thesubtree rooted at x

k ← size[left[x]] + 1 ⊳ k = rank(x)if i = k then return xif i < k

then return OS-SELECT( left[x], i )else return OS-SELECT( right[x], i – k )

(OS-RANK is in the textbook.)

Implementation trick: Use a sentinel (dummyrecord) for NIL such that size[NIL] = 0.

Running time: O(height)

Page 8: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.8

Example

M9

C5

A1

F3

N1

Q1

P3

H1

D1

OS-SELECT(root, 5)

i = 5k = 6

M9

C5

i = 5k = 2

i = 3k = 2

F3

i = 1k = 1

H1H1

Running time = O(h)

Page 9: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.9

Data structure maintenance

Q. Why not keep the ranks themselvesin the nodes instead of subtree sizes?

A. They are hard to maintain when thetree is modified.

Modifying operations: INSERT and DELETE.Strategy: Update subtree sizes wheninserting or deleting.

Page 10: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.10

Example of insertion

M9

C5

A1

F3

N1

Q1

P3

H1

D1

INSERT(“K”)M10

C6

F4

H2

K1

Page 11: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.11

Handling 2-3 trees

• A similar idea can be used in balanced searchtrees such as 2-3 trees. Tricky part is maintaining sizes correctly during

merging and splitting Can be done because size[x] can be computed from

information stored at children of x. Example: splitting an internal node (sizes in red)

8 9 3 7

5

8 9 3 7

530

3636

18 11

Page 12: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.12

Data-structure augmentation

Methodology: (e.g., order-statistics trees)1. Choose an underlying data structure

(2-3 trees).2. Determine additional information to be

stored in the data structure (subtree sizes).3. Verify that this information can be

maintained for modifying operations (INSERT,DELETE).

4. Develop new dynamic-set operations that usethe information (OS-SELECT and OS-RANK).

These steps are guidelines, not rigid rules.

Page 13: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.13

Interval trees

Goal: To maintain a dynamic set of intervals,such as time intervals.

low[i] = 7 10 = high[i]

i = [7, 10]

54 15 22

17118 18

1923

Query: For a given query interval i, find aninterval in the set that overlaps i.

Page 14: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.14

Following the methodology

1. Choose an underlying data structure.• Red-black tree keyed on low (left) endpoint.

intm

2. Determine additional information to bestored in the data structure.• Store in each node x the largest value m[x]

in the subtree rooted at x, as well as theinterval int[x] corresponding to the key.

Page 15: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.15

17,1923

Example interval tree

5,1118

4,88

15,1818

7,1010

22,2323

m[x] = maxhigh[int[x]]m[left[x]]m[right[x]]

Page 16: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.16

Modifying operations

3. Verify that this information can be maintainedfor modifying operations.• INSERT: Fix m’s on the way down.• (For 2-3 trees: recalculate m[x] for affected

nodes when a split is performed)

Total INSERT time = O(lg n); DELETE similar.

Page 17: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.17

New operations

4. Develop new dynamic-set operations that usethe information.INTERVAL-SEARCH(i)

x ← rootwhile x ≠ NIL and (low[i] > high[int[x]]

or low[int[x]] > high[i])do ⊳ i and int[x] don’t overlap

if left[x] ≠ NIL and low[i] ≤ m[left[x]]then x ← left[x]else x ← right[x]

return x

Page 18: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.18

Example 1: INTERVAL-SEARCH([14,16])

17,1923

5,1118

4,88

15,1818

7,1010

22,2323

x

x ← root[14,16] and [17,19] don’t overlap 14 ≤ 18 ⇒ x ← left[x]

Page 19: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.19

Example 1: INTERVAL-SEARCH([14,16])

17,1923

5,1118

4,88

15,1818

7,1010

22,2323

x

[14,16] and [5,11] don’t overlap 14 > 8 ⇒ x ← right[x]

Page 20: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.20

Example 1: INTERVAL-SEARCH([14,16])

17,1923

5,1118

4,88

15,1818

7,1010

22,2323

x

[14,16] and [15,18] overlap return [15,18]

Page 21: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.21

Example 2: INTERVAL-SEARCH([12,14])

17,1923

5,1118

4,88

15,1818

7,1010

22,2323

x

x ← root[12,14] and [17,19] don’t overlap 12 ≤ 18 ⇒ x ← left[x]

Page 22: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.22

Example 2: INTERVAL-SEARCH([12,14])

17,1923

5,1118

4,88

15,1818

7,1010

22,2323

x

[12,14] and [5,11] don’t overlap 12 > 8 ⇒ x ← right[x]

Page 23: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.23

Example 2: INTERVAL-SEARCH([12,14])

17,1923

5,1118

4,88

15,1818

7,1010

22,2323

x

[12,14] and [15,18] don’t overlap 12 > 10 ⇒ x ← right[x]

Page 24: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.24

Example 2: INTERVAL-SEARCH([12,14])

17,1923

5,1118

4,88

15,1818

7,1010

22,2323

x

x = NIL ⇒ no interval thatoverlaps [12,14] exists

Page 25: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.25

Analysis

Time = O(h) = O(lg n), since INTERVAL-SEARCHdoes constant work at each level as it follows asimple path down the tree.List all overlapping intervals:• Search, list, delete, repeat.• Insert them all again at the end.

This is an output-sensitive bound.Best algorithm to date: O(k + lg n).

Time = O(k lg n), where k is the total number ofoverlapping intervals.

Page 26: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.26

Correctness

Theorem. Let L be the set of intervals in theleft subtree of node x, and let R be the set ofintervals in x’s right subtree.• If the search goes right, then

{ i ′ ∈ L : i ′ overlaps i } = ∅.• If the search goes left, then

{i ′ ∈ L : i ′ overlaps i } = ∅⇒ {i ′ ∈ R : i ′ overlaps i } = ∅.

In other words, it’s always safe to take only 1of the 2 children: we’ll either find something,or nothing was to be found.

Page 27: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.27

Correctness proof

Proof. Suppose first that the search goes right.• If left[x] = NIL, then we’re done, since L = ∅.• Otherwise, the code dictates that we must have

low[i] > m[left[x]]. The value m[left[x]]corresponds to the high endpoint of someinterval j ∈ L, and no other interval in L canhave a larger high endpoint than high[ j].

Lhigh[ j] = m[left[x]]

ilow(i)

• Therefore, {i ′ ∈ L : i ′ overlaps i } = ∅.

j

Page 28: Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.28

Proof (continued)

Suppose that the search goes left, and assume that{i ′ ∈ L : i ′ overlaps i } = ∅.

• Then, the code dictates that low[i] ≤ m[left[x]] =high[ j] for some j ∈ L.

• Since j ∈ L, it does not overlap i, and hencehigh[i] < low[ j].

• But, the binary-search-tree property implies thatfor all i ′ ∈ R, we have low[ j] ≤ low[i ′].

• But then {i ′ ∈ R : i ′ overlaps i } = ∅.

L

i ji ′