Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E

October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E. Demaine and C.E. Leiserson L23.1

S. Raskhodnikova and A. Smith

LECTURE 23Augmenting Data

Structures• Dynamic order statistics• Methodology• Interval trees

Algorithms and Data StructuresCSE 465


Review Question

• Recall last lectures were on 2-3 trees “Special” search trees that guarantee O(log n) depth

of the tree Two kinds of nodes: “2-nodes” and “3-nodes”

• Question: Suppose we had “4-nodes” What fields would a 4-node have? What ranges would the keys of subtrees lie in?

• “B-trees” consist of nodes with between t-1 and2t-1 keys (where t is some parameter dependingon the application). See CLRS, Chapter 18.


Dynamic order statistics

Dynamic set that supports standard Dictionaryoperations:• INSERT, DELETE, FIND,

as well as order statistic queries:• SELECT(i, S): returns the i th smallest element

in the dynamic set S.• RANK(x, S): returns the rank of x ∈ S in the

sorted order of S’s elements.

Abstract Data Type:


First Attempts

• Keep data in binary search tree To Select ith largest element, just start from left-

most node in tree and skip to its successor i-1 times Running time: Θ(n) time in worst case

• We will see a data structure that implements alloperations in O(log n) time


Augmenting a data structure

DESIGN TECHNIQUE: Start from standard datastructure, and augment it with additional info.

For order stats: Use a binary search tree for the setS, but keep subtree sizes in the nodes.

keysizeNotation for nodes:


Example of an OS-tree

M9

C5

A1

F3

N1

Q1

P3

H1

D1

size[x] = size[left[x]] + size[right[x]] + 1


Selection

OS-SELECT(x, i) ⊳ i th smallest element in thesubtree rooted at x

k ← size[left[x]] + 1 ⊳ k = rank(x)if i = k then return xif i < k

then return OS-SELECT( left[x], i )else return OS-SELECT( right[x], i – k )

(OS-RANK is in the textbook.)

Implementation trick: Use a sentinel (dummyrecord) for NIL such that size[NIL] = 0.

Running time: O(height)


Example

M9

C5

A1

F3

N1

Q1

P3

H1

D1

OS-SELECT(root, 5)

i = 5k = 6

M9

C5

i = 5k = 2

i = 3k = 2

F3

i = 1k = 1

H1H1

Running time = O(h)


Data structure maintenance

Q. Why not keep the ranks themselvesin the nodes instead of subtree sizes?

A. They are hard to maintain when thetree is modified.

Modifying operations: INSERT and DELETE.Strategy: Update subtree sizes wheninserting or deleting.


Example of insertion

M9

C5

A1

F3

N1

Q1

P3

H1

D1

INSERT(“K”)M10

C6

F4

H2

K1


Handling 2-3 trees

• A similar idea can be used in balanced searchtrees such as 2-3 trees. Tricky part is maintaining sizes correctly during

merging and splitting Can be done because size[x] can be computed from

information stored at children of x. Example: splitting an internal node (sizes in red)

8 9 3 7

5

8 9 3 7

530

3636

18 11


Data-structure augmentation

Methodology: (e.g., order-statistics trees)1. Choose an underlying data structure

(2-3 trees).2. Determine additional information to be

stored in the data structure (subtree sizes).3. Verify that this information can be

maintained for modifying operations (INSERT,DELETE).

4. Develop new dynamic-set operations that usethe information (OS-SELECT and OS-RANK).

These steps are guidelines, not rigid rules.


Interval trees

Goal: To maintain a dynamic set of intervals,such as time intervals.

low[i] = 7 10 = high[i]

i = [7, 10]

54 15 22

17118 18

1923

Query: For a given query interval i, find aninterval in the set that overlaps i.


Following the methodology

1. Choose an underlying data structure.• Red-black tree keyed on low (left) endpoint.

intm

2. Determine additional information to bestored in the data structure.• Store in each node x the largest value m[x]

in the subtree rooted at x, as well as theinterval int[x] corresponding to the key.


17,1923

Example interval tree

5,1118

4,88

15,1818

7,1010

22,2323

m[x] = maxhigh[int[x]]m[left[x]]m[right[x]]


Modifying operations

3. Verify that this information can be maintainedfor modifying operations.• INSERT: Fix m’s on the way down.• (For 2-3 trees: recalculate m[x] for affected

nodes when a split is performed)

Total INSERT time = O(lg n); DELETE similar.


New operations

4. Develop new dynamic-set operations that usethe information.INTERVAL-SEARCH(i)

x ← rootwhile x ≠ NIL and (low[i] > high[int[x]]

or low[int[x]] > high[i])do ⊳ i and int[x] don’t overlap

if left[x] ≠ NIL and low[i] ≤ m[left[x]]then x ← left[x]else x ← right[x]

return x


Example 1: INTERVAL-SEARCH([14,16])

17,1923

5,1118

4,88

15,1818

7,1010

22,2323

x

x ← root[14,16] and [17,19] don’t overlap 14 ≤ 18 ⇒ x ← left[x]



17,1923

5,1118

4,88

15,1818

7,1010

22,2323

x

[14,16] and [5,11] don’t overlap 14 > 8 ⇒ x ← right[x]



17,1923

5,1118

4,88

15,1818

7,1010

22,2323

x

[14,16] and [15,18] overlap return [15,18]



17,1923

5,1118

4,88

15,1818

7,1010

22,2323

x

x ← root[12,14] and [17,19] don’t overlap 12 ≤ 18 ⇒ x ← left[x]



17,1923

5,1118

4,88

15,1818

7,1010

22,2323

x




17,1923

5,1118

4,88

15,1818

7,1010

22,2323

x




17,1923

5,1118

4,88

15,1818

7,1010

22,2323

x

x = NIL ⇒ no interval thatoverlaps [12,14] exists


Analysis

Time = O(h) = O(lg n), since INTERVAL-SEARCHdoes constant work at each level as it follows asimple path down the tree.List all overlapping intervals:• Search, list, delete, repeat.• Insert them all again at the end.

This is an output-sensitive bound.Best algorithm to date: O(k + lg n).

Time = O(k lg n), where k is the total number ofoverlapping intervals.


Correctness

Theorem. Let L be the set of intervals in theleft subtree of node x, and let R be the set ofintervals in x’s right subtree.• If the search goes right, then

{ i ′ ∈ L : i ′ overlaps i } = ∅.• If the search goes left, then

{i ′ ∈ L : i ′ overlaps i } = ∅⇒ {i ′ ∈ R : i ′ overlaps i } = ∅.

In other words, it’s always safe to take only 1of the 2 children: we’ll either find something,or nothing was to be found.


Correctness proof

Proof. Suppose first that the search goes right.• If left[x] = NIL, then we’re done, since L = ∅.• Otherwise, the code dictates that we must have

low[i] > m[left[x]]. The value m[left[x]]corresponds to the high endpoint of someinterval j ∈ L, and no other interval in L canhave a larger high endpoint than high[ j].

Lhigh[ j] = m[left[x]]

ilow(i)

• Therefore, {i ′ ∈ L : i ′ overlaps i } = ∅.

j


Proof (continued)

Suppose that the search goes left, and assume that{i ′ ∈ L : i ′ overlaps i } = ∅.

• Then, the code dictates that low[i] ≤ m[left[x]] =high[ j] for some j ∈ L.

• Since j ∈ L, it does not overlap i, and hencehigh[i] < low[ j].

• But, the binary-search-tree property implies thatfor all i ′ ∈ R, we have low[ j] ≤ low[i ′].

• But then {i ′ ∈ R : i ′ overlaps i } = ∅.

L

i ji ′

Documents

Algorithms and Data Structuresads22/cse465/S07/lecture-notes/CSE465-S...Algorithms and Data Structures CSE 465 October 24, 2005 S. Raskhodnikova and A. Smith. Based on slides by E