21
Static Dictionaries Collection of items. Each item is a pair. (key, element) Pairs have different keys. Operations are: initialize/create get (search)

Static Dictionaries

  • Upload
    casper

  • View
    67

  • Download
    1

Embed Size (px)

DESCRIPTION

Static Dictionaries. Collection of items. Each item is a pair. (key, element) Pairs have different keys. Operations are: initialize/create get (search). Hashing. Perfect hashing (no collisions). Minimal perfect hashing (space = n ). CHD algorithm. - PowerPoint PPT Presentation

Citation preview

Page 1: Static Dictionaries

Static Dictionaries• Collection of items.

• Each item is a pair. (key, element) Pairs have different keys.

• Operations are: initialize/create get (search)

Page 2: Static Dictionaries

Hashing• Perfect hashing (no collisions).

• Minimal perfect hashing (space = n).

• CHD (compress, hash, and displace) algorithm.

• O(n) time to construct the perfect or minimal perfect hash function.

• O(1) search time.

• Bothelo, Belazzougui & Dietzfelbinger. Compress, hash, and displace.  17th European Symposium on Algorithms, 2009.

Page 3: Static Dictionaries

Search Tree

• Hashing not efficient for extended operations such as range search and nearest match.

• Will examine a binary search tree structure for static dictionaries.

• Each item/key/element has an estimated access frequency (or probability).

Page 4: Static Dictionaries

ExampleKey Probability

a 0.8

b 0.1

c 0.1

a

b

c

a < b < c

Cost = 0.8 * 2 + 0.1 * 1 + 0.1 * 2

= 1.9

a

b

c

Cost = 0.8 * 1 + 0.1 * 2 + 0.1 * 3

= 1.3

Page 5: Static Dictionaries

Search Types

• Successful. Search for a key that is in the dictionary. Terminates at an internal node.

• Unsuccessful. Search for a key that is not in the dictionary. Terminates at an external/failure node.

f0

a

b

c

f1 f2 f3

Page 6: Static Dictionaries

Internal And External Nodes

• A binary tree with n internal nodes has n + 1 external nodes.

• Let s1, s2, …, sn be the internal nodes, in inorder.• key(s1) < key(s2) < … < key(sn).• Let key(s0) = –infinity and key(sn+1) = infinity.• Let f0, f1, …, fn be the external nodes, in inorder.• fi is reached iff key(si) < search key < key(si+1).

f0

a

b

c

f1 f2 f3

Page 7: Static Dictionaries

Cost Of Binary Search Tree

• Let pi = probability for key(si).

• Let qi = probability for key(si) < search key < key(si+1).

• Sum of ps and qs = 1.

• Cost of tree = 0 <= i <= n qi (level(fi) – 1)

+ <= i <= n pi * level(si)

• Cost = weighted path length.

f0

a

b

c

f1 f2 f3

Page 8: Static Dictionaries

Brute Force Algorithm

• Generate all binary search trees with n internal nodes.

• Compute the weighted path length of each.

• Determine tree with minimum weighted path length.

• Number of trees to examine is O(4n/n1.5).

• Brute force approach is impractical for large n.

Page 9: Static Dictionaries

Dynamic Programming• Keys are a1 < a2 < …< an.

• Let Ti j= least cost tree for ai+1, ai+2, …, aj.

• T0n= least cost tree for a1, a2, …, an.

a3

a5

a6

a7a4f2

f3

f5

f4

f7f6

• Ti j includes pi+1, pi+2, …, pj and qi, qi+1, …, qj.

T2,7

Page 10: Static Dictionaries

• Ti j= least cost tree for ai+1, ai+2, …, aj.

• ci j= cost of Ti j

= i <= u <= j qu (level(fu) – 1)

+ i < u <= j pu * level(su).

• ri j= root of Ti j.

• wi j= weight of Ti j

= sum of ps and qs in Ti j

= pi+1+ pi+2+ …+ pj + qi + qi+1 + … + qj

Terminology

a3

a5

a6

a7a4f2

f3

f5

f4

f7f6

T2,7

Page 11: Static Dictionaries

i = j• Ti j includes pi+1, pi+2, …, pj and qi, qi+1, …, qj.

• Ti i includes qi only.

fiTii

• ci i = cost of Ti i = 0.

• ri i = root of Ti i = 0.

• wi i = weight of Ti i

= sum of ps and qs in Ti i

= qi

Page 12: Static Dictionaries

i < j• Ti j= least cost tree for ai+1, ai+2, …, aj.

• Ti j includes pi+1, pi+2, …, pj and qi, qi+1, …, qj.

• Let ak, i < k <= j, be in the root of Ti j.ak

L R

Ti j

• L includes pi+1, pi+2, …, pk-1 and qi, qi+1, …, qk-1.

• R includes pk+1, pk+2, …, pj and qk, qk+1, …, qj.

a3

a5

a6

a7a4f2

f3

f5

f4

f7f6

Page 13: Static Dictionaries

cost(L)

• L includes pi+1, pi+2, …, pk-1 and qi, qi+1, …, qk-1.

• cost(L) = weighted path length of L when viewed as a stand alone binary search tree.

L

a3

a5

a6

a7a4f2

f3

f5

f4

f7f6

Page 14: Static Dictionaries

Contribution To cij

• ci j = i <= u <= j qu (level(fu) – 1)

+ i < u <= j pu * level(su).

• When L is viewed as a subtree of Ti j , the level of each node is 1 more than when L is viewed as a stand alone tree.

• So, contribution of L to cij is cost(L) + wi k-1.

L

ak

L R

Ti j

Page 15: Static Dictionaries

cij

• Contribution of L to cij is cost(L) + wi k-1.

• Contribution of R to cij is cost(R) + wkj.

• cij = cost(L) + wi k-1 + cost(R) + wkj + pk

= cost(L) + cost(R) + wij

ak

L R

Ti j

a3

a5

a6

a7a4f2

f3

f5

f4

f7f6

Page 16: Static Dictionaries

cij

• cij = cost(L) + cost(R) + wij

• cost(L) = cik-1

• cost(R) = ckj

• cij = cik-1 + ckj + wij

• Don’t know k.• cij = mini < k <= j{cik-1 + ckj} + wij

ak

L R

Ti j

a3

a5

a6

a7a4f2

f3

f5

f4

f7f6

Page 17: Static Dictionaries

cij

• cij = mini < k <= j{cik-1 + ckj} + wij

• rij = k that minimizes right side.

ak

L R

Ti j

Page 18: Static Dictionaries

Computation Of c0n And r0n

• Start with ci i = 0, ri i = 0, wi i = qi, 0 <= i <= n (zero-key trees).

• Use cij = mini < k <= j{cik-1 + ckj} + wij to compute cii+1, ri i+1, 0 <= i <= n – 1 (one-key trees).

• Now use the equation to compute cii+2, ri i+2, 0 <= i <= n – 2 (two-key trees).

• Now use the equation to compute cii+3, ri i+3, 0 <= i <= n – 3 (three-key trees).

• Continue until c0n and r0n(n-key tree) have been computed.

Page 19: Static Dictionaries

Computation Of c0n And r0n

cij, rij, i <= j

1 2 3 4

1

2

3

4

00

Page 20: Static Dictionaries

Complexity

• cij = mini < k <= j{cik-1 + ckj} + wij

• O(n) time to compute one cij.

• O(n2) cijs to compute.

• Total time is O(n3).

• May be reduced to O(n2) by using

cij = min ri,j-1 < k <= ri+1,j {cik-1 + ckj} + wij

Page 21: Static Dictionaries

Construct T0n

• Root is r0n.

• Suppose that r0n = 10.

a10 T0n

T09 T10,n

• Construct T09 and T10,n recursively.

• Time is O(n).