View
227
Download
0
Embed Size (px)
Citation preview
Hashing, Sets, DictionariesCode Cleaning
Expandable Array Stacks and Amortized Analysis
Hashing so far
To store 250 IP addresses in table:
• Pick prime just bigger than 250 (n = 257)
• Pick a1, …, a4 mod 257 (once and for all)
• To hash x = (x1, …, x4):
– Compute u = a1x1 + … + a4x4 mod 257
– Store x in a bucket at myArray[u]
Generalization
Old: To store 250 IP addresses in table
New: store n1 items, each between 0 and N
Generalization
To store store n1 items between 0 and N• Pick prime n just bigger than n1
• Let k = round_up(logn N)– Each “item” can be written as a k-digit number,
base n
• Pick a1, …, ak mod n (once and for all)• To hash x = (x1, …, xk):
– Compute u = a1x1 + … + akxk mod n– Store x in a bucket at myArray[u]
Example
• Store 8 items, each represented by 16 bits (i.e., between 0 and 216 – 1 = 65535)
• Solution: pick p = 11.
• Log11 65535 = 4.625…, so we pick k = 5
• Pick 5 numbers a1, …, a5, mod 11: 3,10, 0, 5, 2
Example (cont.)• Multipliers: 3, 10, 0, 5, 2• Typical “key”: 31905. • Convert to base 11:
– Mod(31905, 11) = 5– Div(31905, 11) = 2900– Mod(2900, 11) = 7– Div (2900, 11) = 263 …
– 3190511 = 21A75 [“A” means “10”]
• Hash = 3*2 + 10*1 + 0*A + 5*7 + 2*6 mod 11 = 63 mod 11 = 7.
In practice
• Usually items aren’t given as integers between 0 and some large number N
• Doing arithmetic (like “finding the digits”) for big numbers (larger than language can represent) is a pain algorithmically
• Frequently have an “identifier” that’s a few bytes long, often encoded as a string of characters
Practice, cont’d
• Assume objects have k-byte identifiers x
• Compute u = a1x1 + … + akxk mod n
• Put (x, object) into hashbucket u
• This works as long as n > 256 = byte size
• Otherwise assumption of unif. distributed hash indexes is wrong
The SET Abstract Data Type
• create(n): creates a new empty set structure, initially empty but capable of holding up to n elements.
• empty(S): checks whether the set S is empty. • size (S): returns the number of elements in S. • element_of (x,S): checks whether the value x is in the
set S. • enumerate (S): yields the elements of S in some
arbitrary order. • add (S,x): adds the element x to S, if it is not there
already. • delete (S,x): removes the element x from S, if it is
there.
Implementing sets
• Can use hashtable:– “create”, “empty”, and “size” are trivial– “enumerate”: take all elements in all buckets– “add” is just “insert”; “delete” is “delete”– is_element is just “find”
DICTIONARY ADT• Create, empty, size as in SET• Still to do:
– Insert(key, value) – Find(key)
• Sometimes called “store” and “fetch”• A dictionary is sometimes called a “map”
– “key” is ‘mapped to’ “value”
• Closely related to a “database”• May allow several values for one key
– Find(key) returns a list of values in this case
Implementing a dictionary
• Create(n)– Build an array of prime size a little more than
n, each entry an empty list– Pick k numbers, mod n, to handle keys of
length k
• Insert(key, value)– Let u = (a1key1 + … + ak keyk) mod n
– Insert (key, value) into array[u]
• Find(key)– Let u = (a1key1 + … + ak keyk) mod n
– Search for (key, *) in array[u]– If you find (key, val), return val– Else return None
• (Modify as appropriate to return list of vals)
Summary
• We can now assume that we can create a SET or a DICT with O(n1) insertion and lookup times whenever we need one
• After this week’s HW, you can further assume that we don’t need to know the size of the SET or the DICT in advance
Example Application: JUMBLE!
JUMBLE
• Input: list of all 5-letter words in English
• Each word represented as an array of five characters
• Output: all words for which no other permutation is a word
Solution
• Start with an empty dictionary
• Foreach word w– Sort letters alphabetically to get wnew– D.insert(wnew, w)
• Foreach word w– Sort alphabetically again to get wnew
• D(wnew) contains anything except w– Skip w
• Else output w
Clean Your Code
• Errors per line ~ constant– Fewer errors overall!
• Easier to grade– More likely to get credit
• Cleaner code = cleaner thinking– Better understanding of material
LCA(u, v)
lca = null
udepth = T.depth(u)
vdepth = T.depth(v)
if (T.isroot(u) = true) or (T.isroot(v) = true) then
lca = T.root
while (lca = null) do
if (u = v) then
lca = u
else
if udepth > vdepth then
u = T.parent(u)
udepth = udepth – 1
else if vdepth > udepth
v = T.parent(v)
vdepth = vdepth – 1
else
u = T.parent(u)
v = T.parent(v)
return lca
LCA(u, v)
lca = null
udepth = T.depth(u)
vdepth = T.depth(v)
if (T.isroot(u) = true) or (T.isroot(v) = true) then
lca = T.root
while (lca = null) do
if (u = v) then
lca = u
else
if udepth > vdepth then
u = T.parent(u)
udepth = udepth – 1
else if vdepth > udepth
v = T.parent(v)
vdepth = vdepth – 1
else
u = T.parent(u)
v = T.parent(v)
return lca
LCA(u, v, T)
lca = null
udepth = T.depth(u)
vdepth = T.depth(v)
if (T.isroot(u) = true) or (T.isroot(v) = true) then
lca = T.root
while (lca = null) do
if (u = v) then
lca = u
else
if udepth > vdepth then
u = T.parent(u)
udepth = udepth – 1
else if vdepth > udepth
v = T.parent(v)
vdepth = vdepth – 1
else
u = T.parent(u)
v = T.parent(v)
return lca
Needlessly complex
LCA(u, v, T)
lca = null
udepth = T.depth(u)
vdepth = T.depth(v)
if (T.isroot(u) = true) or (T.isroot(v) = true) then
lca = T.root
while (lca = null) do
if (u = v) then
lca = u
else
if T.depth(u) > T.depth(v) then
u = T.parent(u)
else if T.depth(v) > T.depth(u)
v = T.parent(v)
else
u = T.parent(u)
v = T.parent(v)
return lca
Now irrelevant
LCA(u, v, T)
lca = null
if (T.isroot(u) = true) or (T.isroot(v) = true) then
lca = T.root
while (lca = null) do
if (u = v) then
lca = u
else
if T.depth(u) > T.depth(v) then
u = T.parent(u)
else if T.depth(v) > T.depth(u)
v = T.parent(v)
else
u = T.parent(u)
v = T.parent(v)
return lca
LCA(u, v, T)
lca = null
if (T.isroot(u) = true) or (T.isroot(v) = true) then
lca = T.root
while (lca = null) do
if (u = v) then
lca = u
else
if T.depth(u) > T.depth(v) then
u = T.parent(u)
else if T.depth(v) > T.depth(u)
v = T.parent(v)
else
u = T.parent(u)
v = T.parent(v)
return lca
Redundant
LCA(u, v, T)
lca = null
if T.isroot(u) or T.isroot(v) then
lca = T.root
while (lca = null) do
if (u = v) then
lca = u
else
if T.depth(u) > T.depth(v) then
u = T.parent(u)
else if T.depth(v) > T.depth(u)
v = T.parent(v)
else
u = T.parent(u)
v = T.parent(v)
return lca
LCA(u, v, T)
lca = null
if T.isroot(u) or T.isroot(v) then
lca = T.root
while (lca = null) do
if (u = v) then
lca = u
else
if T.depth(u) > T.depth(v) then
u = T.parent(u)
else if T.depth(v) > T.depth(u)
v = T.parent(v)
else
u = T.parent(u)
v = T.parent(v)
return lca
it’s the answer; return it!
LCA(u, v, T)
lca = null
if T.isroot(u) or T.isroot(v) then
lca = T.root
return lca
while (lca = null) do
if (u = v) then
lca = u
else
if T.depth(u) > T.depth(v) then
u = T.parent(u)
else if T.depth(v) > T.depth(u)
v = T.parent(v)
else
u = T.parent(u)
v = T.parent(v)
return lca
LCA(u, v, T)
lca = null
if T.isroot(u) or T.isroot(v) then
lca = T.root
return lca
while (lca = null) do
if (u = v) then
lca = u
return lca
else
if T.depth(u) > T.depth(v) then
u = T.parent(u)
else if T.depth(v) > T.depth(u)
v = T.parent(v)
else
u = T.parent(u)
v = T.parent(v)
return lca
Condition is irrelevant
LCA(u, v, T)
lca = null
if T.isroot(u) or T.isroot(v) then
lca = T.root
return lca
repeat
if (u = v) then
lca = u
return lca
else
if T.depth(u) > T.depth(v) then
u = T.parent(u)
else if T.depth(v) > T.depth(u)
v = T.parent(v)
else
u = T.parent(u)
v = T.parent(v)
lca is no longer used!
LCA(u, v, T)
if T.isroot(u) or T.isroot(v) then
return T.root
repeat
if (u = v) then
return u
else
if T.depth(u) > T.depth(v) then
u = T.parent(u)
else if T.depth(v) > T.depth(u)
v = T.parent(v)
else
u = T.parent(u)
v = T.parent(v)
LCA(u, v, T)
if T.isroot(u) or T.isroot(v) then
return T.root
repeat
if (u = v) then
return u
else
if T.depth(u) > T.depth(v) then
u = T.parent(u)
else if T.depth(v) > T.depth(u)
v = T.parent(v)
else
u = T.parent(u)
v = T.parent(v)
LCA(u, v, T)
while T.depth(u) > T.depth(v)
u = T.parent(u)
while T.depth(v) > T.depth(u)
v = T.parent(v)
if T.isroot(u) or T.isroot(v) then
return T.root
repeat
if (u = v) then
return u
else
u = T.parent(u)
v = T.parent(v)
LCA(u, v, T)
while T.depth(u) > T.depth(v)
u = T.parent(u)
while T.depth(v) > T.depth(u)
v = T.parent(v)
if T.isroot(u) or T.isroot(v) or (u = v) then
return u
repeat
[OOPS!]
else
u = T.parent(u)
v = T.parent(v)
LCA(u, v, T)
while T.depth(u) > T.depth(v)
u = T.parent(u)
while T.depth(v) > T.depth(u)
v = T.parent(v)
if T.isroot(u) or T.isroot(v) or (u = v) then
return u
else return LCA(T.parent(u), T.parent(v), T)
Not needed
LCA(u, v, T)
while T.depth(u) > T.depth(v)
u = T.parent(u)
while T.depth(v) > T.depth(u)
v = T.parent(v)
if T.isroot(u) or (u = v) then
return u
else return LCA(T.parent(u), T.parent(v), T)
LCA(u, v, T)
while T.depth(u) > T.depth(v)
u = T.parent(u)
while T.depth(v) > T.depth(u)
v = T.parent(v)
if (u = v) then
return u
else return LCA(T.parent(u), T.parent(v), T)
Called during recursion, but no effect
LCA(u, v, T)
while T.depth(u) > T.depth(v)
u = T.parent(u)
while T.depth(v) > T.depth(u)
v = T.parent(v)
return LCAsimple(T.parent(u), T.parent(v), T)
LCAsimple(u, v, T)
# LCA for case where u and v have same height
if (u = v) return u
else return LCAsimple(T.parent(u), T.parent(v), T)
DONE!
STACK
• Stack operations:– Push, pop, size, isEmpty()
• (Partial) Implementation: – Array-based stack
ArrayStack
INIT:data = array[20]Count = 0; // next empty space-------------------------------------------------------------Push(obj o): if count < 20 data[count] = o count++ else ERROR(“Overfull Stack”)
ArrayStack
pop():
if count == 0ERROR(“Can’t pop from empty Stack”)
else
count--;
return data[count+1];
ArrayStack
size():
return count
isEmpty()
return count == 0
Analysis
ArrayStack
INIT:data = array[20]Count = 0; // next empty space-------------------------------------------------------------Push(obj o): if count < 20 data[count] = o count++ else ERROR(“Overfull Stack”)
O(n 1)
ArrayStack
pop():
if count == 0ERROR(“Can’t pop from empty Stack”)
else
count--;
return data[count+1];
O(n 1)
ArrayStack
size():
return count
isEmpty()
return count == 0
O(n 1)
O(n 1)
Summary
• Fast but not very useful
ExpandableArrayStack
INIT:
data = array[20]
Count = 0; // next empty space
Capacity = 20
Push
Push(obj o): if count < capacity data[count] = o count++ else
d2 = new Array[capacity+1] for j = 0 to capacity
d2[j] = data[j] capacity = capacity + 1 data = d2 push(o)
Expandable Array Stack
• All other operations remain the same
Analysis
• In the worst case, the time taken is O(n n)
• If we insert items 21, 22, …, 20+k, we’ll have done k operations, with total work 21+22+…+ (20+k) = (20+1) + (20+2) + …(20+k) =20k + (1+2+…+k) = 20k + k(k+1)/2 = O(k k^2)
• So average time is O(k k) as well!
Better: avoid frequent expansion
• Instead of adding a little space, add a lot!
• Double array size when it gets full
DoublingArrayStack: Push
Push(obj o): if count < capacity data[count] = o count++ else
d2 = new Array[2*capacity] for j = 0 to capacity
d2[j] = data[j] capacity = 2*capacity data = d2 push(o)
Doubling Array Stack
• All other operations remain the same
Analysis
Push(obj o): if count < capacity data[count] = o count++ else
d2 = new Array[2*capacity] for j = 0 to capacity
d2[j] = data[j] capacity = 2*capacity data = d2 push(o)
O(n 1)
O(n n)
Analysis
• In the worst case, the time taken is O(n n)
• But over the course of many operations, average time per operation is O(n 1)
“Total Work Analysis”
• If we have an array with n elements
• …and do n operations
• …then total work is no more than 4n.
• Work per operation, on average, is 4.
Alternative view
• “Amortized” analysis:– For each operation that takes one unit of time
• Place an extra unit of time “in the bank”
– By the time an expensive operation arrives• Use your savings to pay for it
• Alternative view: – When you do an expensive operation
• Pay one unit now• Pay an extra unit for each of the next n operations
Language
• For hashing: “the ‘find’ operation runs in expected O(n 1) time”
• For doubling array stacks: “the ‘push’ operation runs in O(n 1) amortized time, with O(n n) worst-case time.”
Pixel boundaries (if time)