Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Hashing, Sets, DictionariesCode Cleaning

Expandable Array Stacks and Amortized Analysis

Hashing so far

To store 250 IP addresses in table:

• Pick prime just bigger than 250 (n = 257)

• Pick a1, …, a4 mod 257 (once and for all)

• To hash x = (x1, …, x4):

– Compute u = a1x1 + … + a4x4 mod 257

– Store x in a bucket at myArray[u]

Generalization

Old: To store 250 IP addresses in table

New: store n1 items, each between 0 and N

Generalization

To store store n1 items between 0 and N• Pick prime n just bigger than n1

• Let k = round_up(logn N)– Each “item” can be written as a k-digit number,

base n

• Pick a1, …, ak mod n (once and for all)• To hash x = (x1, …, xk):

– Compute u = a1x1 + … + akxk mod n– Store x in a bucket at myArray[u]

Example

• Store 8 items, each represented by 16 bits (i.e., between 0 and 216 – 1 = 65535)

• Solution: pick p = 11.

• Log11 65535 = 4.625…, so we pick k = 5

• Pick 5 numbers a1, …, a5, mod 11: 3,10, 0, 5, 2

Example (cont.)• Multipliers: 3, 10, 0, 5, 2• Typical “key”: 31905. • Convert to base 11:

– Mod(31905, 11) = 5– Div(31905, 11) = 2900– Mod(2900, 11) = 7– Div (2900, 11) = 263 …

– 3190511 = 21A75 [“A” means “10”]

• Hash = 3*2 + 10*1 + 0*A + 5*7 + 2*6 mod 11 = 63 mod 11 = 7.

In practice

• Usually items aren’t given as integers between 0 and some large number N

• Doing arithmetic (like “finding the digits”) for big numbers (larger than language can represent) is a pain algorithmically

• Frequently have an “identifier” that’s a few bytes long, often encoded as a string of characters

Practice, cont’d

• Assume objects have k-byte identifiers x

• Compute u = a1x1 + … + akxk mod n

• Put (x, object) into hashbucket u

• This works as long as n > 256 = byte size

• Otherwise assumption of unif. distributed hash indexes is wrong

The SET Abstract Data Type

• create(n): creates a new empty set structure, initially empty but capable of holding up to n elements.

• empty(S): checks whether the set S is empty. • size (S): returns the number of elements in S. • element_of (x,S): checks whether the value x is in the

set S. • enumerate (S): yields the elements of S in some

arbitrary order. • add (S,x): adds the element x to S, if it is not there

already. • delete (S,x): removes the element x from S, if it is

there.

Implementing sets

• Can use hashtable:– “create”, “empty”, and “size” are trivial– “enumerate”: take all elements in all buckets– “add” is just “insert”; “delete” is “delete”– is_element is just “find”

DICTIONARY ADT• Create, empty, size as in SET• Still to do:

– Insert(key, value) – Find(key)

• Sometimes called “store” and “fetch”• A dictionary is sometimes called a “map”

– “key” is ‘mapped to’ “value”

• Closely related to a “database”• May allow several values for one key

– Find(key) returns a list of values in this case

Implementing a dictionary

• Create(n)– Build an array of prime size a little more than

n, each entry an empty list– Pick k numbers, mod n, to handle keys of

length k

• Insert(key, value)– Let u = (a1key1 + … + ak keyk) mod n

– Insert (key, value) into array[u]

• Find(key)– Let u = (a1key1 + … + ak keyk) mod n

– Search for (key, *) in array[u]– If you find (key, val), return val– Else return None

• (Modify as appropriate to return list of vals)

Summary

• We can now assume that we can create a SET or a DICT with O(n1) insertion and lookup times whenever we need one

• After this week’s HW, you can further assume that we don’t need to know the size of the SET or the DICT in advance

Example Application: JUMBLE!

JUMBLE

• Input: list of all 5-letter words in English

• Each word represented as an array of five characters

• Output: all words for which no other permutation is a word

Solution

• Start with an empty dictionary

• Foreach word w– Sort letters alphabetically to get wnew– D.insert(wnew, w)

• Foreach word w– Sort alphabetically again to get wnew

• D(wnew) contains anything except w– Skip w

• Else output w

Clean Your Code

• Errors per line ~ constant– Fewer errors overall!

• Easier to grade– More likely to get credit

• Cleaner code = cleaner thinking– Better understanding of material

LCA(u, v)

lca = null

udepth = T.depth(u)

vdepth = T.depth(v)

if (T.isroot(u) = true) or (T.isroot(v) = true) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if udepth > vdepth then

u = T.parent(u)

udepth = udepth – 1

else if vdepth > udepth

v = T.parent(v)

vdepth = vdepth – 1

else

u = T.parent(u)

v = T.parent(v)

return lca

LCA(u, v)

lca = null

udepth = T.depth(u)

vdepth = T.depth(v)


lca = T.root


if (u = v) then

lca = u

else


u = T.parent(u)



v = T.parent(v)


else

u = T.parent(u)

v = T.parent(v)

return lca

LCA(u, v, T)

lca = null

udepth = T.depth(u)

vdepth = T.depth(v)


lca = T.root


if (u = v) then

lca = u

else


u = T.parent(u)



v = T.parent(v)


else

u = T.parent(u)

v = T.parent(v)

return lca

Needlessly complex

LCA(u, v, T)

lca = null

udepth = T.depth(u)

vdepth = T.depth(v)


lca = T.root


if (u = v) then

lca = u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

Now irrelevant

LCA(u, v, T)

lca = null


lca = T.root


if (u = v) then

lca = u

else


u = T.parent(u)


v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

LCA(u, v, T)

lca = null


lca = T.root


if (u = v) then

lca = u

else


u = T.parent(u)


v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

Redundant

LCA(u, v, T)

lca = null

if T.isroot(u) or T.isroot(v) then

lca = T.root


if (u = v) then

lca = u

else


u = T.parent(u)


v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

LCA(u, v, T)

lca = null


lca = T.root


if (u = v) then

lca = u

else


u = T.parent(u)


v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

it’s the answer; return it!

LCA(u, v, T)

lca = null


lca = T.root

return lca


if (u = v) then

lca = u

else


u = T.parent(u)


v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

LCA(u, v, T)

lca = null


lca = T.root

return lca


if (u = v) then

lca = u

return lca

else


u = T.parent(u)


v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

Condition is irrelevant

LCA(u, v, T)

lca = null


lca = T.root

return lca

repeat

if (u = v) then

lca = u

return lca

else


u = T.parent(u)


v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

lca is no longer used!

LCA(u, v, T)


return T.root

repeat

if (u = v) then

return u

else


u = T.parent(u)


v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

LCA(u, v, T)


return T.root

repeat

if (u = v) then

return u

else


u = T.parent(u)


v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

LCA(u, v, T)

while T.depth(u) > T.depth(v)

u = T.parent(u)

while T.depth(v) > T.depth(u)

v = T.parent(v)


return T.root

repeat

if (u = v) then

return u

else

u = T.parent(u)

v = T.parent(v)

LCA(u, v, T)


u = T.parent(u)


v = T.parent(v)

if T.isroot(u) or T.isroot(v) or (u = v) then

return u

repeat

[OOPS!]

else

u = T.parent(u)

v = T.parent(v)

LCA(u, v, T)


u = T.parent(u)


v = T.parent(v)

if T.isroot(u) or T.isroot(v) or (u = v) then

return u

else return LCA(T.parent(u), T.parent(v), T)

Not needed

LCA(u, v, T)


u = T.parent(u)


v = T.parent(v)

if T.isroot(u) or (u = v) then

return u


LCA(u, v, T)


u = T.parent(u)


v = T.parent(v)

if (u = v) then

return u


Called during recursion, but no effect

LCA(u, v, T)


u = T.parent(u)


v = T.parent(v)

return LCAsimple(T.parent(u), T.parent(v), T)

LCAsimple(u, v, T)

# LCA for case where u and v have same height

if (u = v) return u

else return LCAsimple(T.parent(u), T.parent(v), T)

DONE!

STACK

• Stack operations:– Push, pop, size, isEmpty()

• (Partial) Implementation: – Array-based stack

ArrayStack

INIT:data = array[20]Count = 0; // next empty space-------------------------------------------------------------Push(obj o): if count < 20 data[count] = o count++ else ERROR(“Overfull Stack”)

ArrayStack

pop():

if count == 0ERROR(“Can’t pop from empty Stack”)

else

count--;

return data[count+1];

ArrayStack

size():

return count

isEmpty()

return count == 0

Analysis

ArrayStack

INIT:data = array[20]Count = 0; // next empty space-------------------------------------------------------------Push(obj o): if count < 20 data[count] = o count++ else ERROR(“Overfull Stack”)

O(n 1)

ArrayStack

pop():

if count == 0ERROR(“Can’t pop from empty Stack”)

else

count--;

return data[count+1];

O(n 1)

ArrayStack

size():

return count

isEmpty()

return count == 0

O(n 1)

O(n 1)

Summary

• Fast but not very useful

ExpandableArrayStack

INIT:

data = array[20]

Count = 0; // next empty space

Capacity = 20

Push

Push(obj o): if count < capacity data[count] = o count++ else

d2 = new Array[capacity+1] for j = 0 to capacity

d2[j] = data[j] capacity = capacity + 1 data = d2 push(o)

Expandable Array Stack

• All other operations remain the same

Analysis

• In the worst case, the time taken is O(n n)

• If we insert items 21, 22, …, 20+k, we’ll have done k operations, with total work 21+22+…+ (20+k) = (20+1) + (20+2) + …(20+k) =20k + (1+2+…+k) = 20k + k(k+1)/2 = O(k k^2)

• So average time is O(k k) as well!

Better: avoid frequent expansion

• Instead of adding a little space, add a lot!

• Double array size when it gets full

DoublingArrayStack: Push


d2 = new Array[2*capacity] for j = 0 to capacity

d2[j] = data[j] capacity = 2*capacity data = d2 push(o)

Doubling Array Stack

• All other operations remain the same

Analysis


d2 = new Array[2*capacity] for j = 0 to capacity

d2[j] = data[j] capacity = 2*capacity data = d2 push(o)

O(n 1)

O(n n)

Analysis

• In the worst case, the time taken is O(n n)

• But over the course of many operations, average time per operation is O(n 1)

“Total Work Analysis”

• If we have an array with n elements

• …and do n operations

• …then total work is no more than 4n.

• Work per operation, on average, is 4.

Alternative view

• “Amortized” analysis:– For each operation that takes one unit of time

• Place an extra unit of time “in the bank”

– By the time an expensive operation arrives• Use your savings to pay for it

• Alternative view: – When you do an expensive operation

• Pay one unit now• Pay an extra unit for each of the next n operations

Language

• For hashing: “the ‘find’ operation runs in expected O(n 1) time”

• For doubling array stacks: “the ‘push’ operation runs in O(n 1) amortized time, with O(n n) worst-case time.”

Pixel boundaries (if time)

Documents

Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis