60
Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

  • View
    227

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Hashing, Sets, DictionariesCode Cleaning

Expandable Array Stacks and Amortized Analysis

Page 2: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Hashing so far

To store 250 IP addresses in table:

• Pick prime just bigger than 250 (n = 257)

• Pick a1, …, a4 mod 257 (once and for all)

• To hash x = (x1, …, x4):

– Compute u = a1x1 + … + a4x4 mod 257

– Store x in a bucket at myArray[u]

Page 3: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Generalization

Old: To store 250 IP addresses in table

New: store n1 items, each between 0 and N

Page 4: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Generalization

To store store n1 items between 0 and N• Pick prime n just bigger than n1

• Let k = round_up(logn N)– Each “item” can be written as a k-digit number,

base n

• Pick a1, …, ak mod n (once and for all)• To hash x = (x1, …, xk):

– Compute u = a1x1 + … + akxk mod n– Store x in a bucket at myArray[u]

Page 5: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Example

• Store 8 items, each represented by 16 bits (i.e., between 0 and 216 – 1 = 65535)

• Solution: pick p = 11.

• Log11 65535 = 4.625…, so we pick k = 5

• Pick 5 numbers a1, …, a5, mod 11: 3,10, 0, 5, 2

Page 6: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Example (cont.)• Multipliers: 3, 10, 0, 5, 2• Typical “key”: 31905. • Convert to base 11:

– Mod(31905, 11) = 5– Div(31905, 11) = 2900– Mod(2900, 11) = 7– Div (2900, 11) = 263 …

– 3190511 = 21A75 [“A” means “10”]

• Hash = 3*2 + 10*1 + 0*A + 5*7 + 2*6 mod 11 = 63 mod 11 = 7.

Page 7: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

In practice

• Usually items aren’t given as integers between 0 and some large number N

• Doing arithmetic (like “finding the digits”) for big numbers (larger than language can represent) is a pain algorithmically

• Frequently have an “identifier” that’s a few bytes long, often encoded as a string of characters

Page 8: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Practice, cont’d

• Assume objects have k-byte identifiers x

• Compute u = a1x1 + … + akxk mod n

• Put (x, object) into hashbucket u

• This works as long as n > 256 = byte size

• Otherwise assumption of unif. distributed hash indexes is wrong

Page 9: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

The SET Abstract Data Type

• create(n): creates a new empty set structure, initially empty but capable of holding up to n elements.

• empty(S): checks whether the set S is empty. • size (S): returns the number of elements in S. • element_of (x,S): checks whether the value x is in the

set S. • enumerate (S): yields the elements of S in some

arbitrary order. • add (S,x): adds the element x to S, if it is not there

already. • delete (S,x): removes the element x from S, if it is

there.

Page 10: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Implementing sets

• Can use hashtable:– “create”, “empty”, and “size” are trivial– “enumerate”: take all elements in all buckets– “add” is just “insert”; “delete” is “delete”– is_element is just “find”

Page 11: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

DICTIONARY ADT• Create, empty, size as in SET• Still to do:

– Insert(key, value) – Find(key)

• Sometimes called “store” and “fetch”• A dictionary is sometimes called a “map”

– “key” is ‘mapped to’ “value”

• Closely related to a “database”• May allow several values for one key

– Find(key) returns a list of values in this case

Page 12: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Implementing a dictionary

• Create(n)– Build an array of prime size a little more than

n, each entry an empty list– Pick k numbers, mod n, to handle keys of

length k

Page 13: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

• Insert(key, value)– Let u = (a1key1 + … + ak keyk) mod n

– Insert (key, value) into array[u]

• Find(key)– Let u = (a1key1 + … + ak keyk) mod n

– Search for (key, *) in array[u]– If you find (key, val), return val– Else return None

• (Modify as appropriate to return list of vals)

Page 14: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Summary

• We can now assume that we can create a SET or a DICT with O(n1) insertion and lookup times whenever we need one

• After this week’s HW, you can further assume that we don’t need to know the size of the SET or the DICT in advance

Page 15: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Example Application: JUMBLE!

Page 16: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

JUMBLE

• Input: list of all 5-letter words in English

• Each word represented as an array of five characters

• Output: all words for which no other permutation is a word

Page 17: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Solution

• Start with an empty dictionary

• Foreach word w– Sort letters alphabetically to get wnew– D.insert(wnew, w)

• Foreach word w– Sort alphabetically again to get wnew

• D(wnew) contains anything except w– Skip w

• Else output w

Page 18: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Clean Your Code

• Errors per line ~ constant– Fewer errors overall!

• Easier to grade– More likely to get credit

• Cleaner code = cleaner thinking– Better understanding of material

Page 19: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v)

lca = null

udepth = T.depth(u)

vdepth = T.depth(v)

if (T.isroot(u) = true) or (T.isroot(v) = true) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if udepth > vdepth then

u = T.parent(u)

udepth = udepth – 1

else if vdepth > udepth

v = T.parent(v)

vdepth = vdepth – 1

else

u = T.parent(u)

v = T.parent(v)

return lca

Page 20: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v)

lca = null

udepth = T.depth(u)

vdepth = T.depth(v)

if (T.isroot(u) = true) or (T.isroot(v) = true) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if udepth > vdepth then

u = T.parent(u)

udepth = udepth – 1

else if vdepth > udepth

v = T.parent(v)

vdepth = vdepth – 1

else

u = T.parent(u)

v = T.parent(v)

return lca

Page 21: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v, T)

lca = null

udepth = T.depth(u)

vdepth = T.depth(v)

if (T.isroot(u) = true) or (T.isroot(v) = true) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if udepth > vdepth then

u = T.parent(u)

udepth = udepth – 1

else if vdepth > udepth

v = T.parent(v)

vdepth = vdepth – 1

else

u = T.parent(u)

v = T.parent(v)

return lca

Needlessly complex

Page 22: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v, T)

lca = null

udepth = T.depth(u)

vdepth = T.depth(v)

if (T.isroot(u) = true) or (T.isroot(v) = true) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

Now irrelevant

Page 23: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v, T)

lca = null

if (T.isroot(u) = true) or (T.isroot(v) = true) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

Page 24: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v, T)

lca = null

if (T.isroot(u) = true) or (T.isroot(v) = true) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

Redundant

Page 25: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v, T)

lca = null

if T.isroot(u) or T.isroot(v) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

Page 26: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v, T)

lca = null

if T.isroot(u) or T.isroot(v) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

it’s the answer; return it!

Page 27: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v, T)

lca = null

if T.isroot(u) or T.isroot(v) then

lca = T.root

return lca

while (lca = null) do

if (u = v) then

lca = u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

Page 28: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v, T)

lca = null

if T.isroot(u) or T.isroot(v) then

lca = T.root

return lca

while (lca = null) do

if (u = v) then

lca = u

return lca

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

Condition is irrelevant

Page 29: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v, T)

lca = null

if T.isroot(u) or T.isroot(v) then

lca = T.root

return lca

repeat

if (u = v) then

lca = u

return lca

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

lca is no longer used!

Page 30: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v, T)

if T.isroot(u) or T.isroot(v) then

return T.root

repeat

if (u = v) then

return u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

Page 31: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v, T)

if T.isroot(u) or T.isroot(v) then

return T.root

repeat

if (u = v) then

return u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

Page 32: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v, T)

while T.depth(u) > T.depth(v)

u = T.parent(u)

while T.depth(v) > T.depth(u)

v = T.parent(v)

if T.isroot(u) or T.isroot(v) then

return T.root

repeat

if (u = v) then

return u

else

u = T.parent(u)

v = T.parent(v)

Page 33: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v, T)

while T.depth(u) > T.depth(v)

u = T.parent(u)

while T.depth(v) > T.depth(u)

v = T.parent(v)

if T.isroot(u) or T.isroot(v) or (u = v) then

return u

repeat

[OOPS!]

else

u = T.parent(u)

v = T.parent(v)

Page 34: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v, T)

while T.depth(u) > T.depth(v)

u = T.parent(u)

while T.depth(v) > T.depth(u)

v = T.parent(v)

if T.isroot(u) or T.isroot(v) or (u = v) then

return u

else return LCA(T.parent(u), T.parent(v), T)

Page 35: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Not needed

LCA(u, v, T)

while T.depth(u) > T.depth(v)

u = T.parent(u)

while T.depth(v) > T.depth(u)

v = T.parent(v)

if T.isroot(u) or (u = v) then

return u

else return LCA(T.parent(u), T.parent(v), T)

Page 36: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v, T)

while T.depth(u) > T.depth(v)

u = T.parent(u)

while T.depth(v) > T.depth(u)

v = T.parent(v)

if (u = v) then

return u

else return LCA(T.parent(u), T.parent(v), T)

Called during recursion, but no effect

Page 37: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

LCA(u, v, T)

while T.depth(u) > T.depth(v)

u = T.parent(u)

while T.depth(v) > T.depth(u)

v = T.parent(v)

return LCAsimple(T.parent(u), T.parent(v), T)

LCAsimple(u, v, T)

# LCA for case where u and v have same height

if (u = v) return u

else return LCAsimple(T.parent(u), T.parent(v), T)

Page 38: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

DONE!

Page 39: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

STACK

• Stack operations:– Push, pop, size, isEmpty()

• (Partial) Implementation: – Array-based stack

Page 40: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

ArrayStack

INIT:data = array[20]Count = 0; // next empty space-------------------------------------------------------------Push(obj o): if count < 20 data[count] = o count++ else ERROR(“Overfull Stack”)

Page 41: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

ArrayStack

pop():

if count == 0ERROR(“Can’t pop from empty Stack”)

else

count--;

return data[count+1];

Page 42: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

ArrayStack

size():

return count

isEmpty()

return count == 0

Page 43: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Analysis

Page 44: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

ArrayStack

INIT:data = array[20]Count = 0; // next empty space-------------------------------------------------------------Push(obj o): if count < 20 data[count] = o count++ else ERROR(“Overfull Stack”)

O(n 1)

Page 45: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

ArrayStack

pop():

if count == 0ERROR(“Can’t pop from empty Stack”)

else

count--;

return data[count+1];

O(n 1)

Page 46: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

ArrayStack

size():

return count

isEmpty()

return count == 0

O(n 1)

O(n 1)

Page 47: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Summary

• Fast but not very useful

Page 48: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

ExpandableArrayStack

INIT:

data = array[20]

Count = 0; // next empty space

Capacity = 20

Page 49: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Push

Push(obj o): if count < capacity data[count] = o count++ else

d2 = new Array[capacity+1] for j = 0 to capacity

d2[j] = data[j] capacity = capacity + 1 data = d2 push(o)

Page 50: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Expandable Array Stack

• All other operations remain the same

Page 51: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Analysis

• In the worst case, the time taken is O(n n)

• If we insert items 21, 22, …, 20+k, we’ll have done k operations, with total work 21+22+…+ (20+k) = (20+1) + (20+2) + …(20+k) =20k + (1+2+…+k) = 20k + k(k+1)/2 = O(k k^2)

• So average time is O(k k) as well!

Page 52: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Better: avoid frequent expansion

• Instead of adding a little space, add a lot!

• Double array size when it gets full

Page 53: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

DoublingArrayStack: Push

Push(obj o): if count < capacity data[count] = o count++ else

d2 = new Array[2*capacity] for j = 0 to capacity

d2[j] = data[j] capacity = 2*capacity data = d2 push(o)

Page 54: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Doubling Array Stack

• All other operations remain the same

Page 55: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Analysis

Push(obj o): if count < capacity data[count] = o count++ else

d2 = new Array[2*capacity] for j = 0 to capacity

d2[j] = data[j] capacity = 2*capacity data = d2 push(o)

O(n 1)

O(n n)

Page 56: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Analysis

• In the worst case, the time taken is O(n n)

• But over the course of many operations, average time per operation is O(n 1)

Page 57: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

“Total Work Analysis”

• If we have an array with n elements

• …and do n operations

• …then total work is no more than 4n.

• Work per operation, on average, is 4.

Page 58: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Alternative view

• “Amortized” analysis:– For each operation that takes one unit of time

• Place an extra unit of time “in the bank”

– By the time an expensive operation arrives• Use your savings to pay for it

• Alternative view: – When you do an expensive operation

• Pay one unit now• Pay an extra unit for each of the next n operations

Page 59: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Language

• For hashing: “the ‘find’ operation runs in expected O(n 1) time”

• For doubling array stacks: “the ‘push’ operation runs in O(n 1) amortized time, with O(n n) worst-case time.”

Page 60: Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Pixel boundaries (if time)