Transcript
Page 1: Computer)Science)Fundamentals)107)) Why)Look)for)Be>er 2 ... · Computer)Science)Fundamentals)107)) Lecture)20)Contents) Hashing(–(Why?(Hash(Func0ons(–(folding,(mid7square(Hash(Func0ons(–(keys(which(are(strings

Computer)Science)Fundamentals)107))Lecture)20)Contents)

Hashing(–(Why?(

Hash(Func0ons(–(folding,(mid7square(

Hash(Func0ons(–(keys(which(are(strings(

Load(Factor(

Collisions(and(Collision(Resolu0on(–(open(addressing(methods,(separate(chaining(

(

(

Textbook(pages(163(7(173(

(

2(Why)Look)for)Be>er)((If(our(collec0on(of(data(is(in(sorted(order(we(can(search(for(an(item(in(O(log(n)(0me(using(the(binary(search(algorithm.)

Can(we(improve(on(this(and(search(in(constant(0me(O(1)?(

Can(we(insert(and(deleted(items(in(constant(0me(O(1)?(

In(the(above(situa0on((items(are(sorted),(what(is(the(cost(of(inser0ng(and(dele0ng(items,(searching(for(items(which(are(not(in(the(collec0on?(

3(What)is)a)Hash)Table?)((A(collec0on(of(items(which(are(stored(in(such(a(way(making(the(items(easy(to(access.)

Ini0ally(every(slot(is(empty.(

Each(posi0on((each(slot)(in(the(hash(table(can(hold(one(item(and(is(named((indexed)(by(an(integer(value(star0ng(from(0.(

0""""""""

None

1""""""""

None

2""""""""

None

3""""""""

None

4""""""""

None

5""""""""

None

6""""""""

None

7""""""""

None

8""""""""

None

9""""""""

None

10""""""""

None

11

None

12""""""""

None

4(What)is)a)Hash)FuncCon?)((Every(item(which(is(added(to(the(hash(table(has(a(key((keys(must(be(unique)(and(a(corresponding(value.()

The(hash(func0on(is(the(mapping(between(an(item(key(and(the(slot(where(the(item(is(stored.(

To(add(an(item(to(the(hash(table(there(is(a(mapping(between(the(key(of(the(item(and(a(slot(in(the(table.(

0""""""""

None

1""""""""

None

2""""""""

None

3""""""""

None

4""""""""

None

5""""""""

None

6""""""""

None

7""""""""

None

8""""""""

None

9""""""""

None

10""""""""

None

11

None

12""""""""

None

Page 2: Computer)Science)Fundamentals)107)) Why)Look)for)Be>er 2 ... · Computer)Science)Fundamentals)107)) Lecture)20)Contents) Hashing(–(Why?(Hash(Func0ons(–(folding,(mid7square(Hash(Func0ons(–(keys(which(are(strings

5(Mapping)the)Key)into)a)Hash)Table)Slot))((

can(be(mapped(as(follows:(

For(example,(using(a(hash(table(of(size(13(and(mapping(integer(keys((the(items(are(just(the(keys(in(this(example)(using(the(key(modulus(the(size(of(the(table,(the(following(keys(

0""""""""

26

1""""""""

None

2""""""""

54

3""""""""

94

4""""""""

17

5""""""""

31

6""""""""

None

7""""""""

None

8""""""""

None

9""""""""

None

10""""""""

None

11

None

12""""""""

77

54 !26!94!17!77!31!

This(mapping(uses(the(remainder(method((i.e.,((key(%(13).(

6(Mapping)the)Key)into)a)Hash)Table)Slot))(( A(hash(func0on(takes(the(key((which(must(be(unique)(

and(returns(the(slot(number(in(the(hash(table.(

hash(item_key))=)…)%)m)

All(hash(func0ons(will(have(%(table_size((m)(as(part(of(the(formula(since(the(resul0ng(slot(number(must(be(within(the(range(of(the(table(size.(

The(result(of(applying(the(hash(func0on(to(the(key(is(an(index(into(the(table.(

7(Load)Factor)of)the)Hash)Table)((

The(load(factor((λ)(of(the(hash(table(is(the(number(of(items(in(the(table(divided(by(the(size(of(the(table.(

0""""""""

26

1""""""""

None

2""""""""

54

3""""""""

94

4""""""""

17

5""""""""

31

6""""""""

None

7""""""""

None

8""""""""

None

9""""""""

None

10""""""""

None

11

None

12""""""""

77

λ)=)6)/)13)

For(the(example(hash(table(below(has(a(load(factor(of(

8(Mapping)the)Key)–)Collisions)((

If(I(now(wish(to(insert(the(key(value,(44,(there(is(a(problem(((hash(44)(gives(the(index(5).(

Using(the(hash(func0on:(

0""""""""

26

1""""""""

None

2""""""""

54

3""""""""

94

4""""""""

17

5""""""""

31

6""""""""

None

7""""""""

None

8""""""""

None

9""""""""

None

10""""""""

None

11

None

12""""""""

77

This(problem(is(called(a('collision'((or('clash')(

hash(item_key))=)item_key)%)13)

6(items(are(mapped(into(the(table(below,((

Page 3: Computer)Science)Fundamentals)107)) Why)Look)for)Be>er 2 ... · Computer)Science)Fundamentals)107)) Lecture)20)Contents) Hashing(–(Why?(Hash(Func0ons(–(folding,(mid7square(Hash(Func0ons(–(keys(which(are(strings

9(Perfect)Hash)FuncCons)((

One(way(to(achieve(this(is(to(have(a(hash(table(which(is(big(enough(to(accommodate(the(full(range(of(the(keys,(e.g.,(if(the(keys(were(the(eight(digit(student(ID(numbers(we(would(need(an(108(sized(table.(((

A(hash(func0on(which(uniformly(distributes(items(over(the(whole(hash(table(area(is(a(perfect(hash(func0on.(((

A(perfect(hash(func0on(is(able(to(map(m(dis0nct(keys(into(a(table(of(size,(m,(with(no(collisions.(

10(Perfect)Hash)FuncCons)((

This(technique(can(be(very(wasteful(as(the(number(of(students(to(be(stored((and(retrieved)(may(be(much(smaller(than(the(actual(table(size.(

We(need(some(sort(of(compression(from(the(full(range(of(the(keys(into(the(number(of(hash(table(slots.(

Possible(key(values(

Actual(key((values(

11(A)Good)Hash)FuncCons)((

Quick(and(easy(to(compute.(

Achieves(even(distribu0on(of(key(values((uniformity).(

Ideally(we(want(a(1:1(correspondence(of(the(key(values(to(the(size(of(the(hash(table.(

12(Hash)FuncCons)–)Folding)Method)((

For(example,(if(the(keys(are(eight(digit(phone(numbers,(e.g.,(468(23496((

This(type(of(hash(func0on(takes(different(chunks(of(the(key((in(different(order),(performs(some(computa0on(on(the(chunks,(and,(perhaps,(adds(the(results(together.(

The(number(may(be(split(into(groups(and(summed((really(simple(hash),(e.g.,((and(finally,(using(the(remainder(method((…(%(table_size((13(in(this(example))(the(result(is:( 5(

A(hash(func0on(tries(to(use(all(parts(of(the(key(in(the(calcula0on(7(in(case(some(parts(of(the(keys(are(very(similar((causing(lots(of(collisions).(

468( 234( 96(

Sum(is(798(

Page 4: Computer)Science)Fundamentals)107)) Why)Look)for)Be>er 2 ... · Computer)Science)Fundamentals)107)) Lecture)20)Contents) Hashing(–(Why?(Hash(Func0ons(–(folding,(mid7square(Hash(Func0ons(–(keys(which(are(strings

13(Hash)FuncCons)–)Folding)Method)((

Some(hash(func0ons(use(shif7and7add(folding(algorithms,(e.g.,(shif(lef(2(places(and(then(add(the(next(bit(of(the(key.((Divisions(are(less(common(because(division(is(slower(than(mul0plica0on(and(addi0on.(

Hash(func0ons(need(to(use(very(quick(calcula0ons,(so,(ofen,(there(are(folding(methods(which(use(some(lef/right(bit(shifs(because(these(are(very(fast(opera0ons.(

13(

Shifing(lef(example:((shif(lef(2(bits(is(the(same(as(mul0plying(the(number(by(4(

0011012(

52( 1101002(

14(Hash)FuncCons)–)Mid)Square)Method)((

An(example(using(the(mid7square(method:((Square(the(key,(take(the(digits((except(the(lef7most(digit)(and,(finally,(modulus(of(the(table(size((13(in(this(example).(

Square(the(key(and(take(some(por0on(of(the(result.(

For(keys:( 654( 427716( 27716( 0(

653( 426409( 26409( 6(

655( 529025( 29025( 9(

key( key2( Remove(first( %(13(

15(Hash)FuncCons)–)Keys)which)are)Strings)((

The(ASCII(table(on(the(right(shows(the(numerical(representa0on(of(each(character.(

ord('a') is 97!

ord('b') is 98!

ord('c') is 99!

16(Hash)FuncCons)–)keys)which)are)strings)((

For(example:((add(the(ASCII(values(of(each(character(in(the(key(and(take(the(result(modulus(of(the(table(size.(

The(ASCII(values(of(the(characters(of(the(string(can(be(used(to(compute(the(slot(number(into(which(the(key(is(mapped.(

For(key:( "cat"( 99(+(97(+(116( 312( 0(key( Add(ord()s( (((((((Sum( %(13(

Page 5: Computer)Science)Fundamentals)107)) Why)Look)for)Be>er 2 ... · Computer)Science)Fundamentals)107)) Lecture)20)Contents) Hashing(–(Why?(Hash(Func0ons(–(folding,(mid7square(Hash(Func0ons(–(keys(which(are(strings

17(Hash)FuncCons)–)keys)which)are)strings)((Hash(func0on:(add(the(ASCII(values(of(each(character(in(the(key(and(take(the(result(modulus(of(the(table(size.(

def hash1(key_word, table_size):!

!…???!

!

def main():!

print("table size is 13") !!

for key_wd in ["cat","dog","abracadabra","abraabracad"]:!

!print(key_wd, hash1(key_wd, 13))!

table size is 13!cat 0!dog 2!abracadabra 3!abraabracad 3!

Using(the(above(hashing(algorithm,(which(kind(of(keys(

will(cause(collisions?(

18(Hash)FuncCons)–)Keys)which)are)Strings)((Improve(the(previous(algorithm(by(adding(a(weigh0ng(to(each(character((1(for(the(first,(2(for(the(second,(…).(

table size is 13!cat 4!dog 7!abracadabra 9!abraabracad 1!

def hash2(a_string, table_size):!

!…???!

!

!

print("table size is 13") !!

for key_word in ["cat","dog","abracadabra","abraabracad"]:!

!print(key_word, hash2(key_word, 13))!

19(Hashing)–)Collision)ResoluCon)((Perfect(hash(func0ons(are(hard(to(come(by,(especially(if(you(do(not(know(the(input(keys(beforehand.(

You(will(always(need(a(systema0c(way(of(handling(collisions.(

One(method(is(to(systema0cally(find(an(empty(slot(in(the(table,(and(put(the(value(in(this(slot.((This(technique(is(called('open(addressing'.(((For(example,(look(sequen0ally(un0l(you(find(a(slot(which(is(empty.(

"open(addressing"(refers(to(the(fact(that(the(loca0on(("address")(of(the(item(is(not(determined(by(its(hash(value.(

20(Collision)ResoluCon)–)Linear)Probing)((

hash(key,(0)(=(key(%(m(((#may(be(a(different(hash(func0on(hash(key,(1)(=((hash(key,(0)(+(1)(%(m(hash(key,(2)(=((hash(key,(0)(+(2)(%(m(hash(key,(3)(=((hash(key,(0)(+(3)(%(m(…(hash(key,(i)(=((hash(key,(0)(+(i)(%(m(

One(method(of(collision(resolu0on(is(to(look(sequen0ally(un0l(you(find(a(slot(which(is(empty.(

The(number(of(probes(is(the(number(of(amempts(made(un0l(an(empty(slot(posi0on(is(found.(

The(probe(sequence(is(the(sequence(of(slots(which(are(checked(un0l(an(available(slot(is(found.(

Page 6: Computer)Science)Fundamentals)107)) Why)Look)for)Be>er 2 ... · Computer)Science)Fundamentals)107)) Lecture)20)Contents) Hashing(–(Why?(Hash(Func0ons(–(folding,(mid7square(Hash(Func0ons(–(keys(which(are(strings

21(Collision)ResoluCon)–)Linear)Probing)((Look(sequen0ally(un0l(you(find(a(slot(which(is(empty,(e.g.,(using(the(following(table(and(the(hash(func0on(below:((

0""""""""

26

1""""""""

None

2""""""""

54

3""""""""

94

4""""""""

17

5""""""""

31

6""""""""

None

7""""""""

None

8""""""""

None

9""""""""

None

10""""""""

None

11

None

12""""""""

77

44(–(goes(into(slot(6((51(–(goes(into(slot(1((

0""""""""

26

1""""""""

51

2""""""""

54

3""""""""

94

4""""""""

17

5""""""""

31

6""""""""

44

7""""""""

None

8""""""""

None

9""""""""

None

10""""""""

None

11

None

12""""""""

77

Inser0ng(keys(44,(51(into(the(above(hash(table:(((

hash(key)(=(key(%(13(

22(Collision)ResoluCon)–)Clustering)((Clustering(happens(when(regions(of(the(table(become(very(full(and(there(are(long(runs(of(filled(slots.((Clustering(slows(down(performance.(

This(becomes(worse(and(worse(because(as(more(elements(are(hashed(to(the('clustered'(areas(these(increase(the(clustering(even(more.(

0""""""""

26

1""""""""

51

2""""""""

54

3""""""""

94

4""""""""

17

5""""""""

31

6""""""""

44

7""""""""

None

8""""""""

None

9""""""""

None

10""""""""

None

11

None

12""""""""

77

clustering(

Another('open(addressing'(approach:((instead(of(looking(for(an(empty(slot(sequen0ally,(we(can(skip(slots,(i.e.,(instead(of(looking(in(the(next(slot,(the(slot(afer(that,(etc.,(we(can(skip(slots,(e.g.,(skip(3.( hash(k,(i)(=((h(k,(0)(+(3)*)i)(%(m(

23(Collision)ResoluCon)–)QuadraCc)Probing)((Another(method(of(resolving(collisions(using('open(addressing'(is(to(add(the(square(of(1(to(the(slot(number((first(hash(result),(then(add(the(square(of(2,(and(so(on.(

hash(key(,(0)(=(key(%(m(((#may(be(different(hash(key(,(1)(=((hash(key,(0)(+(12)(%(m(hash(key(,(2)(=((hash(key(,(0)(+(22)(%(m(hash(key(,(3)(=((hash(key(,(0)(+(32)(%(m(…(hash(key(,(i)(=((hash(key(,(0)(+(i2)(%(m(

The(probe(sequence(is(the(sequence(of(slots(which(are(checked(un0l(an(available(slot(is(found.(

24(Collision)ResoluCon)–)QuadraCc)Probing)((Another(method(of('open(addressing'(is(to(add(the(square(of(1(to(the(slot(number,(then(add(the(square(of(2,(and(so(on.(

0""""""""

26

1""""""""

None

2""""""""

54

3""""""""

94

4""""""""

17

5""""""""

31

6""""""""

None

7""""""""

None

8""""""""

None

9""""""""

None

10""""""""

None

11

None

12""""""""

77

Inser0ng(keys(43,(25(into(the(above(hash(table:(((

43(–(goes(into(slot(8((probe(sequence(is(4,(5,(8)((25(–(goes(into(slot(11((probe(sequence(is(12,(0,(3,((

8,(2,(11)(((((

The(clustering(s0ll(happens(but(in(different(regions(of(the(table.(

0""""""""

26

1""""""""

None

2""""""""

54

3""""""""

94

4""""""""

17

5""""""""

31

6""""""""

None

7""""""""

None

8""""""""

43

9""""""""

None

10""""""""

None

11

25

12""""""""

77

hash(key)(=(key(%(13((

Page 7: Computer)Science)Fundamentals)107)) Why)Look)for)Be>er 2 ... · Computer)Science)Fundamentals)107)) Lecture)20)Contents) Hashing(–(Why?(Hash(Func0ons(–(folding,(mid7square(Hash(Func0ons(–(keys(which(are(strings

25(Collision)ResoluCon)–)Double)Hashing)(( We(have(looked(at(sequen0al(linear(probing(where(you(look(sequen0ally(un0l(you(find(a(slot(which(is(empty.(((Then(we(looked(at(different(versions(of(this(where(we(skipped(some(slots(e.g.,("plus73"(and(quadra0c(probing.(((These('open(addressing'(versions(spread(the(clustering(in(a(more(even(manner.(

We(can,(instead,(apply(a(second(hash(func0on(to(the(key(and(use(the(resul0ng(value(as(our(skip(number.((((In(this(way(the(number(of(skips(is(dependent(on(the(key(and(will(be(different(for(different(keys.(

26(Collision)ResoluCon)–)Double)Hashing)((Using(the(two(hash(func0ons(on(the(table(below:(

0""""""""

26

1""""""""

None

2""""""""

54

3""""""""

94

4""""""""

17

5""""""""

31

6""""""""

None

7""""""""

None

8""""""""

None

9""""""""

None

10""""""""

None

11

None

12""""""""

77

43(–(goes(into(slot(10((h2(43)(=(4(and(the(probe((sequence(is(4,(10)((

25(–(goes(into(slot(8((h2(25)(=(12(and(the(probe((sequence(is(12,(2,(5,(8)((

!

hash1(key)(=(key(%(13(hash2(key)(=(7(–(key(%(7(

0""""""""

26

1""""""""

None

2""""""""

54

3""""""""

94

4""""""""

17

5""""""""

31

6""""""""

None

7""""""""

None

8""""""""

25

9""""""""

None

10""""""""

43

11

None

12""""""""

77

Inser0ng(keys(43,(25(into(the(above(hash(table:(((

27(Collision)ResoluCon)–)Separate)Chaining)((Another(way(of(handling(collisions(is(to(use(chaining(where(every(element(of(the(hash(table(is(a(list(and(any(items(which(are(hashed(to(a(slot(are(added(to(the(list.(

If(the(hash(func0on(is(good(and(if(the(table(has(a(load(factor(which(is(reasonable,(the(lists(in(each(node(of(the(hash(table(will(be(quite(small.(Therefore(the(Big(O(for(inser0ng,(dele0ng(or(searching(for(an(item(will(be(reasonable((close(to(O(1)).(

Each(element(of(the(hash(table(could(be(a(linked(list(or(a(Python(list(object.(

Remember:(the(costs(of(maintaining(the(links(are(a(considera0on(especially(if(the(actual(associated(data(is(small.(

28(Collision)ResoluCon)–)Separate)Chaining)((For(example,(given(the(following(hash(table(which(uses(separate(chaining:(

inser0ng(the(keys:(((43,(69,(93,(56,(90((

0"""""""" 1"""""""" 2"""""""" 3"""""""" 4"""""""" 5"""""""" 6"""""""" 7"""""""" 8"""""""" 9"""""""" 10"""""""" 11 12""""""""

5426 93 17 7731

0"""""""" 1"""""""" 2"""""""" 3"""""""" 4"""""""" 5"""""""" 6"""""""" 7"""""""" 8"""""""" 9"""""""" 10"""""""" 11 12""""""""

5426 93 17 773143

69

93

56

90

hash(key)(=(key(%(13((

Page 8: Computer)Science)Fundamentals)107)) Why)Look)for)Be>er 2 ... · Computer)Science)Fundamentals)107)) Lecture)20)Contents) Hashing(–(Why?(Hash(Func0ons(–(folding,(mid7square(Hash(Func0ons(–(keys(which(are(strings

29(Python)Hash)FuncCon)Example)((((

print("Python hash values for the strings '15' to '19'")!for num in range(15, 20):!! print(num, "-", hash(str(num)))!

!print()!print("Python hash values for the integers 15 to 19") !!for num in range(15, 20):!! print(num, "-", hash(num)) ! ! !!Python(hash(values(for(the(strings('15'(to('19'(

15(7(3463324007810316761(16(7(74414834310162138642(17(7(7164126101736489360(18(7(6351382913715616161(19(7(73838919086090416598((Python(hash(values(for(the(integers(15(to(19(15(7(15(16(7(16(17(7(17(18(7(18(19(7(19)

30(Another)Hash)FuncCon)((Unix(opera0ng(system(system7level(hash(table(

def elf_hash(to_hash):!!hash_value = 0!!hex_number = 0xf0000000!!for pos in range(len(to_hash)):!! !hash_value = (hash_value << 4) + ord(to_hash[pos])!! !hi_bits = hash_value & hex_number!! !if hi_bits != 0:!! ! ! hash_value = hash_value ^ (hi_bits >> 26)!! !hash_value = hash_value & ~hi_bits!! !!!return hash_value!! !!

print("cat", elf_hash("cat"))!print("elephant", elf_hash("elephant"))!print("piglet", elf_hash("piglet"))!print("human", elf_hash("human"))!print("mig", elf_hash("mig")) ! ! ! !!

cat 27012!elephant 46589668!piglet 124773060!human 7324542!mig 29687!

~(complement(&(bit(and(

^(exclusive(or(<<(lef(shif,(>>(right(shif(


Recommended