32
308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Embed Size (px)

Citation preview

Page 1: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

308-203AIntroduction to Computing II

Lecture 11: Hashtables

Fall Session 2000

Page 2: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Dictionary

An Abstract class which defines data-structureswhich support:

• void put(Object key, Object value)

• Object get(Object key)

• void remove(Object key)

Page 3: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Implementations for Dictionary?

If we use an unsorted linked list:

put O( 1 )get O( n )remove O( n )

Naïve solution: must search all possibilities

Page 4: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Implementations for Dictionary?

If we use a binary tree (assume depth = d):

put O( 1og d )get O( log d )remove O( log d )

Good, unless the tree is unbalanced…

Page 5: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Implementations for Dictionary?

If we use a heap:

put O( log n )get O( n )remove O( n )

Insert is easy, but finding arbitrary elements is hard…

Page 6: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Implementations for Dictionary?

If we use a sorted array:

put O(n )get O( log n )remove O( n )

Binary search is easy, but lots of copying is needed

Page 7: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Implementations for Dictionary?

If we use an array with enough space forevery possible key (not realistic):

put O( 1 )get O( 1 )remove O( 1 )

All operations are quick and easy, but requires enormous(i.e. infinite) memory

Page 8: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Hashtables

We can try to patch this “perfect solution” so thatit is feasible.

Page 9: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

The “Perfect” Solution

If we had an array that was infinitely large and eachkey had it’s own slot, every access would be O( 1 )[ and we would waste a lot of space on null pointers]

1 2 3 4 j-1 j j+1 j+ 2

Key = 3 Key = j

… …

Page 10: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Hash Function

Definition: A hash function is a functionwhich maps keys to a finite range of integers,called hashcodes:

f: keys [ 0, (m-1) ]

Page 11: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Example

Let the keys be non-negative integers: { 0, 1, … }

Let the hash function be f(x) = x mod 7

For the keys (4, 15, 26):

f(4) = 4f(15) = 1f(26) = 5

Page 12: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Example

Let the keys be non-negative integers: { 0, 1, … }

Let the hash function be f(x) = x mod 7

For the keys (4, 15, 26):

f(4) = 4f(15) = 1f(26) = 5

4 26

Fits in an array of size 7

150 1 2 3 4 5 6

Page 13: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Collisions

Problem:When two or more keys hash to the same slot,there is a possiblity of collision.

Page 14: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Open-Addressing

• A simple way to handle collisions

• When a collision occurs look for an empty slot elsewhere

• Some elements may end up in the slot corresponding a different hashcode

Page 15: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Linear Probing

Find an alternative slot after collision by steppingsequentially through the slots, for example:

4 2615

0 1 2 3 4 5 6

Insert 18 : f(18) = 18 mod 7 = 4

18 Collision in slot 4!

Page 16: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Linear Probing

Find an alternative slot after collision by steppingsequentially through the slots, for example

4 2615

0 1 2 3 4 5 6

Insert 18 : f(18) = 18 mod 7 = 4

18 Slot 5 is also taken

Page 17: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Linear Probing

Find an alternative slot after collision by steppingsequentially through the slots, for example

4 2615

0 1 2 3 4 5 6

Insert 18 : f(18) = 18 mod 7 = 4

18

Slot 6 is free

Page 18: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Disadvantages

• In open-addressing, the table can fill up; Must have (n < m)

• Linear-probing leads to “primary clustering:” A run of filled slots is more likely to receive more collisions

• Although best-case access is O( 1 ), worst-case access O( m )

Page 19: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Chaining

A (Better) Solution to Collisions:

Use the flexibility of the linked-list, but only whenneeded, i.e. within a single slot where collisionsmay occur.

Page 20: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Example (chaining)

0 1 2 3 4 5 6

15 4 26

Insert 39 into the previous hashtable:

Page 21: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Example (chaining)

0 1 2 3 4 5 6

15 4 26

f(39) = 39 mod 7 = 4 collision

39

Page 22: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Worst-Case

If all elements hash to the same entrywe get a linked list:

Therefore put, get and remove are O(n)worst-case.

Page 23: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Best-Case

0 1 2 3 4 5 6

Equal distribution to each slot

Page 24: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Best Case

Definition: The load factor for a hashtablewith n elements hashed into m slots is theaverage number of elements per slot:

= n / m

Page 25: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Best Case

If every slot contains elements (uniformlydistributed hashing):

put, get and remove are O( )

Page 26: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Best Case

If every slot contains elements (uniformlydistributed hashing):

put, get and remove are O( )

If the number of slots is allowed to growas O( n ) :

= n/m = n /O( n ) = O( 1 )

put, get and remove are O( 1 )

Page 27: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Average-Case

More realistic analysis involves determinationof statistics of the data and how well it will behashed.

Example: hashing olympic years by f(x) = x mod 4would be a bad idea (always hash to the same slot)

Page 28: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Java Hashtable Class

• Constructor:

Hashtable(int initialCapacity, float loadFactor)

• Default: initialCapacity = 101, loadFactor = 0.75f

• Collision resolution with chaining

Page 29: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Java Hashtable Class

• hashcode(): defined in java.lang.Object

• equals(): assumed defined for the entries

Keys can be objects of any class providedthe following is appropriately defined:

Page 30: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Java Hashtable Class

Hashtables grow multiplicatively:

• Put() checks if the hashtable contains more than (m) elements and if so m 2m+1

• Hashtables only grow, never shrink, no matter how many elements you delete

Page 31: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Java Hashtable Class

Other Features:

elements() returns an enumeration of everythingin the table.

This works by keeping references into thetable rather than by copying the table itself.

Page 32: 308-203A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000

Any questions?