18
Heidi C. Ellis and Gerard C. Weatherby Object Structures Hashes and Dictionaries.1 Hashing • Hashing: Applie s transformation to keys to arr ive at addr ess of an element. * Performance is independent of table size. Perf ect hash function: Each key is tr ansfo rmed i nto a unique storage location. * Element could be stored and retrieved using same trans- formation. Impe rfec t hash func tion: Maps mor e than one ke y to the same storage location. • Assume we have 10 elements to store with 4-digi t keys between 0000 and 9999. Appr oach 1: Set up arra y size of 10,00 0 and use key v alue as

Hashes Dictionaries

Embed Size (px)

Citation preview

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 1/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.1

Hashing

• Hashing: Applies transformation to keys to arrive at address

of an element.

* Performance is independent of table size.• Perfect hash function: Each key is transformed into a unique

storage location.

* Element could be stored and retrieved using same trans-

formation.

• Imperfect hash function: Maps more than one key to the same

storage location.

• Assume we have 10 elements to store with 4-digit keys

between 0000 and 9999.

• Approach 1: Set up array size of 10,000 and use key value as

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 2/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.2

an index into array.

* Wastes LOTS of space.

• Approach 2: Divide key by 13 and use remainder as index to

access array of 13 elements with indices 0 to 12.

• Problem: 7423 and 0013 both map to same location.

Key Transformed Key

1234 12

5021 3

7423 0

2000 11

9043 8

6296 4

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 3/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.3

Hashing

• Hashing: Transformation of key into storage location.

* Key has been converted to “hash.”

* Transformed key has no visible relation to original key.* Cannot be used to construct original key.

• Collision: Two keys are hashed to same storage location.

• Hashing search method reduces to two problems:1. Finding a hashing method that reduces collisions.

2. Resolving collisions when they occur.

• Goal of a good hash function: Spread hash keys evenly overentire table.

• Minimize clustering: The tendency for many keys to hash tothe same or nearby locations.

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 4/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.4

• Hash functions should be easy and quick to compute.

• Hash function represented by H(K) where K is the key.

* Use modulo the array length to stay within the array.

• Mid-square method:

* Square K.

* Strip predetermined digits from front and rear.

* e.g., use thousands and ten thousands places.

K K2 Hashed Key

3205 10272025 72

7148 51093904 93

2344 5499025 99

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 5/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.5

Hashing

• Folding method:

* Partition K into a number of parts.

* Each part has the same number of digits as the requiredaddress.

* Add parts together ignoring the last carry.

K H(K) Hashed Key

3205 32 + 05 37

7148 71 + 48 192344 23 + 45 68

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 6/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.6

• Remainder method:

* H(K) = K \\ M.

* Some choices of M are better than others.

* M should not be even since H(K) will be even when K is

even and odd when K is odd.

* M should not be a power of the radix of the computer

since (K\\M) would be the least significant digits of K.

1024004, 6424004, and 9324004 would collide.

* Prime numbers work fairly well.

* Select a prime that is about twice as large as the largest

possible number of table elements.

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 7/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.7

Hashing

• How do we detect collisions?

* Initialize table to value that cannot occur as key.

* Inspect the location before inserting.• Two major classes of collision resolution:

1. Open addressing:

* When collision occurs, use organized method to find nextopen space.

* Maximum number of elements equal to table size.

2. Chained addressing:

* Make linked list of all elements that hash to same location.

* Allows number of elements to exceed table size.

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 8/18

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 9/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.9

Linear Probing Hashing

Use linear probing hashing to insert the following keys:103 8 108 208 308 10 9

• Deletion of an element takes more effort.

* Collision chain must be maintained if element is in middle

of chain.

* Could reorganize whole chain.

* Could mark entry as deleted.

- Requires table to be reorganized periodically.

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 10/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.10

Quadratic Hashing

• One way to avoid clustering problem is to probe in some other

way than successive table locations.

• Quadratic hashing: Probes the table in increasing modulo the

table size.

* H(K) + 1, H(K) + 4, H(K) + 9, H(K) + 16, . . .

* Increment by squares.

Use quadratic hashing to insert the following keys:103 8 108 208 308 10 9

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 11/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.11

Quadratic Hashing

• Data is more widely distributed than with linear probing

hashing.

• Using open address hashing, table size must be decided in

advance.

* Not applicable to all applications.

* Best to guess high, but this wastes space.

• One resolution is adaptive hashing.

* Hash table automatically enlarged and reorganized when

it gets too close to being full.

* Doesn’t come to a full halt.

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 12/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.12

Chained Hashing

• Chained hashing solves problem of size/reorganization.

• Index to hash table computed as before.

• Table now holds a pointer to a linked list of all elements thathash to that location.

• Chained hashing allows more elements to be stored than table

locations.

• Used in cases where the number of entries is difficult to pre-

dict.

• Chained hashing requires fewer storage accesses.

* Don’t have to follow a series of addresses to find an ele-

ment.

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 13/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.13

Chained Hashing

Use chained hashing to insert the following keys:103 8 108 208 308 10 9

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 14/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.14

Hashing

• Hashing obtains search times independent of number of ele-

ments.

* Trade off is wasted space.

• High speed can be compromised by clustering.

* Also compromised by excessive collision recovery opera-

tions.

• Chaining reduces severity of collisions.

* Requires extra space for nodes.

• Hashing is dynamic.* Handles combinations of insertions, searches, and dele-

tions.

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 15/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.15

Java Collection interface

• Java provides Map interface

* Associates pairs of values

* For each  key, there exists a unique value* Keys are hashed for rapid lookup

*  Map replaces obsolete  Dictionary class

Introduced with Java 1.2

 

Method Description

void clear() Removes all mappings from this map

(optional operation).

boolean containsKey(Object key) Returns true if this map contains a mapping

for the specified key.

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 16/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.16

boolean containsValue(Object

value)

Returns true if this map maps one or more

keys to the specified value.

Set entrySet() Returns a set view of the mappings con-

tained in this map.

boolean equals(Object o) Compares the specified object with thismap for equality.

Object get(Object key) Returns the value to which this map maps

the specified key.

int hashCode() Returns the hash code value for this map.boolean isEmpty() Returns true if this map contains no key-

value mappings.

Set keySet() Returns a set view of the keys contained in

this map.

Object put(Object key, Object

value)

Associates the specified value with the

specified key in this map (optional opera-

tion).

Method Description

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 17/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.17

void putAll(Map t) Copies all of the mappings from the speci-

fied map to this map (optional operation).

Object remove(Object key) Removes the mapping for this key from

this map if it is present (optional operation).

int size() Returns the number of key-value mappingsin this map.

Collection values() Returns a collection view of the values con-

tained in this map.

Method Description

8/2/2019 Hashes Dictionaries

http://slidepdf.com/reader/full/hashes-dictionaries 18/18

Heidi C. Ellis and Gerard C. Weatherby

Object Structures Hashes and Dictionaries.18

Available implementations

• Hashtable - original implemention (JDK 1.0)

* nulls not allowed

* synchronized• HashMap - newer implementation (JDK 1.2)

* nulls allowed

* not synchronized

• LinkedHashMap - subclass of HashMap

* maintains linked list so iteration returns values in

insertion order

• Each implementation has initialCapacity and loadFactor 

attributes.