Data Structures and Algorithms Hashing First Year

Preview:

DESCRIPTION

Data Structures and Algorithms Hashing First Year. M. B. Fayek CUFE 2010. Hashing. What is Hashing? Problems in hashing Collision Resolution Strategies. 1. What is Hashing?. Hashing is a quick and efficient searching technique . - PowerPoint PPT Presentation

Citation preview

Data Structures Data Structures and Algorithms and Algorithms

HashingHashingFirst YearFirst Year

M. B. FayekM. B. Fayek

CUFE 2010CUFE 2010

HashingHashing

1.1. What is Hashing?What is Hashing?

2.2. Problems in hashingProblems in hashing

3.3. Collision Resolution Collision Resolution Strategies Strategies

1. What is Hashing?1. What is Hashing? Hashing is a quick and efficient Hashing is a quick and efficient

searching techniquesearching technique.. So far, efficiency of search So far, efficiency of search depended on the number of depended on the number of

comparisonscomparisons In hashing the keys themselves In hashing the keys themselves

point directly to records by point directly to records by applying a applying a hashing functionhashing function.. All possible key values are All possible key values are

mapped into in the mapped into in the hash tablehash table.. The hashing function is used for The hashing function is used for

search as well as for storing.search as well as for storing.

1. What is Hashing?1. What is Hashing?

The hash table is sequential and The hash table is sequential and contiguous.contiguous.

Each slot is called a Each slot is called a bucketbucket.. Buckets may hold more than one Buckets may hold more than one

key.key.

1. What is Hashing?1. What is Hashing?

Hashing methods:Hashing methods: Direct and SubtractionDirect and Subtraction

Modulo-division (or division Modulo-division (or division remainder) using list size remainder) using list size

( prime, why?)( prime, why?) Digit extractionDigit extraction

MidsquareMidsquare Folding ( fold shift, fold Folding ( fold shift, fold

boundary)boundary) Pseudo random ( seed)Pseudo random ( seed)

HashingHashing

1.1. What is Hashing?What is Hashing?

2.2. Problems in hashingProblems in hashing3.3. Collision Resolution Collision Resolution

StrategiesStrategies

Problems in HashingProblems in Hashing

CollisionCollision occurs whenever a hash occurs whenever a hash function maps two distinct keys to function maps two distinct keys to

the same bucket.the same bucket. The The hashing functionhashing function must generate must generate

bucket addresses bucket addresses quicklyquickly and and efficientlyefficiently, with minimum collisions., with minimum collisions.

As the domain of keys is usually As the domain of keys is usually larger than the number of buckets larger than the number of buckets collisions are very likely to happen collisions are very likely to happen

no matter how efficient the hashing no matter how efficient the hashing function is. function is.

HashingHashing

1.1. What is Hashing?What is Hashing?

2.2. Problems in hashingProblems in hashing3.3. Collision Resolution Collision Resolution

StrategiesStrategies

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Definitions:Definitions: Load factor Load factor

= list size/num of = list size/num of elements in listelements in list

Clustering ( primary, Clustering ( primary, secondary)secondary)

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Open Addressing: (using prime Open Addressing: (using prime area)area)

Probing (Linear, quadratic)Probing (Linear, quadratic) Double Hashing Double Hashing

Pseudo-randomPseudo-random Key offsetKey offset

Linked Lists (Separate Linked Lists (Separate Chaining)Chaining)

(Bucket Hashing)(Bucket Hashing) Re-hashingRe-hashing

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Open Addressing:Open Addressing: Probing:Probing:

Linear Probing:Linear Probing: Search at Search at constant intervals from constant intervals from

collision (typically 1)collision (typically 1) Quadratic Probing:Quadratic Probing: Search at Search at

quad-ratically increasing quad-ratically increasing intervals, i.e. collision function intervals, i.e. collision function

f(i) = if(i) = i2 2 ; i.e. on collision ; i.e. on collision searching 1searching 1stst, 4, 4thth, 9, 9thth, , …… location location

Linear ProbingLinear Probing

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Open Addressing:Open Addressing: (using prime (using prime area)area)

Probing (Linear, quadratic)Probing (Linear, quadratic) Double Hashing Double Hashing

Pseudo-randomPseudo-random Key offsetKey offset

Linked Lists (Separate Linked Lists (Separate Chaining)Chaining)

(Bucket Hashing)(Bucket Hashing) Re-hashingRe-hashing

3. Collision Resolution 3. Collision Resolution Strategies Strategies Open AddressingOpen Addressing

Double Hashing:Double Hashing: Apply a Apply a second hashing function second hashing function

and probe at the obtained and probe at the obtained address: address:

hashhash22(x), 2* hash(x), 2* hash22(x), 3* (x), 3* hashhash22(x), . . .(x), . . .

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Open Addressing: (using prime Open Addressing: (using prime area)area)

Probing (Linear, quadratic)Probing (Linear, quadratic) Double Hashing Double Hashing

Pseudo-randomPseudo-random Key offsetKey offset

Linked Lists (Separate Linked Lists (Separate Chaining)Chaining)

(Bucket Hashing)(Bucket Hashing) Re-hashingRe-hashing

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Linked lists (Separate Linked lists (Separate Chaining):Chaining):

Separate chaining ( may be Separate chaining ( may be modified by keeping the chain modified by keeping the chain

sorted!)sorted!) Modified Hash Table (by Modified Hash Table (by eliminating the first probe, eliminating the first probe,

hence the hash table becomes hence the hash table becomes an array of records instead of an array of records instead of

an array of pointers to records) an array of pointers to records)

Linked List (Separate Linked List (Separate Chaining)Chaining)

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Open Addressing: (using prime Open Addressing: (using prime area)area)

Probing (Linear, quadratic)Probing (Linear, quadratic) Double Hashing Double Hashing

Pseudo-randomPseudo-random Key offsetKey offset

Linked Lists (Separate Linked Lists (Separate Chaining)Chaining)

(Bucket Hashing)(Bucket Hashing) Re-hashingRe-hashing

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Rehashing:Rehashing: When table becomes When table becomes too fulltoo full, ,

operations will start taking operations will start taking too longtoo long

Solution:Solution: Build another Build another hashing table of about double hashing table of about double

size + associated hashing size + associated hashing function and scan down function and scan down

entire original hash tableentire original hash table

successful search unsuccessful search

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Rehashing:Rehashing: When is the table When is the table too full too full ??

Rehash when table is half Rehash when table is half fullfull

Rehash when an insertion Rehash when an insertion failsfails

When table reaches a certain When table reaches a certain load factor . . . . . load factor . . . . . bestbest

End of HashingEnd of Hashing

ProbingProbing

Definition: Definition:

Each calculation of an Each calculation of an address and test for address and test for success is known as success is known as probingprobing

Key offset collision Key offset collision resolutionresolution

Offset = key/list sizeOffset = key/list size Address= (Offset + old Address= (Offset + old

address) % list sizeaddress) % list size

Recommended