18
Hashing 1 Hashing … To ‘hash’ (is to ‘chop into pieces’ or to ‘mince’), is to make a ‘map’ or a ‘transform’ … Hashing provides a means for accessing data without the use of an index structure. Data is addressed on disk by computing a function on a search key instead. Math's A set of N keys, compute the index, then use an array of size N Key k at k -> direct address, now key k at h(k) -> hashing

Hashing

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Hashing

Hashing 1

Hashing …

To ‘hash’ (is to ‘chop into pieces’ or to ‘mince’), is to make a ‘map’ or a ‘transform’ …

Hashing provides a means for accessing data without the use of an index structure.

Data is addressed on disk by computing a function on a search key instead.

Math's …A set of N keys, compute the index, then use an array of size N

Key k at k -> direct address, now key k at h(k) -> hashing

Page 2: Hashing

Hashing 2

how??? ! A bucket in a hash file is unit of storage

(typically a disk block) that can hold one or more records.

The hash function, h, is a function from the set of all search-keys, K, to the set of all bucket addresses, B.

Insertion, deletion, and lookup are done in constant time*.

Page 3: Hashing

Hashing 3

Hash Table

Hash table is a data structure that support Finds, insertions, deletions (deletions

may be unnecessary in some applications)

The implementation of hash tables is called hashing A technique which allows the

executions of above operations in constant average time

Page 4: Hashing

Hashing 4

Implementation :P

The ideal hash table data structure is an array of some fixed size, containing the item

A search is performed based on key

Each key is mapped into some position in the range 0 to TableSize-1

Page 5: Hashing

Hashing 5

Practical Solution

T[k] corresponds to an element with key k

If the set contains no element with key k, then T[k]=NULL

Each position (slot) corresponds to a key in the universe of keys

Page 6: Hashing

Hashing 6

What’s the key FUNDA!!!

Usually, m << N

h(K i) = an integer in [0, …, m-1] called the hash

value of K i

Page 7: Hashing

Hashing 7

#@ Hash Function @# Ideally :-> The distribution should be

uniform. Hash function should assign the same

number of records in each bucket. Practically :-> The distribution can be

random. Regardless of the actual search-keys, the

each bucket has the same number of records on average

Hash values should not depend on any ordering or the search-keys

Page 8: Hashing

Hashing 8

Data Structure Incorporation

Hash indices are typically a prefix of the entire hash value.

More than one consecutive index can point to the same bucket. The indices have the same hash prefix which

can be shorter then the length of the index.

Page 9: Hashing

Hashing 9

Page 10: Hashing

Hashing 10

Example of Hash Index

Page 11: Hashing

Hashing 11 Hash Function – Strings of Characters

Folding Method:int h(String x, int D) {int i, sum;for (sum=0, i=0; i<x.length(); i++)

sum+= (int)x.charAt(i);return (sum%D);}

sums the ASCII values of the letters in the string

ASCII value for “A” =65; sum will be in range 650-900 for 10 upper-case letters; good when D around 100, for example

order of chars in string has no effect

Page 12: Hashing

Hashing 12 Hash Function – Strings of Characters

Much better: Cyclic Shiftstatic long hashCode(String key, int D) { int h=0; for (int i=0, i<key.length(); i++){

h = (h << 4) | ( h >> 27);h += (int) key.charAt(i);}

return h%D;}

Page 13: Hashing

Hashing 13

Use of Extendable Hash Structure: Example

Initial Hash structure, bucket size = 2

Page 14: Hashing

Hashing 14

Example (Cont.) Hash structure after insertion of one

Brighton and two Downtown records

Page 15: Hashing

Hashing 15

Example (Cont.)Hash structure after insertion of Mianus record

Page 16: Hashing

Hashing 16

Example (Cont.)

Hash structure after insertion of three Perryridge records

Page 17: Hashing

Hashing 17

Example (Cont.) Hash structure after insertion of

Redwood and Round Hill records

Page 18: Hashing

Hashing 18Comparison to Other Hashing Methods

Advantage: performance does not

decrease as the database size increases

Space is conserved by adding and removing

as necessary

Disadvantage: additional level of indirection for operations Complex implementation