HASHING Section 12.7 (P. 707-717). HASHING - have already seen binary and linear search and discussed when they might be useful (based on complexity)

HASHINGHASHING

Section 12.7 (P. 707-717)Section 12.7 (P. 707-717)

HASHINGHASHING

- have already seen binary and linear - have already seen binary and linear search and discussed when they might be search and discussed when they might be useful (based on useful (based on complexitycomplexity) ) Linear: Linear: O(n)O(n) Binary: Binary: O(log n)O(log n) this may not be fast enough in some cases: this may not be fast enough in some cases:

ex: a web server (like ex: a web server (like Yahoo!Yahoo!) ) millions of searches/sec – millions of searches/sec –

sometimes need a structure that can allow sometimes need a structure that can allow very fast retrieval of recordsvery fast retrieval of records

a a hash tablehash table is one such structure is one such structure

HASHING (Example)HASHING (Example) library card cataloglibrary card catalog

suppose you only have a few books; suppose you only have a few books; you could make a list of the books by their catalog numbersyou could make a list of the books by their catalog numbers if you need to find a book, you search through the numbers if you need to find a book, you search through the numbers

till you find the one you're looking fortill you find the one you're looking for not efficient with large library (Library of Congress)not efficient with large library (Library of Congress)

solutionsolution: : create an array containing the number of elements that you create an array containing the number of elements that you

have books (each book has a slot)have books (each book has a slot) put in each slot: T - the book is there F - it is notput in each slot: T - the book is there F - it is not then, if you are looking for a book number, look in the array at then, if you are looking for a book number, look in the array at

that slot; if True - the book is therethat slot; if True - the book is there running time: running time: O(1)O(1) - time to find 1 element - time to find 1 element what is the problem? - not practicalwhat is the problem? - not practical need a structure that can give us the efficiency of need a structure that can give us the efficiency of O(1)O(1) as well as well

as the efficient use of spaceas the efficient use of space

How Does “Hashing” Work?How Does “Hashing” Work?

can come up with an efficient way to handle the can come up with an efficient way to handle the indexes (say, last 2 digits)indexes (say, last 2 digits)

Then, you can search for the slot representing the Then, you can search for the slot representing the last 2 digitslast 2 digits

problem: problem: what if more than one book has same last 2 digits?what if more than one book has same last 2 digits?

called called collisioncollision one solution: one solution:

create an array at least twice as big as what we are create an array at least twice as big as what we are storingstoring

handle duplicates through technique called handle duplicates through technique called chainingchaining when duplicate happens, link them together like a when duplicate happens, link them together like a linked linked

listlist [draw picture of book list, with last 2 digits as index] [draw picture of book list, with last 2 digits as index]

How Does Hashing Work? How Does Hashing Work? (cont)(cont)

another solution: place duplicate in another another solution: place duplicate in another location in the tablelocation in the table One way: One way: linear probinglinear probing::

can create a way to mark indexes, called a can create a way to mark indexes, called a hashing functionhashing function: : index = number % sizeoftable index = number % sizeoftable add a mark indicating that a cell is occupied or unoccupiedadd a mark indicating that a cell is occupied or unoccupied as you add item, indicate that the cell is now occupiedas you add item, indicate that the cell is now occupied if collision:if collision:

start at place where collision occursstart at place where collision occurs keep moving until find empty cell keep moving until find empty cell put item in that cell and mark it occupiedput item in that cell and mark it occupied good strategy:good strategy:

if table gets half-full, double the size and redo the hashing if table gets half-full, double the size and redo the hashing function to create new indexes function to create new indexes

problem: collision can group items together using this problem: collision can group items together using this method method

HASHING SOLUTIONS (cont.)HASHING SOLUTIONS (cont.)

another way: another way: quadratic probingquadratic probing uses a formula to determine what is the next uses a formula to determine what is the next

available cell: available cell:

index = (value + no_to_move_aheadindex = (value + no_to_move_ahead22) % ) % tablesizetablesize

• then, if half-full, double the size of the table as then, if half-full, double the size of the table as beforebefore

• Note:Note: for hashing to be a good thing, duplicates must be for hashing to be a good thing, duplicates must be

minimized and hashing function must be quick (have minimized and hashing function must be quick (have to find balance between computation of indexes and to find balance between computation of indexes and search times) search times)

HASHING HASHING IMPLEMENTATIONIMPLEMENTATION

Explanation:Explanation: use of a class containing a struct representing use of a class containing a struct representing

the items in the tablethe items in the table three functions: insert(), delete(), and search()three functions: insert(), delete(), and search() write our hashing function as a member write our hashing function as a member

function: hash()function: hash() define the following constants: define the following constants:

OCCUPIEDOCCUPIEDUNOCCUPIEDUNOCCUPIEDDELETEDDELETED

HASHING HASHING IMPLEMENTATIONIMPLEMENTATION

#define Occupied 0 #define Occupied 0 #define Unoccupied 1 #define Unoccupied 1 #define Deleted 2 #define Deleted 2 #define NotFound -1 #define NotFound -1 #define DefaultSize 30 #define DefaultSize 30 typedef int Item; typedef int Item; struct TableItem { struct TableItem { Item Value; Item Value; int Status; int Status; }}

Hashing Implementation Hashing Implementation (cont)(cont)

class HashTable { class HashTable { public: public: bool Insert(const Item &Value); bool Insert(const Item &Value); bool Delete(const Item &Value); bool Delete(const Item &Value); int Find(const Item &Value); int Find(const Item &Value); HashTable(); HashTable(); ~HashTable(); ~HashTable(); void Clear(); void Clear(); int HashFunction(int HashValue); int HashFunction(int HashValue); private: private: int TableSize; int TableSize; int CurrentSize; int CurrentSize; TableItem * MyTable; TableItem * MyTable; } }

HASHING MEMBER HASHING MEMBER FUNCTIONSFUNCTIONS

The Constructor: allocates memory and calls clear to The Constructor: allocates memory and calls clear to set the status of each cell set the status of each cell

HashTable :: HashTable() { HashTable :: HashTable() { TableSize = DefaultSize; TableSize = DefaultSize; MyTable = new TableItem[TableSize]; MyTable = new TableItem[TableSize]; Clear(); Clear(); } } • Clear:Clear: void HashTable :: Clear() { void HashTable :: Clear() { int TempIndex = TableSize - 1; int TempIndex = TableSize - 1; while (TempIndex >= 0) while (TempIndex >= 0) MyTable[TempIndex--].Status = MyTable[TempIndex--].Status =

Unoccupied; Unoccupied;

} }

HASHING MEMBER HASHING MEMBER FUNCTIONS (cont.)FUNCTIONS (cont.)

DeconstructorDeconstructor - deallocates all memory allocated for the - deallocates all memory allocated for the table table

HashTable :: ~HashTable() HashTable :: ~HashTable()

{ {

delete [] MyTable; delete [] MyTable;

} } • Hashing Function - used to calculate the indexes of Hashing Function - used to calculate the indexes of

the table the table

int HashTable :: HashFunction(Item HashValue) int HashTable :: HashFunction(Item HashValue)

{ {

return HashValue % TableSize; return HashValue % TableSize;

} }


• DeleteDelete::• finds the item using the Find() functionfinds the item using the Find() function• if NOT FOUND returned, then delete returns false if NOT FOUND returned, then delete returns false • if an index comes back, then status of that cell set to if an index comes back, then status of that cell set to

Deleted and delete returns true Deleted and delete returns true bool HashTable :: Delete(const Item &Value) bool HashTable :: Delete(const Item &Value) { { int Pos = Find(Value); int Pos = Find(Value); if (Pos == NotFound) if (Pos == NotFound) return false; return false; MyTable[Pos].Status = Deleted; MyTable[Pos].Status = Deleted; return true; return true; } }


FindFind:: Find calls the HashingFunction to find the first possible spot Find calls the HashingFunction to find the first possible spot

for the itemfor the item It then moves through the table looking for the next It then moves through the table looking for the next

unoccupied spot unoccupied spot int HashTable :: Find(const Item &Value) { int HashTable :: Find(const Item &Value) { int Pos = HashFunction(Value); int Pos = HashFunction(Value); while (MyTable[Pos].Status != Unoccupied && while (MyTable[Pos].Status != Unoccupied && MyTable[Pos].Value != Value) MyTable[Pos].Value != Value) if (++Pos >= TableSize) if (++Pos >= TableSize) Pos = 0;Pos = 0; if (MyTable[Pos].Status == Unoccupied) || if (MyTable[Pos].Status == Unoccupied) || MyTable[Pos].Status == Deleted) MyTable[Pos].Status == Deleted) return NotFound; return NotFound; else else return Pos; return Pos; } }

Hashing Member Functions Hashing Member Functions (cont.)(cont.)

• Insert:Insert:• 1) find the initial spot that value should be added 1) find the initial spot that value should be added • 2) use linear probing to find the first available location 2) use linear probing to find the first available location • 3) once found, if the value is already in that spot ,return 3) once found, if the value is already in that spot ,return

false false • 4) otherwise, add the item, and increase the current size 4) otherwise, add the item, and increase the current size

of the Table of the Table • 5) Then, make sure the table is not half full if so, then 5) Then, make sure the table is not half full if so, then

double the table and copy over the values Then, delete double the table and copy over the values Then, delete the old array from memory the old array from memory

Hashing Member Functions Hashing Member Functions (Insert)(Insert)

bool HashTable :: Insert(const Item &Value) bool HashTable :: Insert(const Item &Value) { { // find spot to add// find spot to add

int Pos = HashFunction(Value); int Pos = HashFunction(Value); while (MyTable[Pos].Status != Unoccupied && while (MyTable[Pos].Status != Unoccupied && MyTable[Pos].Value != Value) { MyTable[Pos].Value != Value) { Pos++; Pos++; // see if at the end of table // see if at the end of table if (Pos >= TableSize) if (Pos >= TableSize) Pos = 0; Pos = 0; } } //if value exists, return without inserting //if value exists, return without inserting if (MyTable[Pos].Status == Occupied) if (MyTable[Pos].Status == Occupied) return false; return false;

Hashing – Insert (cont.)Hashing – Insert (cont.)

// add new item // add new item MyTable[Pos].Status = Occupied; MyTable[Pos].Status = Occupied; MyTable[Pos].Value = Value; MyTable[Pos].Value = Value; Currentsize++; Currentsize++; //see if now more than half full //see if now more than half full if (CurrentSize * 2 < TableSize) if (CurrentSize * 2 < TableSize) return true; return true; //if it is more than half, increase size //if it is more than half, increase size TableItem * OldTable = MyTable; TableItem * OldTable = MyTable; //points to old array //points to old array CurrentSize = 0; CurrentSize = 0; //get space for new table //get space for new table MyTable = new HashElement[TableSize*2]; MyTable = new HashElement[TableSize*2]; clear();clear();

Hashing (Insert)Hashing (Insert)

//copy values from old table to new one //copy values from old table to new one

int OldTableSize = TableSize;int OldTableSize = TableSize;

TableSize *= 2; TableSize *= 2;

for (int i = 0; i < OldTableSize; i++) { for (int i = 0; i < OldTableSize; i++) {

if (OldTable[i].Status == Occupied) if (OldTable[i].Status == Occupied)

Insert(OldTable[i].Value); Insert(OldTable[i].Value);

} }

//delete the old table from memory //delete the old table from memory

delete [] OldTable; delete [] OldTable;

return true; return true;

} }

Hashing (an Example)Hashing (an Example)

EXAMPLE:EXAMPLE:

Given: hash table with initial size 10 that uses Given: hash table with initial size 10 that uses linear probing, show table after following linear probing, show table after following insertions: insertions:

13 4 23 99 100 25 33 13 4 23 99 100 25 33

(be sure to double size of table when required to) (be sure to double size of table when required to)

Questions?Questions?

Read chapter on Sorting (Intro)Read chapter on Sorting (Intro) P. 722 - 733 (Insertion, Selection, Bubble)P. 722 - 733 (Insertion, Selection, Bubble)

Documents

HASHING Section 12.7 (P. 707-717). HASHING - have already seen binary and linear search and discussed when they might be useful (based on complexity)