45
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables III

COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables III

Embed Size (px)

Citation preview

COSC 2007 Data Structures II

Chapter 13Advanced Implementation of

Tables III

2

Topics

Hashing Definition

Hash function Key Hash value collision

Open hashing

3

Common Problem

A common pattern in many programs is to store and look up data Find student record, given ID# Find person address, given phone #

Because it is so common, many data structures for it have been investigated How?

4

Phone Number Problem

Problem: phone company wants to implement caller ID.

given a phone number (the key), look up person’s name or address(the data)

lots of phone numbers (P=107-1) in a given area code

only a small fraction of them are in use Nobody has a phone number :0000000 or 0000001

5

Comparison of Time Complexity (average)Operation Insertion Deletion Search

Unsorted Array O(1) O(n) O(n)Unsorted reference O(1) O(n) O(n)

Sorted Array O(n) O(n) O(logn)

Sorted reference O(n) O(n) O(n)

BST O(logn) O(logn) O(logn)

Can we do better than O(logn)?

6

Can we do better than O(log N)?

All previous searching techniques require a specified amount of time (O(logn) or O(n))

Time usually depends on number of elements (n) stored in the table

In some situations searching should be almost instantaneous -- how? Examples

911 emergency system Air-traffic control system

7

Can we do better than O(log N)?

Answer: Yes … sort of, if we're lucky. General idea: take the key of the data record

you’re inserting, and use that number directly as the item number in a list (array).

Search is O(1), but huge amount of space wasted. – how to solve this?

Null Null Null Null

259

-162

3

Xu

000

-000

0

000

-000

1

000

-000

2

•••

Null ••• Null Sub

263

-304

9

••• •••

8

Hashing Basic idea:

Don't use the data value directly. Given an array of size B, use a hash function,

h(x), which maps the given data record x to some (hopefully) unique index (“bucket”) in the array.

0

1

h(x)

B-1

xh

9

What is Hash Table?

The simplest kind of hash table is an array of records.

This example has 101 records.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

An array of records

. . .

[100]

10

What is Hash Table?

Each record has a special

field, called its key. In this example, the key

is a long integer field

called Number.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

An array of records

. . .

[100]

[ 4 ]

Number 256-2879

8888 Queen St.Linda Kim

11

What is Hash Table?

The number is person's

phone number,

and the rest is

person name or address.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

An array of records

. . .

[100]

[ 4 ]

Number 256-2879

12

What is Hash Table?

When a hash table is in use, some spots contain valid records, and other spots are "empty".

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

An array of records

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322

13

Inserting a New Record? In order to insert a new record,

the key must somehow be

converted to an array index. The index is called the

hash value of the key.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

An array of records

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322

Number 265-1556

14

Inserting a New Record? Typical way to create a hash value:

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

An array of records

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322

Number 265-1556

(Number mod 101)

What is (265-1556 mod 101) ?

15

Inserting a New Record? Typical way to create a hash value:

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

An array of records

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322

Number 265-1556

(Number mod 101)

What is (2651556 mod 101) ? 3

16

Inserting a New Record? The hash value is used for

the location of the

new record.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

An array of records

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322

Number 265-1556

[3]

17

Inserting a New Record?

The hash value is used for the location of the new record.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

An array of records

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685

18

What is Hashing? What is hashing?

Each item has a unique key. Use a large array called a Hash Table. Use a Hash Function.

Hashing is like indexing in that it involves associating a key with a relative record address.

Hashing, however, is different from indexing in two important ways: With hashing, there is no obvious connection between the key and the

location. With hashing two different keys may be transformed to the same address.

A Hash function is a function h(K) which transforms a key K into an address.

19

What is Hashing?

An address calculator (hashing function) is used to determine the location of the item

Address Calculator

(Hash function)

Array

(Hash table)Search key

N-1

0

20

What Can Be Hashed?

Anything! Can hash on numbers, strings, structures, etc. Java defines a hashing method for general objects

which returns an integer value.

21

Where do we use Hashing?

Databases (phone book, student name list). Spell checkers. Computer chess games. Compilers.

22

Hashing and Tables Hashing gives us another implementation of Table

ADT Hashing operations

Initialize all locations in Hash Table are empty.

Insert Search Delete

Hash the key; this gives an index; use it to find the value stored in the table in O(1) Great improvement over Log N.

23

Hashing Insert pseudocode

tableInsert (newItem)

i = the array index that the address calculator gives you for the new item’s search keytable[i]=newItem

Retrieval pseudocodetableRerieve (searchKey)

i = array index for searchKey given by the hash functionif (table[i].getKey( ) == searchKey)

return table[i] else

return null

24

Hashing

Deletion pseudocodetableDelete (searchKey)

i = array index for searchKey given by the hash function

success=(tabke[I].getKey() equals searchKey

if (success)

Delete the item from table[i]

Return success

25

Hash Tables

Table size Entries are numbered 0 to TSIZE-1

Mapping Simple to compute Ideally 1-1: not possible Even distribution

Main problems Choosing table size Choosing a good hash function What to do on collisions

26

How to choose the Table Size?

H (Key) = Key mod TSIZETSIZE = 10

20

22

541526

49

0123456789

152022264954

0123456789

110210320460520600

0123456789

110210320460520600

10

110210,320

520

600

460

TSIZE = 11

27

How to choose a Hashing Function?

The hash function we choose depends on the type of the key field (the key we use to do our lookup). Finding a good one can be hard

Rule Be easy to calculate. Use all of the key. Spread the keys uniformly.

28

How to choose a Hashing Function?

Example: Student Ids (integers)

h(idNumber) = idNumber % B

eg. h(678921) = 678921 % 100 = 21 Names (char strings)

h(name) = (sum over the ascii values) % B

eg. h(“Bill”) = (66+105+108+108) % 101 = 86

29

Collision

Here is another new record to

insert, with a hash value of 2.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

An array of records

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685

Number 2641455

My hashvalue is [2].

30

What to do on collisions?

Open hashing (separate chaining) Close hashing (open address)

Linear Probing Quadratic Probing Double hashing

31

Keep a list of all elements that hash to the same value.

Open hashing (separate chaining)

0123456789

01 81

4 64

16 36

9 49

25

0149162536496481

32

Open hashing (separate chaining)

Secondary Data Structure List Search tree another hash table

We expect small collision List

Simple Small overhead

0123456789

01 81

4 64

16 36

9 49

25

33

Operations with Chaining

Insert with chaining Apply hash function to get a position. Insert key into the Linked List at this position.

Search with chaining Apply hash function to get a position. Search the Linked List at this position.

34

Open hashing (separate chaining) 

public class ChainNode{

Private KeyedItem item; private ChainNode next;

public ChainNode(KeyedItem newItem, ChainNode nextNode) {item = newItem;next= nextNode;

// set and get methods }

} // end of ChainNode

35

Open hashing (separate chaining) 

public class HashTable{

private final int HASH_TABLE_SIZE = 101; // size of hash table private ChainNode [] table; //hash table

private int size; //size of hash table

public HashTable() {table = new ChainNode [HASH_TABLE_SIZE];size =0;

}

public bool tableIsEmpty() { return size ==0;} public int tableLength() { return size;} public void tableInsert(KeyedItem newItem) throws

HashException {} public boolean tableDelete(Comparable searchKey) {} public KeyedIten tableRetrieve(Comparable searchKey) {}} // end of hashtable

36

Open hashing (separate chaining)

tableInsert(newItem)if (table is not full) {

searchKey= the search key of newItem

i = hashIndex (searchKey)

node= reference to a new node containing newItem

node.setNext (table[I]);table[I] = node

}else //table full

throw new HashException ()

37

Open hashing (separate chaining)tableRetrieve (searchKey)

i = hashIndex (searchKey)

node= table [I];

while ((node !=null)&& node.getItem().getKey()!= searchKey )

node=getNext ()

if (node !=null)return node.getITem()

elsereturn null

38

Evaluation of Chaining

Disadvantages of Chaining More complex to implement. Search and Delete are harder. We need to know: The

number of elements in the table (N); the number of buckets (B); the quality of the hash function

Worse case (O(n)) for searching

Advantage of Chaining Insertions is easy and quick. Allows more records to be stored.

The size of table is dynamic

39

Review A(n) ______ maps the search key of a

table item into a location that will contain the item. hash function hash table AVL tree heap

40

Review A hash table is a(n) ______.

stack queue array list

41

Review The condition that occurs when a hash

function maps two or more distinct search keys into the same location is called a(n) ______. disturbance collision Rotation congestion

42

Review ______ is a collision-resolution scheme that

searches the hash table sequentially, starting from the original location specified by the hash function, for an unoccupied location. Linear probing Quadratic probing Double hashing Separate chaining

43

Review ______ is a collision-resolution scheme that

searches the hash table for an unoccupied location beginning with the original location that the hash function specifies and continuing at increments of 12, 22, 32, and so on. Linear probing Double hashing Quadratic probing Separate chaining

44

Review ______ is a collision-resolution scheme

that uses an array of linked lists as a hash table. Linear probing Double hashing Quadratic probing Separate chaining

45

Review The load factor of a hash table is

calculated as ______. table size + current number of table items table size – current number of table items current number of table items * table size current number of table items / table size