31
CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Embed Size (px)

Citation preview

Page 1: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

CS 221

Analysis of AlgorithmsData StructuresDictionaries, Hash Tables,Ordered Dictionary and Binary Search Trees

Page 2: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Reading material

Goodrich and Tamassia, 2002 Chapter 2, section 2.5,pages 114-137 see also section 2.6

Chapter 3, section 3.1 pages 141-151

Page 3: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Dictionaries

Remember Stacks and queues

LIFO, FIFO vectors and lists

Ranks and positions

Page 4: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Dictionaries

Dictionaries containers for storing, retrieving,

modifying collections of objects We call the objects – items Items = (key, element) pair Access to item based on key –D(k,e) Can the element by the key?

D(e,e)

Page 5: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Dictionary ADT The dictionary ADT models a searchable

collection of key-element items The main operations of a dictionary are

searching, inserting, and deleting items Multiple items with the same key are allowed Applications:

address book credit card authorization mapping host names (e.g., cs16.net) to internet

addresses (e.g., 128.148.34.101)

Page 6: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Dictionary ADT Dictionary ADT methods:

findElement(k): if the dictionary has an item with key k, returns its element, else, returns the special element NO_SUCH_KEY

insertItem(k, o): inserts item (k, o) into the dictionary removeElement(k): if the dictionary has an item with

key k, removes it from the dictionary and returns its element, else returns the special element NO_SUCH_KEY

size(), isEmpty() keys(), Elements()

Page 7: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Dictionaries

Unordered Dictionary Log files (or audit trails) Hash tables

Page 8: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Dictionaries

Log files vector or list store items as unsorted sequence Think of –

error log text messages …

Page 9: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Dictionaries Log files – Performance

insertItem(k,e) very good – always insert items at end (or

beginning of sequence) O(1)

findElement(k) must scan sequence to look for key worst case –

unsuccessful O(n)

removeElement(k)?

Page 10: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Dictionaries

Logfiles Good for applications like –

small unstructured sequences findElement(k), removeItem(k) times are trivial

Almost always insertItem rarely findElement, removeItem

Page 11: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Hash Tables

It is convenient to think of keys to be address of items in some sense.

In an sequence implemented as a vector- it would be nice if key = rank …but this often does not work

…consider student ID # as a key, as a rank

Page 12: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Hash Tables

Another strategy is to create a hash table

Essentially a hash table is a container where you can compute the address of the item based on the key

Page 13: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Hash Tables

We need two things a vector for holding items a hash function to compute the index of

the vector based on key So, suppose we have a

vector container A hash function on k h(k) items are not stored at A[k], but at A[h(k)]

Page 14: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Hash Functions and Hash Tables (§2.5.2)

A hash function h maps keys of a given type to integers in a fixed interval [0, N1]

Example:h(x) x mod N

is a hash function for integer keys The integer h(x) is called the hash value of key x

A hash table for a given key type consists of Hash function h Array (called table) of size N

When implementing a dictionary with a hash table, the goal is to store item (k, o) at index i h(k)

Page 15: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Example We design a hash table for

a dictionary storing items (SSN, Name), where SSN (social security number) is a nine-digit positive integer

Our hash table uses an array of size N10,000 and the hash functionh(x)last four digits of x

01234

999799989999

…451-229-0004

981-101-0002

200-751-9998

025-612-0001

Page 16: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Hash Function

Hash function then is a function that maps a set of keys to a set of integers that represent the set of ranks

(addresses) in our vector

…but this usually requires to steps

Page 17: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Hash Functions (§ 2.5.3)

A hash function is usually specified as the composition of two functions:Hash code map: h1: keys integers

Compression map: h2: integers [0, N1]

The hash code map is applied first, and the compression map is applied next on the result, i.e.,

h(x) = h2(h1(x)) The goal of the

hash function is to “disperse” the keys in an apparently random way

Page 18: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Hash functions

Hash code maps there are many – see Goodrich and

Tamassia

Page 19: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Hash functions

Hash compression map to map the output of the hash code map

(some set of integers) to integers in the range of [0,…n-1]

corresponds to set of ranks in vector container

Page 20: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Hash Tables

What happens with hash function values are not unique collisions collisions mean that two item has the

same hash code which means that two items will have the

same location in the vector Can we do that?

Page 21: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Hash Tables

Collision handling Chaining Open addressing

Page 22: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Hash Tables – Collision Handing

Chaining with a vector of simple items we can

insert an item where one exists without destroying the existing one

So, what can we do?

Page 23: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Hash Tables – Collision Handing Chaining

Suppose that each element of the dictionary D[i] is itself an unordered sequence

The algorithm is calc hash code check hash table entry D[h(k)]

if not occupied insertItem if occupied insertItem anyway, but after

existing in same location of D Called Separate Chaining Rule

Page 24: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Collision Handling (§ 2.5.5)

Collisions occur when different elements are mapped to the same cell

Chaining: let each cell in the table point to a linked list of elements that map there

Chaining is simple, but requires additional memory outside the table

01234 451-229-0004 981-101-0004

025-612-0001

Page 25: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Example

You create a vector for storing WVU student SSN data— vector has 100 slots (rank 0 – 99) you have 100 students with 9 digit SSNs Your hash code mapping function is

h1(SSN) = int(left3Characters(SSN)) Your hash compression mapping function

is h2(h1(SSN) = h1(SSN) mod 100

Page 26: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Hash Tables – collision handling

So what happens if a lot of keys generate the same hash code? …they pile up Loading Factor

want to keep loading factor low

Page 27: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Hash Tables – Collision HandingOpen Addressing

Linear Probing Vector holds one item per rank

if rank at D[h(k)] is occupied, … check D[h(k)+1], if unoccupied insertItem if not, check D[h(k)+2], …

Page 28: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Linear Probing (§2.5.5) Open addressing: the

colliding item is placed in a different cell of the table

Linear probing handles collisions by placing the colliding item in the next (circularly) available table cell

Each table cell inspected is referred to as a “probe”

Colliding items lump together, causing future collisions to cause a longer sequence of probes

Example: h(x) x mod 13 Insert keys 18, 41,

22, 44, 59, 32, 31, 73, in this order

0 1 2 3 4 5 6 7 8 9 10 11 12

41 18445932223173 0 1 2 3 4 5 6 7 8 9 10 11 12

Page 29: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Hash Tables – Collision HandingOpen Addressing

Linear Probing causes lumping

alternatives quadratic probing double hashing

Page 30: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees

Performance of Hashing In the worst case, searches,

insertions and removals on a hash table take O(n) time

The worst case occurs when all the keys inserted into the dictionary collide

The load factor nN affects the performance of a hash table

Assuming that the hash values are like random numbers, it can be shown that the expected number of probes for an insertion with open addressing is

1 (1 )

The expected running time of all the dictionary ADT operations in a hash table is O(1)

In practice, hashing is very fast provided the load factor is not close to 100%

Applications of hash tables: small databases compilers browser caches

Page 31: CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees