48
More on Data Stru ctures in C CS-2301 B-term 200 8 1 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides include materials from The C Programming Language, 2 nd ed., by Kernighan and Ritchie and from C: How to Program, 5 th ed., by Deitel and Deitel)

More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

  • View
    223

  • Download
    1

Embed Size (px)

Citation preview

Page 1: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 1

More on Lists and TreesIntroduction to Hash Tables

CS-2301, System Programming for Non-majors

(Slides include materials from The C Programming Language, 2nd ed., by Kernighan and Ritchie and from C: How to Program, 5th ed., by Deitel and Deitel)

Page 2: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 2

Linked List (review)

• Linear data structure

• Easy to grow and shrink

• Easy to add and delete items

• Time to search for an item – O(n)

Page 3: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 3

Linked List (review)

• Linear data structure

• Easy to grow and shrink

• Easy to add and delete items

• Time to search for an item – O(n)

“Big-O

” notat

ion:–

means “

order of”

Page 4: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 4

Definition — Big-O“Of the order of …”

• A characterization of the number of operations in an algorithm in terms of the number of data items involved

• O(n) means that the number of operations to complete the algorithm is proportional to n

• E.g., searching a list with n items requires, on average, n/2 comparisons with payloads

Page 5: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 5

Big-O (continued)

• O(n): proportional to n – i.e., linear• O(n2): proportional to n2 – i.e., quadratic• O(kn) – proportional to kn – i.e., exponential• …• O(log n) – proportional to log n – i.e.,

sublinear• O(n log n)

• Worse than O(n), better than O(n2)

• O(1) – independent of n; i.e., constant

Page 6: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 6

Anecdote & Questions:–

• In the design of electronic adders, what is the order of the carry-propagation?

• What is the order of floating point divide?

• What is the order of floating point square root?

• What program have we studied in this course that is O(2n)? i.e., exponential?

Page 7: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 7

Questions on Big-O?

Page 8: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 8

Back to Linked List Review

• Linear data structure

• Easy to grow and shrink

• Easy to add and delete items

• Time to search for an item – O(n)

Page 9: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 9

Linked List (continued)

payload

nextpayload

nextpayload

next

payload

next

struct listItem *head;

Page 10: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 10

Doubly-Linked List (review)

prev next

payload

prev next

payloadprev next

payload

prev next

payload

struct listItem *head, *tail;

Page 11: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 11

AddAfter(item *p, item *new)

Simple linked list{ new -> next =

p -> next;p -> next = new;

}

Doubly-linked list{ new -> next =

p -> next;new -> prev =

p->next->prev; p -> next =

p->next->prev = new;

}

Page 12: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 12

AddAfter(item *p, item *new)

Simple linked list{ new -> next =

p -> next;p -> next = new;

}

Doubly-linked list{ new -> next = p -> next;

new -> prev = p; p -> next -> prev = new;p -> next = new;}

prev next

payloadprev next

payload

prev next

payload

Page 13: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 13

AddAfter(item *p, item *new)

Simple linked list{ new -> next =

p -> next;p -> next = new;

}

Doubly-linked list{ new -> next = p -> next;

new -> prev = p; p -> next -> prev = new;p -> next = new;

}

prev next

payloadprev next

payload

prev next

payload

Page 14: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 14

AddAfter(item *p, item *new)

Simple linked list{ new -> next =

p -> next;p -> next = new;

}

Doubly-linked list{ new -> next = p -> next;

new -> prev = p; p -> next -> prev = new;p -> next = new;

}

prev next

payloadprev next

payload

prev next

payload

Page 15: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 15

AddAfter(item *p, item *new)

Simple linked list{ new -> next =

p -> next;p -> next = new;

}

Doubly-linked list{ new -> next = p -> next;

new -> prev = p; p -> next -> prev = new;p -> next = new;

}

prev next

payloadprev next

payload

prev next

payload

Page 16: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 16

deleteNext(item *p)

Simple linked list{ if (p->next != NULL)

p->next = p->next->next;

}

Doubly-linked list• Complicated• Easier to deleteItem

Page 17: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 17

deleteItem(item *p)

Simple linked list• Not possible without

having a pointer to previous item!

Doubly-linked list{ if(p->next != NULL)

p->next->prev = p->prev; if(p->prev != NULL)p->prev->next = p->next;

}

prev next

payloadprev next

payload

prev next

payload

Page 18: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 18

deleteItem(item *p)

Simple linked list• Not possible without

having a pointer to previous item!

Doubly-linked list{ if(p->next != NULL)

p->next->prev = p->prev; if(p->prev != NULL)p->prev->next = p->next;

}

prev next

payloadprev next

payload

prev next

payload

Page 19: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 19

deleteItem(item *p)

Simple linked list• Not possible without

having a pointer to previous item!

Doubly-linked list{ if(p->next != NULL)

p->next->prev = p->prev; if(p->prev != NULL)p->prev->next = p->next;

}

prev next

payloadprev next

payload

prev next

payload

Page 20: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 20

Special Cases of Linked Lists

• Queue:– – Items always added to tail– Items always removed from head

• Stack:– – Items always added to head– Items always removed from head

Page 21: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 21

Bubble Sort a Linked List

item *BubbleSort(item *p) {if (p->next != NULL) {

item *q = p->next, *qq = p;for (;q != NULL; qq = q, q = q-

>next)if (p->payload > q->payload){

/*swap p and q */

}p->next = BubbleSort(p->next);

};return p;

}

Page 22: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 22

Bubble Sort a Linked List

item *BubbleSort(item *p) {if (p->next != NULL) {item *q = p->next, *qq = p;for (;q != NULL; qq = q, q = q->next)if (p->payload > q->payload){item *temp = p->next;p->next = q->next; q->next = temp;qq->next = p; p = q;}p->next = BubbleSort(p->next);};return p;

}

Page 23: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 23

Potential Exam Questions

• Analyze BubbleSort to determine if it is correct, and fix it if incorrect.

• Hint: you need to define “correct”

• Hint2: you need to define a loop invariant to convince yourself

• Draw a diagram showing the nodes, pointers, and actions of the algorithm

Page 24: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 24

Observations:–

• What is the order of the Bubble Sort algorithm?

• Answer: O(n2)

• Note that Quicksort is faster• Pages 87 & 110 in Kernighan and Ritchie

• Potential exam question:– why?

Page 25: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 25

Questions?

Page 26: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 26

Binary Tree (review)

• A linked list but with two links per item

struct treeItem {type payload;treeItem *left; treeItem *right;

};

left right

payload

left right

payloadleft right

payload

left right

payloadleft right

payloadleft right

payload

left right

payload

Page 27: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 27

Binary Trees (continued)

• Two-dimensional data structure

• Easy to grow and shrink

• Easy to add and delete items at leaves• More work needed to insert or delete branch nodes

• Search time is O(log n)• If tree is reasonably balanced

• Degenerates to O(n) in worst case if unbalanced

Page 28: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 28

Order of Traversing Binary Trees

• In-order• Traverse left sub-tree (in-order)• Visit node itself• Traverse right sub-tree (in-order)

• Pre-order• Visit node first• Traverse left sub-tree• Traverse right sub-tree

• Post-order• Traverse left sub-tree• Traverse right sub-tree• Visit node last

Page 29: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 29

Order of Traversing Binary Trees

• In-order• Traverse left sub-tree (in-order)• Visit node itself• Traverse right sub-tree (in-order)

• Pre-order• Visit node first• Traverse left sub-tree• Traverse right sub-tree

• Post-order• Traverse left sub-tree• Traverse right sub-tree• Visit node last

Homework #5

Page 30: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 30

Example of Binary Tree

x = (a.real*b.imag - b.real*a.imag) / sqrt(a.real*b.real – a.imag*b.imag)

=

x /

sqrt-

* *

. .

a real b imag

. .

b real a imag

-

Page 31: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 31

Question

• What kind of traversal order is required for this expression?

• In-order?

• Pre-order?

• Post-order?

Page 32: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 32

Binary Trees in Compilers

• Used to represent the structure of the compiled program

• Optimizations• Common sub-expression detection

• Code simplification

• Loop unrolling

• Parallelization

• Reductions in strength – e.g., substituting additions for multiplications, etc.

• Many others

Page 33: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 33

Questions about Trees?

(or about Homework 5?)

Page 34: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 34

New Challenge

• What if we have a data structure that needs to be accessed by value in constant time?

• I.e., O(log n) is not good enough!

• Need to be able to add or delete items

• Total number of items unknown• But an approximate maximum might be known

Page 35: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 35

Examples

• Anti-virus scanner

• Symbol table of compiler

• Virtual memory tables in operating system

• Bank account for an individual

Page 36: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 36

Observation

• Arrays provide constant time access …

• … but you have to know which element you want!

• Also• Not easy to grow or shrink

• Not open-ended

• Can we do better?

Page 37: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 37

Answer – Hash Table

• Definition:– Hash Table• A data structure comprising an array (for constant time access)

• A set of linked lists (for each array element)

• A hashing function to convert value to array index

• Definition:– Hashing function (or simply hash function)

• A function that takes the value in question and “randomizes” it to produce an index

• So that non-randomness of values does not cause concentration of too many elements around a few indices in array

• See §6.6 in Kernighan & Ritchie

Page 38: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 38

datanext

Hash Table Structure

item item item item item item item item item item...

datanext data

next

datanextdatanext

datanext

datanextdatanext

datanext

datanextdatanext

datanext

datanext

Page 39: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 39

Guidelines for Hash Tables

• Lists from each item should be short• I.e., with short search time (approximately constant)

• Size of array should be based on expected # of entries

• Err on large side if possible

• Hashing function• Should “spread out” the values relatively uniformly

• Multiplication and division by prime numbers usually works well

Page 40: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 40

Example Hashing Function

• P. 144 of K & R

#define HASHSIZE 101

unsigned int hash(char *s) {unsigned int hashval;for (hashval = 0; *s != ‘\0’; s++)

hashval = *s + 31 * hashval;

return hashval % HASHSIZE

}

Page 41: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 41

Example Hashing Function

• P. 144 of K & R

#define HASHSIZE 101

unsigned int hash(char *s) {unsigned int hashval;for (hashval = 0; *s != ‘\0’; s++)

hashval = *s + 31 * hashval;

return hashval % HASHSIZE

}

Note choice of prime

numbers to “mix it

up”

Page 42: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 42

Using a Hash Table

struct item *lookup(char *s) {struct item *np;

for (np = hashtab[hash(s)]; np != NULL;np = np -> next)

if (strcmp(s, np->data) == 0)return np; /*found*/

return NULL; /* not found */

}

Page 43: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 43

Using a Hash Table

struct item *lookup(char *s) {struct item *np;

for (np = hashtab[hash(s)]; np != NULL;np = np -> next)

if (strcmp(s, np->data) == 0)return np; /*found*/

return NULL; /* not found */

}

Hash table is indexed

by hash value of s

Page 44: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 44

Using a Hash Table

struct item *lookup(char *s) {struct item *np;

for (np = hashtab[hash(s)]; np != NULL;np = np -> next)

if (strcmp(s, np->data) == 0)return np; /*found*/

return NULL; /* not found */

}

Traverse the linked

list to find item s

Page 45: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 45

Using a Hash Table (continued)

struct item *addItem(char *s, …) {struct item *np;unsigned int hv;

if ((np = lookup(s)) == NULL) {np = malloc(item);/* fill in s and data */np -> next = hashtab[hv = hash(s)];hashtab[hv] = np;

};

return np;}

Page 46: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 46

Using a Hash Table (continued)

struct item *addItem(char *s, …) {struct item *np;unsigned int hv;

if ((np = lookup(s)) == NULL) {np = malloc(item);/* fill in s and data */np -> next = hashtab[hv = hash(s)];hashtab[hv] = np;

};

return np;}

Inserts new ite

m at head

of the lis

t indexed by

hash value

Page 47: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 47

Hash Table Summary

• Widely used for constant time access

• Easy to build and maintain

• There exist an art and science to the choice of hashing functions

• Consult textbooks, web, etc.

Page 48: More on Data Structures in C CS-2301 B-term 20081 More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides

More on Data Structures in C

CS-2301 B-term 2008 48

Questions?