Ch8.Dictionaries Hashing.handouts

Embed Size (px)

Citation preview

  • 8/8/2019 Ch8.Dictionaries Hashing.handouts

    1/7

    Dictionaries 2/19/08 14:

    2/19/08 14:15 Dictionaries 1

    Dictionaries & Hashing (Ch 8)

    01234 451-229-0004

    981-101-0002

    025-612-0001

    2/19/08 14:15 Dictionaries 2

    Outline and Reading

    Dictionary ADT (8.1)

    Hash Tables (8.2)

    Ordered Dictionaries (8.3)

    Binary search (8.3.3)

    Lookup table (8.3.2)

    Skip Lists (8.4)

    2/19/08 14:15 Dictionaries 3

    Dictionary ADT (8.1.1)The dictionary ADT models asearchable collection of key-element items

    The main operations of adictionary are searching,inserting, and deleting items

    Multiple items with the samekey are allowed

    Applications: address book

    credit card authorization

    mapping host names (e.g.,cs16.net) to internet addresses(e.g., 128.148.34.101)

    Dictionary ADT methods:

    find(k): if the dictionary has

    an item with key k, returnsthe position of this element,

    else, returns a null position.

    insertItem(k, o): inserts item(k, o) into the dictionary

    removeElement(k): if thedictionary has an item withkey k, removes it from thedictionary and returns its

    element. An error occurs ifthere is no such element.

    size(), isEmpty()

    keys(), Elements()

    2/19/08 14:15 Dictionaries 4

    Unordered Sequence- Log File (8.1.2)

    A log file is a dictionary implemented by means of an unsortedsequence

    We store the items of the dictionary in a sequence (based on a doubly-linked lists or a circular array), in arbitrary order

    Performance: insertItem takes O(1) time since we can insert the new item at the

    beginning or at the end of the sequence

    find and removeElement take O(n) time since in the worst case (the item isnot found) we traverse the entire sequence to look for an item with thegiven key

    Space - can be O(n), where n is the number of elements in the dictionary

    The log file is effective only for dictionaries of small size or fordictionaries on which insertions are the most common operations, whilesearches and removals are rarely performed (e.g., historical record oflogins to a workstation)

    2/19/08 14:15 Dictionaries 5

    Direct Address Table

    A direct address table is a dictionary in which

    The keys are in the range {0,1,2,,N}

    Stored in an array of size N - T[0,N]

    Item with key k stored in T[k]

    Performance: insertItem, find, and removeElement all take O(1) time

    Space - requires space O(N), independent of n, the number ofitems stored in the dictionary

    The direct address table is not space efficient unless the rangeof the keys is close to the number of elements to be stored inthe dictionary, I.e., unless n is close to N.

    2/19/08 14:15 Dictionaries 6

    Hash Tables

    Hashing

    Hash table (an array) of size N, H[0,N]

    Hash function h that maps keys to indices in H

    Issues Hash functions - need method to transform key to an index in H

    that will have nice properties.

    Collisions - some keys will map to the same index of H (otherwisewe have a Direct Address Table). Several methods to resolve thecollisions

    Chaining - put elts that hash to same location in a linked list

    Open addressing - if a collision occurs, have a method to select anotherlocation in the table.

    Probe sequences

  • 8/8/2019 Ch8.Dictionaries Hashing.handouts

    2/7

    Dictionaries 2/19/08 14:

    2/19/08 14:15 Dictionaries 7

    Hash Functions andHash Tables (8.2)

    Ahash functionh maps keys of a given type to

    integers in a fixed interval [0,N1]Example:

    h(x) =xmodN

    is a hash function for integer keys

    The integer h(x) is called the hash value of key x

    A hash table for a given key type consists of

    Hash function h

    Array (called table) of size N

    When implementing a dictionary with a hash table, the goal is tostore item (k, o) at index i=h(k)

    2/19/08 14:15 Dictionaries 8

    Example

    We design a hash table for

    a dictionary storing items(SSN, Name), where SSN(social security number) is anine-digit positive integer

    Our hash table uses anarray of sizeN= 10,000 andthe hash functionh(x) = last four digits ofx

    0

    1234

    999799989999

    451-229-0004

    981-101-0002

    200-751-9998

    025-612-0001

    2/19/08 14:15 Dictionaries 9

    Hash Functions (8.2.2)

    A hash function isusually specified as the

    composition of twofunctions:

    Hash code map: h1:keysintegers

    Compression map: h2: integers [0,N1]

    The hash code map isapplied first, and thecompression map isapplied next on theresult, i.e.,

    h(x) = h2(h1(x))

    The goal of the hashfunction is todisperse the keys inan apparently randomway

    2/19/08 14:15 Dictionaries 10

    Hash Code Maps (8.2.3)Memory address: We reinterpret the memory

    address of the key object asan integer

    Good in general, except fornumeric and string keys

    Integer cast: We reinterpret the bits of the

    key as an integer

    Suitable for keys of lengthless than or equal to thenumber of bits of the integertype (e.g., char, short, intand float on many machines)

    Component sum: We partition the bits of

    the key into componentsof fixed length (e.g., 16or 32 bits) and we sumthe components(ignoring overflows)

    Suitable for numeric keysof fixed length greaterthan or equal to thenumber of bits of theinteger type (e.g., longand double on many

    machines)

    2/19/08 14:15 Dictionaries 11

    Hash Code Maps (cont.)

    Polynomial accumulation: We partition the bits of the key

    into a sequence of componentsof fixed length (e.g., 8, 16 or 32bits)

    a0 a1 an1 We evaluate the polynomial

    p(z)= a0+a1z +a2z2+

    +an1zn1

    at a fixed valuez, ignoringoverflows

    Especially suitable for strings(e.g., the choice z=33gives atmost 6 collisions on a set of50,000 English words)

    Polynomial p(z) can beevaluated in O(n) time

    using Horners rule:

    The followingpolynomials aresuccessively computed,each from the previousone in O(1) time

    p0(z)= an1

    pi(z)= ani1+zpi1(z)

    (i=1, 2, , n 1)

    We havep(z) = pn1(z)

    2/19/08 14:15 Dictionaries 12

    Compression Maps (8.2.4)

    Division: h2 (y) = y mod N

    The sizeNof thehash table is usuallychosen to be a prime

    The reason has to dowith number theory

    and is beyond thescope of this course

    Multiply, Add andDivide (MAD): h2 (y) =(ay + b)mod N

    a and b arenonnegative integerssuch thata mod N 0

    Otherwise, everyinteger would map tothe same value b

  • 8/8/2019 Ch8.Dictionaries Hashing.handouts

    3/7

    Dictionaries 2/19/08 14:

    2/19/08 14:15 Dictionaries 13

    Collision Handling(8.2.5)

    Collisions occur whendifferent elements aremapped to the samecell

    Chaining: let eachcell in the table pointto a linked list ofelements that mapthere

    Chaining is simple,but requiresadditional memoryoutside the table

    0

    1234 451-229-0004 981-101-0004

    025-612-0001

    2/19/08 14:15 Dictionaries 14

    Exercise: chaining

    Assume you have a hash table H withN=9 slots (H[0,8]) and let the hashfunction be h(k)=k mod N.

    Demonstrate (by picture) the insertionof the following keys into a hash tablewith collisions resolved by chaining. 5, 28, 19, 15, 20, 33, 12, 17, 10

    2/19/08 14:15 Dictionaries 15

    Linear ProbingOpen addressing: thecolliding item is placed in adifferent cell of the table

    Linear probing handlescollisions by placing thecolliding item in the next(circularly) available table cell.So the i-th cell checked is: H(k,i) = (h(k)+i)mod N

    Each table cell inspected isreferred to as a probe

    Colliding items lump together,causing future collisions tocause a longer sequence of

    probes

    Example: h(x) = xmod13

    Insert keys 18, 41, 22,44, 59, 32, 31, 73, in thisorder

    0 1 2 3 4 5 6 7 8 9 10 11 12

    41 18445932223173

    0 1 2 3 4 5 6 7 8 9 10 11 12

    2/19/08 14:15 Dictionaries 16

    Search with Linear Probing

    Consider a hash tableAthat uses linear probing

    find(k) We start at cell h(k)

    We probe consecutivelocations until one of thefollowing occurs An item with key kis

    found, or

    An empty cell is found,or

    Ncells have beenunsuccessfully probed

    Algorithmfind(k)

    i h(k)

    p0

    repeat

    c A[i]

    ifc=

    returnPosition(null)

    else ifc.key () =k

    returnPosition(c)

    else

    i(i+1)mod N

    p p+1

    until p=N

    returnPosition(null)

    2/19/08 14:15 Dictionaries 17

    Updates with Linear ProbingTo handle insertions anddeletions, we introduce aspecial object, called

    AVAILABLE, which replacesdeleted elements

    removeElement(k) We search for an item with

    key k

    If such an item (k, o) isfound, we replace it with thespecial itemAVAILABLEand we return the position ofthis item

    Else, we return a nullposition

    insertItem(k, o) We throw an exception

    if the table is full

    We start at cell h(k)

    We probe consecutivecells until one of thefollowing occurs A cell i is found that is

    either empty or storesAVAILABLE, or

    Ncells have beenunsuccessfully probed

    We store item (k, o) incell i

    2/19/08 14:15 Dictionaries 18

    Exercise: open addressing &linear probing

    Assume you have a hash table H withN=11 slots (H[0,10]) and let the hashfunction be h(k)=k mod N.

    Demonstrate (by picture) the insertionof the following keys into a hash tablewith collisions resolved by linearprobing. 10, 22, 31, 4, 15, 28, 17, 88, 59

  • 8/8/2019 Ch8.Dictionaries Hashing.handouts

    4/7

    Dictionaries 2/19/08 14:

    2/19/08 14:15 Dictionaries 19

    Double HashingDouble hashing uses asecondary hash function d(k)

    and handles collisions byplacing an item in the firstavailable cell of the seriesh(k,i) =(h(k)+id(k)) modNfor i= 0, 1, ,N1

    The secondary hash functiond(k) cannot have zero values

    The table size Nmust be aprime to allow probing of allthe cells

    Common choice of

    compression map for thesecondary hash function:

    d2(k) =qkmod q

    where q

  • 8/8/2019 Ch8.Dictionaries Hashing.handouts

    5/7

  • 8/8/2019 Ch8.Dictionaries Hashing.handouts

    6/7

    Dictionaries 2/19/08 14:

    2/19/08 14:15 Dictionaries 31

    What is a Skip ListAskip list for a set Sof distinct (key, element) items is a series of lists S0,S1 , ,Sh such that

    Each listSi contains the special keys + and ListS0 contains the keys ofSin nondecreasing order

    Each list is a subsequence of the previous one, i.e.,S0S1 Sh

    ListSh contains only the two special keys

    We show how to use a skip list to implement the dictionary ADT

    56 64 78 +31 34 44 12 23 26

    +

    +31

    64 +31 34 23

    S0

    S1

    S2

    S3

    2/19/08 14:15 Dictionaries 32

    SearchWe search for a keyxin a a skip list as follows:

    We start at the first position of the top list

    At the current positionp, we comparexwithy

    key(after(p))x= y: we return element(after(p))

    x> y: we scan forward

    x< y: we drop down

    If we try to drop down past the bottom list, we return NO_SUCH_KEY

    Example: search for 78

    +

    S0

    S1

    S2

    S3

    +31

    64 +31 34 23

    56 64 78 +31 34 44 12 23 26

    2/19/08 14:15 Dictionaries 33

    Randomized AlgorithmsArandomized algorithmperforms coin tosses (i.e.,uses random bits) to controlits execution

    It contains statements of thetype

    brandom()

    if b= 0

    do A

    else { b= 1}

    do B

    Its running time depends onthe outcomes of the cointosses

    We analyze the expectedrunning time of a randomizedalgorithm under the followingassumptions

    the coins are unbiased, and

    the coin tosses are independent

    The worst-case running time ofa randomized algorithm is oftenlarge but has very lowprobability (e.g., it occurs whenall the coin tosses give heads)

    We use a randomized algorithmto insert items into a skip list

    2/19/08 14:15 Dictionaries 34

    To insert an item (x, o) into a skip list, we use a randomizedalgorithm: We repeatedly toss a coin until we get tails, and we denote with i

    the number of times the coin came up heads

    Ifi h, we add to the skip list new lists Sh+1, ,Si+1, eachcontaining only the two special keys

    We search for xin the skip list and find the positions p0, p1 , ,piof the items with largest key less thanxin each list S0,S1, ,Si

    For j 0, , i, we insert item (x, o) into list Sj after positionpjExample: insert key 15, with i= 2

    Insertion

    + 10 36

    +

    23

    23 +

    S0

    S1

    S2

    +

    S0

    S1

    S2

    S3

    + 10 362315

    + 15

    + 2315p0

    p1

    p2

    2/19/08 14:15 Dictionaries 35

    Deletion

    To remove an item with key xfrom a skip list, we proceed asfollows:

    We search for xin the skip list and find the positions p0, p1 , ,piof the items with key x, where positionpj is in list Sj

    We remove positions p0, p

    1, ,p

    i

    from the listsS0,S

    1, ,S

    i

    We remove all but one list containing only the two special keys

    Example: remove key 34

    +4512

    +

    23

    23 +

    S0

    S1

    S2

    +

    S0

    S1

    S2

    S3

    +4512 23 34

    +34

    +23 34p0

    p1

    p2

    2/19/08 14:15 Dictionaries 36

    Implementation

    We can implement a skip listwith quad-nodes

    A quad-node stores:

    item

    link to the node before link to the node after

    link to the node below

    link to the node after

    Also, we define special keysPLUS_INF and MINUS_INF,and we modify the keycomparator to handle them

    x

    quad-node

  • 8/8/2019 Ch8.Dictionaries Hashing.handouts

    7/7

    Dictionaries 2/19/08 14:

    2/19/08 14:15 Dictionaries 37

    Space Usage

    The space used by a skip listdepends on the random bits

    used by each invocation of theinsertion algorithm

    We use the following two basicprobabilistic facts:Fact 1: The probability of getting i

    consecutive heads whenflipping a coin is 1/2i

    Fact 2: If each ofn items ispresent in a set withprobabilityp, the expected sizeof the set is np

    Consider a skip list with n items

    By Fact 1, we insert an item in

    list Si with probability 1/

    2i

    By Fact 2, the expected size oflist Si is n/2

    i

    The expected number of nodes

    used by the skip list is

    nnn

    h

    i

    i

    h

    i

    i2

    2

    1

    200