Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off (extra space in tables - breathing

Design and Analysis of Algorithms - Chapter 7 1

Space-time tradeoffsSpace-time tradeoffs

For many problems some extra space really pays off For many problems some extra space really pays off (extra space in tables - breathing room)(extra space in tables - breathing room)

input enhancementinput enhancement• non comparison-based sortingnon comparison-based sorting• auxiliary tables (shift tables for pattern matching)auxiliary tables (shift tables for pattern matching)

prestructuringprestructuring• hashinghashing• indexing schemes (eg, B-trees)indexing schemes (eg, B-trees)

tables of information that do all the worktables of information that do all the work• dynamic programming dynamic programming


Sorting by CountingSorting by Counting

Algorithm ComparisonCountingSort(A[0..n-1])Algorithm ComparisonCountingSort(A[0..n-1])for i for i 0 to n-1 do Count[i] 0 to n-1 do Count[i]00for i for i 0 to n-2 do 0 to n-2 do

for j for j i+1 to n-1 do i+1 to n-1 do if A[i] <A[j] then Count[j] if A[i] <A[j] then Count[j] Count[j]+1 Count[j]+1else Count[i] else Count[i] Count[i]+1 Count[i]+1

for i for i 0 to n-1 do S[Count[i]] 0 to n-1 do S[Count[i]] A[i] A[i] Example: 62 31 84 96 19 47 Example: 62 31 84 96 19 47 EfficiencyEfficiency


Sorting by Counting (2)Sorting by Counting (2)

Algorithm DistributionCountingSort(A[0..n-1])Algorithm DistributionCountingSort(A[0..n-1])

for j for j 0 to u-l do D[j] 0 to u-l do D[j]00

for i for i 0 to n-1 do D[A[i]-l] 0 to n-1 do D[A[i]-l] D[A[i]-l] + 1 D[A[i]-l] + 1

for j for j 1 to u-l do D[j] 1 to u-l do D[j] D[j-1]+D[j] D[j-1]+D[j]

for i for i n-1 down to 0 do n-1 down to 0 do

j j A[i]-l; S[D[j]-1] A[i]-l; S[D[j]-1] A[i]; D[j] A[i]; D[j] D[j]-1 D[j]-1 Example: 13 11 12 13 12 12 Example: 13 11 12 13 12 12 EfficiencyEfficiency


String matchingString matching

patternpattern: a string of : a string of mm characters to search for characters to search for texttext: a (long) string of : a (long) string of nn characters to search in characters to search in

Brute force algorithm:Brute force algorithm:1.1. Align pattern at beginning of textAlign pattern at beginning of text

2.2. moving from left to right, compare each character of moving from left to right, compare each character of pattern to the corresponding character in text untilpattern to the corresponding character in text until– all characters are found to match (successful search); orall characters are found to match (successful search); or

– a mismatch is detecteda mismatch is detected

3.3. while pattern is not found and the text is not yet while pattern is not found and the text is not yet exhausted, realign pattern one position to the right and exhausted, realign pattern one position to the right and repeat step 2.repeat step 2.


String searching - HistoryString searching - History

1970: Cook shows (using finite-state machines) that 1970: Cook shows (using finite-state machines) that problem can be solved in time proportional to problem can be solved in time proportional to n n++mm

1976 Knuth and Pratt find algorithm based on Cook’s idea; 1976 Knuth and Pratt find algorithm based on Cook’s idea; Morris independently discovers same algorithm in attempt Morris independently discovers same algorithm in attempt to avoid “backing up” over textto avoid “backing up” over text

At about the same time Boyer and Moore find an algorithm At about the same time Boyer and Moore find an algorithm that examines only a fraction of the text in most cases (by that examines only a fraction of the text in most cases (by comparing characters in pattern and text from right to left, comparing characters in pattern and text from right to left, instead of left to right)instead of left to right)

1980 Another algorithm proposed by Rabin and Karp 1980 Another algorithm proposed by Rabin and Karp virtually always runs in time proportional to virtually always runs in time proportional to nn++mm and has and has the advantage of extending easily to two-dimensional the advantage of extending easily to two-dimensional pattern matching and being almost as simple as the brute-pattern matching and being almost as simple as the brute-force method.force method.


Horspool’s AlgorithmHorspool’s Algorithm

A simplified version of Boyer-Moore A simplified version of Boyer-Moore algorithm that retains key insights:algorithm that retains key insights:

• compare pattern characters to text from compare pattern characters to text from right to leftright to left

• given a pattern, create a shift table that given a pattern, create a shift table that determines how much to shift the pattern determines how much to shift the pattern when a mismatch occurs (when a mismatch occurs (input input enhancementenhancement))


How far to shift?How far to shift?

Look at first (rightmost) character in text that was compared. Three Look at first (rightmost) character in text that was compared. Three cases:cases:• The character is not in the patternThe character is not in the pattern

.....c...................... .....c...................... ((cc not in pattern) not in pattern)

BAOBABBAOBAB• The character is in the pattern (but not at rightmost position)The character is in the pattern (but not at rightmost position)

.....O...................... .....O...................... ((OO occurs once in pattern) occurs once in pattern)

BAOBABBAOBAB

.....A...................... .....A...................... ((AA occurs twice in pattern) occurs twice in pattern) BAOBABBAOBAB

• The rightmost characters produced a matchThe rightmost characters produced a match .....B...................... .....B...................... BAOBABBAOBAB Shift Table:Shift Table: Stores number of characters to shift by depending on first Stores number of characters to shift by depending on first

character comparedcharacter compared


Shift tableShift table

Constructed by scanning pattern before search beginsConstructed by scanning pattern before search begins

All entries are initialized to length of pattern.All entries are initialized to length of pattern. For For cc occurring in pattern, update table entry to distance occurring in pattern, update table entry to distance

of rightmost occurrence of of rightmost occurrence of cc from end of pattern from end of pattern

Algorithm ShiftTable(P[0..m-1])Algorithm ShiftTable(P[0..m-1])

for ifor i0 to size-1 do Table[i]0 to size-1 do Table[i]mm

for jfor j0 to m-2 do Table[P[j]]0 to m-2 do Table[P[j]]m-1-jm-1-j

return Table return Table


Shift tableShift table

Example for pattern Example for pattern BAOBAB:BAOBAB:

Then:Then:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z


The AlgorithmThe Algorithm

Horspool Matching(P[0..m-1,T[0..n-1]])Horspool Matching(P[0..m-1,T[0..n-1]])ShiftTable(P[0..m-1])ShiftTable(P[0..m-1])i i m-1 m-1while i<=n-1 dowhile i<=n-1 do

kk00while k<=m-1 and P[m-1-k]=T[i-k] dowhile k<=m-1 and P[m-1-k]=T[i-k] do

k k k+1 k+1if k=m return i-m+1if k=m return i-m+1else i else i i+Table[T[i]] i+Table[T[i]]

return -1return -1


Boyer-Moore algorithmBoyer-Moore algorithm

Based on same two ideas:Based on same two ideas:• compare pattern characters to text from right to compare pattern characters to text from right to

leftleft• given a pattern, create a shift table that given a pattern, create a shift table that

determines how much to shift the pattern when determines how much to shift the pattern when a mismatch occurs (a mismatch occurs (input enhancementinput enhancement))

Uses additional shift table with same idea applied Uses additional shift table with same idea applied to the number of matched charactersto the number of matched characters


The bad-symbol shiftThe bad-symbol shift

Based on the Horspool idea of using the extra tableBased on the Horspool idea of using the extra table However, this table is computed differentlyHowever, this table is computed differently

• If c, the text character corresponding to the last pattern If c, the text character corresponding to the last pattern character, is not in the pattern, then shift in the same character, is not in the pattern, then shift in the same way by m characters (actually c is a bad symbol)way by m characters (actually c is a bad symbol)

• If the mismatching character (the bad symbol) does not If the mismatching character (the bad symbol) does not appear in the pattern, then shift to overpass itappear in the pattern, then shift to overpass it

• If the mismathcing character (the bad symbol) appears If the mismathcing character (the bad symbol) appears in the pattern, then shift to align the bad symbol to the in the pattern, then shift to align the bad symbol to the same text character (lying to the left of the mismatching same text character (lying to the left of the mismatching position).position).


The bad-symbol shift - exampleThe bad-symbol shift - example

The bad symbol IS NOT in the patternThe bad symbol IS NOT in the pattern......SSER...................... ER...................... BARBARBBERER BARBERBARBER shift 4 positionsshift 4 positions

The bad symbol IS in the patternThe bad symbol IS in the pattern......AAER...................... ER...................... BBAARRBBERER BARBERBARBER shift 2 positions, shift 2 positions,

This shift is given by: d=max[t1(c)-k,1], where t1 is This shift is given by: d=max[t1(c)-k,1], where t1 is the Horspool table, k the distance between the bad the Horspool table, k the distance between the bad symbol from the end of the patternsymbol from the end of the pattern


The good-suffix shift - exampleThe good-suffix shift - example

What happens if a matched suffix appears What happens if a matched suffix appears again in the pattern (eg. ABagain in the pattern (eg. ABRARACADABCADABRARA))

Important to find another suffix with a Important to find another suffix with a different previous character. Calculate the different previous character. Calculate the shift as the distance between two shift as the distance between two occurrences of the suffix.occurrences of the suffix.

Also, important to find the longest prefix of Also, important to find the longest prefix of size l<k that matches the suffix of size l. size l<k that matches the suffix of size l. Calculate the shift as the distance between Calculate the shift as the distance between the suffix and the prefix.the suffix and the prefix.


The good-suffix shift – example The good-suffix shift – example (2)(2)

KK patternpattern d2d2

11 ABCBAABCBABB 22

22 ABCBABCBABAB 44

33 ABCABCBABBAB 44

44 ABABCBABCBAB 44

55 AABCBABBCBAB 44

KK patternpattern d2d2

11 BAOBABAOBABB 22

22 BAOBBAOBABAB 55

33 BAOBAOBABBAB 55

44 BABAOBABOBAB 55

55 BBAOBABAOBAB 55


Final rule for Boyer-MooreFinal rule for Boyer-Moore

Calculate shift as Calculate shift as d1d1 if k=0if k=0

d = d = max(d1,d2)max(d1,d2) if k>0if k>0

where d1=max(t1(c)-k)where d1=max(t1(c)-k)

ExampleExampleBESS_KNEW_ABOUT_BAOBABSBESS_KNEW_ABOUT_BAOBABSBAOBABBAOBAB

{


HashingHashing

A very efficient method for implementing a A very efficient method for implementing a dictionary, i.e., dictionary, i.e., a set with the operations: a set with the operations:

– insert insert – find find – deletedelete

Applications:Applications:– databasesdatabases– symbol tablessymbol tables


Hash tables and hash Hash tables and hash functionsfunctions

Hash table:Hash table: an array with indices that correspond to an array with indices that correspond to bucketsbuckets

Hash function:Hash function: determines the bucket for each record determines the bucket for each record Example:Example: student records, key=SSN. Hash function: student records, key=SSN. Hash function:

hh((kk) = ) = kk mod mod mm

((k k is a key andis a key and m m is the number of buckets) is the number of buckets)• if if mm=1000, where is record with SSN= 315-17-4251 stored?=1000, where is record with SSN= 315-17-4251 stored?

Hash function must:Hash function must:• be easy to computebe easy to compute

• distribute keys evenly throughout the tabledistribute keys evenly throughout the table


CollisionsCollisions

If If hh((k1k1)) = h = h((kk2) then there is a 2) then there is a collision. collision. Good hash functions result in fewer collisions.Good hash functions result in fewer collisions. Collisions can never be completely eliminated.Collisions can never be completely eliminated. Two types handle collisions differently: Two types handle collisions differently:

• Open hashing - bucket points to linked list of all keys hashing to it.

• Closed hashing – one key per bucket, in case of collision, find another bucket for one of the keys

– linear probing: use next bucket

– double hashing: use second hash function to compute increment


Open hashingOpen hashing

If hash function distributes keys uniformly, If hash function distributes keys uniformly, average length of linked list will be average length of linked list will be n/mn/m

Average number of probes S = 1+Average number of probes S = 1+αα/2, U = /2, U = αα Worst-case is still linear!Worst-case is still linear! Open hashing still works if Open hashing still works if n>mn>m..


Closed hashingClosed hashing Does not work if Does not work if n>m.n>m. Avoids pointers.Avoids pointers. Deletions are Deletions are notnot straightforward. straightforward. Number of probes to insert/find/delete a key depends Number of probes to insert/find/delete a key depends

on load factor on load factor αα = = nn//m m (hash table density) (hash table density) – successful search: (½) (1+ 1/(1- successful search: (½) (1+ 1/(1- αα))))

– unsuccessful search: (½) (1+ 1/(1- unsuccessful search: (½) (1+ 1/(1- αα)²))²)

As the table gets filled (As the table gets filled (αα approaches 1), number of approaches 1), number of probes increases dramatically: probes increases dramatically:

Documents

Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off (extra space in tables - breathing