View
220
Download
0
Embed Size (px)
Citation preview
Design and Analysis of Algorithms - Chapter 7 1
Space-time tradeoffsSpace-time tradeoffs
For many problems some extra space really pays off For many problems some extra space really pays off (extra space in tables - breathing room)(extra space in tables - breathing room)
input enhancementinput enhancement• non comparison-based sortingnon comparison-based sorting• auxiliary tables (shift tables for pattern matching)auxiliary tables (shift tables for pattern matching)
prestructuringprestructuring• hashinghashing• indexing schemes (eg, B-trees)indexing schemes (eg, B-trees)
tables of information that do all the worktables of information that do all the work• dynamic programming dynamic programming
Design and Analysis of Algorithms - Chapter 7 2
Sorting by CountingSorting by Counting
Algorithm ComparisonCountingSort(A[0..n-1])Algorithm ComparisonCountingSort(A[0..n-1])for i for i 0 to n-1 do Count[i] 0 to n-1 do Count[i]00for i for i 0 to n-2 do 0 to n-2 do
for j for j i+1 to n-1 do i+1 to n-1 do if A[i] <A[j] then Count[j] if A[i] <A[j] then Count[j] Count[j]+1 Count[j]+1else Count[i] else Count[i] Count[i]+1 Count[i]+1
for i for i 0 to n-1 do S[Count[i]] 0 to n-1 do S[Count[i]] A[i] A[i] Example: 62 31 84 96 19 47 Example: 62 31 84 96 19 47 EfficiencyEfficiency
Design and Analysis of Algorithms - Chapter 7 3
Sorting by Counting (2)Sorting by Counting (2)
Algorithm DistributionCountingSort(A[0..n-1])Algorithm DistributionCountingSort(A[0..n-1])
for j for j 0 to u-l do D[j] 0 to u-l do D[j]00
for i for i 0 to n-1 do D[A[i]-l] 0 to n-1 do D[A[i]-l] D[A[i]-l] + 1 D[A[i]-l] + 1
for j for j 1 to u-l do D[j] 1 to u-l do D[j] D[j-1]+D[j] D[j-1]+D[j]
for i for i n-1 down to 0 do n-1 down to 0 do
j j A[i]-l; S[D[j]-1] A[i]-l; S[D[j]-1] A[i]; D[j] A[i]; D[j] D[j]-1 D[j]-1 Example: 13 11 12 13 12 12 Example: 13 11 12 13 12 12 EfficiencyEfficiency
Design and Analysis of Algorithms - Chapter 7 4
String matchingString matching
patternpattern: a string of : a string of mm characters to search for characters to search for texttext: a (long) string of : a (long) string of nn characters to search in characters to search in
Brute force algorithm:Brute force algorithm:1.1. Align pattern at beginning of textAlign pattern at beginning of text
2.2. moving from left to right, compare each character of moving from left to right, compare each character of pattern to the corresponding character in text untilpattern to the corresponding character in text until– all characters are found to match (successful search); orall characters are found to match (successful search); or
– a mismatch is detecteda mismatch is detected
3.3. while pattern is not found and the text is not yet while pattern is not found and the text is not yet exhausted, realign pattern one position to the right and exhausted, realign pattern one position to the right and repeat step 2.repeat step 2.
Design and Analysis of Algorithms - Chapter 7 5
String searching - HistoryString searching - History
1970: Cook shows (using finite-state machines) that 1970: Cook shows (using finite-state machines) that problem can be solved in time proportional to problem can be solved in time proportional to n n++mm
1976 Knuth and Pratt find algorithm based on Cook’s idea; 1976 Knuth and Pratt find algorithm based on Cook’s idea; Morris independently discovers same algorithm in attempt Morris independently discovers same algorithm in attempt to avoid “backing up” over textto avoid “backing up” over text
At about the same time Boyer and Moore find an algorithm At about the same time Boyer and Moore find an algorithm that examines only a fraction of the text in most cases (by that examines only a fraction of the text in most cases (by comparing characters in pattern and text from right to left, comparing characters in pattern and text from right to left, instead of left to right)instead of left to right)
1980 Another algorithm proposed by Rabin and Karp 1980 Another algorithm proposed by Rabin and Karp virtually always runs in time proportional to virtually always runs in time proportional to nn++mm and has and has the advantage of extending easily to two-dimensional the advantage of extending easily to two-dimensional pattern matching and being almost as simple as the brute-pattern matching and being almost as simple as the brute-force method.force method.
Design and Analysis of Algorithms - Chapter 7 6
Horspool’s AlgorithmHorspool’s Algorithm
A simplified version of Boyer-Moore A simplified version of Boyer-Moore algorithm that retains key insights:algorithm that retains key insights:
• compare pattern characters to text from compare pattern characters to text from right to leftright to left
• given a pattern, create a shift table that given a pattern, create a shift table that determines how much to shift the pattern determines how much to shift the pattern when a mismatch occurs (when a mismatch occurs (input input enhancementenhancement))
Design and Analysis of Algorithms - Chapter 7 7
How far to shift?How far to shift?
Look at first (rightmost) character in text that was compared. Three Look at first (rightmost) character in text that was compared. Three cases:cases:• The character is not in the patternThe character is not in the pattern
.....c...................... .....c...................... ((cc not in pattern) not in pattern)
BAOBABBAOBAB• The character is in the pattern (but not at rightmost position)The character is in the pattern (but not at rightmost position)
.....O...................... .....O...................... ((OO occurs once in pattern) occurs once in pattern)
BAOBABBAOBAB
.....A...................... .....A...................... ((AA occurs twice in pattern) occurs twice in pattern) BAOBABBAOBAB
• The rightmost characters produced a matchThe rightmost characters produced a match .....B...................... .....B...................... BAOBABBAOBAB Shift Table:Shift Table: Stores number of characters to shift by depending on first Stores number of characters to shift by depending on first
character comparedcharacter compared
Design and Analysis of Algorithms - Chapter 7 8
Shift tableShift table
Constructed by scanning pattern before search beginsConstructed by scanning pattern before search begins
All entries are initialized to length of pattern.All entries are initialized to length of pattern. For For cc occurring in pattern, update table entry to distance occurring in pattern, update table entry to distance
of rightmost occurrence of of rightmost occurrence of cc from end of pattern from end of pattern
Algorithm ShiftTable(P[0..m-1])Algorithm ShiftTable(P[0..m-1])
for ifor i0 to size-1 do Table[i]0 to size-1 do Table[i]mm
for jfor j0 to m-2 do Table[P[j]]0 to m-2 do Table[P[j]]m-1-jm-1-j
return Table return Table
Design and Analysis of Algorithms - Chapter 7 9
Shift tableShift table
Example for pattern Example for pattern BAOBAB:BAOBAB:
Then:Then:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Design and Analysis of Algorithms - Chapter 7 10
The AlgorithmThe Algorithm
Horspool Matching(P[0..m-1,T[0..n-1]])Horspool Matching(P[0..m-1,T[0..n-1]])ShiftTable(P[0..m-1])ShiftTable(P[0..m-1])i i m-1 m-1while i<=n-1 dowhile i<=n-1 do
kk00while k<=m-1 and P[m-1-k]=T[i-k] dowhile k<=m-1 and P[m-1-k]=T[i-k] do
k k k+1 k+1if k=m return i-m+1if k=m return i-m+1else i else i i+Table[T[i]] i+Table[T[i]]
return -1return -1
Design and Analysis of Algorithms - Chapter 7 11
Boyer-Moore algorithmBoyer-Moore algorithm
Based on same two ideas:Based on same two ideas:• compare pattern characters to text from right to compare pattern characters to text from right to
leftleft• given a pattern, create a shift table that given a pattern, create a shift table that
determines how much to shift the pattern when determines how much to shift the pattern when a mismatch occurs (a mismatch occurs (input enhancementinput enhancement))
Uses additional shift table with same idea applied Uses additional shift table with same idea applied to the number of matched charactersto the number of matched characters
Design and Analysis of Algorithms - Chapter 7 12
The bad-symbol shiftThe bad-symbol shift
Based on the Horspool idea of using the extra tableBased on the Horspool idea of using the extra table However, this table is computed differentlyHowever, this table is computed differently
• If c, the text character corresponding to the last pattern If c, the text character corresponding to the last pattern character, is not in the pattern, then shift in the same character, is not in the pattern, then shift in the same way by m characters (actually c is a bad symbol)way by m characters (actually c is a bad symbol)
• If the mismatching character (the bad symbol) does not If the mismatching character (the bad symbol) does not appear in the pattern, then shift to overpass itappear in the pattern, then shift to overpass it
• If the mismathcing character (the bad symbol) appears If the mismathcing character (the bad symbol) appears in the pattern, then shift to align the bad symbol to the in the pattern, then shift to align the bad symbol to the same text character (lying to the left of the mismatching same text character (lying to the left of the mismatching position).position).
Design and Analysis of Algorithms - Chapter 7 13
The bad-symbol shift - exampleThe bad-symbol shift - example
The bad symbol IS NOT in the patternThe bad symbol IS NOT in the pattern......SSER...................... ER...................... BARBARBBERER BARBERBARBER shift 4 positionsshift 4 positions
The bad symbol IS in the patternThe bad symbol IS in the pattern......AAER...................... ER...................... BBAARRBBERER BARBERBARBER shift 2 positions, shift 2 positions,
This shift is given by: d=max[t1(c)-k,1], where t1 is This shift is given by: d=max[t1(c)-k,1], where t1 is the Horspool table, k the distance between the bad the Horspool table, k the distance between the bad symbol from the end of the patternsymbol from the end of the pattern
Design and Analysis of Algorithms - Chapter 7 14
The good-suffix shift - exampleThe good-suffix shift - example
What happens if a matched suffix appears What happens if a matched suffix appears again in the pattern (eg. ABagain in the pattern (eg. ABRARACADABCADABRARA))
Important to find another suffix with a Important to find another suffix with a different previous character. Calculate the different previous character. Calculate the shift as the distance between two shift as the distance between two occurrences of the suffix.occurrences of the suffix.
Also, important to find the longest prefix of Also, important to find the longest prefix of size l<k that matches the suffix of size l. size l<k that matches the suffix of size l. Calculate the shift as the distance between Calculate the shift as the distance between the suffix and the prefix.the suffix and the prefix.
Design and Analysis of Algorithms - Chapter 7 15
The good-suffix shift – example The good-suffix shift – example (2)(2)
KK patternpattern d2d2
11 ABCBAABCBABB 22
22 ABCBABCBABAB 44
33 ABCABCBABBAB 44
44 ABABCBABCBAB 44
55 AABCBABBCBAB 44
KK patternpattern d2d2
11 BAOBABAOBABB 22
22 BAOBBAOBABAB 55
33 BAOBAOBABBAB 55
44 BABAOBABOBAB 55
55 BBAOBABAOBAB 55
Design and Analysis of Algorithms - Chapter 7 16
Final rule for Boyer-MooreFinal rule for Boyer-Moore
Calculate shift as Calculate shift as d1d1 if k=0if k=0
d = d = max(d1,d2)max(d1,d2) if k>0if k>0
where d1=max(t1(c)-k)where d1=max(t1(c)-k)
ExampleExampleBESS_KNEW_ABOUT_BAOBABSBESS_KNEW_ABOUT_BAOBABSBAOBABBAOBAB
{
Design and Analysis of Algorithms - Chapter 7 17
HashingHashing
A very efficient method for implementing a A very efficient method for implementing a dictionary, i.e., dictionary, i.e., a set with the operations: a set with the operations:
– insert insert – find find – deletedelete
Applications:Applications:– databasesdatabases– symbol tablessymbol tables
Design and Analysis of Algorithms - Chapter 7 18
Hash tables and hash Hash tables and hash functionsfunctions
Hash table:Hash table: an array with indices that correspond to an array with indices that correspond to bucketsbuckets
Hash function:Hash function: determines the bucket for each record determines the bucket for each record Example:Example: student records, key=SSN. Hash function: student records, key=SSN. Hash function:
hh((kk) = ) = kk mod mod mm
((k k is a key andis a key and m m is the number of buckets) is the number of buckets)• if if mm=1000, where is record with SSN= 315-17-4251 stored?=1000, where is record with SSN= 315-17-4251 stored?
Hash function must:Hash function must:• be easy to computebe easy to compute
• distribute keys evenly throughout the tabledistribute keys evenly throughout the table
Design and Analysis of Algorithms - Chapter 7 19
CollisionsCollisions
If If hh((k1k1)) = h = h((kk2) then there is a 2) then there is a collision. collision. Good hash functions result in fewer collisions.Good hash functions result in fewer collisions. Collisions can never be completely eliminated.Collisions can never be completely eliminated. Two types handle collisions differently: Two types handle collisions differently:
• Open hashing - bucket points to linked list of all keys hashing to it.
• Closed hashing – one key per bucket, in case of collision, find another bucket for one of the keys
– linear probing: use next bucket
– double hashing: use second hash function to compute increment
Design and Analysis of Algorithms - Chapter 7 20
Open hashingOpen hashing
If hash function distributes keys uniformly, If hash function distributes keys uniformly, average length of linked list will be average length of linked list will be n/mn/m
Average number of probes S = 1+Average number of probes S = 1+αα/2, U = /2, U = αα Worst-case is still linear!Worst-case is still linear! Open hashing still works if Open hashing still works if n>mn>m..
Design and Analysis of Algorithms - Chapter 7 21
Closed hashingClosed hashing Does not work if Does not work if n>m.n>m. Avoids pointers.Avoids pointers. Deletions are Deletions are notnot straightforward. straightforward. Number of probes to insert/find/delete a key depends Number of probes to insert/find/delete a key depends
on load factor on load factor αα = = nn//m m (hash table density) (hash table density) – successful search: (½) (1+ 1/(1- successful search: (½) (1+ 1/(1- αα))))
– unsuccessful search: (½) (1+ 1/(1- unsuccessful search: (½) (1+ 1/(1- αα)²))²)
As the table gets filled (As the table gets filled (αα approaches 1), number of approaches 1), number of probes increases dramatically: probes increases dramatically: