Upload
siva-agora-karthikeyan
View
236
Download
0
Embed Size (px)
Citation preview
7/27/2019 String Matching Problem.ppt
http://slidepdf.com/reader/full/string-matching-problemppt 1/16
String Matching Problem
• Given a text string T of length n and a pattern string P of length m, the exact stringmatching problem is to find all occurrences
of P in T .• Example: T=“AGCTTGA” P=“GCT”
• Applications:
– Searching keywords in a file – Searching engines (like Google and Openfind)
– Database searching (GenBank)
7/27/2019 String Matching Problem.ppt
http://slidepdf.com/reader/full/string-matching-problemppt 2/16
• Problem/issue
Finding occurrence of a pattern (string)
„P‟ in String „S‟ and also finding theposition in „S‟ where the pattern match
occurs
What is pattern matching?
7/27/2019 String Matching Problem.ppt
http://slidepdf.com/reader/full/string-matching-problemppt 3/16
Brute Force algorithm
The brute-force pattern matching algorithm comparesthe pattern P with the text T for each possible shift of P
relative to T ,
*until either a match is found, or
*all placements of the pattern have been tried
7/27/2019 String Matching Problem.ppt
http://slidepdf.com/reader/full/string-matching-problemppt 4/16
Brute-force
• Worst O(m*n)
• Best O(n)
algorithm brute-force:
input: an array of characters, T (the string to be analyzed) , length n
an array of characters, P (the pattern to be searched for), length m
for i := 0 to n-m do
for j := 0 to m-1 do
compare T[j] with P[i+j]if not equal, exit the inner loop
7/27/2019 String Matching Problem.ppt
http://slidepdf.com/reader/full/string-matching-problemppt 5/16
Compare each character of P with S if
match continue else shift one position
String S
a
b
a
a
a
b
c
a
b
a
a
b
c
a
b
a
c
Pattern p
Example
7/27/2019 String Matching Problem.ppt
http://slidepdf.com/reader/full/string-matching-problemppt 6/16
Step 1:compare p[1] with S[1]
a
b
c
a
b
a
a
b
c
a
b
a
c
a
b
a
a
Step 2: compare p[2] with S[2]
a
b
c
a
b
a
a
b
c
a
b
a
c
a
b
a
a
S
S
p
p
7/27/2019 String Matching Problem.ppt
http://slidepdf.com/reader/full/string-matching-problemppt 7/16
Step 3: compare p[3] with S[3]
S a b c a b a a b c a b a c
a b a ap
Mismatch occurs here..
“Since mismatch is detected, shift ‘P’ one position to the Right andperform steps analogous to those from step 1 to step 3. At position where mismatch is detected, shift ‘P’ one position to the right andrepeat matching procedure. “
7/27/2019 String Matching Problem.ppt
http://slidepdf.com/reader/full/string-matching-problemppt 8/16
The Knuth-Morris-Pratt Algorithm
Knuth, Morris and Pratt proposed a linear time algorithm for the string matchingproblem.
A matching time of O(n) is achieved byavoiding comparisons with elements of „S‟ that have previously been involved incomparison with some element of thepattern „p‟ to be matched. i.e.,backtracking on the string „S‟ never occurs
7/27/2019 String Matching Problem.ppt
http://slidepdf.com/reader/full/string-matching-problemppt 9/16
Components of KMP algorithm
• The prefix function, Π
The prefix function,Π for a pattern encapsulatesknowledge about how the pattern matches against shiftsof itself. This information can be used to avoid uselessshifts of the pattern „p‟. In other words, this enablesavoiding backtracking on the string „S‟.
• The KMP Matcher
With string „S‟, pattern „p‟ and prefix function „Π‟ asinputs, finds the occurrence of „p‟ in „S‟ and returns the
number of shifts of „p‟ after which occurrence is found.
7/27/2019 String Matching Problem.ppt
http://slidepdf.com/reader/full/string-matching-problemppt 10/16
• Knuth-Morris-Pratt algorithm-Algorithm
Compute-Prefix-Function(P )
1. m length[T ]
2. [1] 0
3. k 0
4. for q 2 to m 5. do while k > 0 and P [k + 1] P [q]
6. do k [k ] /*if k = 0 or P [k + 1]
= P [q],
7. if P [k + 1] = P [q] going out of thewhile-loop.*/
8. then k k + 1
9. [q] k
10.return
7/27/2019 String Matching Problem.ppt
http://slidepdf.com/reader/full/string-matching-problemppt 11/16
• Knuth-Morris-Pratt algorithm-Algorithm
KMP-Matcher(T , P )
1. n length[T ]
2. m length[P ]
3. Compute-Prefix-Function(P )
4. q 05. for i 1 to n
6. do while q > 0 and P [q + 1] T [i ]
7. do q [q]
8. if P [q + 1] = T [i ] 9. then q q + 1
10. if q = m
11. then print “pattern occurs with shift” i – m
12. q [q]
7/27/2019 String Matching Problem.ppt
http://slidepdf.com/reader/full/string-matching-problemppt 12/16
Compute prefix function
P = ababababca, T = ababaababababca[1] = 0
k = 0
q = 2, P [k + 1] = P [1] = a, P [q] = P [2] = b, P [k + 1] P [q]
[q] k ([2] 0)q = 3, P [k + 1] = P [1] = a, P [q] = P [3] = a, P [k + 1] = P [q]
k k + 1, [q] k ([3] 1)
k = 1
q = 4, P [k + 1] = P [2] = b, P [q] = P [4] = b, P [k + 1] = P [q]
k k + 1, [q] k ([4] 2)
7/27/2019 String Matching Problem.ppt
http://slidepdf.com/reader/full/string-matching-problemppt 13/16
k = 2
q = 5, P [k + 1] = P [3] = a, P [q] = P [5] = a, P [k + 1] = P [q]
k k + 1, [q] k ([5] 3)k = 3
q = 6, P [k + 1] = P [4] = b, P [q] = P [6] = b, P [k + 1] = P [q]
k k + 1, [q] k ([6] 4)
k = 4 q = 7, P [k + 1] = P [5] = a, P [q] = P [7] = a, P [k + 1] = P [q]
k k + 1, [q] k ([7] 5)
k = 5
q = 8, P [k + 1] = P [6] = b, P [q] = P [8] = b, P [k + 1] = P [q]
k k + 1, [q] k ([8] 6)
7/27/2019 String Matching Problem.ppt
http://slidepdf.com/reader/full/string-matching-problemppt 14/16
k = 6
q = 9, P [k + 1] = P [6] = b, P [q] = P [9] = c, P [k + 1] P [q]k [k ] (k [6] = 4)
P [k + 1] = P [5] = a, P [q] = P [9] = c, P [k + 1] P [q]
k [k ] (k [4] = 2)
P [k + 1] = P [3] = a, P [q] = P [9] = c, P [k + 1] P [q]
k [k ] (k [2] = 0)
k = 0
q = 9, P [k + 1] = P [1] = a, P [q] = P [9] = c, P [k + 1] P [q]
[q] k ([9] 0)
q = 10, P [k + 1] = P [1] = a, P [q] = P [10] = a, P [k + 1] = P [q]k k + 1, [q] k ([10] 1)
7/27/2019 String Matching Problem.ppt
http://slidepdf.com/reader/full/string-matching-problemppt 15/16
After prefix computation, the table is shown below
P = ababababca
1 2 3 4 5 6 7 8 9 10
a b a b a b a b c a0 0 1 2 3 4 5 6 0 1
i
P [i] [i]
a b a b a b a b c a
a b a b a b
a b a b
a b
a b c a
a b a b c a
a b a b a b c a
a b a b a b a b c a
P 8
P 6
P 4
P 2
P 0
[8] = 6
[6] = 4
[4] = 2
[2] = 0
7/27/2019 String Matching Problem.ppt
http://slidepdf.com/reader/full/string-matching-problemppt 16/16
Another Example for KMP Algorithm
Phase 1
Phase 2
f (4 – 1)+1= f (3)+1=0+1=1
f (13-1)+1= 4+1=5matched
First finish the prefix
computation
Next, Search phase computation