11
The Rabin-Karp Algorithm String Matching Jonathan M. Elchison 19 November 2004 CS-3410 Algorithms Dr. Shomper

Rabin Karp Matching

Embed Size (px)

DESCRIPTION

Rabin

Citation preview

Page 1: Rabin Karp Matching

The Rabin-Karp AlgorithmString Matching

Jonathan M. Elchison

19 November 2004

CS-3410 Algorithms

Dr. Shomper

Page 2: Rabin Karp Matching

Background

• String matching

• Naïve method• n ≡ size of input string• m ≡ size of pattern to be matched• O( (n-m+1)m )

• Θ( n2 ) if m = floor( n/2 )

• We can do better

Page 3: Rabin Karp Matching

How it works

• Consider a hashing scheme• Each symbol in alphabet Σ can be represented by

an ordinal value { 0, 1, 2, ..., d }• |Σ| = d• “Radix-d digits”

Page 4: Rabin Karp Matching

How it works

• Hash pattern P into a numeric value• Let a string be represented by the sum of these

digits• Horner’s rule (§ 30.1)

• Example• { A, B, C, ..., Z } → { 0, 1, 2, ..., 26 }• BAN → 1 + 0 + 13 = 14• CARD → 2 + 0 + 17 + 3 = 22

Page 5: Rabin Karp Matching

Upper limits

• Problem• For long patterns, or for large alphabets, the number

representing a given string may be too large to be practical

• Solution• Use MOD operation• When MOD q, values will be < q

• Example• BAN = 1 + 0 + 13 = 14

• 14 mod 13 = 1• BAN → 1

• CARD = 2 + 0 + 17 + 3 = 22• 22 mod 13 = 9• CARD → 9

Page 6: Rabin Karp Matching

Searching

Page 7: Rabin Karp Matching

Spurious Hits

• Question• Does a hash value match mean that the patterns match?

• Answer• No – these are called “spurious hits”

• Possible cases• MOD operation interfered with uniqueness of hash values

• 14 mod 13 = 1• 27 mod 13 = 1• MOD value q is usually chosen as a prime such that 10q just fits

within 1 computer word

• Information is lost in generalization (addition)• BAN → 1 + 0 + 13 = 14• CAM → 2 + 0 + 12 = 14

Page 8: Rabin Karp Matching

CodeRABIN-KARP-MATCHER( T, P, d, q )

n ← length[ T ]m ← length[ P ]h ← dm-1 mod qp ← 0t0 ← 0for i ← 1 to m ► Preprocessing

do p ← ( d*p + P[ i ] ) mod qt0 ← ( d*t0 + T[ i ] ) mod q

for s ← 0 to n – m ► Matchingdo if p = ts

then if P[ 1..m ] = T[ s+1 .. s+m ]then print “Pattern occurs with shift” s

if s < n – mthen ts+1 ← ( d * ( ts – T[ s + 1 ] * h ) + T[ s + m + 1 ] ) mod q

Page 9: Rabin Karp Matching

Performance

• Preprocessing (determining each pattern hash)• Θ( m )

• Worst case running time• Θ( (n-m+1)m )• No better than naïve method

• Expected case• If we assume the number of hits is constant

compared to n, we expect O( n )• Only pattern-match “hits” – not all shifts

Page 10: Rabin Karp Matching

Demonstration

• http://www-igm.univ-mlv.fr/~lecroq/string/node5.html

Page 11: Rabin Karp Matching

The Rabin-Karp AlgorithmString Matching

Jonathan M. Elchison

19 November 2004

CS-3410 Algorithms

Dr. Shomper

Sources:• Cormen, Thomas S., et al. Introduction to Algorithms. 2nd ed. Boston: MIT Press, 2001.• Karp-Rabin algorithm. 15 Jan 1997. <http://www-igm.univ-mlv.fr/~lecroq/string/node5.html>.• Shomper, Keith. “Rabin-Karp Animation.” E-mail to Jonathan Elchison. 12 Nov 2004.