25
Boyer Moore Searches Boyer Moore Searches on Binary Texts on Binary Texts Shmuel Tomi Klein Shmuel Tomi Klein Miri Kopel Ben-Nissan Miri Kopel Ben-Nissan Bar Ilan University, ISRAEL Bar Ilan University, ISRAEL Accelerating Accelerating

Boyer Moore Searches on Binary Texts

  • Upload
    jered

  • View
    34

  • Download
    2

Embed Size (px)

DESCRIPTION

Accelerating. Boyer Moore Searches on Binary Texts. Shmuel Tomi Klein Miri Kopel Ben-Nissan Bar Ilan University, ISRAEL. Background and motivation. Boyer Moore algorithm. New binary variant. Analysis. Experiments. Summary. Outline. Background and motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Boyer Moore Searches  on Binary Texts

Boyer Moore Searches Boyer Moore Searches

on Binary Textson Binary TextsShmuel Tomi Klein Shmuel Tomi Klein Miri Kopel Ben-NissanMiri Kopel Ben-Nissan

Bar Ilan University, ISRAELBar Ilan University, ISRAEL

AcceleratingAccelerating

Page 2: Boyer Moore Searches  on Binary Texts

Outline

Background and motivationBoyer Moore algorithm

Analysis

Experiments

New binary variant

Summary

Background and motivationBoyer Moore algorithm

New binary variant

Analysis

Experiments

Summary

Page 3: Boyer Moore Searches  on Binary Texts

Important application of Automata:

PATTERN MATCHING

KMP BDM BM

Boyer & Moore

this-is-a-sample-text---

pattern

Match Backwards ! !

Page 4: Boyer Moore Searches  on Binary Texts

Mismatch – case 1: Mismatch – case 1: deltadelta11

ub

ua

b does not occur in x

x

y

contains no bcontains no bx

shift

Boyer – Moore Algorithm

Page 5: Boyer Moore Searches  on Binary Texts

ub

uax

y

contains no bcontains no bbx

shift

b occurs in x

Mismatch – case 2: Mismatch – case 2: deltadelta11

Boyer – Moore Algorithm

Page 6: Boyer Moore Searches  on Binary Texts

ub

uax

y

ucx

shift

Mismatch – case 3: Mismatch – case 3: deltadelta22

u reoccurs in x preceded by c ≠ a

Boyer – Moore Algorithm

Page 7: Boyer Moore Searches  on Binary Texts

ub

uax

y

vx

v shift

Mismatch – case 4: Mismatch – case 4: deltadelta22

Only a suffix v of u reoccurs in x

Boyer – Moore Algorithm

Page 8: Boyer Moore Searches  on Binary Texts

Boyer – Moore Example

aaeellmmppxxresrestt

44001133225577

eexxaammppllee

12121111101099887711

example

deltadelta11

deltadelta22

here ihere iss a simple example a simple example

exampleexamplehere is a simhere is a simpple examplele example

exampleexamplehere is a shere is a siimplemple example example

exaexamplemplehere is a simple examhere is a simple exampplele

exampleexamplehere is a simple here is a simple exampleexample

exampleexample

Page 9: Boyer Moore Searches  on Binary Texts

Problems of Binary Boyer & Moore

deltadelta1 1 uselessuseless

most work bymost work by delta delta11

0100101101011101000100110101001

1101100

this-is-a-sample-text---

pattern

Bit-level processing

Page 10: Boyer Moore Searches  on Binary Texts

Need for Binary Boyer & Moore

Compressed Matching

Given E(T) and P look for E(P) in E(T)

rather than P in D(E(T))

Suggested Solution:

BBBMM Blocked Binary Boyer Moore

Matching

Page 11: Boyer Moore Searches  on Binary Texts

k

shsl

BBBMM

Text [ i ]

Pat [ sh , j ]

Page 12: Boyer Moore Searches  on Binary Texts

ffghabdgttiocbsbgghj

0110001001101010

BBBMM

More information in binary case

ASCII

BINARY

Page 13: Boyer Moore Searches  on Binary Texts

BBBMM

101

101

i i + 1i – 1

T

P

101

100

extended extended delta delta11

01

ksl 1slB 20

mBsldelta ],[1

Page 14: Boyer Moore Searches  on Binary Texts

BBBMM

Total size of delta1 tables:

2221

1 k

sl

ksl

If too large, use limit value kK

T

P

sl k

K

Size of delta1 tables reduced to

12 K

Page 15: Boyer Moore Searches  on Binary Texts

BBBMM

Original delta1 : increase of text pointer BBBMM delta1 : shift size

T

P

Mismatch not in last block

Correct[sh,j]

Page 16: Boyer Moore Searches  on Binary Texts

BBBMM

T

P

deltadelta22

][2 matchlenmdelta

jj11223344556677889910

11

12

13

14

15

16

Pat[Pat[jj]]11001100110011001111110011110011deltadelta22[[jj

]]1133

1133

1133

1133

1133

1133

1133

1133

1133

1133

1133

33771155

2211

Page 17: Boyer Moore Searches  on Binary Texts

AnalysisAssumption : random input

Reasonable for compressed text

Expected # comparisons till mismatch:

Bit-wise:

221

m

j

jj

Blocked:

kk

k

sl

km

t

sltk 112

11

1

/

1

)(

Page 18: Boyer Moore Searches  on Binary Texts

AnalysisExpected # bits shifted after mismatch:

Bit-wise: M

Blocked: M’

mmME jm

j

j log),2min(2)(1

MM '

Page 19: Boyer Moore Searches  on Binary Texts

Experiments

English Bible (2.5MB) World Factbook (1.5MB)

Text: Huffman encoded

Patterns: Random substrings

of lengths 10 to 500

k = 8

Page 20: Boyer Moore Searches  on Binary Texts

Experiments:Average # comparisons between shiftsAverage # comparisons between shifts

Bit-wiseBlocked

100 200 300 400 500

1.1

1.2

1.3

1.4

1.5

length of pattern

Page 21: Boyer Moore Searches  on Binary Texts

Experiments:Average size of shiftsAverage size of shifts

Bit-wise

100 200 300 400 500

20

40

60

80

100

length of pattern

Blocked

Page 22: Boyer Moore Searches  on Binary Texts

Experiments:Average # comparisons for 1000 bitsAverage # comparisons for 1000 bits

100 200 300 400 500

100

200

300

400

500

length of pattern

Bit-wise

Blocked

BDM

Page 23: Boyer Moore Searches  on Binary Texts

Experiments:Time to locate first occurrence (ms)Time to locate first occurrence (ms)

100 200 300 400 500

50

100

150

200

250

length of pattern

300

Bit-wise

Blocked

BDMTurbo-BDM

Page 24: Boyer Moore Searches  on Binary Texts

Summary

Blocked variant of BMBlocked variant of BM

Faster than alternatives, Overhead 1-10 KFaster than alternatives, Overhead 1-10 K

Extensions:Extensions:

ASCII, words instead of characters

Page 25: Boyer Moore Searches  on Binary Texts

Thank you Thank you !!