14
Parametrized Matching Amir, Farach, Muthukrishnan Orgad Keller

Parametrized Matching Amir, Farach, Muthukrishnan Orgad Keller

Embed Size (px)

Citation preview

Parametrized MatchingAmir, Farach, Muthukrishnan

Orgad Keller

Orgad Keller - Algorithms 2 - Recitation 9 2

Definition: Two strings over the alphabet , parametrized match (p-match) if the following 3 conditions apply :

Parametrized Match Relation

, ,S S S S n ( )

1 i n

Orgad Keller - Algorithms 2 - Recitation 9 3

Conditions

i iS S

i i i iS S S S

1 ,i i

i j i j

S S

j n S S S S

xSxS

xSyS

x

y

z

w

Orgad Keller - Algorithms 2 - Recitation 9 4

We can see it as a bijection :

Example

, , , , , ,a b c x y z w

S b a x x y x b b z y z x b y z

S b a y y x y b b w x w y b x w

( ) ( )

( ) ( )

f x y f z w

f y x f w z

:f

Orgad Keller - Algorithms 2 - Recitation 9 5

Parametrized Matching

1 1... , ... , ,n mT t t P p p Input: Output: All locations where

p-matches . i 1...i i mt t

P

Orgad Keller - Algorithms 2 - Recitation 9 6

We can reduce the problem, to the same problem with (m-match).

Given we’ll define :

Observation

where ,

i i i ii i

i i

i i i ii i

i i

t t t tt t

a t b t

p p p pp p

a p b p

a b

,T P , , ,T T P P

Orgad Keller - Algorithms 2 - Recitation 9 7

Now is over and is over and .

We get the algorithm for p-match:Create Find all the places appears in (using

KMP)Find all the places m-matches in

(We’ll show later how) Return

Observation

P

,T P a ,T P b

T

P T 1L

2L1 2L L

, , ,T T P P

Orgad Keller - Algorithms 2 - Recitation 9 8

Why is that enough? In other words: Prove there is a p-match at

location iff . We are left with the question: How do we

solve step 3 efficiently?

Exercise

i 1 2i L L

Orgad Keller - Algorithms 2 - Recitation 9 9

Is m-match transitive? We can use KMP-like automaton method For each index in pattern, we want to find the

longest suffix that m-matches the prefix. For instance:

M-match

, , ,P ababcbca a b c

a ab b bc c a

Orgad Keller - Algorithms 2 - Recitation 9 10

Failure Links

Where to link the failure link from ? In KMP it is simple: If then link to .

Otherwise go back again and repeat. In our case:

If never appeared before, i.e. We link if .

Otherwise, we link if such that , it holds that .

ip 1ip jp j i

?

i jp p j

jp 1 1,...,j jp p p1 1,...,i i i jp p p

1 l j j j lp p

i i lp p

i

Orgad Keller - Algorithms 2 - Recitation 9 11

Failure Links

Can we do this efficiently? We’ll build an array :

So, if , we know hasn’t appeared before. Otherwise, we’ll know exactly where it had appeared last.

ip 1ip jp j i

?

[1],..., [ ]A A m

1 1

1 1

,...,[ ]

, ,...,i i

i k i k i

i p p pA i

k p p p p p

[ ]A j j jp

Orgad Keller - Algorithms 2 - Recitation 9 12

Building the Array

A

We’ll hold a Balanced Binary Search Tree for the symbols of the alphabet. Initially it will be empty.

We’ll go over the pattern. For each symbol, if it isn’t in the tree, we’ll add it with it’s index and update . Otherwise, we know exactly where it had last appeared, so we’ll update and then update the symbol in the tree with the new index.

Time: where .

A

( log )O m min ,m

Orgad Keller - Algorithms 2 - Recitation 9 13

The Matching Itself

We go forward in the automaton if either and .

We’ll hold and update a balanced BST as we go over the text as well. Time:

So overall algorithm time is Can we improve this further?

log min ,O n n

1 1,...,j jp p p 1 1,...,i i i jt t t 1 s.t. ,j j l i i ll j p p t t

log min ,O n n

Orgad Keller - Algorithms 2 - Recitation 9 14

The Trick

We’ll split the text into overlapping segments of size like this:

So every match in the text must appear in whole in one of the segments.

We’ll run the algorithm for each such segment. Time: where .

Overall for all segments:

n

m

n

m2m

2m 2m 2m 2m 2m 2m

2m2m 2m 2m 2m 2m

2m

( log )O m ( log ) ( log )n

mO m O n

min ,m