Faster 2-Dimensional Scaled Matching Amihood Amir and Eran Chencinski

Preview:

Citation preview

Faster 2-Dimensional Faster 2-Dimensional Scaled MatchingScaled Matching

Amihood Amir and Eran Chencinski

Real ScalingReal Scaling

Given an n x n Text T, m x m pattern P, find all Given an n x n Text T, m x m pattern P, find all occurrences of P in T, scaled to any read scaleoccurrences of P in T, scaled to any read scale

Best known algorithm [Amir at el.]:Best known algorithm [Amir at el.]: Time:Time: O(nm O(nm33+n+n22m*log(m)) m*log(m)) Space:Space: O(nm O(nm33+n+n22))

Our Altorithm:Our Altorithm: Time:Time: O(n O(n22m) m) Space:Space: O(n O(n22))

Scaling – Geometric Scaling – Geometric DefinitionDefinition

Scaling – Algebraic Scaling – Algebraic DefinitionDefinition

Rounding Function:Rounding Function:

Scaling – Algebraic Scaling – Algebraic DefinitionDefinition

Given pattern P, of size m x m, and scale rGiven pattern P, of size m x m, and scale r The first row would be scaled to || 1*r ||The first row would be scaled to || 1*r || The first 2 rows would be scaled to || 2*r ||The first 2 rows would be scaled to || 2*r || …… The first m rows would be scaled to || m*r ||The first m rows would be scaled to || m*r ||

Similarly on the columnsSimilarly on the columns

Scaling – Algebraic Scaling – Algebraic DefinitionDefinition

Rounding Function:Rounding Function:

Inverse Rounding Function: suppose we Inverse Rounding Function: suppose we know that K rows where scaled to L row:know that K rows where scaled to L row:

Subrow/column Repetition Subrow/column Repetition QueryQuery

Query time: O(1), preprocessing time: O(nQuery time: O(1), preprocessing time: O(n22))

Algorithm LayoutAlgorithm Layout

The algorithm consists of 4 stages:The algorithm consists of 4 stages:1. Scale Elimination1. Scale Elimination2. Candidate Consistency2. Candidate Consistency3. Candidate Verification3. Candidate Verification4. Occurrence Recognition4. Occurrence Recognition

Each stage takes O(nEach stage takes O(n22m) time and O(nm) time and O(n22) ) spacespace

Scale Elimination StageScale Elimination Stage

PivotPivot

Scale Elimination StageScale Elimination Stage

(i,j)(i,j)

Scale Elimination StageScale Elimination Stage

(i,j)(i,j)

O(m) time for each location, O(nO(m) time for each location, O(n22m) total, O(nm) total, O(n22) space) space

Candidate Consistency Candidate Consistency StageStage

Candidate Consistency Candidate Consistency StageStage

Case (a)Case (a) Case (b)Case (b)

Witness Table ConstructionWitness Table Construction

For each suffix O(mFor each suffix O(m22) time and O(m) space) time and O(m) space

Pre-Dueling StepPre-Dueling Step

For each candidate For each candidate cc in T: in T:For each suffix For each suffix ss of P: of P:Compare Compare c’sc’s borders with witness table borders with witness table borders of suffix borders of suffix ss

If borders are not the same – c is eliminatedIf borders are not the same – c is eliminated

Can be done in O(m) time for each candidateCan be done in O(m) time for each candidate

Performing a DuelPerforming a Duel

The Dueling OrderThe Dueling Order

Each candidate performs at most O(m) succ. duelsEach candidate performs at most O(m) succ. duels

Witness Table construction: Witness Table construction: O(mO(m33) time, O(m) time, O(m22) space) space

Pre-Dueling Step:Pre-Dueling Step: O(nO(n22m) time, O(mm) time, O(m22) space) space

# of Duel# of Duel At most O(n) unsucc., at most O(nAt most O(n) unsucc., at most O(n22m) succ.m) succ.

where each duel takes O(1) timewhere each duel takes O(1) time

Total:Total: O(n O(n22m) time, O(nm) time, O(n22) space) space

Candidate Consistency Candidate Consistency StageStage

Candidate Verification Candidate Verification StageStage

Candidate Verification Candidate Verification StageStage

For each location find maximal containing For each location find maximal containing intervalinterval

Can be solved in O(n) time per row using solution Can be solved in O(n) time per row using solution to Maximal Interval Problemto Maximal Interval Problem

Once we find the largest interval we: Once we find the largest interval we: Verify each row in O(m) time, using Verify each row in O(m) time, using

subcolumn repetition queriessubcolumn repetition queries Save the longest matching lengthSave the longest matching length For each candidate run a Range For each candidate run a Range

Minimum Query on the lengthsMinimum Query on the lengths

The pattern appears The pattern appears iffiff pattern size >= pattern size >= RMQRMQ

Candidate Verification Candidate Verification StageStage

Finding largest intervals:Finding largest intervals: O(n) time per row, O(nO(n) time per row, O(n22) total) total

Verifing columns:Verifing columns: O(nm) time per row, O(nO(nm) time per row, O(n22m) totalm) total

RMQ :RMQ : Preprocess: O(n) time per row, O(nPreprocess: O(n) time per row, O(n22) total) total Quering: O(1) time per candidate, O(nQuering: O(1) time per candidate, O(n22) )

totaltotal

Total:Total: O(n O(n22m) time, O(nm) time, O(n22) space) space

Candidate Verification Candidate Verification StageStage

Occurrence Recognition Occurrence Recognition StageStage

Recall: Scale elimination stage returned Recall: Scale elimination stage returned

At most O(m) steps At most O(m) steps per candiateper candiate

Total: O(nTotal: O(n22m) timem) time

ConclusionsConclusions

The algorithm consists of 4 stages:The algorithm consists of 4 stages:1. Scale Elimination1. Scale Elimination2. Candidate Consistency2. Candidate Consistency3. Candidate Verification3. Candidate Verification4. Occurrence Recognition4. Occurrence Recognition

Each stage takes O(nEach stage takes O(n22m) time and O(nm) time and O(n22) ) spacespace

Recommended