61
Fully compressed pattern matching by recompression Artur Jeż University of Wroclaw 9 VII 2012 Artur Jeż FCPM by recompression 9 VII 2012 1 / 18

Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Fully compressed pattern matching by recompression

Artur JeżUniversity of Wrocław

9 VII 2012

Artur Jeż FCPM by recompression 9 VII 2012 1 / 18

Page 2: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

SLPDefinition (SLP: Straight Line Programme)CFG generating exactly one wordXi → XjXk or Xi → a

ExampleX0 = a, X1 = b, Xn+1 = Xn−1Xn−2a, b, ba, bab, babba, babbababb, . . .

Relations to LZ and LZWLZW rules Xi → aXj , text is X1X2X3 . . .

LZ LZ to SLP: from n to O(n log(N/n))

many algorithms for SLPsCPM for LZ [Gawrychowski ESA’11]in theory (word equations, equations in groups, verification...)

Artur Jeż FCPM by recompression 9 VII 2012 2 / 18

Page 3: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

SLPDefinition (SLP: Straight Line Programme)CFG generating exactly one wordXi → XjXk or Xi → a

ExampleX0 = a, X1 = b, Xn+1 = Xn−1Xn−2a, b, ba, bab, babba, babbababb, . . .

Relations to LZ and LZWLZW rules Xi → aXj , text is X1X2X3 . . .

LZ LZ to SLP: from n to O(n log(N/n))

many algorithms for SLPsCPM for LZ [Gawrychowski ESA’11]in theory (word equations, equations in groups, verification...)

Artur Jeż FCPM by recompression 9 VII 2012 2 / 18

Page 4: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

SLPDefinition (SLP: Straight Line Programme)CFG generating exactly one wordXi → XjXk or Xi → a

ExampleX0 = a, X1 = b, Xn+1 = Xn−1Xn−2a, b, ba, bab, babba, babbababb, . . .

Relations to LZ and LZWLZW rules Xi → aXj , text is X1X2X3 . . .

LZ LZ to SLP: from n to O(n log(N/n))

many algorithms for SLPsCPM for LZ [Gawrychowski ESA’11]in theory (word equations, equations in groups, verification...)

Artur Jeż FCPM by recompression 9 VII 2012 2 / 18

Page 5: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

SLPDefinition (SLP: Straight Line Programme)CFG generating exactly one wordXi → XjXk or Xi → a

ExampleX0 = a, X1 = b, Xn+1 = Xn−1Xn−2a, b, ba, bab, babba, babbababb, . . .

Relations to LZ and LZWLZW rules Xi → aXj , text is X1X2X3 . . .

LZ LZ to SLP: from n to O(n log(N/n))

many algorithms for SLPsCPM for LZ [Gawrychowski ESA’11]in theory (word equations, equations in groups, verification...)

Artur Jeż FCPM by recompression 9 VII 2012 2 / 18

Page 6: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

This talk

Definition (CPM, FCPM)Compressed pattern matching: text is compressed, pattern not.Fully Compressed pattern matching: both text and pattern are compressed.

ResultsAn O((n + m) logM) algorithm for FCPM for SLP.(Previously: O(nm2), [Lifshits, CPM’07]).

Different approachA new technique; recompression.

decompresses text and patterncompresses them again (in the same way)in the end: pattern is a single symbol

Artur Jeż FCPM by recompression 9 VII 2012 3 / 18

Page 7: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

This talk

Definition (CPM, FCPM)Compressed pattern matching: text is compressed, pattern not.Fully Compressed pattern matching: both text and pattern are compressed.

ResultsAn O((n + m) logM) algorithm for FCPM for SLP.(Previously: O(nm2), [Lifshits, CPM’07]).

Different approachA new technique; recompression.

decompresses text and patterncompresses them again (in the same way)in the end: pattern is a single symbol

Artur Jeż FCPM by recompression 9 VII 2012 3 / 18

Page 8: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

This talk

Definition (CPM, FCPM)Compressed pattern matching: text is compressed, pattern not.Fully Compressed pattern matching: both text and pattern are compressed.

ResultsAn O((n + m) logM) algorithm for FCPM for SLP.(Previously: O(nm2), [Lifshits, CPM’07]).

Different approachA new technique; recompression.

decompresses text and patterncompresses them again (in the same way)in the end: pattern is a single symbol

Artur Jeż FCPM by recompression 9 VII 2012 3 / 18

Page 9: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Technique

Where it comes fromMehlhorn, Gawry

Applicable toFully Compressed Membership Problem [∈ NP]Word equations [alternative PSPACE algorithm]Fully Compressed Pattern Matching [SLPs, LZ,O((n + m) logM log(n + m))]construction of a grammar for a string [alternative log(N/n)approximation algorithm]other?

Artur Jeż FCPM by recompression 9 VII 2012 4 / 18

Page 10: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Technique

Where it comes fromMehlhorn, Gawry

Applicable toFully Compressed Membership Problem [∈ NP]Word equations [alternative PSPACE algorithm]Fully Compressed Pattern Matching [SLPs, LZ,O((n + m) logM log(n + m))]construction of a grammar for a string [alternative log(N/n)approximation algorithm]other?

Artur Jeż FCPM by recompression 9 VII 2012 4 / 18

Page 11: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Example

Equality of stringsHow to test equality of strings?

a aa a bb a bc a bb a b c ab

a aa a bb a bc a bb a b c ab

Iterate!

Artur Jeż FCPM by recompression 9 VII 2012 5 / 18

Page 12: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Example

Equality of stringsHow to test equality of strings?

a aa a bb a bc a bb a b c ab

a aa a bb a bc a bb a b c ab

Iterate!

Artur Jeż FCPM by recompression 9 VII 2012 5 / 18

Page 13: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Example

Equality of stringsHow to test equality of strings?

a3 a bb a bc a bb a b c ab

a3 a bb a bc a bb a b c ab

Iterate!

Artur Jeż FCPM by recompression 9 VII 2012 5 / 18

Page 14: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Example

Equality of stringsHow to test equality of strings?

a3 a bb a bc a b2 a b c ab

a3 a bb a bc a b2 a b c ab

Iterate!

Artur Jeż FCPM by recompression 9 VII 2012 5 / 18

Page 15: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Example

Equality of stringsHow to test equality of strings?

a3

d

b c a b2 c ab

a3 b c a b2 c ab

dd

d

d

d

Iterate!

Artur Jeż FCPM by recompression 9 VII 2012 5 / 18

Page 16: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Example

Equality of stringsHow to test equality of strings?

a3

d

b c a b2 c e

a3 b c a b2 c e

dd

d

d

d

Iterate!

Artur Jeż FCPM by recompression 9 VII 2012 5 / 18

Page 17: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Example

Equality of stringsHow to test equality of strings?

a3

d

b c a b2 c e

a3 b c a b2 c e

dd

d

d

d

Iterate!

Artur Jeż FCPM by recompression 9 VII 2012 5 / 18

Page 18: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Example

Equality of stringsHow to test equality of strings?

a3

d

b c a b2 c e

a3 b c a b2 c e

dd

d

d

d

Iterate!

Artur Jeż FCPM by recompression 9 VII 2012 5 / 18

Page 19: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

How to generalise?

IdeaFor both strings

replace pairs of lettersreplace (maximal) blocks of the same letter

When every letter is compressed, the length reduces by half in an iteration.

TODOformalisefor SLPsfor pattern matchingrunning time

Artur Jeż FCPM by recompression 9 VII 2012 6 / 18

Page 20: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

How to generalise?

IdeaFor both strings

replace pairs of lettersreplace (maximal) blocks of the same letter

When every letter is compressed, the length reduces by half in an iteration.

TODOformalisefor SLPsfor pattern matchingrunning time

Artur Jeż FCPM by recompression 9 VII 2012 6 / 18

Page 21: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Formalisation

In one phase

L←list of letters, P ←list of pairs of lettersfor every letter a ∈ L do

replace (maximal) blocks a` with a`

for every pair of letter ab ∈ P doreplace pairs ab with c

It will shorten the strings by constant factor.

Loop, while nontrivial.(O(logM) iterations).

Artur Jeż FCPM by recompression 9 VII 2012 7 / 18

Page 22: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Formalisation

In one phaseL←list of letters, P ←list of pairs of letters

for every letter a ∈ L doreplace (maximal) blocks a` with a`

for every pair of letter ab ∈ P doreplace pairs ab with c

It will shorten the strings by constant factor.

Loop, while nontrivial.(O(logM) iterations).

Artur Jeż FCPM by recompression 9 VII 2012 7 / 18

Page 23: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Formalisation

In one phaseL←list of letters, P ←list of pairs of letters

for every letter a ∈ L doreplace (maximal) blocks a` with a`

for every pair of letter ab ∈ P doreplace pairs ab with c

It will shorten the strings by constant factor.

Loop, while nontrivial.(O(logM) iterations).

Artur Jeż FCPM by recompression 9 VII 2012 7 / 18

Page 24: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Formalisation

In one phaseL←list of letters, P ←list of pairs of letters

for every letter a ∈ L doreplace (maximal) blocks a` with a`

for every pair of letter ab ∈ P doreplace pairs ab with c

It will shorten the strings by constant factor.

Loop, while nontrivial.(O(logM) iterations).

Artur Jeż FCPM by recompression 9 VII 2012 7 / 18

Page 25: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Formalisation

In one phaseL←list of letters, P ←list of pairs of letters

for every letter a ∈ L doreplace (maximal) blocks a` with a`

for every pair of letter ab ∈ P doreplace pairs ab with c

It will shorten the strings by constant factor.

Loop, while nontrivial.(O(logM) iterations).

Artur Jeż FCPM by recompression 9 VII 2012 7 / 18

Page 26: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Formalisation

In one phaseL←list of letters, P ←list of pairs of letters

for every letter a ∈ L doreplace (maximal) blocks a` with a`

for every pair of letter ab ∈ P doreplace pairs ab with c

It will shorten the strings by constant factor.

Loop, while nontrivial.(O(logM) iterations).

Artur Jeż FCPM by recompression 9 VII 2012 7 / 18

Page 27: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

SLPs

Grammar formMore general rules: Xi → uXjvXkw , j , k < i .

LemmaThere are |G |+ 4n different maximal lengths of blocks in G.

Proof.blocks contained in explicit words: assign to explicit lettersblocks not contained in explicit words: at most 4 per rule

LemmaThere are |G |+ 4n different pairs of letters in G.

Artur Jeż FCPM by recompression 9 VII 2012 8 / 18

Page 28: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

SLPs

Grammar formMore general rules: Xi → uXjvXkw , j , k < i .

LemmaThere are |G |+ 4n different maximal lengths of blocks in G.

Proof.blocks contained in explicit words: assign to explicit lettersblocks not contained in explicit words: at most 4 per rule

LemmaThere are |G |+ 4n different pairs of letters in G.

Artur Jeż FCPM by recompression 9 VII 2012 8 / 18

Page 29: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

SLPs

Grammar formMore general rules: Xi → uXjvXkw , j , k < i .

LemmaThere are |G |+ 4n different maximal lengths of blocks in G.

Proof.blocks contained in explicit words: assign to explicit lettersblocks not contained in explicit words: at most 4 per rule

LemmaThere are |G |+ 4n different pairs of letters in G.

Artur Jeż FCPM by recompression 9 VII 2012 8 / 18

Page 30: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Blocks compression

Compression of a

Definition (Crossing block)a has a crossing block if some of its maximal blocks is contained in Xi butnot in explicit words in Xi ’s rule.

When a has no crossing block1: for all maximal blocks a` of a do2: let a` ∈ Σ be an unused letter3: replace each explicit maximal a` in rules’ bodies by a`

Artur Jeż FCPM by recompression 9 VII 2012 9 / 18

Page 31: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Blocks compression

Compression of aX1 → baaba, X2 → aaX1baX1baa

Definition (Crossing block)a has a crossing block if some of its maximal blocks is contained in Xi butnot in explicit words in Xi ’s rule.

When a has no crossing block1: for all maximal blocks a` of a do2: let a` ∈ Σ be an unused letter3: replace each explicit maximal a` in rules’ bodies by a`

Artur Jeż FCPM by recompression 9 VII 2012 9 / 18

Page 32: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Blocks compression

Compression of aX1 → baaba, X2 → aaX1baX1baa (no problem)

Definition (Crossing block)a has a crossing block if some of its maximal blocks is contained in Xi butnot in explicit words in Xi ’s rule.

When a has no crossing block1: for all maximal blocks a` of a do2: let a` ∈ Σ be an unused letter3: replace each explicit maximal a` in rules’ bodies by a`

Artur Jeż FCPM by recompression 9 VII 2012 9 / 18

Page 33: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Blocks compression

Compression of aX1 → baaba, X2 → aaX1baX1baa (no problem)X1 → a, X2 → aX1aX1a

Definition (Crossing block)a has a crossing block if some of its maximal blocks is contained in Xi butnot in explicit words in Xi ’s rule.

When a has no crossing block1: for all maximal blocks a` of a do2: let a` ∈ Σ be an unused letter3: replace each explicit maximal a` in rules’ bodies by a`

Artur Jeż FCPM by recompression 9 VII 2012 9 / 18

Page 34: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Blocks compression

Compression of aX1 → baaba, X2 → aaX1baX1baa (no problem)X1 → a, X2 → aX1aX1a (problem)

Definition (Crossing block)a has a crossing block if some of its maximal blocks is contained in Xi butnot in explicit words in Xi ’s rule.

When a has no crossing block1: for all maximal blocks a` of a do2: let a` ∈ Σ be an unused letter3: replace each explicit maximal a` in rules’ bodies by a`

Artur Jeż FCPM by recompression 9 VII 2012 9 / 18

Page 35: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Blocks compression

Compression of aX1 → baaba, X2 → aaX1baX1baa (no problem)X1 → a, X2 → aX1aX1a (problem)X1 → abaaba, X2 → aX1aX1a

Definition (Crossing block)a has a crossing block if some of its maximal blocks is contained in Xi butnot in explicit words in Xi ’s rule.

When a has no crossing block1: for all maximal blocks a` of a do2: let a` ∈ Σ be an unused letter3: replace each explicit maximal a` in rules’ bodies by a`

Artur Jeż FCPM by recompression 9 VII 2012 9 / 18

Page 36: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Blocks compression

Compression of aX1 → baaba, X2 → aaX1baX1baa (no problem)X1 → a, X2 → aX1aX1a (problem)X1 → abaaba, X2 → aX1aX1a (problem)

Definition (Crossing block)a has a crossing block if some of its maximal blocks is contained in Xi butnot in explicit words in Xi ’s rule.

When a has no crossing block1: for all maximal blocks a` of a do2: let a` ∈ Σ be an unused letter3: replace each explicit maximal a` in rules’ bodies by a`

Artur Jeż FCPM by recompression 9 VII 2012 9 / 18

Page 37: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Blocks compression

Compression of aX1 → baaba, X2 → aaX1baX1baa (no problem)X1 → a, X2 → aX1aX1a (problem)X1 → abaaba, X2 → aX1aX1a (problem)

Definition (Crossing block)a has a crossing block if some of its maximal blocks is contained in Xi butnot in explicit words in Xi ’s rule.

When a has no crossing block1: for all maximal blocks a` of a do2: let a` ∈ Σ be an unused letter3: replace each explicit maximal a` in rules’ bodies by a`

Artur Jeż FCPM by recompression 9 VII 2012 9 / 18

Page 38: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Blocks compression

Compression of aX1 → baaba, X2 → aaX1baX1baa (no problem)X1 → a, X2 → aX1aX1a (problem)X1 → abaaba, X2 → aX1aX1a (problem)

Definition (Crossing block)a has a crossing block if some of its maximal blocks is contained in Xi butnot in explicit words in Xi ’s rule.

When a has no crossing block1: for all maximal blocks a` of a do2: let a` ∈ Σ be an unused letter3: replace each explicit maximal a` in rules’ bodies by a`

Artur Jeż FCPM by recompression 9 VII 2012 9 / 18

Page 39: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

What about crossing blocks?Idea

change the ruleswhen Xi defines a`iwari 7→ wreplace Xi in rules by a`iwari

CutPrefSuff(a)1: for i ← 1 to n do2: calculate and remove a-prefix a`i and a-suffix ari of Xi3: replace each Xi in rules bodies by a`iXiari

LemmaAfter CutPrefSuff(a) letter a has no crossing block.

So a’s blocks can be easily compressed.

Parallelly for many letters!

Artur Jeż FCPM by recompression 9 VII 2012 10 / 18

Page 40: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

What about crossing blocks?Idea

change the ruleswhen Xi defines a`iwari 7→ wreplace Xi in rules by a`iwari

CutPrefSuff(a)1: for i ← 1 to n do2: calculate and remove a-prefix a`i and a-suffix ari of Xi3: replace each Xi in rules bodies by a`iXiari

LemmaAfter CutPrefSuff(a) letter a has no crossing block.

So a’s blocks can be easily compressed.

Parallelly for many letters!

Artur Jeż FCPM by recompression 9 VII 2012 10 / 18

Page 41: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

What about crossing blocks?Idea

change the ruleswhen Xi defines a`iwari 7→ wreplace Xi in rules by a`iwari

CutPrefSuff(a)1: for i ← 1 to n do2: calculate and remove a-prefix a`i and a-suffix ari of Xi3: replace each Xi in rules bodies by a`iXiari

LemmaAfter CutPrefSuff(a) letter a has no crossing block.

So a’s blocks can be easily compressed.

Parallelly for many letters!

Artur Jeż FCPM by recompression 9 VII 2012 10 / 18

Page 42: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

What about crossing blocks?Idea

change the ruleswhen Xi defines a`iwari 7→ wreplace Xi in rules by a`iwari

CutPrefSuff(a)1: for i ← 1 to n do2: calculate and remove a-prefix a`i and a-suffix ari of Xi3: replace each Xi in rules bodies by a`iXiari

LemmaAfter CutPrefSuff(a) letter a has no crossing block.

So a’s blocks can be easily compressed.

Parallelly for many letters!

Artur Jeż FCPM by recompression 9 VII 2012 10 / 18

Page 43: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

What about crossing blocks?Idea

change the ruleswhen Xi defines a`iwari 7→ wreplace Xi in rules by a`iwari

CutPrefSuff(a)1: for i ← 1 to n do2: calculate and remove a-prefix a`i and a-suffix ari of Xi3: replace each Xi in rules bodies by a`iXiari

LemmaAfter CutPrefSuff(a) letter a has no crossing block.

So a’s blocks can be easily compressed.

Parallelly for many letters!Artur Jeż FCPM by recompression 9 VII 2012 10 / 18

Page 44: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

What about crossing blocks?Idea

change the ruleswhen Xi defines a`iwbri 7→ wreplace Xi in rules by a`iwbri

CutPrefSuff1: for i ← 1→ n do2: let Xi begin with a and end with b3: calculate and remove a-prefix a` and b-suffix br of Xi4: replace each Xi in rules bodies by a`Xibr

LemmaAfter CutPrefSuff no letter has a crossing block.

So all blocks can be easily compressed.

Parallelly for many letters!

Artur Jeż FCPM by recompression 9 VII 2012 10 / 18

Page 45: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Pair compression

X1 → ababcab, X2 → abcbX1abX1a

compression of ab: easycompression of ba: problempairs may overlap (problem: sequentially, not parallely)

Artur Jeż FCPM by recompression 9 VII 2012 11 / 18

Page 46: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Pair compression

X1 → ababcab, X2 → abcbX1abX1acompression of ab: easy

compression of ba: problempairs may overlap (problem: sequentially, not parallely)

Artur Jeż FCPM by recompression 9 VII 2012 11 / 18

Page 47: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Pair compression

X1 → ababcab, X2 → abcbX1abX1acompression of ab: easycompression of ba: problem

pairs may overlap (problem: sequentially, not parallely)

Artur Jeż FCPM by recompression 9 VII 2012 11 / 18

Page 48: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Pair compression

X1 → ababcab, X2 → abcbX1abX1acompression of ab: easycompression of ba: problempairs may overlap (problem: sequentially, not parallely)

Artur Jeż FCPM by recompression 9 VII 2012 11 / 18

Page 49: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Crossing pairs

When ab has a ‘crossing’ appearance: aXi or XibXi defines bw 7→ w , replace Xi by bXi

symmetrically for ending a

LeftPop(b)1: for i=1 to n do2: if the first symbol in Xi → α is b then3: remove this b4: replace Xi in productions by bXi

LemmaAfter LeftPop(b) and RightPop(a) the ab is no longer crossing.

Can be done in parallel!

Artur Jeż FCPM by recompression 9 VII 2012 12 / 18

Page 50: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Crossing pairs

When ab has a ‘crossing’ appearance: aXi or XibXi defines bw 7→ w , replace Xi by bXi

symmetrically for ending a

LeftPop(b)1: for i=1 to n do2: if the first symbol in Xi → α is b then3: remove this b4: replace Xi in productions by bXi

LemmaAfter LeftPop(b) and RightPop(a) the ab is no longer crossing.

Can be done in parallel!

Artur Jeż FCPM by recompression 9 VII 2012 12 / 18

Page 51: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Crossing pairs

When ab has a ‘crossing’ appearance: aXi or XibXi defines bw 7→ w , replace Xi by bXi

symmetrically for ending a

LeftPop(b)1: for i=1 to n do2: if the first symbol in Xi → α is b then3: remove this b4: replace Xi in productions by bXi

LemmaAfter LeftPop(b) and RightPop(a) the ab is no longer crossing.

Can be done in parallel!

Artur Jeż FCPM by recompression 9 VII 2012 12 / 18

Page 52: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Crossing pairs

When ab ∈ Σ1Σ2 has a crossing appearance: aXi or XibXi defines bw 7→ w , replace Xi by aXi

symmetrically for ending a

LeftPop1: for i=1 to n do2: if the first symbol in Xi → α is b ∈ Σ2 then3: remove this b4: replace Xi in productions by bXi

LemmaAfter LeftPop and RightPop the pairs Σ1Σ2 are no longer crossing.

Can be done in parallel!

Artur Jeż FCPM by recompression 9 VII 2012 12 / 18

Page 53: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Running time

Blocks compression: O(|G |) timenon-crossing pairs: O(|G |) timecrossing pairs: O(n + m) time per partition (Σ1,Σ2)

LemmaThere are O(n + m) crossing pairs.

crossing pairs: O((n + m)2) time.

Running timeRunning time: O(|G |+ (n + m)2).

Artur Jeż FCPM by recompression 9 VII 2012 13 / 18

Page 54: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Running time

Blocks compression: O(|G |) timenon-crossing pairs: O(|G |) timecrossing pairs: O(n + m) time per partition (Σ1,Σ2)

LemmaThere are O(n + m) crossing pairs.

crossing pairs: O((n + m)2) time.

Running timeRunning time: O(|G |+ (n + m)2).

Artur Jeż FCPM by recompression 9 VII 2012 13 / 18

Page 55: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Running time

Blocks compression: O(|G |) timenon-crossing pairs: O(|G |) timecrossing pairs: O(n + m) time per partition (Σ1,Σ2)

LemmaThere are O(n + m) crossing pairs.

crossing pairs: O((n + m)2) time.

Running timeRunning time: O(|G |+ (n + m)2).

Artur Jeż FCPM by recompression 9 VII 2012 13 / 18

Page 56: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Running time

Blocks compression: O(|G |) timenon-crossing pairs: O(|G |) timecrossing pairs: O(n + m) time per partition (Σ1,Σ2)

LemmaThere are O(n + m) crossing pairs.

crossing pairs: O((n + m)2) time.

Running timeRunning time: O(|G |+ (n + m)2).

Artur Jeż FCPM by recompression 9 VII 2012 13 / 18

Page 57: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Shortening of the string

consider pair ab in the textif a = b: it is compressedif a 6= b: it is compressed unless a or b was compressed alreadyconsider four consecutive symbols: something in them is compressedtext compresses by a constant factor in each phaseO(| logM|) phases

Artur Jeż FCPM by recompression 9 VII 2012 14 / 18

Page 58: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Overall running time and grammar size

Grammar sizeIn each phase size of grammar increases by O((n + m)2)

I CutPrefSuffI LeftPop, RightPop

shortening G : the same analysis as for patternI shortens by a constant factor in a phase

G is O((n + m)2)

Running time is O((n + m)2 logM)

Can be reduced to O((n + m) logM)

Artur Jeż FCPM by recompression 9 VII 2012 15 / 18

Page 59: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Turning to the pattern matching

Problem with the endstext: abababab, pattern baba, compression of abtext: abababab, pattern aba, compression of abtext: aaaaaaaa, pattern aaa, compression of a blocks

Fixing the endsCompress the starting and ending pair, if possible(so ba in the first case)not possible, when the first and last letter is the same, say areplace leading a by aL, ending by aR

spawn a into aRaL

Artur Jeż FCPM by recompression 9 VII 2012 16 / 18

Page 60: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Turning to the pattern matching

Problem with the endstext: abababab, pattern baba, compression of abtext: abababab, pattern aba, compression of abtext: aaaaaaaa, pattern aaa, compression of a blocks

Fixing the endsCompress the starting and ending pair, if possible(so ba in the first case)not possible, when the first and last letter is the same, say areplace leading a by aL, ending by aR

spawn a into aRaL

Artur Jeż FCPM by recompression 9 VII 2012 16 / 18

Page 61: Artur Jeż University of Wrocław - ii.uni.wroc.plaje/presentation/PresentationICALP2012.pdf · Artur Jeż University of Wrocław 9VII2012 Artur Jeż FCPM by recompression 9 VII 2012

Questions?Other applications?

Artur Jeż FCPM by recompression 9 VII 2012 17 / 18