46
Compressed Membership in Automata with Compressed Labels Markus Lohrey Christian Mathissen University of Leipzig June 16, 2011 Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 1 / 12

Csr2011 june16 17_00_lohrey

  • View
    272

  • Download
    1

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Csr2011 june16 17_00_lohrey

Compressed Membership in Automata withCompressed Labels

Markus LohreyChristian Mathissen

University of Leipzig

June 16, 2011

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 1 / 12

Page 2: Csr2011 june16 17_00_lohrey

Motivation

Try to develop algorithms that directly work on compressed data.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 2 / 12

Page 3: Csr2011 june16 17_00_lohrey

Motivation

Try to develop algorithms that directly work on compressed data.

Goal: Beat straightforward decompress and manipulate strategy.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 2 / 12

Page 4: Csr2011 june16 17_00_lohrey

Motivation

Try to develop algorithms that directly work on compressed data.

Goal: Beat straightforward decompress and manipulate strategy.

In this talk: focus on compressed strings

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 2 / 12

Page 5: Csr2011 june16 17_00_lohrey

Motivation

Try to develop algorithms that directly work on compressed data.

Goal: Beat straightforward decompress and manipulate strategy.

In this talk: focus on compressed strings

Applications:

◮ all domains, where massive string data arise and have to beprocessed, e.g. bioinformatics

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 2 / 12

Page 6: Csr2011 june16 17_00_lohrey

Motivation

Try to develop algorithms that directly work on compressed data.

Goal: Beat straightforward decompress and manipulate strategy.

In this talk: focus on compressed strings

Applications:

◮ all domains, where massive string data arise and have to beprocessed, e.g. bioinformatics

◮ large (and highly compressible) strings often occur as intermediatedata structures (e.g. in computational group theory, program analysis,verification).

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 2 / 12

Page 7: Csr2011 june16 17_00_lohrey

Straight-line programs

Dictionary-based compression algorithms (LZ77, LZ78) exploit textrepetition.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12

Page 8: Csr2011 june16 17_00_lohrey

Straight-line programs

Dictionary-based compression algorithms (LZ77, LZ78) exploit textrepetition.

Straight-line programs are a general representation for compressed strings,which covers most dictionary-based algorithms.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12

Page 9: Csr2011 june16 17_00_lohrey

Straight-line programs

Dictionary-based compression algorithms (LZ77, LZ78) exploit textrepetition.

Straight-line programs are a general representation for compressed strings,which covers most dictionary-based algorithms.

A straight-line program (SLP) over the alphabet Γ is a sequence ofdefinitions A = (Ai := αi)1≤i≤n, where either αi ∈ Γ or αi = AjAk forsome j , k < i .

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12

Page 10: Csr2011 june16 17_00_lohrey

Straight-line programs

Dictionary-based compression algorithms (LZ77, LZ78) exploit textrepetition.

Straight-line programs are a general representation for compressed strings,which covers most dictionary-based algorithms.

A straight-line program (SLP) over the alphabet Γ is a sequence ofdefinitions A = (Ai := αi)1≤i≤n, where either αi ∈ Γ or αi = AjAk forsome j , k < i .

Alternatively: a context-free grammar that generates exactly one string.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12

Page 11: Csr2011 june16 17_00_lohrey

Straight-line programs

Dictionary-based compression algorithms (LZ77, LZ78) exploit textrepetition.

Straight-line programs are a general representation for compressed strings,which covers most dictionary-based algorithms.

A straight-line program (SLP) over the alphabet Γ is a sequence ofdefinitions A = (Ai := αi)1≤i≤n, where either αi ∈ Γ or αi = AjAk forsome j , k < i .

Alternatively: a context-free grammar that generates exactly one string.

We write val(A) for the unique word generated by A.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12

Page 12: Csr2011 june16 17_00_lohrey

Straight-line programs

Dictionary-based compression algorithms (LZ77, LZ78) exploit textrepetition.

Straight-line programs are a general representation for compressed strings,which covers most dictionary-based algorithms.

A straight-line program (SLP) over the alphabet Γ is a sequence ofdefinitions A = (Ai := αi)1≤i≤n, where either αi ∈ Γ or αi = AjAk forsome j , k < i .

Alternatively: a context-free grammar that generates exactly one string.

We write val(A) for the unique word generated by A.

The size of an SLP A = (Ai := αi)1≤i≤n is |A| = n.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12

Page 13: Csr2011 june16 17_00_lohrey

Straight-line programs

Dictionary-based compression algorithms (LZ77, LZ78) exploit textrepetition.

Straight-line programs are a general representation for compressed strings,which covers most dictionary-based algorithms.

A straight-line program (SLP) over the alphabet Γ is a sequence ofdefinitions A = (Ai := αi)1≤i≤n, where either αi ∈ Γ or αi = AjAk forsome j , k < i .

Alternatively: a context-free grammar that generates exactly one string.

We write val(A) for the unique word generated by A.

The size of an SLP A = (Ai := αi)1≤i≤n is |A| = n.

One may have |val(A)| = 2n.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12

Page 14: Csr2011 june16 17_00_lohrey

Straight-line programs

Dictionary-based compression algorithms (LZ77, LZ78) exploit textrepetition.

Straight-line programs are a general representation for compressed strings,which covers most dictionary-based algorithms.

A straight-line program (SLP) over the alphabet Γ is a sequence ofdefinitions A = (Ai := αi)1≤i≤n, where either αi ∈ Γ or αi = AjAk forsome j , k < i .

Alternatively: a context-free grammar that generates exactly one string.

We write val(A) for the unique word generated by A.

The size of an SLP A = (Ai := αi)1≤i≤n is |A| = n.

One may have |val(A)| = 2n.

Thus, an SLP A can be seen as a compressed representation of val(A).

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12

Page 15: Csr2011 june16 17_00_lohrey

Straight-line programs

Example: A = (A1 := b, A2 := a, Ai := Ai−1Ai−2 for 3 ≤ i ≤ 7).

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12

Page 16: Csr2011 june16 17_00_lohrey

Straight-line programs

Example: A = (A1 := b, A2 := a, Ai := Ai−1Ai−2 for 3 ≤ i ≤ 7).

Then val(A) = abaababaabaab:

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12

Page 17: Csr2011 june16 17_00_lohrey

Straight-line programs

Example: A = (A1 := b, A2 := a, Ai := Ai−1Ai−2 for 3 ≤ i ≤ 7).

Then val(A) = abaababaabaab:

A3 = A2A1 = ab

A4 = A3A2 = aba

A5 = A4A3 = abaab

A6 = A5A4 = abaababa

A7 = A6A5 = abaababaabaab

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12

Page 18: Csr2011 june16 17_00_lohrey

Straight-line programs

Example: A = (A1 := b, A2 := a, Ai := Ai−1Ai−2 for 3 ≤ i ≤ 7).

Then val(A) = abaababaabaab:

A3 = A2A1 = ab

A4 = A3A2 = aba

A5 = A4A3 = abaab

A6 = A5A4 = abaababa

A7 = A6A5 = abaababaabaab

Relationship to dictionary-based compression (Rytter):

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12

Page 19: Csr2011 june16 17_00_lohrey

Straight-line programs

Example: A = (A1 := b, A2 := a, Ai := Ai−1Ai−2 for 3 ≤ i ≤ 7).

Then val(A) = abaababaabaab:

A3 = A2A1 = ab

A4 = A3A2 = aba

A5 = A4A3 = abaab

A6 = A5A4 = abaababa

A7 = A6A5 = abaababaabaab

Relationship to dictionary-based compression (Rytter):

◮ From an SLP A one can compute in polynomial time LZ77(val(A)).

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12

Page 20: Csr2011 june16 17_00_lohrey

Straight-line programs

Example: A = (A1 := b, A2 := a, Ai := Ai−1Ai−2 for 3 ≤ i ≤ 7).

Then val(A) = abaababaabaab:

A3 = A2A1 = ab

A4 = A3A2 = aba

A5 = A4A3 = abaab

A6 = A5A4 = abaababa

A7 = A6A5 = abaababaabaab

Relationship to dictionary-based compression (Rytter):

◮ From an SLP A one can compute in polynomial time LZ77(val(A)).

◮ From LZ77(w) one can compute in polynomial time an SLP A withval(A) = w .

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12

Page 21: Csr2011 june16 17_00_lohrey

Algorithms on SLP-compressed strings

The mother of all algorithms on SLP -compressed strings:

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 5 / 12

Page 22: Csr2011 june16 17_00_lohrey

Algorithms on SLP-compressed strings

The mother of all algorithms on SLP -compressed strings:

Plandowski 1994

The following problem can be solved in polynomial time:

INPUT: SLPs A, B

QUESTION: val(A) = val(B)?

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 5 / 12

Page 23: Csr2011 june16 17_00_lohrey

Algorithms on SLP-compressed strings

The mother of all algorithms on SLP -compressed strings:

Plandowski 1994

The following problem can be solved in polynomial time:

INPUT: SLPs A, B

QUESTION: val(A) = val(B)?

Note: The decompress-and-compare strategy does not work here.We cannot compute val(A) and val(B) in polynomial time.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 5 / 12

Page 24: Csr2011 june16 17_00_lohrey

Algorithms on SLP-compressed strings

The mother of all algorithms on SLP -compressed strings:

Plandowski 1994

The following problem can be solved in polynomial time:

INPUT: SLPs A, B

QUESTION: val(A) = val(B)?

Note: The decompress-and-compare strategy does not work here.We cannot compute val(A) and val(B) in polynomial time.

Plandowski’s algorithm uses combinatorics on words, in particular thePeriodicity-Lemma of Fine and Wilf.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 5 / 12

Page 25: Csr2011 june16 17_00_lohrey

Improvements of Plandowski’s result

Gasieniec, Karpinski, Miyazaki, Plandowski, Rytter, Shinohara,Takeda (mid 90’s)

The following problem can be solved in polynomial time(fully compressed pattern matching):

INPUT: SLPs P, T

QUESTION: Is val(P) a factor of val(T), i.e., ∃x , y : val(T) = x val(P) y?

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 6 / 12

Page 26: Csr2011 june16 17_00_lohrey

Improvements of Plandowski’s result

Gasieniec, Karpinski, Miyazaki, Plandowski, Rytter, Shinohara,Takeda (mid 90’s)

The following problem can be solved in polynomial time(fully compressed pattern matching):

INPUT: SLPs P, T

QUESTION: Is val(P) a factor of val(T), i.e., ∃x , y : val(T) = x val(P) y?

The best known algorithm has a running time of O(|P| · |T|2)(Lifshits 2006).

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 6 / 12

Page 27: Csr2011 june16 17_00_lohrey

Generalization: Compressed automata

A compressed automata is an ordinary finite automaton, where everytransition is labelled with an SLP.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 7 / 12

Page 28: Csr2011 june16 17_00_lohrey

Generalization: Compressed automata

A compressed automata is an ordinary finite automaton, where everytransition is labelled with an SLP.

Example: The compressed automaton

q qA

Σ Σ

accepts all words that have val(A) as a factor.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 7 / 12

Page 29: Csr2011 june16 17_00_lohrey

Generalization: Compressed automata

A compressed automata is an ordinary finite automaton, where everytransition is labelled with an SLP.

Example: The compressed automaton

q qA

Σ Σ

accepts all words that have val(A) as a factor.

The size |A| of the compressed automaton A(with the set ∆ of transition triples) is

|A| =∑

(p,A,q)∈∆

|A|.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 7 / 12

Page 30: Csr2011 june16 17_00_lohrey

Compressed membership for compressed automata

Compressed membership for compressed automata:

INPUT: A compressed automaton A and an SLP B.QUESTION: val(B) ∈ L(A)?

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 8 / 12

Page 31: Csr2011 june16 17_00_lohrey

Compressed membership for compressed automata

Compressed membership for compressed automata:

INPUT: A compressed automaton A and an SLP B.QUESTION: val(B) ∈ L(A)?

Plandowski, Rytter 1999

◮ Compressed membership for compressed automata belongs toPSPACE.

◮ Compressed membership for compressed automata over a unaryalphabet is NP-complete.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 8 / 12

Page 32: Csr2011 june16 17_00_lohrey

Compressed membership for compressed automata

Compressed membership for compressed automata:

INPUT: A compressed automaton A and an SLP B.QUESTION: val(B) ∈ L(A)?

Plandowski, Rytter 1999

◮ Compressed membership for compressed automata belongs toPSPACE.

◮ Compressed membership for compressed automata over a unaryalphabet is NP-complete.

Plandowski and Rytter conjectured that compressed membership forcompressed automata is NP-complete (for every alphabet size).

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 8 / 12

Page 33: Csr2011 june16 17_00_lohrey

Some combinatorics on words

The period of the word w ∈ Σ∗ is the smallest number p such thatw [k + p] = w [k] for all 1 ≤ k ≤ |w | − p(period(w) = |w | if such a p does not exist).

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 9 / 12

Page 34: Csr2011 june16 17_00_lohrey

Some combinatorics on words

The period of the word w ∈ Σ∗ is the smallest number p such thatw [k + p] = w [k] for all 1 ≤ k ≤ |w | − p(period(w) = |w | if such a p does not exist).

Let order(w) = ⌊ |w |period(w)⌋.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 9 / 12

Page 35: Csr2011 june16 17_00_lohrey

Some combinatorics on words

The period of the word w ∈ Σ∗ is the smallest number p such thatw [k + p] = w [k] for all 1 ≤ k ≤ |w | − p(period(w) = |w | if such a p does not exist).

Let order(w) = ⌊ |w |period(w)⌋.

Example: Let w = abbabbab = (abb)2ab.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 9 / 12

Page 36: Csr2011 june16 17_00_lohrey

Some combinatorics on words

The period of the word w ∈ Σ∗ is the smallest number p such thatw [k + p] = w [k] for all 1 ≤ k ≤ |w | − p(period(w) = |w | if such a p does not exist).

Let order(w) = ⌊ |w |period(w)⌋.

Example: Let w = abbabbab = (abb)2ab.

Then period(w) = 3 and order(w) = 2.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 9 / 12

Page 37: Csr2011 june16 17_00_lohrey

Some combinatorics on words

The period of the word w ∈ Σ∗ is the smallest number p such thatw [k + p] = w [k] for all 1 ≤ k ≤ |w | − p(period(w) = |w | if such a p does not exist).

Let order(w) = ⌊ |w |period(w)⌋.

Example: Let w = abbabbab = (abb)2ab.

Then period(w) = 3 and order(w) = 2.

For a compressed automaton A, let:

order(A) = max{order(val(A)) | A occurs as a label in A}

period(A) = max{period(val(A)) | A occurs as a label in A}

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 9 / 12

Page 38: Csr2011 june16 17_00_lohrey

First main result

Theorem 1

For a compressed automaton A and an SLP B, we can checkval(B) ∈ L(A) in time polynomial in (i) |A|, (ii) |B|, and (iii) order(A).

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 10 / 12

Page 39: Csr2011 june16 17_00_lohrey

First main result

Theorem 1

For a compressed automaton A and an SLP B, we can checkval(B) ∈ L(A) in time polynomial in (i) |A|, (ii) |B|, and (iii) order(A).

We use the following combinatorial fact:

Let u, v ∈ Σ∗ and let p be a position in v . Then there exist at mostorder(u) many occurrences of u in v that “touch” position p.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 10 / 12

Page 40: Csr2011 june16 17_00_lohrey

Second main result

Theorem 2

For a compressed automaton A and an SLP B, we can checkval(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|,and (iii) period(A).

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 11 / 12

Page 41: Csr2011 june16 17_00_lohrey

Second main result

Theorem 2

For a compressed automaton A and an SLP B, we can checkval(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|,and (iii) period(A).

Generalizes the result of Plandowski & Rytter for a unary alphabet(NP-completeness of compressed membership for compressed automata).

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 11 / 12

Page 42: Csr2011 june16 17_00_lohrey

Second main result

Theorem 2

For a compressed automaton A and an SLP B, we can checkval(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|,and (iii) period(A).

Generalizes the result of Plandowski & Rytter for a unary alphabet(NP-completeness of compressed membership for compressed automata).

A word w = an has period 1!

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 11 / 12

Page 43: Csr2011 june16 17_00_lohrey

Second main result

Theorem 2

For a compressed automaton A and an SLP B, we can checkval(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|,and (iii) period(A).

Generalizes the result of Plandowski & Rytter for a unary alphabet(NP-completeness of compressed membership for compressed automata).

A word w = an has period 1!

The theorem is proven by a reduction to the case of a unary alphabet.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 11 / 12

Page 44: Csr2011 june16 17_00_lohrey

Open problems and a conjecture

Conjecture 1

Compressed membership for compressed automata is NP-complete.

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 12 / 12

Page 45: Csr2011 june16 17_00_lohrey

Open problems and a conjecture

Conjecture 1

Compressed membership for compressed automata is NP-complete.

Conjecture 2

If val(B) ∈ L(A) for an SLP B and a compressed automaton A, then thereexists an accepting run of A on val(B) (viewed as a word over the set oftransition triples of A), which can be generated by an SLP of sizepoly(|B|, |A|).

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 12 / 12

Page 46: Csr2011 june16 17_00_lohrey

Open problems and a conjecture

Conjecture 1

Compressed membership for compressed automata is NP-complete.

Conjecture 2

If val(B) ∈ L(A) for an SLP B and a compressed automaton A, then thereexists an accepting run of A on val(B) (viewed as a word over the set oftransition triples of A), which can be generated by an SLP of sizepoly(|B|, |A|).

Conjecture 2 implies Conjecture 1:

◮ Guess nondeterministically an SLP C of polynomial size.

◮ Check (in deterministic polynomial time), whether val(C) is anaccepting run of A on val(B).

Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 12 / 12