30
DEGefWb5wiGH2XgYMEzUKajEmtCDUsRQ4d 1 A nice paper relevant to this course is titled “The Glory of the Past” 2 NICTA Researcher, Adjunct at the Australian National University and Griffith University -Closure, Kleene’s Theorem, and Kleene’s Algebra Charles Gretton 2 24 February 2014 NICTA Funding and Supporting Members and Partners Kleene’s Theorem Copyright NICTA 2014 Charles Gretton 2 1/30

Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

DEGefWb5wiGH2XgYMEzUKajEmtCDUsRQ4d1A nice paper relevant to this course is titled “The Glory of the Past”2NICTA Researcher,

Adjunct at the Australian National University and Griffith University

ε-Closure, Kleene’s Theorem,and Kleene’s Algebra

Charles Gretton2

24 February 2014

NICTA Funding and Supporting Members and Partners

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 1/30

Page 2: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

First, a Quick Summary of ε-Closure

• I’ve previously waived my hands about ε-labelled arcs.

• Recall, ε is our notation for the empty string.

• How can we formally treat ε-labelled arcs in an NFA?

• Using a function called ECLOSE...

• A construction similar to that for NFA←→ DFA is used to proveε-NFA←→ DFA

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 2/30

Page 3: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Quick Summary of ε-Closure

• ε-NFA is 〈Q,Σ, δ, q0,F 〉.

• ECLOSE : Q → 2Q – i.e. gives an element in the powerset of states.

• It is defined inductively, as follows.

• BASIS: q ∈ ECLOSE(q) – i.e. the argument is in the basis.

• INDUCTION: Suppose p ∈ ECLOSE(q) and r ∈ δ(p, ε), thenr ∈ ECLOSE(q).

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 3/30

Page 4: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Quick Example of ε-Closure

• ECLOSE : Q → 2Q – i.e. gives an element in the powerset of states.

• It is defined inductively, as follows.

• BASIS: q ∈ ECLOSE(q) – i.e. the argument is in the basis.

• INDUCTION: Suppose p ∈ ECLOSE(q) and r ∈ δ(p, ε), thenr ∈ ECLOSE(q).

???

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 4/30

Page 5: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Quick Example of ε-Closure

• ECLOSE : Q → 2Q – i.e. gives an element in the powerset of states.

• It is defined inductively, as follows.

• BASIS: q ∈ ECLOSE(q) – i.e. the argument is in the basis.

• INDUCTION: Suppose p ∈ ECLOSE(q) and r ∈ δ(p, ε), thenr ∈ ECLOSE(q).

ECLOSE(1) = {1, 2, 3, 4, 6}

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 5/30

Page 6: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

ε-Closure Subset Construction

• For DFA −→ ε-NFA, just add δ(q, ε) = ∅ transitions to all states –i.e. ε is not an alphabet symbol.

• ε-NFA −→ DFA

• Same as for NFA −→ DFA,

• Each DFA state corresponds to a unique set of ε-NFA states.

• Only take the ε-closure at ε-NFA transitions.

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 6/30

Page 7: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Summarizing

• NFA←→ DFA

• The equivalent NFA can be exponentially smaller than thecorresponding DFA.

• ε-NFA←→ DFA

• From transitivity: ε-NFA←→ NFA

• All formalisms so far are expressively equivalent.

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 7/30

Page 8: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Languages and Regular Expressions

• Regular expression implementations in many UNIX tools: grep,sed, awk, emacs and vi

• Following Kleene’s seminal work, we deal with regularity formally.

• We describe the language intended by a given regular expression.

• We consider the class of languages that can be expressed.

• We examine algebraic laws which inform re-writing rules forsimplifying expressions.

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 8/30

Page 9: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Operations on Languages

• L1,L2 ⊆ Σ∗

• product of languages

L1.L2 = L1L2 = {WX |W ∈ L1,X ∈ L2}

• inductive i th product

L0 = {ε} ∀i > 0, Li = LLi−1

L∗ =⋃

i≥0 Li

• positive closure

L+ = LL∗

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 9/30

Page 10: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Regular Expressions

• A regular expression is a well formed word over a given alphabet Σ

• We use the following symbols:

Σ ∪ {ε ∅ ( ) + . ∗}

• alphabet• empty string• empty set• open parenthesis• close parenthesis• “union” or “choice”• concatenation – X.Y often written as XY• Kleene closure – the “star” of this show

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 10/30

Page 11: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Regular Expressions

• A regular expression is a well formed word over a given alphabet Σ

• We use the following symbols:

Σ ∪ {ε ∅ ( ) + . ∗}

• “symbol” :: alphabet• constant (or 0-ary operator) :: empty string• constant (or 0-ary operator) :: empty set• punctuation :: open parenthesis• punctuation :: close parenthesis• infix binary operator :: “union” or “choice”• infix binary operator :: concatenation – X.Y often written as XY• postfix unary operator :: Kleene closure – the “star” of this show

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 11/30

Page 12: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Regular Expressions

• We use the following symbols:

Σ ∪ {ε ∅ ( ) + . ∗}

• Taking a ∈ Σ ∪ {ε, ∅}, regular expressions satisfy the followinggrammar:

regxp ::= a|(regxp)|regxp + regxp|regxp.regxp|regxp∗

• The Kleene closure operator, *, has the highest precedence.• Concatenation, . , is usually represented by juxtaposition – e.g.

concatenation a.b is written ab• Concatenation has the next highest precedence.• Union/choice has the least precedence.

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 12/30

Page 13: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Regular Expression Examples

• Taking a ∈ Σ ∪ {ε, ∅}, regular expressions satisfy the followinggrammar:

regxp ::= a|(regxp)|regxp + regxp|regxp.regxp|regxp∗

• So, which of the following is correct?

ab∗ + a = (a(b∗ + a))ab∗ + a = (ab)∗ + aab∗ + a = (a(b∗)) + a

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 13/30

Page 14: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Regular Expression Examples

• Taking a ∈ Σ ∪ {ε, ∅}, regular expressions satisfy the followinggrammar:

regxp ::= a|(regxp)|regxp + regxp|regxp.regxp|regxp∗

• So, which of the following is correct?

ab∗ + a = (a(b∗ + a))ab∗ + a = (ab)∗ + aab∗ + a = (a(b∗)) + a

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 14/30

Page 15: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Semantics of Regular Expressions

• BASISLanguage of Expression SemanticsL(ε) = {ε}L(∅) = ∅L(a), a ∈ Σ = {a}

• INDUCTION

• Below, L1 and L2 are well formed regular expressions.

Language of Expression SemanticsL(L1 + L2) = L(L1) ∪ L(L2)L(L1L2) = L(L1)L(L2)L(L∗1) = L(L1)∗

L((L1)) = L(L1)

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 15/30

Page 16: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Every Finite Language is Regular

• A regular language is a language that can be given by a regularexpression.

• A finite language is a finite set of strings:

{X1,X2, ..,Xn}

• The regular expression that gives that finite language is:

X1 + X2 + ..+ Xn

• QED

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 16/30

Page 17: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Algebra and Regular Expressions

• Commutativity laws:

L1 + L2 = L2 + L1

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 17/30

Page 18: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Algebra and Regular Expressions

• Cancellation laws:• Identities

L + ∅ = L∅+ L = LεL = LLε = L

• Annihilators

L∅ = ∅∅L = ∅

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 18/30

Page 19: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Algebra and Regular Expressions

• Associative laws:

L1 + (L2 + L3) = (L1 + L2) + L3

L1(L2L3) = (L1L2)L3

• Power associativity

L1(L1L1) = (L1L1)L1

... ... ...(L1L1)(L1L1) = (L1(L1L1))L1

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 19/30

Page 20: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Algebra and Regular Expressions

• Left and right distributive laws:

L1(L2 + L3) = L1L2 + L1L3

(L2 + L3)L1 = L2L1 + L3L1

• Idempotence:

L + L = L

• Closure laws – There are exactly two languages whose closures arefinite. What are they?

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 20/30

Page 21: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Algebra and Regular Expressions

• Left and right distributive laws:

L1(L2 + L3) = L1L2 + L1L3

(L2 + L3)L1 = L2L1 + L3L1

• Idempotence:

L + L = L

• Closure laws

ε∗ = ε∅∗ = ε(L∗)∗ = L∗

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 21/30

Page 22: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Application of Algebraic Laws

• a + ab∗

• precedence: (a + (a(b∗)))

• annihilators: (aε+ (a(b∗)))

• left distributive law: a(ε+ b∗)

• cancellation law: a(b∗)

• CONCLUSION: ab∗ = a + ab∗

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 22/30

Page 23: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Relating Concrete and Lifted Expressions

• When I write (L∗)∗ = L∗, I intended that L could be any regularexpression.

• For example, the following concrete expressions are instances ofthis identity.

1 ((a)∗)∗ = a∗

2 ((ε(a + bab + ∅) + bab)∗)∗ = (ε(a + bab + ∅) + bab)∗

• The above are called concrete regular expressions, because theycontain no variables.

• Our textbook talks about a specific class of concrete expression(above, Type 1), where each variable symbol from the liftedexpression is replaced with an alphabet constant.

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 23/30

Page 24: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Theorem Relating Concrete and LiftedExpressions

• Let E be a regular expression with variable symbols Li , i ∈ [1..m].

• For example taking m = 2, E could be (L1 + L2)∗, (L1L∗2)∗, etc

• Let E ′ be an expression where each variable Li is replaced by aconstant ai .

• Every X ∈ L(E) can be written as X1X2..Xk , where each substringXj is in one of the languages L(Li).

• Writing L(Xj) = i if Xj is from the i th language,

• aL(X1)aL(X2)..aL(Xk )∈ L(E ′)

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 24/30

Page 25: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Theorem Relating Concrete and LiftedExpressions

• BASIS: E ∈ {ε, ∅, L1}.

• The cases E ∈ {ε, ∅} are trivial, because the concrete and liftedexpressions are identical.

• Taking E = L1, then L(E) = L(L1). Also E ′ = a1 andL(E ′) = L(a1) = {a1}.

• Above, the L1/a1 correspondence is clearly satisfied.

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 25/30

Page 26: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Theorem Relating Concrete and LiftedExpressions

• INDUCTION: E ∈ {E1 + E2,E1E2,E∗1}.

• E ′1 and E ′2 are constructed so that if a variable appears in bothsubexpressions, then that is substituted with the same concretesymbol.

• For example, take E = L1L2 + L2L1.

• Then a valid concrete expression is: E = a1a2 + a2a1.

• An invalid concrete expression would be: E = a1a2 + a3a4.

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 26/30

Page 27: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Theorem Relating Concrete and LiftedExpressions

• INDUCTION: E ∈ {E1 + E2,E1E2,E∗1}.

• E ′1 and E ′2 are constructed so that if a variable appears in bothsubexpressions, then that is substituted with the same concretesymbol.

• L(E ′) = L(E ′1 + E ′2) = L(E ′1) ∪ L(E ′2)

• L(E) = L(E1 + E2) = L(E1) ∪ L(E2)

• Our inductive hypothesis gives us the correspondence for strings inL(E1)/L(E ′1).

• Also for strings in L(E2)/L(E ′2).

• Following the definition of union, we have proved this step case.

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 27/30

Page 28: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Theorem Relating Concrete and LiftedExpressions

• INDUCTION: E ∈ {E1 + E2,E1E2,E∗1}.• E ′1 and E ′2 are constructed so that if a variable appears in both

subexpressions, then that is substituted with the same concretesymbol.

• L(E ′) = L(E ′1.E′2) = L(E ′1).L(E ′2)

• L(E) = L(E1.E2) = L(E1).L(E2)

• Our inductive hypothesis gives us the correspondence for strings inL(E1)/L(E ′1).

• Also for strings in L(E2)/L(E ′2).

• Following the definition of concatenation, we have proved this stepcase.

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 28/30

Page 29: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Theorem Relating Concrete and LiftedExpressions

• INDUCTION: E ∈ {E1 + E2,E1E2,E∗1}.

• L((E ′1)∗) = L(ε) ∪ L(E ′1)+

• L(E∗1 ) = L(ε) ∪ L(E1)+

• Our inductive hypothesis gives us the correspondence for strings inL(E1)/L(E ′1).

• Because closure yields zero or more occurrences of the languagestrings, we also have a correspondence between L(E1)∗/L(E ′1)∗.

• QED

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 29/30

Page 30: Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

Theorem Relating Concrete and LiftedExpressions

• What if we added to the step cases:E ∈ {E1 ∩ E2,E1 + E2,E1E2,E∗1}?

• Let’s propose a law L1 ∩ L2 ∩ L3 = L1 ∩ L2

• The law holds for a concretization: L(a ∩ b ∩ c) = L(a ∩ b) = ∅

• But fails generally: L(a ∩ a ∩ ∅) 6= L(a ∩ a)

• So, the theorem we just proved is rather specific to regularexpressions.

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 30/30