Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs

DEGefWb5wiGH2XgYMEzUKajEmtCDUsRQ4d1A nice paper relevant to this course is titled “The Glory of the Past”2NICTA Researcher,

Adjunct at the Australian National University and Griffith University

ε-Closure, Kleene’s Theorem,and Kleene’s Algebra

Charles Gretton2

24 February 2014

NICTA Funding and Supporting Members and Partners

Kleene’s Theorem Copyright NICTA 2014 Charles Gretton2 1/30

First, a Quick Summary of ε-Closure

• I’ve previously waived my hands about ε-labelled arcs.

• Recall, ε is our notation for the empty string.

• How can we formally treat ε-labelled arcs in an NFA?

• Using a function called ECLOSE...

• A construction similar to that for NFA←→ DFA is used to proveε-NFA←→ DFA


Quick Summary of ε-Closure

• ε-NFA is 〈Q,Σ, δ, q0,F 〉.

• ECLOSE : Q → 2Q – i.e. gives an element in the powerset of states.

• It is defined inductively, as follows.

• BASIS: q ∈ ECLOSE(q) – i.e. the argument is in the basis.

• INDUCTION: Suppose p ∈ ECLOSE(q) and r ∈ δ(p, ε), thenr ∈ ECLOSE(q).


Quick Example of ε-Closure





???


Quick Example of ε-Closure





ECLOSE(1) = {1, 2, 3, 4, 6}


ε-Closure Subset Construction

• For DFA −→ ε-NFA, just add δ(q, ε) = ∅ transitions to all states –i.e. ε is not an alphabet symbol.

• ε-NFA −→ DFA

• Same as for NFA −→ DFA,

• Each DFA state corresponds to a unique set of ε-NFA states.

• Only take the ε-closure at ε-NFA transitions.


Summarizing

• NFA←→ DFA

• The equivalent NFA can be exponentially smaller than thecorresponding DFA.

• ε-NFA←→ DFA

• From transitivity: ε-NFA←→ NFA

• All formalisms so far are expressively equivalent.


Languages and Regular Expressions

• Regular expression implementations in many UNIX tools: grep,sed, awk, emacs and vi

• Following Kleene’s seminal work, we deal with regularity formally.

• We describe the language intended by a given regular expression.

• We consider the class of languages that can be expressed.

• We examine algebraic laws which inform re-writing rules forsimplifying expressions.


Operations on Languages

• L1,L2 ⊆ Σ∗

• product of languages

L1.L2 = L1L2 = {WX |W ∈ L1,X ∈ L2}

• inductive i th product

L0 = {ε} ∀i > 0, Li = LLi−1

L∗ =⋃

i≥0 Li

• positive closure

L+ = LL∗


Regular Expressions

• A regular expression is a well formed word over a given alphabet Σ

• We use the following symbols:

Σ ∪ {ε ∅ ( ) + . ∗}

• alphabet• empty string• empty set• open parenthesis• close parenthesis• “union” or “choice”• concatenation – X.Y often written as XY• Kleene closure – the “star” of this show


Regular Expressions

• A regular expression is a well formed word over a given alphabet Σ


Σ ∪ {ε ∅ ( ) + . ∗}

• “symbol” :: alphabet• constant (or 0-ary operator) :: empty string• constant (or 0-ary operator) :: empty set• punctuation :: open parenthesis• punctuation :: close parenthesis• infix binary operator :: “union” or “choice”• infix binary operator :: concatenation – X.Y often written as XY• postfix unary operator :: Kleene closure – the “star” of this show


Regular Expressions


Σ ∪ {ε ∅ ( ) + . ∗}

• Taking a ∈ Σ ∪ {ε, ∅}, regular expressions satisfy the followinggrammar:

regxp ::= a|(regxp)|regxp + regxp|regxp.regxp|regxp∗

• The Kleene closure operator, *, has the highest precedence.• Concatenation, . , is usually represented by juxtaposition – e.g.

concatenation a.b is written ab• Concatenation has the next highest precedence.• Union/choice has the least precedence.


Regular Expression Examples



• So, which of the following is correct?

ab∗ + a = (a(b∗ + a))ab∗ + a = (ab)∗ + aab∗ + a = (a(b∗)) + a


Regular Expression Examples



• So, which of the following is correct?

ab∗ + a = (a(b∗ + a))ab∗ + a = (ab)∗ + aab∗ + a = (a(b∗)) + a


Semantics of Regular Expressions

• BASISLanguage of Expression SemanticsL(ε) = {ε}L(∅) = ∅L(a), a ∈ Σ = {a}

• INDUCTION

• Below, L1 and L2 are well formed regular expressions.

Language of Expression SemanticsL(L1 + L2) = L(L1) ∪ L(L2)L(L1L2) = L(L1)L(L2)L(L∗1) = L(L1)∗

L((L1)) = L(L1)


Every Finite Language is Regular

• A regular language is a language that can be given by a regularexpression.

• A finite language is a finite set of strings:

{X1,X2, ..,Xn}

• The regular expression that gives that finite language is:

X1 + X2 + ..+ Xn

• QED


Algebra and Regular Expressions

• Commutativity laws:

L1 + L2 = L2 + L1



• Cancellation laws:• Identities

L + ∅ = L∅+ L = LεL = LLε = L

• Annihilators

L∅ = ∅∅L = ∅



• Associative laws:

L1 + (L2 + L3) = (L1 + L2) + L3

L1(L2L3) = (L1L2)L3

• Power associativity

L1(L1L1) = (L1L1)L1

... ... ...(L1L1)(L1L1) = (L1(L1L1))L1



• Left and right distributive laws:

L1(L2 + L3) = L1L2 + L1L3

(L2 + L3)L1 = L2L1 + L3L1

• Idempotence:

L + L = L

• Closure laws – There are exactly two languages whose closures arefinite. What are they?



• Left and right distributive laws:

L1(L2 + L3) = L1L2 + L1L3

(L2 + L3)L1 = L2L1 + L3L1

• Idempotence:

L + L = L

• Closure laws

ε∗ = ε∅∗ = ε(L∗)∗ = L∗


Application of Algebraic Laws

• a + ab∗

• precedence: (a + (a(b∗)))

• annihilators: (aε+ (a(b∗)))

• left distributive law: a(ε+ b∗)

• cancellation law: a(b∗)

• CONCLUSION: ab∗ = a + ab∗


Relating Concrete and Lifted Expressions

• When I write (L∗)∗ = L∗, I intended that L could be any regularexpression.

• For example, the following concrete expressions are instances ofthis identity.

1 ((a)∗)∗ = a∗

2 ((ε(a + bab + ∅) + bab)∗)∗ = (ε(a + bab + ∅) + bab)∗

• The above are called concrete regular expressions, because theycontain no variables.

• Our textbook talks about a specific class of concrete expression(above, Type 1), where each variable symbol from the liftedexpression is replaced with an alphabet constant.


Theorem Relating Concrete and LiftedExpressions

• Let E be a regular expression with variable symbols Li , i ∈ [1..m].

• For example taking m = 2, E could be (L1 + L2)∗, (L1L∗2)∗, etc

• Let E ′ be an expression where each variable Li is replaced by aconstant ai .

• Every X ∈ L(E) can be written as X1X2..Xk , where each substringXj is in one of the languages L(Li).

• Writing L(Xj) = i if Xj is from the i th language,

• aL(X1)aL(X2)..aL(Xk )∈ L(E ′)



• BASIS: E ∈ {ε, ∅, L1}.

• The cases E ∈ {ε, ∅} are trivial, because the concrete and liftedexpressions are identical.

• Taking E = L1, then L(E) = L(L1). Also E ′ = a1 andL(E ′) = L(a1) = {a1}.

• Above, the L1/a1 correspondence is clearly satisfied.



• INDUCTION: E ∈ {E1 + E2,E1E2,E∗1}.

• E ′1 and E ′2 are constructed so that if a variable appears in bothsubexpressions, then that is substituted with the same concretesymbol.

• For example, take E = L1L2 + L2L1.

• Then a valid concrete expression is: E = a1a2 + a2a1.

• An invalid concrete expression would be: E = a1a2 + a3a4.




• E ′1 and E ′2 are constructed so that if a variable appears in bothsubexpressions, then that is substituted with the same concretesymbol.

• L(E ′) = L(E ′1 + E ′2) = L(E ′1) ∪ L(E ′2)

• L(E) = L(E1 + E2) = L(E1) ∪ L(E2)

• Our inductive hypothesis gives us the correspondence for strings inL(E1)/L(E ′1).

• Also for strings in L(E2)/L(E ′2).

• Following the definition of union, we have proved this step case.



• INDUCTION: E ∈ {E1 + E2,E1E2,E∗1}.• E ′1 and E ′2 are constructed so that if a variable appears in both

subexpressions, then that is substituted with the same concretesymbol.

• L(E ′) = L(E ′1.E′2) = L(E ′1).L(E ′2)

• L(E) = L(E1.E2) = L(E1).L(E2)


• Also for strings in L(E2)/L(E ′2).

• Following the definition of concatenation, we have proved this stepcase.




• L((E ′1)∗) = L(ε) ∪ L(E ′1)+

• L(E∗1 ) = L(ε) ∪ L(E1)+


• Because closure yields zero or more occurrences of the languagestrings, we also have a correspondence between L(E1)∗/L(E ′1)∗.

• QED



• What if we added to the step cases:E ∈ {E1 ∩ E2,E1 + E2,E1E2,E∗1}?

• Let’s propose a law L1 ∩ L2 ∩ L3 = L1 ∩ L2

• The law holds for a concretization: L(a ∩ b ∩ c) = L(a ∩ b) = ∅

• But fails generally: L(a ∩ a ∩ ∅) 6= L(a ∩ a)

• So, the theorem we just proved is rather specific to regularexpressions.


Documents

Closure, Kleene’s Theorem, and Kleene’s Algebrausers.cecs.anu.edu.au/~charlesg/COMP3630_Lecture3.pdf · Regular expression implementations in many UNIX tools: grep, sed,awk emacs