29
Computational Linguistics II: Parsing Formal Languages: Regular Languages II Frank Richter & Jan-Philipp S ¨ ohn [email protected], [email protected] Computational Linguistics II: Parsing – p.1

Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Computational Linguistics II:Parsing

Formal Languages: Regular Languages II

Frank Richter & Jan-Philipp Sohn

[email protected], [email protected]

Computational Linguistics II: Parsing – p.1

Page 2: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Reminder: The Big Picturehierarchy grammar machine othertype 3 reg. grammar DFA reg. expressions

NFAdet. cf. LR(k) grammar DPDAtype 2 CFG PDAtype 1 CSG LBAtype 0 unrestricted Turing

grammar machine

DFA: Deterministic finite state automaton

(D)PDA: (Deterministic) Pushdown automaton

CFG: Context-free grammar

CSG: Context-sensitive grammarLBA: Linear bounded automaton

Computational Linguistics II: Parsing – p.2

Page 3: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Form of Grammars of Type 0–3For i ∈ {0, 1, 2, 3}, a grammar 〈N,T, P, S〉 of Type i, with N

the set of non-terminal symbols, T the set of terminalsymbols (N and T disjoint, Σ = N ∪ T ), P the set ofproductions, and S the start symbol (S ∈ N ), obeys thefollowing restrictions:

T3: Every production in P is of the form A → aB or A → ǫ,with B,A ∈ N , a ∈ T .

T2: Every production in P is of the form A → x, with A ∈ N

and x ∈ Σ∗.

T1: Every production in P is of the form x1Ax2 → x1yx2, withx1, x2 ∈ Σ∗, y ∈ Σ+, A ∈ N and the possible exception ofC → ǫ in case C does not occur on the righthand side ofa rule in P .

T0: No restrictions.Computational Linguistics II: Parsing – p.3

Page 4: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Regular Languages

Regular grammars,

Computational Linguistics II: Parsing – p.4

Page 5: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Regular Languages

Regular grammars,

deterministic finite state automata,

Computational Linguistics II: Parsing – p.4

Page 6: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Regular Languages

Regular grammars,

deterministic finite state automata,

nondeterministic finite state automata, and

Computational Linguistics II: Parsing – p.4

Page 7: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Regular Languages

Regular grammars,

deterministic finite state automata,

nondeterministic finite state automata, and

regular expressions

Computational Linguistics II: Parsing – p.4

Page 8: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Regular Languages

Regular grammars,

deterministic finite state automata,

nondeterministic finite state automata, and

regular expressions

characterize the same class of languages, viz. Type 3languages.

Computational Linguistics II: Parsing – p.4

Page 9: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Reminder: DFA

Definition 1 (DFA) A deterministic FSA (DFA) is aquintuple (Σ, Q, i, F, δ) where

Σ is a finite set called the alphabet,

Q is a finite set of states,

i ∈ Q is the initial state,

F ⊆ Q the set of final states, and

δ is the transition function from Q × Σ to Q.

Computational Linguistics II: Parsing – p.5

Page 10: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Reminder: Acceptance

Definition 3 (Acceptance)

Given a DFA M = (Σ, Q, i, F, δ), the language L(M)accepted by M is

L(M) = {x ∈ Σ∗|δ(i, x) ∈ F}.

Computational Linguistics II: Parsing – p.6

Page 11: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Nondeterministic Finite-state Automata

Definition 4 (NFA) A nondeterministic finite-stateautomaton is a quintuple (Σ, Q, S, F, δ) where

Σ is a finite set called the alphabet,

Q is a finite set of states,

S ⊆ Q is the set of initial states,

F ⊆ Q the set of final states, and

δ is the transition function from Q × Σ to Pow(Q).

Computational Linguistics II: Parsing – p.7

Page 12: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Theorem (Rabin/Scott)

For every language accepted by an NFA there is a DFAwhich accepts the same language.

Computational Linguistics II: Parsing – p.8

Page 13: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Regular Expressions

Given an alphabet Σ of symbols the following are all andonly the regular expressions over the alphabetΣ ∪ {Ø, 0, |, ∗, [, ]}:

Ø empty set

0 the empty string (ǫ, [])

σ for all σ ∈ Σ

[α | β] union (for α, β reg.ex.) (α ∪ β, α + β)

[α β] concatenation (for α, β reg.ex.)

[α*] Kleene star (for α reg.ex.)

Computational Linguistics II: Parsing – p.9

Page 14: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Meaning of Regular Expressions

L(Ø) = ∅ the empty language

L(0) = {0} the empty-string language

L(σ) = {σ}

L([α | β]) = L(α) ∪ L(β)

L([α β]) = L(α) ◦ L(β)

L([α∗]) = (L(α))*

Σ∗ is called the universal language. Note that the universallanguage is given relative to a particular alphabet.

Computational Linguistics II: Parsing – p.10

Page 15: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Theorem (Kleene)

The set of languages which can be described by regularexpressions is the set of regular languages.

Computational Linguistics II: Parsing – p.11

Page 16: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Pumping Lemma for Regular Languages

uvw theorem:

For each regular language L there is an integer n such thatfor each x ∈ L with |x| ≥ n there are u, v, w with x = uvw

such that

1. |v| ≥ 1,

2. |uv| ≤ n,

3. for all i ∈ IN0: uviw ∈ L.

Computational Linguistics II: Parsing – p.12

Page 17: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

A Non-regular Language

CorollaryLet Σ be {a,b}.L = {anbn | n ∈ IN} is not regular.

ProofAssume k ∈ IN. For each akbk = uvw with v 6= ǫ

1. v = al, 0< l ≤ k, or

2. v = al1bl2, 0< l1, l2 ≤ k, or

3. v = bl, 0< l ≤ k, or

In each case we have uv2w 6∈ L. The result follows with thePumping Lemma.

Computational Linguistics II: Parsing – p.13

Page 18: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Natural and Regular Languages

Corollary German is not a regular language.

Proof ConsiderL1={Ein Spion (der einen Spion)k observiertl wird meist

selbst observiert}L1 is regular.

L1 ∩ Deutsch ={Ein Spion (der einen Spion)k observiertk wird meist selbstobserviert}

is not regular.

Computational Linguistics II: Parsing – p.14

Page 19: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Theorem (Myhill/Nerode)

The following three statements are equivalent:

1. The set L ⊆ Σ∗ is accepted by some DFA.

2. L is the union of some of the equivalence classes of aright invariant equivalence relation of finite index.

3. Let equivalence relation RL be defined by: xRLy iff forall z ∈ Σ∗, xz ∈ L iff yz ∈ L. Then RL is of finite index.

Computational Linguistics II: Parsing – p.15

Page 20: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Minimization

For every nondeterministic finite-state automaton thereexists an equivalent deterministic automaton with a minimalnumber of states.

Computational Linguistics II: Parsing – p.16

Page 21: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Closure Properties of Regular Languages

Regular languages are closed under

union

intersection

complement

product

Kleene star

Computational Linguistics II: Parsing – p.17

Page 22: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Closure Properties of Regular Languages

Regular languages are closed under

union (regular expression)

intersection

complement

product (regular expression)

Kleene star (regular expression)

Computational Linguistics II: Parsing – p.17

Page 23: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Closure Properties of Regular Languages

Regular languages are closed under

union (regular expression)

intersection (e.g. constructive)

complement

product (regular expression)

Kleene star (regular expression)

Computational Linguistics II: Parsing – p.17

Page 24: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Closure Properties of Regular Languages

Regular languages are closed under

union (regular expression)

intersection (e.g. constructive)

complement (DFA)

product (regular expression)

Kleene star (regular expression)

Computational Linguistics II: Parsing – p.17

Page 25: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Decidable Problems for Reg. Languages

1. Word problem

Computational Linguistics II: Parsing – p.18

Page 26: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Decidable Problems for Reg. Languages

1. Word problem

2. Emptiness

Computational Linguistics II: Parsing – p.18

Page 27: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Decidable Problems for Reg. Languages

1. Word problem

2. Emptiness

3. Finiteness

Computational Linguistics II: Parsing – p.18

Page 28: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Decidable Problems for Reg. Languages

1. Word problem

2. Emptiness

3. Finiteness

4. Intersection

Computational Linguistics II: Parsing – p.18

Page 29: Computational Linguistics II: Parsing · Computational Linguistics II: Parsing – p.2. Form of Grammars of Type 0–3 For i ∈ {0,1,2,3}, a grammar hN,T,P,Si of Type i, with N the

Decidable Problems for Reg. Languages

1. Word problem

2. Emptiness

3. Finiteness

4. Intersection

5. Equivalence

Computational Linguistics II: Parsing – p.18