39
Lecture 09: Theory of Automata:08 Context Free Grammars

38142_Lec 09-b Context Free Grammar

Embed Size (px)

DESCRIPTION

automata

Citation preview

Lecture 09: Theory of Automata:08

Context Free Grammars

Lecture 09: Theory of Automata:08

2

The Total Language Tree

• It is possible to depict the generation of all the words in the language of a CFG simultaneously in one big (possibly infinite) tree.

Definition:• For a given CFG, we define a tree with the start symbol

S as its root and whose nodes are working strings of terminals and nonterminals. The descendants of each node are all the possible results of applying every applicable production to the working string, one at a time. A string of all terminals is a terminal node in the tree. The resultant tree is called the total language tree of the CFG.

Lecture 09: Theory of Automata:08

3

Example

• Consider the CFG:

S → aa | bX |aXX

X → ab |b• The total language tree is

Lecture 09: Theory of Automata:08

4

• The above total language has only 7 different words.

• Four of its words (abb, aabb, abab, aabab) have two different derivations because they appear as terminal nodes in two different places.

• However, these words are NOT generated by two different derivation trees. Hence, the CFG is unambiguous. For example,

Lecture 09: Theory of Automata:08

5

Example

• Consider the CFG:

S → aSb | bS | a• The language of this CFG is infinite, so is the

total language tree: The tree may get arbitrary wide as well as infinitely long.

• Can you draw the beginning part of this total language tree?

Lecture 09: Theory of Automata:08

6

Semi Word

• For a given CFG, semiword is a string of terminals (may be none) concatenated with with exactly one non-terminal (on the right).

• In general semiword has the shape

(terminal) (terminal)….(terminal) (Non-Terminal)

e.g. aaaX abcY bbY

A word is a string of terminals only (zero or more terminals)

Lecture 09: Theory of Automata:08

7

Regular Grammar

Given an FA, there is a CFG that generates exactly the language accepted by the FA.

– In other words, all regular languages are CFLs

CFL

Regular

Lecture 09: Theory of Automata:08

8

Creating a CFG from an FA

Step-1 The Non-terminals in CFG will be all names of the states in the FA with the start state renamed S.

Step-2 For every edge

Create productions XaY or XaX

Do the same for b-edges

Step-3 For every final-state X, create the production

X ya

x

a

Lecture 09: Theory of Automata:08

9

Example

S aMS bSM aFM bSF aFF bFF Λ

S- M

a

b

b

x

a,b

a

Note: It is not necessary that each CFG has a corresponding FA. But each FA has an equivalent CFG.

Lecture 09: Theory of Automata:08

10

Regular Grammar

Theorem 22:

If all the productions in a given CFG fit one of the two forms: Non-terminal semiword

or Non-terminal word

(Where the word may be a Λ or string of terminal), then the language generated by the CFG is Regular.

Proof:

For a CFG to be regular is by constructing a TG from the given CFG.

Lecture 09: Theory of Automata:08

11

Proof contd.• Let us consider a general CFG in this form

N1 w1N2 N7 w10

N1 w2N3 N18 w23

N2 w3N4 ---------------

--------------- ---------------

Where N’s are non-terminal and w’s are the string of terminal and part wyNz are semiwords.

Let N1=S. Draw a small circle for each N and one extra circle labelled +, the circle for S we label (-)

-SN3

N2

Nx

N13

+…… ……

Lecture 09: Theory of Automata:08

12

Proof contd.

• For each production of the form Nx wyNz, draw a directed edge from state Nx to Nz with label wy.

• If Nx = Nz, the path is a loop• For every production of the form Np wq, draw a directed edge

from Np to + and label it with wq even if wq = Λ.

• Any path in TG form – to + corresponds to a word in the language of TG (by concatenating symbols) and simultaneously corresponds to sequence of productions on the CFG generating words.

• Conversely every production of the word in the CFG: S wN wwN wwwN ….. wwwwwCorresponds to a path in this TG.

Nx Nzwy

Np +wq

Lecture 09: Theory of Automata:08

13

Example

• Consider the CFG S aaS | bbS | Λ

• The regular expression is given by (aa + bb)*.• Consider the CFG

SaaS | bbS | abX | baX | Λ

X aaX | bbX | abS | baS

-

aa

+bb Λ

-

aa,bb

XΛ+

ab, ba

ab, ba aa,bb

• Language accepted?

• EVEN-EVEN

Lecture 09: Theory of Automata:08

14

Killing Λ-Productions

Λ-Productions:

In a given CFG, we call a non-terminal N null able

if there is a production N Λ, or there is a

derivation that starts at N and lead to a Λ.

N ……… Λ• Λ-Productions are undesirable.• We can replace Λ-production with appropriate

non-Λ productions.

Lecture 09: Theory of Automata:08

15

Theorem 23

• If L is CFL generated by a CFG having Λ-productions, then there is a different CFG that has no Λ-production and still generates either the whole language L (if L does not include Λ) or else generate the language of all the words in L other than Λ.

• Replacement Rule.1. Delete all Λ-Productions.2. Add the following productions:

For every production of the X old stringAdd new production of the form X .., where right side willaccount for every modification of the old string that can beformed by deleting all possible subsets of null-ableNon-Terminals, except that we do not allow X Λ, to beformed if all the character in old string are null-able

Lecture 09: Theory of Automata:08

16

Example

Old nullable New

Production Production

X Y nothing

X Λ nothing

Y X nothing

S Xb S b

S aYa S aa

So the new CFG is

S a | Xb | aa | aYa |b

X Y

Y b | X

Consider the CFGS a | Xb | aYaX Y | ΛY b | X

Lecture 09: Theory of Automata:08

17

Example

Old nullable New

Production Production

S Xa S a

X aX X a

X bX X b

So the new CFG is

S a | Xa

X aX | bX | a | b

Consider the CFGS Xa X aX | bX | Λ

Lecture 09: Theory of Automata:08

18

Example

S XY X ZbY bW Z ABW Z A aA | bA | ΛB Ba | Bb | Λ

• Null-able Non-terminals are?

• A, B, Z and W

Lecture 09: Theory of Automata:08

19

Example Contd.

Old nullable New Production ProductionX Zb X bY bW Y bZ AB Z A and Z

BW Z Nothing newA aA A aA bA A bB Ba B aB Bb B b

So the new CFG is

S XY

X Zb | b

Y bW | b

Z AB | A | B

W Z

A aA | bA | a | b

B Ba | Ba | a | b

S XY X ZbY bW Z ABW Z A aA | bA | ΛB Ba | Bb | Λ

Lecture 09: Theory of Automata:08

20

Killing unit-productions

• Definition: A production of the form• Nonterminal one Nonterminal

is called a unit production.• The following theorem allows us to get rid of unit

productions:

Theorem 24:

If there is a CFG for the language L that has no

Λ-productions, then there is also a CFG for L with

no Λ-productions and no unit productions.

Lecture 09: Theory of Automata:08

21

Proof of Theorem 24

• This is another proof by constructive algorithm.• Algorithm: For every pair of nonterminals A and B, if the

CFG has a unit production A B, or if there is a chain

A X1 X2 … B

where X1, X2, ... are nonterminals, create new productions

as follows:• If the non-unit productions from B are

B s1 | s2| …

where s1, s2, ... are strings, we create the productions

A s1| s2| …

Lecture 09: Theory of Automata:08

22

Example

• Consider the CFGS A| bbA B | bB S | a

• The non-unit productions areS bb, A b ,B a

• And unit productions areS AA BB S

Lecture 09: Theory of Automata:08

23

Example contd.

• Let’s list all unit productions and their sequences and create new productions:S A gives S bS A B gives S aA B gives A aA B S gives A bbB S gives B bbB S A gives B b

• Eliminating all unit productions, the new CFG isS bb | b | aA b | a | bbB a | bb | b

• This CFG generates a finite language since there are no nonterminals in any strings produced from S.

Lecture 09: Theory of Automata:08

24

Useless Symbols

• Let a CFG G. A symbol X ε (V U ∑) is useful if there is a derivation

Where U and V ε (V U ∑) and w ε ∑*. A symbol that is not useful is useless• A terminal is useful if it occurs in a string of the language of G.• A variable is useful if it occurs in a derivation that begins from S and

generates a terminal string

For a variable to be useful two conditions must be satisfied.

1. The variable must occur in a sentential form of the grammar2. There must be a derivation of a terminal string from the variable.• A variable that occurs in a sentential form is said to be reachable from

S.• A two part procedure is presented to eliminate useless symbols.

* *

G GS UxV w

Lecture 09: Theory of Automata:08

28

Useless Productions

aAA

AS

S

aSbS

aAaaaaAaAAS

Some derivations never terminate...

Useless Production

Lecture 09: Theory of Automata:08

29

bAB

A

aAA

AS

Another grammar:

Not reachable from S

Useless Production

Lecture 09: Theory of Automata:08

30

In general:

if wxAyS

then variable is usefulA

otherwise, variable is uselessA

)(GLw

contains only terminals

Lecture 09: Theory of Automata:08

31

A production is useless if any of its variables is useless

xA

DC

CB

aAA

AS

S

aSbS

Productions

useless

useless

useless

useless

Variables

useless

useless

useless

Lecture 09: Theory of Automata:08

32

Removing Useless Productions

Example Grammar:

aCbC

aaB

aA

CAaSS

||

Lecture 09: Theory of Automata:08

33

First: find all variables that can producestrings with only terminals

aCbC

aaB

aA

CAaSS

|| },{ BA

AS

},,{ SBA

Round 1:

Round 2:

Lecture 09: Theory of Automata:08

34

Keep only the variablesthat produce terminal symbols:

aCbC

aaB

aA

CAaSS

||

},,{ SBA

aaB

aA

AaSS

|

(the rest variables are useless)

Remove useless productions

Lecture 09: Theory of Automata:08

35

Second:Find all variablesreachable from

aaB

aA

AaSS

|

S A B

Use a Dependency Graph

notreachable

S

Lecture 09: Theory of Automata:08

36

Keep only the variablesreachable from S

aaB

aA

AaSS

|

aA

AaSS

|

Final Grammar(the rest variables are useless)

Remove useless productions

Lecture 09: Theory of Automata:08

37

Set of variables that Derive terminal symbols

• Input = CFG (V, ∑, P , S)

• TERM = { A | there is a rule Aw ε P with w ε ∑*

• repeat– PREV = TERM– For each variable in A ε V do

• If there is a rule A w and w ε (PREV U ∑)* then

TERM = TERM U {A}

• Until PREV = TERM

Lecture 09: Theory of Automata:08

38

Example

• Consider following CFG

G: S AC | BS | B

A aA | aF

B CF | b

C cC | D

D aD | BD | C

E aA | BSA

F bB | b

Lecture 09: Theory of Automata:08

39

S AC | BS | BA aA | aFB CF | bC cC | DD aD | BD | CE aA | BSAF bB | b

{B, F, A, S, E}{B, F, A, S, E}3

{B, F, A, S}{B, F, A, S, E}2

{B, F}{B, F, A, S}1

{}{B, F}0

PREVTERMIteration

• New Grammar from TERM will beGT:S BS | BA aA | aFB bE aA | BSAF bB | b

Lecture 09: Theory of Automata:08

40

Construction of set of reachable Variables

• Input = CFG (V, ∑, P , S)• REACH = {S}

1. PREV = null

2. repeati. NEW = REACH – PREV

ii. PREV = REACH

iii. For each variable A in NEW doi. For each rule A w do add all variables in w to

REACH

3. Until REACH = PREV

Lecture 09: Theory of Automata:08

41

{S, B}{S, B}3

{S, B}{S, B}2

{S}{S, B}1

{}{S}0

PREVREACHIteration

GT:S BS | BA aA | aFB bE aA | BSAF bB | b

NEW

{S}

{B}

Lecture 09: Theory of Automata:08

42

Removing All

• Step 1: Remove Nullable Variables

• Step 2: Remove Unit-Productions

• Step 3: Remove Useless Variables