9

Click here to load reader

1 Section 12.4 Context-Free Language Topics Algorithm. Remove -productions from grammars for langauges without . 1.Find nonterminals that derive

Embed Size (px)

Citation preview

Page 1: 1 Section 12.4 Context-Free Language Topics Algorithm. Remove  -productions from grammars for langauges without . 1.Find nonterminals that derive

1

Section 12.4 Context-Free Language Topics

Algorithm. Remove -productions from grammars for langauges without .

• Find nonterminals that derive .

• For each production A w construct all productions A w’ where w’ is obtained from w by removing one or more occurrences of the nonterminals from Step 1.

1. Combine the original productions with those of step 2 and eliminate any -productions.

Example. Remove -productions from the grammar

S ABc A aA | B bB | .

Solution. Step 1: The nonterminals A and B derive .Step 2: From the production S ABc we construct S Bc | Ac | c.From the production A aA we construct A a.From the production B bB we construct B b.Step 3: S ABc | Bc | Ac | c

A aA | aB bB | b.

Quiz. Remove -productions from

S ABc | Ab | cA ABa | B Bbc | .

Solution.

S ABc | Ab | c| Bc | Ac | bA ABa | Ba | Aa | aB Bbc | bc.

Page 2: 1 Section 12.4 Context-Free Language Topics Algorithm. Remove  -productions from grammars for langauges without . 1.Find nonterminals that derive

2

Chomsky Normal Form. Productions have one of the following forms

A a (a a terminal)A BC S (if is in the language).

Advantages: Parse trees are binary, which are easy to represent. Any string of length n > 0 can be derived in 2n – 1 steps.

Algorithm. Transform to Chomsky normal form (with the additional property that no start symbol occurs on the right side of a production)

1. If the start symbol S occurs on some right side, create a new start symbol S´ and a new production S´ S.

2. Remove A (if A ≠ S) by previous algorithm. (If S is removed, add it back.)

3. Remove unitproductions (i.e., A B): If A B or A + B, then construct productions A w where B w is not a unit production. Now remove all unit productions.

4. For each production whose right side has two or more symbols, replace all occurrrences of each terminal a with a new nonterminal A and also add the new production A a.

5. Replace each production B C1…Cn with n > 2 with B C1D where D C2 …Cn.Repeat this step until all right sides have length two.

Page 3: 1 Section 12.4 Context-Free Language Topics Algorithm. Remove  -productions from grammars for langauges without . 1.Find nonterminals that derive

3

Example. Construct a Chomsky normal form for the grammar

S aSb | D D Dc | .

Solution.

Step 1: Add the production S´ S.

Step 2: S´ S | S aSb | ab | DD Dc | c.

Step 3: S´ aSb | ab | Dc | c | S aSb | ab | Dc | cD Dc | c.

Step 4: S´ ASB | AB | DC | c | S ASB | AB | DC | c D DC | cA aB bC c.

Step 5: Replace S´ ASB and S ASB by

S´ AE, S´ AE, and E SB.

Page 4: 1 Section 12.4 Context-Free Language Topics Algorithm. Remove  -productions from grammars for langauges without . 1.Find nonterminals that derive

4

Quiz. Construct a Chomsky normal form for the grammar

S aTbb | U | T cT | cU Ud | d.

Solution. Step 1: No change. Step 2: No change.

Step 3: Remove unit production S U to obtain

S aTbb | Ud | d | T cT | cU Ud | d.

Step 4: Transform right sides of length at least two into strings of nonterminals.

S ATBB | UD | d | T CT | cU UD | d. A aB bC cD d.

Step 5: Replace S ATBB with the productions

S AE, E TF, F BB.

Page 5: 1 Section 12.4 Context-Free Language Topics Algorithm. Remove  -productions from grammars for langauges without . 1.Find nonterminals that derive

5

Greibach Normal Form. Productions have one of the following forms

A b (b a terminal)A bD1…Dk

S (if is in the language).

Advantage: Any string of length n > 0 can be derived in n steps.

Algorithm (idea). Transform context-free grammar to Greibach normal form.

• Perform steps 1, 2, and 3 of the Chomsky algorithm.• Remove all left-recursion, including indirect, without adding 3. Make substitutions to transform the grammar into the proper form.

Example. Put the following grammar into Greibach normal form.

S AB | Ac | dA aA | aB Ab | c.

Solution: Steps 1 (Chomsky steps 1, 2, and 3) and 2 are non needed.

Step 3: Replace A in S AB | Ac | d with aA | a to obtain

S aAB | aB | aAc | ac | d.

Replace A in B Ab | c with theright side of A aA | a to obtain B aAb | ab | c.Now add the new productions C c and D b to obtain the proper form:

S aAB | aB | aAC | aC | dA aA | aB aAD | aD | cC cD b.

Page 6: 1 Section 12.4 Context-Free Language Topics Algorithm. Remove  -productions from grammars for langauges without . 1.Find nonterminals that derive

6

Example. Put the following grammar into Greibach normal form.

S AB | Ac | dA aA | aB Ab | c.

Solution: Steps 1 (Chomsky steps 1, 2, and 3) and 2 are not needed.

Step 3: Replace A in S AB | Ac | d with aA | a to obtain

S aAB | aB | aAc | ac | d.

Replace A in B Ab | c with theright side of A aA | a to obtain

B aAb | ab | c.

Now add the new productions C c and D b and make appropriate replacementsto obtain the proper form:

S aAB | aB | aAC | aC | dA aA | aB aAD | aD | cC cD b.

Page 7: 1 Section 12.4 Context-Free Language Topics Algorithm. Remove  -productions from grammars for langauges without . 1.Find nonterminals that derive

7

Properties of Context-Free LanguagesWhen we know some properties of context-free languages they can help us argue, BWOC, that certain languages are not context-free.

The Pumping LemmaIf L is an infinite context-free language, then any grammar for L must be recursive, so there must be derivations of the the following form where u, v, w, x, and y are terminal strings.

S + uNy N + vNx (where v and x are not both )N + w.

These derivations lead to derivations like

S + uNy + uvNxy + uv2Nx2y + uvkNxky + uvkwxky L for all k N.

This is the basis for the Pumping Lemma:

There is an integer m > 0 such that if z L and | z | ≥ m, then z has the form z = uvwxy where 1 ≤ | vx | ≤ | vwx | ≤ m and uvkwxky L for all k N.

Note: The number m depends on the grammar as we’ll see in the following example.

Page 8: 1 Section 12.4 Context-Free Language Topics Algorithm. Remove  -productions from grammars for langauges without . 1.Find nonterminals that derive

8

For this grammar m = 4 can be used in the pumping lemma because any derivation of astring z with | z | ≥ 4 must use the nonterminal N. For example, if | z | = 8 and z = abcccccd,then the pumping lemma factors z = abcccccd = uvwxy where 1 ≤ | vx | ≤ | vwx | ≤ 4 anduvkwxky L for all k N. In this case let u = a, v = , w = b, x = c, and y = ccccd.

Example. The language L = {anbncn+k | k, n N} is not context-free.

Proof: Assume, BWOC, that L is context-free. L is infinite, so pumping lemma applies. Choose z = ambmcm where m is the positive integer from the lemma. Then z = ambmcm = uvwxy where 1 ≤ | vx | ≤ | vwx | ≤ m and uvkwxky L for all k N. Observe neither v nor x can contain distinct letters. For example, if v = …a…b…, then v2 = …a…b……a…b…, which can’t appear as a substring of any string in L.So v and x must be strings of repeated occurrences of a single letter.Now since | vwx | ≤ m, there are two possible places in ambmcm where v and x must occur:

(1) v and x occur in ambm. (2) v and x occur in bmcm.

But we obtain the following contradictions because v and x are not both .(1) Let k = 2 to obtain uv2wx2y = am+ibm+jcm, where i > 0 or j > 0. So uv2wx2y L (2) Let k = 0 to obtain uwy = ambm-icm–j, where i > 0 or j > 0. So we have uwy L. These contradictions imply that L is not context-free. QED.

Example. Suppose we have the following grammar for {, bbc} {abcnd | n N}.

S aNd | bbc | N Nc | b.

Here are a few derivations:S aNd abdS aNd aNcd abcdS aNd aNcd aNccd abccdS + abckd for any k in N.

Page 9: 1 Section 12.4 Context-Free Language Topics Algorithm. Remove  -productions from grammars for langauges without . 1.Find nonterminals that derive

9

Example/Quiz. Prove that the language L = {ss | s {a, b}*} is not context-free.

Proof: Assume, BWOC, that L is context-free. L is infinite, so pumping lemma applies. Choose z = ambmambm where m is the positive integer from the lemma. Then z = ambmambm = uvwxy where 1 ≤ | vx | ≤ | vwx | ≤ m and uvkwxky L for all k N.Now since | vwx | ≤ m, there are three possible places in ambmambm where v and x must occur:

(1) v and x occur in ambm (on the left of z).(2) v and x occur in bmam (in the center of z).(3) v and x occur in ambm (on the right of z).

Notice that v and x can consist only of repetitions a single letter. For example, in case (1) suppose v = aibj for some i > 0 and j > 0 and x = bn for some n ≥ 0. Then, letting k = 0, we would obtain uwy = am–ibm–j–nambm, which cannot be in L. The argument is similar for the other cases. So v and x must consist only of repetitions of a single letter.

We need to find a contradiction in each of the three cases. We’ll do it by using k = 0. This tells us that uwy L. But we obtain the following contradictions because v and x are not both .

(1) uwy = am–ibm–jambm where either i > 0 or j > 0 So uwy L, (2) uwy = ambm–iam–jbm where either i > 0 or j > 0. So uwy L. (3) uwy = ambmam–ibm–j where either i > 0 or j > 0. So uwy L.

Therefore L is not context-free. QED.

Remark: Be careful that the choice of z is not in a context-free sublanguage of L. For example, if we chose z = (ab)m(ab)m in the preceding example, we would not get any contradictions.