Grammars Automata

Embed Size (px)

Citation preview

  • 8/20/2019 Grammars Automata

    1/88

    Introduction to Formal Languages,Automata and Computability

    K. Krithivasan and R. Rama

    Introduction to Formal Languages, Automata and Computability – p.1/74

  • 8/20/2019 Grammars Automata

    2/88

    Language

    Strings are defined over an alphabet which is finite.Alphabet may vary depending upon the application.

    Elements of an alphabet are called symbols. Usuallywe denote the basic alphabet set either as Σ or T . Forexample, the following are a few examples of analphabet set.

    Σ1   =   {a, b}

    Σ2   =   {0, 1, 2}

    Σ3   =   {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.

    Introduction to Formal Languages, Automata and Computability – p.2/74

  • 8/20/2019 Grammars Automata

    3/88

    contd.

    A string or word is a finite sequence of symbols from

    the alphabet, usually written as concatenated symbols

    and not separated by gaps or commas. For example if 

    Σ =  {a, b}, a string abbab  is a string or word over Σ.

    If  w  is a string over an alphabet  Σ, then the length of w written as len(w) or |w| is the number of symbols it

    contains. If  |w|  = 0

    , then w

     is called as empty string

    denoted either as λ or .

    Introduction to Formal Languages, Automata and Computability – p.3/74

  • 8/20/2019 Grammars Automata

    4/88

    contd.

    For any word   w,   w   =   w   =   w. For any string

    w   =   a1 . . . an   of length  n, the reverse of  w   is writ-

    ten as wR which is the string anan−1 . . . a1, where each

    symbol ai  belongs to the basic alphabet Σ. A string z 

    that is appearing consecutively within another string wis called a substring or subword of w. For example aab

    is a substring of  baabb

    .

    Introduction to Formal Languages, Automata and Computability – p.4/74

  • 8/20/2019 Grammars Automata

    5/88

    contd.

    The set of all strings over an alphabet Σ is denoted byΣ∗ which includes the empty string . For example for

    Σ = {0, 1}, Σ∗

    = {, 0, 1, 00, 01, 10, 11, . . . }. Notethat Σ∗ is a countably infinite set. Also Σn denotes theset of all strings over Σ whose length is n. Hence

    Σ∗

    = Σ0

    ∪ Σ1

    ∪ Σ2

    ∪ Σ3

    . . .   andΣ+ = Σ1 ∪ Σ2 ∪ Σ3 . . . . Subsets of  Σ∗ are calledlanguages. For example if  Σ = {a, b}

    L1   =   {,a,b}

    L2   =   {ab, aabb, aaabbb, . . . }

    L3   =   {w ∈ Σ∗/|w|a  = |w|b}

    In the above example, L1 is finite, L2 and L3 areinfinite languages. φ denotes an empty language.

    Introduction to Formal Languages, Automata and Computability – p.5/74

  • 8/20/2019 Grammars Automata

    6/88

    contd.

    DefinitionA set is an unordered collection of objects.

    Introduction to Formal Languages, Automata and Computability – p.6/74

  • 8/20/2019 Grammars Automata

    7/88

    contd.

    DefinitionA set is an unordered collection of objects.Example Let W  denote the set of well formed

    parentheses. It can be defined inductively as follows:Basis clause : [ ] ∈ W Inductive clause : if  x, y ∈ W , xy ∈ W   and [x] ∈ W 

    Extremal clause : No object is a member of  W   unlessits being so follows from a finite number of applica-

    tions of the basis and the inductive clauses.

    Introduction to Formal Languages, Automata and Computability – p.6/74

  • 8/20/2019 Grammars Automata

    8/88

    Language

    Definition Let  Σ be any alphabet set.  Σ+ is a set of nonempty

    strings over  Σ defined as follows:

    1.   Basis :  If  a  ∈  Σ, then  a  ∈  Σ+.

    2.   Induction :  If  α  ∈  Σ+ and  a  ∈  Σ,  αa, aα are in  Σ+.

    3. No other element belong to Σ

    +

    .Clearly the set  Σ+ contains all strings of length  n,  n  ≥  1.

    Introduction to Formal Languages, Automata and Computability – p.7/74

  • 8/20/2019 Grammars Automata

    9/88

    Language

    Definition Let  Σ be any alphabet set.  Σ+ is a set of nonempty

    strings over  Σ defined as follows:

    1.   Basis :  If  a  ∈  Σ, then  a  ∈  Σ+.

    2.   Induction :  If  α  ∈  Σ+ and  a  ∈  Σ,  αa, aα are in  Σ+.

    3. No other element belong to Σ

    +

    .Clearly the set  Σ+ contains all strings of length  n,  n  ≥  1.

    Definition Let  Σ be any alphabet set.  Σ∗ is defined as follows:

    1.   Basis :    ∈  Σ∗.

    2.   Induction :  If  α  ∈  Σ∗,  a  ∈  Σ, then  aα, αa  ∈  Σ∗.

    3. No other element is in Σ∗.

    Introduction to Formal Languages, Automata and Computability – p.7/74

  • 8/20/2019 Grammars Automata

    10/88

    contd.

    Since languages are sets, one can define the set-

    theoretic operations of union, intersection, difference,

    complement in the usual fashion. The following oper-

    ations are also defined for languages. If  x = a1 . . . an,

    y  =  b1 . . . bm, the concatenation of  x  and y  is definedas   xy   =   a1 . . . anb1 . . . bm. The catenation (or con-

    catenation) of two languages L1

     and L2

     is defined by,

    L1L2   =   {w1w2/w1   ∈   L1 and w2   ∈   L2}. Note that

    concatenation of languages is associative because con-

    catenation of strings is associative. Also L0 = {} and

    L   =   L =   , L = L = L.   Introduction to Formal Languages, Automata and Computability – p.8/74

  • 8/20/2019 Grammars Automata

    11/88

    contd.

    The concatenation closure (Kleene closure) of alanguage L, in symbols L∗ is defined to be the union

    of all powers of  L:

    L∗ =∞

    i=0

    Li

    Also L

    +

    =

    i=1 L

    i

    .

    Introduction to Formal Languages, Automata and Computability – p.9/74

  • 8/20/2019 Grammars Automata

    12/88

    contd.

    The right quotient and right derivative are thefollowing sets respectively.

    L1\L2 = {y|yz  ∈ L1 for some z  ∈ L2}

    ∂ rzL = L/{z } = {y/yz  ∈ L}

    Similarly left quotient of a language L1 by a languageL2 is defined by

    L2/L1 = {z |yz  ∈ L1 for some y  ∈ L2}.

    The left derivative of a language  L  with respect to a

    word y is denoted as ∂ yL which is equal to {z |yz  ∈ L}.Introduction to Formal Languages, Automata and Computability – p.10/74

  • 8/20/2019 Grammars Automata

    13/88

    contd.

    The mirror image (or reversal) of a language is thecollection of the mirror images of its words and

    mir(L) = {mir(w)/w ∈ L} or LR

    = {wR

    /w ∈ L}.The operations substitution and homomorphism aredefined as follows.

    For each symbol a of an alphabet Σ, let σ(a) be alanguage over Σa. Also σ() = , σ(αβ ) = σ(α).σ(β )

    for α, β  ∈ Σ+. σ is a mapping from Σ∗ to 2V   ∗

    where

    V  is the union of the alphabets Σa, is called asubstitution. For a language L over Σ, we define

    σ(L) = {α/α ∈ σ(β ) for some β  ∈ L}.

    Introduction to Formal Languages, Automata and Computability – p.11/74

  • 8/20/2019 Grammars Automata

    14/88

    contd.

    A substitution is -free if and only if none of thelanguage σ(a) contains . A family of languages is

    closed under substitution if and only if whenever L  isin the family and σ  is a substitution such that σ(a) isin the family, then σ(L) is also in the family.

    A substitution σ such that σ(a) consists of a singleword wa is called a homomorphism. It is called -freehomomorphism if none of  σ(a) is .

    Algebraically, one can see that Σ∗ is a free semigroupwith   as its identity.

    Introduction to Formal Languages, Automata and Computability – p.12/74

  • 8/20/2019 Grammars Automata

    15/88

    contd.

    The homomorphism which is defined above agreeswith the customary definition of homomorphism of 

    one semigroup into another.Inverse homomorphism can be defined as follows:

    h−1(w) = {x|h(x) = w}

    h−1(L) = {x|h(x) is in L}.

    It should be noted that h(h−1

    (L)) need not be equal toL. Generally h(h−1(L)) ⊆ L and h−1(h(L)) ⊇ L.

    Introduction to Formal Languages, Automata and Computability – p.13/74

  • 8/20/2019 Grammars Automata

    16/88

    Grammar

    Definition A phrase-structure grammar or a type 0

    grammar is a 4-tuple G  = (N , T , P , S  ), where N   is a

    finite set of nonterminal symbols called thenonterminal alphabet, T  is a finite set of terminalsymbols called the terminal alphabet, S  ∈ N  is thestart symbol and P  is a set of productions (also calledproduction rules or simply rules) of the form u  → v,where u ∈ (N  ∪ T )∗N (N  ∪ T )∗ and v  ∈ (N  ∪ T )∗.Derivations are defined as follows:

    If  αuβ  is a string in (N  ∪ T )∗ and u  →  v  is a rule in

    P , from  αuβ   we get  αvβ  by replacing  u  by  v. This

    is denoted as   αuβ   ⇒   αvβ .   ⇒   is read as ‘directly

    derives’.   Introduction to Formal Languages, Automata and Computability – p.14/74

  • 8/20/2019 Grammars Automata

    17/88

    contd.

    If  α1 ⇒ α2, α2 ⇒ α3, . . . , αn−1 ⇒ αn, the derivation

    is denoted as α1 ⇒ α2 ⇒ · · · ⇒ αn or α1∗

    ⇒ αn.  ∗⇒ is

    the reflexive, transitive closure of ⇒.Definition The language generated by a grammarG = (N , T , P , S  ) is the set of terminal strings

    derivable in the grammar from the start symbol.

    L(G) = {w/w ∈ T ∗, S   ∗⇒ w}

    Introduction to Formal Languages, Automata and Computability – p.15/74

  • 8/20/2019 Grammars Automata

    18/88

    Example

    Consider the grammar G = (N , T , P , S  ) whereN  = {S, A}, T   = {a,b,c}, production rules in P   are

    S  → aSc,   S  → aAc,   A → bA typical derivation in the grammar is

    S    ⇒   aSc

    ⇒   aaScc

    ⇒   aaaAccc

    ⇒   aaabccc

    The language generated is L(G) = {anbcn/n ≥ 1}.

    Introduction to Formal Languages, Automata and Computability – p.16/74

  • 8/20/2019 Grammars Automata

    19/88

    Lengths

    DefinitionIf the rules are of the form αAβ  → αγβ ,α, β  ∈ (N  ∪ T )∗, A ∈ N , γ  ∈ (N  ∪ T )+, the

    grammar is called context-sensitive grammar.

    Introduction to Formal Languages, Automata and Computability – p.17/74

  • 8/20/2019 Grammars Automata

    20/88

    Lengths

    DefinitionIf the rules are of the form αAβ  → αγβ ,α, β  ∈ (N  ∪ T )∗, A ∈ N , γ  ∈ (N  ∪ T )+, the

    grammar is called context-sensitive grammar.DefinitionIf in the rule u → v, |u| |v|, the grammaris called length increasing grammar.

    Introduction to Formal Languages, Automata and Computability – p.17/74

  • 8/20/2019 Grammars Automata

    21/88

    Lengths

    DefinitionIf the rules are of the form αAβ  → αγβ ,α, β  ∈ (N  ∪ T )∗, A ∈ N , γ  ∈ (N  ∪ T )+, the

    grammar is called context-sensitive grammar.DefinitionIf in the rule u → v, |u| |v|, the grammaris called length increasing grammar.

    Example Let G = (N , T , P , S  ) where N  = {S, B},T   = {a,b,c},P  has the following rules:

    1.   S  → aSBc2.   S  → abc

    3.   cB  → Bc

    4.   bB  → bbIntroduction to Formal Languages, Automata and Computability – p.17/74

  • 8/20/2019 Grammars Automata

    22/88

    contd..

    Let us consider the language generated. The numberappearing above ⇒ denotes the rule being used.

    S    2⇒   abc;   here abc ∈ L(G)

    S   1⇒   aSBc2⇒   aabcBc3

    ⇒   aabBcc4

    ⇒   aabbcc, a2b2c2 ∈ L(G)

    Introduction to Formal Languages, Automata and Computability – p.18/74

  • 8/20/2019 Grammars Automata

    23/88

    contd.

    Similarly

      1

    ⇒   aSBc1⇒   aaSBcBc2

    ⇒   aaabcBcBc3⇒   aaabBccBc3

    ⇒   aaabBcBcc

    3⇒   aaabBBccc4

    ⇒   aaabbBccc4

    ⇒   aaabbbccc, a3

    b3

    c3

    ∈ L(G)Introduction to Formal Languages, Automata and Computability – p.19/74

  • 8/20/2019 Grammars Automata

    24/88

    contd.

    In general any string of the form anbncn will begenerated.

    S    ∗⇒   an−1S (Bc)n−1 (by applying rule 1 (n-1) times)

    ⇒   anbc(Bc)n−1 (rule 2 once)

    ∗⇒   anbBn−1cn (by applying rule 3   n(n − 1)2

      times)

    ∗⇒   anbncn (by applying rule 4,(n − 1) times)

    Hence  L(G) =   {anbncn/n   ≥   1}. This is a type 1

    language.Introduction to Formal Languages, Automata and Computability – p.20/74

  • 8/20/2019 Grammars Automata

    25/88

    Type2 language

    Definition If in a grammar, the production rules are of the form,  A  → α,

    where A ∈ N  and α ∈ (N  ∪ T )∗, the grammar is called a type 2 grammar or

    context-free grammar. The language generated is called a type 2 language or

    context-free language.

    Introduction to Formal Languages, Automata and Computability – p.21/74

    T 2 l

  • 8/20/2019 Grammars Automata

    26/88

    Type2 language

    Definition If in a grammar, the production rules are of the form,  A  → α,

    where A ∈ N  and α ∈ (N  ∪ T )∗, the grammar is called a type 2 grammar or

    context-free grammar. The language generated is called a type 2 language or

    context-free language.

    Definition If the rules are of the form  A  →  αB,A  →  β, A,B  ∈  N, α, β  ∈

    , the grammar is called a right linear grammar or type 3 grammar and the

    language generated is called a type 3 language or regular set. We can even

    put the restriction that the rules can be of the form  A  →  aB,A  →   b,  where

    A,B  ∈ N, a ∈ T, b ∈ T  ∪ . This is possible because a rule  A  → a1 . . . akB

    can be split into A  →  a1B1, B1  →  a2B2, . . . , Bk−1  →  akB  by introducing

    new nonterminals B1, . . . , Bk.

    Introduction to Formal Languages, Automata and Computability – p.21/74

    E l

  • 8/20/2019 Grammars Automata

    27/88

    Example

    Let G = (N,T,P,S  ) where N   = {S }, T   = {a, b}

    P  consists of the following rules.

    1.   S  → aS 

    2.   S  → bS 3.   S  → 

    This grammar generates all strings in T ∗. For example, the string abbaab is generated as follows:

    S    ⇒   aS  (rule 1)

    ⇒   abS  (rule 2)

    ⇒   abbS  (rule 2)

    ⇒   abbaS  (rule 1)

    ⇒   abbaaS  (rule 1)

    ⇒   abbaabS  (rule 2)

    ⇒   abbaab (rule 3)

    Introduction to Formal Languages, Automata and Computability – p.22/74

  • 8/20/2019 Grammars Automata

    28/88

    Derivation tree

    We have considered the definition of a grammar andderivation. Each derivation can be represented by atree called a derivation tree (sometimes called parsetree). A derivation tree for the derivation consideredin previous example with grammarS  → aSc, S  → aAc, A → b is

    a S   c

    S

    Sa

    a

    c

    c

    b

    A

    Introduction to Formal Languages, Automata and Computability – p.23/74

    E l

  • 8/20/2019 Grammars Automata

    29/88

    Example

    Consider the following CFG, G  = (N , T , P , S  ),N  = {S,A,B}, T   = {a, b}. P  consists of the

    following productions1.   S  → aB

    2.   B  → b

    3.   B  → bS 

    4.   B  → aBB

    5.   S  → bA6.   A → a

    7.   A → aS 

    8.   A → bAAIntroduction to Formal Languages, Automata and Computability – p.24/74

  • 8/20/2019 Grammars Automata

    30/88

    contd..

    The derivation tree for aaabbb is as follows,

    a B

    S

    Ba

    a

    B

    b

    B   B

    b

    b

    Introduction to Formal Languages, Automata and Computability – p.25/74

    E l

  • 8/20/2019 Grammars Automata

    31/88

    Example

    Consider the grammarG = ({S }, {a, b}, {S  → SaSbS , S  → SbSaS ,

    S  → }, S ). The language generated by this grammaris the same as the language generated by the grammarG = (N , T , P , S  ), N  = {S,A,B}, T   = {a, b}, P consists of the following productionsS  → aB, B  → b, B  → bS, B  → aBB, S  → bA, A →a, A → aS, A → bAA, except that , the empty stringis also generated here.

    Introduction to Formal Languages, Automata and Computability – p.26/74

    E ample

  • 8/20/2019 Grammars Automata

    32/88

    Example

    Consider the grammarG = ({S }, {a, b}, {S  → SaSbS , S  → SbSaS ,

    S  → }, S ). The language generated by this grammaris the same as the language generated by the grammarG = (N , T , P , S  ), N  = {S,A,B}, T   = {a, b}, P consists of the following productionsS  → aB, B  → b, B  → bS, B  → aBB, S  → bA, A →a, A → aS, A → bAA, except that , the empty stringis also generated here.

    0

    1

    2

    −1

    −2

    X

    Introduction to Formal Languages, Automata and Computability – p.26/74

    td

  • 8/20/2019 Grammars Automata

    33/88

    contd..

    Consider a string w having equal number of  a’s andb’s. We use induction.Basis|w| = 0, S  ⇒ |w| = 2, it is either ab or ba, then

    S  ⇒ SaSbS 

      ∗

    ⇒ abS  ⇒ SbSaS   ∗⇒ ba.

    Introduction to Formal Languages, Automata and Computability – p.27/74

    td

  • 8/20/2019 Grammars Automata

    34/88

    contd..

    Induction  Assume that the result holds up to stringsof length k − 1. Prove that the result holds for stringsof length k. Draw a graph where the x  axis representsthe length of the prefixes of the given string.  y axisrepresents the number of  a’s - number of  b’s. For thestring aabbabba, the graph will look as given in theprevious figure. For a given string w  with equalnumber of  a’s and b’s there are 3 possibilities.

    1. The string begins with ‘a’ and ends with ‘b’.

    2. The string begins with ‘b’ and ends with ‘a’.

    3. Other two cases (begins with a  and ends with a,

    begins with b  and ends with b)Introduction to Formal Languages, Automata and Computability – p.28/74

    td

  • 8/20/2019 Grammars Automata

    35/88

    contd..

    In the first case w = aw1b and w1 has equal numbersof  a’s and b’s. So we have

    S  ⇒ SaSbS  ⇒ aSbS  ⇒ aSb

      ∗

    ⇒ aw1b as S 

      ∗

    ⇒ w1 byinductive hypotheses. A similar argument holds forcase 2. In case 3, the graph mentioned above willcross the x  axis.Consider w  = w1w2 where w1, w2 have equal numberof  a’s and b’s. Let us say w1 begins with a, and w1corresponds to the portion where the graph touches

    the x  axis for the first time. In the above examplew1 = aabb and w2 = abba.

    Introduction to Formal Languages, Automata and Computability – p.29/74

    td

  • 8/20/2019 Grammars Automata

    36/88

    contd..

    In this case we can have a derivation as follows:

    S   1⇒   SaSbS 3

    ⇒   aSbS ∗

    ⇒   aw1bS (w1  = aw1b)

    ∗⇒   aw1bw2∗

    ⇒   w1w2.

    S    ∗⇒ w follows from inductive hypothesis.

    Introduction to Formal Languages, Automata and Computability – p.30/74

    CS

  • 8/20/2019 Grammars Automata

    37/88

    CS

    Theorem Every context-sensitive language is length increasing and

    conversely.

    That every context-sensitive language is length increasing can be seen

    from definitions.

    Every length increasing language is context-sensitive can be seen from

    the following construction.

    Let L be a length increasing language generated by G = (N , T , P , S  ).

    Without loss of generality, one can assume that the productions in P 

    are of the form X  → a, X  → X 1 . . . X  m, X 1 . . . X  m  → Y 1 . . . Y  n,

    2 ≤ m ≤ n, X, X 1, . . . , X  m, Y 1, . . . , Y  n ∈ N , a ∈ T . Productions in

    P  which are already context-sensitive productions are not modified.

    Hence consider, a production of the form

    X 1 . . . X  m → Y 1 . . . Y  n,   2 ≤ m ≤ nIntroduction to Formal Languages, Automata and Computability – p.31/74

    contd

  • 8/20/2019 Grammars Automata

    38/88

    contd..

    It is replaced by the following set of context-sensitiveproductions:

    X 1 . . . X  m   →   Z 1X 2 . . . X  mZ 1X 2 . . . X  m   →   Z 1Z 2X 3 . . . X  m

    ..

    .Z 1Z 2 . . . Z  m−1X m   →   Z 1Z 2 . . . Z  mY m+1 . . . Y  nZ 1Z 2 . . . Z  mY m+1 . . . Y  n   →   Y 1Z 2 . . . Z  mY m+1 . . . Y  n

    ...

    Y 1Y 2 . . . Y  m−1Z mY m+1 . . . Y  n   →   Y 1Y 2 . . . Y  mY m+1 . . . Y  n

    where Z k, 1 ≤ k ≤ m are new nonterminals.Introduction to Formal Languages, Automata and Computability – p.32/74

    contd

  • 8/20/2019 Grammars Automata

    39/88

    contd..

    Each production that is not context-sensitive is to bereplaced by a set of context-sensitive productions asmentioned above. Application of this set of rules hasthe same effect as applying X 1 . . . X  m → Y 1 . . . Y  n.Hence a new grammar G thus obtained iscontext-sensitive that is equivalent to G.Example Let L = {anbmcndm/n,m ≥ 1}.The type 1 grammar generating this CSL is given byG = (N , T , P , S  ) with N  = {S,A,B,X,Y },

    T   = {a,b,c,d}

    Introduction to Formal Languages, Automata and Computability – p.33/74

    contd

  • 8/20/2019 Grammars Automata

    40/88

    contd..

    and P   =

    S    →   aAB|aB

    A   →   aAX |aX 

    B   →   bBd|bY d

    Xb   →   bX XY    →   Y c

    Y    →   c.

    Introduction to Formal Languages, Automata and Computability – p.34/74

    contd

  • 8/20/2019 Grammars Automata

    41/88

    contd..

    Sample Derivations

    S  ⇒ aB  ⇒ abY d   ⇒   abcd.

    S  ⇒ aB  ⇒ abBd   ⇒   abbY dd ⇒ ab2cd2.S  ⇒ aAB  ⇒ aaXB   ⇒   aaXbY d

    ⇒   aabXY d

    ⇒   aabY cd

    ⇒   a2bc2d.

    Introduction to Formal Languages, Automata and Computability – p.35/74

    contd

  • 8/20/2019 Grammars Automata

    42/88

    contd..

    S  ⇒ aAB   ⇒   aaAXB

    ⇒   aaaXXB⇒   aaaXXbY d

    ⇒   aaaXbXY d

    ⇒   aaabXXY d⇒   aaabXY cd

    ⇒   aaabY ccd

    ⇒   aaabcccd.

    Introduction to Formal Languages, Automata and Computability – p.36/74

    Exercise

  • 8/20/2019 Grammars Automata

    43/88

    Exercise

    Consider the length increasing grammar withproductions P   =S  → aSBc, S  → abc, cB  → Bc, bB  → bb. All rulesexcept the rule cB  → Bc are context-sensitive. Thefollowing grammar is a context-sensitive grammarequivalent to the above grammar.

    S    →   aSBC 

    S    →   abc

    C    →   cCB   →   DB

    DB   →   DC 

    DC    →   BC.Introduction to Formal Languages, Automata and Computability – p.37/74

    Ambiguity

  • 8/20/2019 Grammars Automata

    44/88

    Ambiguity

    Definition Let G = (N , T , P , S  ) be a CFG. A word win L(G) is said to be ambiguously derivable in G, if it

    has two or more different derivation trees in G.Since the correspondence between derivation treesand leftmost derivations is a bijection, an equivalentdefinition in terms of leftmost derivations can be

    given.

    Introduction to Formal Languages, Automata and Computability – p.38/74

    Ambiguity

  • 8/20/2019 Grammars Automata

    45/88

    Ambiguity

    Definition Let G = (N , T , P , S  ) be a CFG. A word win L(G) is said to be ambiguously derivable in G, if ithas two or more different derivation trees in

     G.

    Since the correspondence between derivation treesand leftmost derivations is a bijection, an equivalentdefinition in terms of leftmost derivations can be

    given.DefinitionLet G = (N , T , P , S  ) be a CFG. A word win L(G) is said to be ambiguously derivable in G, if ithas two or more different leftmost derivations in G.

    Introduction to Formal Languages, Automata and Computability – p.38/74

    Ambiguity

  • 8/20/2019 Grammars Automata

    46/88

    g y

    Definition Let G = (N , T , P , S  ) be a CFG. A word win L(G) is said to be ambiguously derivable in G, if ithas two or more different derivation trees in

     G.

    Since the correspondence between derivation treesand leftmost derivations is a bijection, an equivalentdefinition in terms of leftmost derivations can be

    given.DefinitionLet G = (N , T , P , S  ) be a CFG. A word win L(G) is said to be ambiguously derivable in G, if ithas two or more different leftmost derivations in G.

    DefinitionA CFG is said to be ambiguous if there is a

    word w  in L(G) which is ambiguously derivable. Oth-erwise it is unambiguous.   Introduction to Formal Languages, Automata and Computability – p.38/74

    Example

  • 8/20/2019 Grammars Automata

    47/88

    p

    Consider the grammar G with rules S  → SS, S  → awhere S  is the nonterminal and a is the terminalsymbol. L(G) = {an/n ≥ 1}.This grammar isambiguous as a3 has two different derivation trees asfollows

    Introduction to Formal Languages, Automata and Computability – p.39/74

    Example

  • 8/20/2019 Grammars Automata

    48/88

    p

    Consider the grammar G with rules S  → SS, S  → awhere S  is the nonterminal and a is the terminalsymbol. L(G) = {an/n ≥ 1}.This grammar isambiguous as a3 has two different derivation trees asfollows

    S

    S

    a

    a

    S

    S

    S

    a

    S

    S

    a

    a

    S

    S

    S

    a

    Introduction to Formal Languages, Automata and Computability – p.39/74

    Ambiguity contd..

  • 8/20/2019 Grammars Automata

    49/88

    g y

    Definition A CFL L  is said to be inherently ambiguous if all the grammars

    generating it are ambiguous or in other words, there is no unambiguous

    grammar generating it.

    Example L  = {an

    bn

    c p

    /n,m,p ≥ 1, n = m or m = p}.This can be looked at as L  = L1 ∪ L2

    L1   =   {anbnc p/n,p ≥ 1}

    L2   =   {anbmcm/n,m ≥ 1}.

    L1  and L2  can be generated individually by unambiguous grammars, but any

    grammar generating  L1  ∪ L2   will be ambiguous. Since strings of the form

    anbncn will have two different derivation trees, one corresponding to L1  and

    another corresponding to L2. Hence L  is inherently ambiguous.

    Introduction to Formal Languages, Automata and Computability – p.40/74

    Ambiguity contd..

  • 8/20/2019 Grammars Automata

    50/88

    g y

    Definition A CFL L  is said to be inherently ambiguous if all the grammars

    generating it are ambiguous or in other words, there is no unambiguous

    grammar generating it.

    Example L  = {an

    bn

    c p

    /n,m,p ≥ 1, n = m or m = p}.This can be looked at as L  = L1 ∪ L2

    L1   =   {anbnc p/n,p ≥ 1}

    L2   =   {anbmcm/n,m ≥ 1}.

    L1  and L2  can be generated individually by unambiguous grammars, but any

    grammar generating  L1  ∪ L2   will be ambiguous. Since strings of the form

    anbncn will have two different derivation trees, one corresponding to L1  and

    another corresponding to L2. Hence L  is inherently ambiguous.

    Introduction to Formal Languages, Automata and Computability – p.40/74

    contd..

  • 8/20/2019 Grammars Automata

    51/88

    co td..

    Theorem It is undecidable to determine whether agiven CFG G is ambiguous or not.

    Introduction to Formal Languages, Automata and Computability – p.41/74

    contd..

  • 8/20/2019 Grammars Automata

    52/88

    Theorem It is undecidable to determine whether agiven CFG G is ambiguous or not.Theorem It is undecidable to determine whether a

    given CFL L is ambiguous or not.

    Introduction to Formal Languages, Automata and Computability – p.41/74

    contd..

  • 8/20/2019 Grammars Automata

    53/88

    Theorem It is undecidable to determine whether agiven CFG G is ambiguous or not.Theorem It is undecidable to determine whether a

    given CFL L is ambiguous or not.DefinitionA CFL L is bounded if there exists stringsw1, . . . , wk  such that L  ⊆ w

    ∗1w

    ∗2 . . . w

    ∗k.

    Introduction to Formal Languages, Automata and Computability – p.41/74

    contd..

  • 8/20/2019 Grammars Automata

    54/88

    Theorem It is undecidable to determine whether agiven CFG G is ambiguous or not.Theorem It is undecidable to determine whether a

    given CFL L is ambiguous or not.DefinitionA CFL L is bounded if there exists stringsw1, . . . , wk  such that L  ⊆ w

    ∗1w

    ∗2 . . . w

    ∗k.

    TheoremThere exists an algorithm to find out whethera given bounded CFL is inherently ambiguous or not.

    Introduction to Formal Languages, Automata and Computability – p.41/74

    contd..

  • 8/20/2019 Grammars Automata

    55/88

    Theorem It is undecidable to determine whether agiven CFG G is ambiguous or not.Theorem It is undecidable to determine whether a

    given CFL L is ambiguous or not.DefinitionA CFL L is bounded if there exists stringsw1, . . . , wk  such that L  ⊆ w

    ∗1w

    ∗2 . . . w

    ∗k.

    TheoremThere exists an algorithm to find out whethera given bounded CFL is inherently ambiguous or not.

    DefinitionLet G = (N , T , P , S  ) be a CFG then the de-

    gree of ambiguity of   G   is the maximum number of 

    derivation trees a string w ∈ L(G) can have in G.

    Introduction to Formal Languages, Automata and Computability – p.41/74

    contd.

  • 8/20/2019 Grammars Automata

    56/88

    We can also use the idea of power series and find outthe number of different derivation trees a string canhave. Consider the grammar with rules

    S  → SS, S  → a write an equation S  = SS  + aInitial solution is S  = a, S 1 = aUse this in the equation for S  on the right-hand side

    S 2   =   S 1S 1 + a

    =   aa + a.

    S 3   =   S 2S 2 + a= (aa + a)(aa + a) + a

    =   a4 + 2a3 + a2 + a.

    Introduction to Formal Languages, Automata and Computability – p.42/74

    contd.

  • 8/20/2019 Grammars Automata

    57/88

    S 4   =   S 3S 3 + a

    = (a

    4

    + 2a

    3

    + a

    2

    + a)

    2

    + a=   a8 + 4a7 + 6a6 + 6a5 + 5a4 + 2a3 + a2 + a.

    We can proceed like this using S i = S i−1S i−1 + aIn  S i, upto strings of length   i, the coefficient of the

    string will give the number of different derivation trees

    it can have in G. For example, in S 4 coefficient of a4 is

    5 and a4 has 5 different derivation trees in G. The co-

    efficient of  a3 is 2 - the number of different derivationtrees for a3 is 2 in G.   Introduction to Formal Languages, Automata and Computability – p.43/74

    Simplification

  • 8/20/2019 Grammars Automata

    58/88

    Definition Let G = (N , T , P , S  ) be a CFG. A variable X   in N  is said

    to be useful if and only if there is at least a string α  ∈ L(G) such that

      ∗

    ⇒ α1Xα2∗

    ⇒ α,

    where α1, α2  ∈ (N  ∪ T )∗ i.e., X  is useful because it appears in at least

    one derivation from S  to a word α in L(G). Otherwise X  is useless.

    Consequently the production involving X  is useful.

    One can understand the ‘useful symbol’ concept in two steps.

    Step 1   For a symbol  X   ∈   N   to be useful it should occur in some

    derivation starting from S , i.e., S   ∗⇒ α1Xα2.

    Introduction to Formal Languages, Automata and Computability – p.44/74

    contd.

  • 8/20/2019 Grammars Automata

    59/88

    Step 2   Also X  has to derive a string α  ∈ T ∗ i.e.,

    X   ∗⇒ α

    These two conditions are necessary. But they are notsufficient. These two conditions may be satisfied stillα1 or α2 may contain a nonterminal from which aterminal string cannot be derived. So the usefulness of 

    a symbol has to be tested in two steps as above.

    Lemma   Let   G   = (N , T , P , S  )   be a CFG such that

    L(G)   =   φ. Then there exists an equivalent context-free grammar G = (N , T , P , S )  that does not con-

    tain any useless symbol or productions.

    Introduction to Formal Languages, Automata and Computability – p.45/74

    contd.

  • 8/20/2019 Grammars Automata

    60/88

    The context-free grammar G is obtained by thefollowing elimination proceduresI. First eliminate all those symbols X  such that X does not derive any string over T ∗. LetG2 = (N 2, T , P  2, S ) be the grammar thus modified.As L(G) = φ, S  will not be eliminated. The followingalgorithm identifies symbols not to be eliminated.Algorithm GENERATINGStep 1   Let GEN  = T ;

    Step 2   If  A → α and every symbol of  α  belongs toGEN , then add A  to GEN .

    Remove from N  all those symbols that are not in the

    set GEN  and all the rules using them.Introduction to Formal Languages, Automata and Computability – p.46/74

    contd.

  • 8/20/2019 Grammars Automata

    61/88

    Let the resultant grammar be G2  = (N 2, T , P  2, S ).

    II. Now eliminate all symbols in the grammar G2  that are not occurring

    in any derivation from ‘S ’. i.e., S ∗

    ⇒ α1Xα2.

    Algorithm REACHABLE

    Let REACH  = {S }.

    If A ∈ REACH  and A → α ∈ P  then every symbol of α is in

    REACH .

    The above algorithm terminates as any grammar has only finite set of 

    rules. It collects all those symbols which are reachable from S  through

    derivations from S .

    Now in   G2   remove all those symbols that are not in  REACH   and

    also productions involving them. Hence one gets the modified gram-

    mar G = (N , T , P , S ), a new CF G.   Introduction to Formal Languages, Automata and Computability – p.47/74

    contd.

  • 8/20/2019 Grammars Automata

    62/88

    without useless symbols and productions using them.The equivalence of  G with G can be easily seen sinceonly symbols and productions leading to derivation of 

    terminal strings from S  are present in G. HenceL(G) = L(G).Theorem Given a CFG G = (N , T , P , S  ). Procedure Iof the previous lemma is executed to getG2 = (N 2, T , P  2, S ) and procedure II of previouslemma is executed to get G = (N , T , P , S ). Then

    G contains no useless symbols.

    Suppose G contains a symbol  X   (say) which is use-

    less. It is easily seen that N 

    ⊆ N 2, T 

    ⊆ T , P 

    ⊆ P 2.Introduction to Formal Languages, Automata and Computability – p.48/74

    contd.

  • 8/20/2019 Grammars Automata

    63/88

    Since X  is obtained after execution of II,

    S   ∗⇒ α1Xα2, α1, α2 ∈ (N 

    ∪ T )∗. Every symbol of 

    is also in N 2

    . Since G2

     is obtained by execution of I, it is possible to get a terminal string from every

    symbol of  N 2 and hence from N  : α1

    ∗⇒ w1 and

    α2∗

    ⇒ w2, w1, w2 ∈ T ∗

    . ThusS   ∗⇒ α1Xα2

    ∗⇒ w1Xw2

    ∗⇒ w1ww2.

    Clearly S  is not useless as supposed. Hence G

    contains only useful symbols.

    Introduction to Formal Languages, Automata and Computability – p.49/74

    Example

  • 8/20/2019 Grammars Automata

    64/88

    Let G  = (N , T , P , S  ) be a CFG withN  = {S,A,B,C }T   = {a, b} andP   = {S  → Sa|A|C , A  → a, B  → bb, C  → aC |B}First ‘GEN ’ set will be {S,A,B,a,b}. ThenG

    2 = (N 

    2, T , P  

    2, S ) where N 

    2 = {S,A,B}

    P   = {S  → Sa|A, A → a, B  → bb}In G2, REACH  set will be {S,A,a}. Hence

    G

    = (N 

    , T 

    , P 

    , S ) whereN  = {S, A}T  = {a}

    = {S  → Sa|A, A → a}L(G) = L(G) = {an|n ≥ 1}.   Introduction to Formal Languages, Automata and Computability – p.50/74

    -rule Elimination

  • 8/20/2019 Grammars Automata

    65/88

    Definition Any production of the form A →  is called

    a -rule. If  A  ∗⇒ , then we call A a nullable symbol.

    Theorem Let G = (N , T , P , S  )

     be a CFG such that

    /∈ L(G). Then there exists a CFG without -rulesgenerating L(G).Before modifying the grammar one has to identify theset of null-able symbols of  G. This is done by thefollowing procedure.Algorithm NULL

    1. Let N U LL := φ,

    2. If  A →  ∈ P , then A  ∈ N U LL,

    3. If  A → B1 . . . Bt ∈ P , and each Bi  is in N U LL,then A ∈ N U LL   Introduction to Formal Languages, Automata and Computability – p.51/74

    contd.

  • 8/20/2019 Grammars Automata

    66/88

    Run algorithm N U LL for G and get the N U LL set.The modification of  G  to get G = (N , T , P  , S ) withrespect to N U LL is given below.If  A → A1 . . . At ∈ P , t ≥ 1 and if n (n < t) of theseAis are in N U LL. Then P  will contain 2

    n rules wherethe variables in N U LL are either present or absent in

    all possible combinations. If  n = t then removeA →  from P . The grammar G = (N , T , P  , S ) thusobtained is -free. To prove that a word w  ∈ L(G) if 

    and only if  w ∈ L(G). As G and G do not differ inN  and T , one can equivalently show that,

    A  ∗

    ⇒G wIntroduction to Formal Languages, Automata and Computability – p.52/74

    contd.

  • 8/20/2019 Grammars Automata

    67/88

    if and only if 

    A  ∗⇒G  w

    for  A   ∈   N . Clearly  w   =   . Let  A   n⇒G   w.   n   =

    1,   A  ∗⇒G   w   then  A   →   w   is in  P   and hence in  P 

    .

    Then A  ∗

    ⇒G   w. Assume the result to be true for allderivations from  A   of length  n  − 1. Let  A

      n⇒G   w

    i.e., A  ⇒G

    Y 1, Y 2, . . . , Y  kn−1

    =⇒G   w. Let w  =  w1 . . . wk

    and let X 1, X 2, . . . , X  m be those Y  j’s in order such that

    Y  j∗

    ⇒   w j, w j   =   . Clearly  k   ≥   1  as  w   =   . Hence

    A → X 1 . . . X  m is a rule in G.Introduction to Formal Languages, Automata and Computability – p.53/74

    contd.

  • 8/20/2019 Grammars Automata

    68/88

    We can see that X 1 . . . X  m∗⇒G w as some Y  j  derive

    only . Since each Y  j∗

    ⇒ w j, w j  takes fewer than n

    steps, by induction, Y  j ∗⇒ w j, for w j  = . HenceA ⇒G X 1 . . . X  m

    ∗⇒ w.

    Conversely if  A  ∗⇒

    G  w. Then to show that A

      ∗⇒

    G w

    also. Again the proof is by induction on the number of derivation steps.Basis   If  A ⇒

    G

    w, then A → w ∈ G. By the

    construction of  G, one can see that there exists a ruleA → α in G  such that α  and w  differ only in nullable

    symbols. Hence A ⇒G α  ∗

    ⇒G w where for α  ∗

    ⇒ w only-rules are used.   Introduction to Formal Languages, Automata and Computability – p.54/74

    contd.

  • 8/20/2019 Grammars Automata

    69/88

    Induction

    Assume A  n⇒G  w n > 1. Then let

    A ⇒G

    Y 1

    . . . Y  k

    ∗⇒

    G w. For A ⇒

    GY 1

    . . . Y  k

     the

    corresponding equivalent derivation by G will be

    A ⇒

    G

    X 1 . . . X  m∗

    ⇒G Y 1 . . . Y  k  as some of the X i  are

    nullable. Hence A  ∗⇒G Y 1 . . . Y  k

    ∗⇒G w1 . . . wk  = w

    by induction hypothesis. Hence A  ∗⇒G w. Hence

    L(G) = L(G

    ).

    Introduction to Formal Languages, Automata and Computability – p.55/74

    Example

  • 8/20/2019 Grammars Automata

    70/88

    Consider G = (N , T , P , S  ) whereN  = {S,A,B,C,D}, T   = {0, 1} andP   = {S  → AB0C , A → BC , B  → 1|, C  → D|,D → }The set N U LL = {A,B,C,D}.Then G will be withP  = {S  → AB0C |AB0|A0C |B0C |A0|B0|0C |0,

    A → BC |B|C, B  → 1, C  → D}.

    Introduction to Formal Languages, Automata and Computability – p.56/74

    Procedure to Eliminate Unit Rules

  • 8/20/2019 Grammars Automata

    71/88

    Definition Any rule of the form X  → Y , X, Y   ∈ N   iscalled a unit rule. Note that A → a with a ∈ T  is nota unit rule.

    In simplification of context-free grammars anotherimportant step is to make the given context-freegrammar unit rule free i.e., to eliminate unit rules.

    This is essential for the application in compilers. Asthe compiler spends time on each rule used in parsingby generating semantic routines, having unnecessary

    unit rules will increase compiler time.Lemma Let G be a CFG, then there exists a CFG G

    without unit rules such that L(G) = L(G).

    Introduction to Formal Languages, Automata and Computability – p.57/74

    contd.

  • 8/20/2019 Grammars Automata

    72/88

    Let G = (N , T , P , S  ) be a CFG. First find the sets of pairs of 

    nonterminals (A, B) in G such that A  ∗⇒ B by unit rules. Such a pair

    is called a unit-pair.

    Algorithm UNIT-PAIR

    1.   (A, A) ∈ U N I T   − PAIR, for every variable A ∈ N .

    2. If  (A, B) ∈ U N I T   − PAIR and B  → C  ∈ P   then(A, C ) ∈ U N I T   − PAIR

    Now construct G

    = (N , T , P  

    , S ) as follows. Remove all unit produc-tions. For every unit-pair (X, Y ), if  Y   → α ∈ P  is a nonunit rule, add

    to P , X  → α. Thus G has no unit rules.

    Introduction to Formal Languages, Automata and Computability – p.58/74

    contd.

  • 8/20/2019 Grammars Automata

    73/88

    Let us consider leftmost derivations in G and G. If w ∈ L(G), then

    S   ∗⇒G  w. Clearly there exists a derivation of  w from S  by G where

    zero or more applications of unit rules are used. Hence S   ∗⇒G  w whose

    length may be different from S    ∗⇒G  w.

    If  w   ∈   L(G), then one can consider  S   ∗⇒G   w  by G. A sequence of 

    unit-productions applied in this derivation is to be written by a nonunit

    production. Since this is a leftmost derivation, for such sequence there

    exists one rule in G doing the same job. Hence one can see the simula-

    tion of a derivation of G with G.

    Introduction to Formal Languages, Automata and Computability – p.59/74

    contd.

  • 8/20/2019 Grammars Automata

    74/88

    Hence L(G) = L(G).Example Let G = (N , T , P , S  ) be a CFG whereN  = {X, Y }, T   = {a, b} andP   = {X  → aX |Y |b, Y   → bK |K |b, K  → a}UNIT  − PAIR ={(X, X ), (Y, Y  ), (K, K ), (X, Y  ), (Y, K ), (X, K )}Then G = (N , T , P  , S ) whereP  = {X  → aX |bK |b|a, Y   → bK |b|a, K  → a}.

    Remark   Since removal of  -rule can introduce unitproductions, to get a simplified CFG to generate

    L(G)  − {}, the following steps have to be used in

    the order given.Introduction to Formal Languages, Automata and Computability – p.60/74

    contd.

  • 8/20/2019 Grammars Automata

    75/88

    1. Remove -rules

    2. Remove unit-rules

    3. Remove useless symbols.(i) Remove symbols not deriving terminal

    strings.

    (ii) Remove symbols not reachable from S .

    Example It is essential that steps 3(i) and 3(ii) have tobe executed in that order. If step 3(ii) is executed firstand then step 3(i), we may not get the requiredreduced grammar. Consider the CFGG = (N , T , P , S  ) where N  = {S,A,B,C },

    T   = {a, b} and P   = {S  → ABC |a, A  → a, C  → b}=   Introduction to Formal Languages, Automata and Computability – p.61/74

    contd.

  • 8/20/2019 Grammars Automata

    76/88

    Applying step 3(ii) first removes nothing. Then applystep 3(i) which removes B  and S  → ABC   leavingS  → a, A → a, C  → b. Though A and C  do not

    contribute to the set L(G), they are not removed.On the other hand applying step 3(i) first, removes B,

    S   →   ABC . Afterwards apply step 3(ii), removesA,C,A  →  a, C   →  b. Hence S   →  a  is the only rule

    left which is the required result.

    Introduction to Formal Languages, Automata and Computability – p.62/74

    Normal form

  • 8/20/2019 Grammars Automata

    77/88

    The most popular normal forms are Weak ChomskyNormal Form, Chomsky Normal Form, StrongChomsky Normal Form, Greibach Normal Form.

    Definition Let G = (N , T , P , S  ) be a CFG. If eachrule in P  is of the form A → ∆, A  → a or A → ,where A ∈ N , ∆ ∈ N +, a ∈ T , then G  is said to be in

    Weak Chomsky Normal Form (WCNF).Example Let G = (N , T , P , S  ) be a CFG whereN  = {S,A,B}, T   = {a, b} and

    P   = {S  → ASB|AB, A → a, B  → b}. G is inWCNF.

    Introduction to Formal Languages, Automata and Computability – p.63/74

    contd.

  • 8/20/2019 Grammars Automata

    78/88

    Theorem For any CFG G  = (N , T , P , S  ) there exists aCFG G in WCNF such that L(G) = L(G).Let G  = (N , T , P , S  ) be a CFG. One can construct anequivalent CFG in WCNF as below. LetG = (N , T , P , S ) be an equivalent CFG whereN  = N  ∪ {Aa/a ∈ T }, none of  Aa’s belong to N .P  = {A → ∆|A → α ∈ P  and every occurrence of asymbol from T  present in α is replaced by Aa, giving∆} ∪ {Aa → a|a ∈ T }. Clearly ∆ ∈ N 

    + and P  getsthe required form. G is in WCNF. That G and Gequivalent can be seen easily.

    Introduction to Formal Languages, Automata and Computability – p.64/74

    Chomsky Normal Form

  • 8/20/2019 Grammars Automata

    79/88

    Definition Let /∈ L(G) and G = (N , T , P , S  ) be a CFG. G is said to

    be in Chomsky Normal Form (CNF) if all its productions are of the

    form A → BC  or A → a, A,B,C  ∈ N, a ∈ T .

    Example The following CFGs are in CNF.

    1. G1 = (N , T , P , S  ) where N  = {S,A,B,C }, T   = {0, 1} and

    P   = {S  → AB|AC |SS , C  → SA, A → 0, B  → 1}.

    2. G2 = (N , T , P , S  ) where N  = {S,A,B,C }, T   = {a, b} andP   = {S  → AS |SB, A → AB|a, B → b}.

    No CFG in CNF can generate . If  is to be added to L(G), then a new

    start symbol S 

    is to be taken and S 

    →  should be added. For everyrule S  → α, S  → α should be added which make sure that the new

    start symbol does not appear on the right-hand side of any production.

    Introduction to Formal Languages, Automata and Computability – p.65/74

    contd.

  • 8/20/2019 Grammars Automata

    80/88

    Theorem Given CFG G, there exists an equivalent CFG G in CNF.

    Let G = (N , T , P , S  ) be a CFG without rules, unit-rules and useless

    symbols and also /∈ L(G). Modify G to G such that

    G = (N , T , P  , S ) is in WCNF. Let A → ∆ ∈ P . If |∆| = 2, such

    rules need not be modified. If |∆| ≥ 3, the modification is as below:

    If A → ∆ = A1A2A3, the new set of equivalent rules will be:

    A → A1B1

    B1 → A2A3.

    Similarly if A → A1A2 . . . An ∈ P , it is replaced by

    Introduction to Formal Languages, Automata and Computability – p.66/74

    contd.

  • 8/20/2019 Grammars Automata

    81/88

    A → A1B1

    B1 → A2B2

    ...

    Bn−2 → An−1An.

    Let P  be the collection of modified rules andG = (N , T , P  , S ) be the modified grammar whichis clearly in CNF. Also L(G) = L(G).

    Introduction to Formal Languages, Automata and Computability – p.67/74

    contd.

  • 8/20/2019 Grammars Automata

    82/88

    Example Let G = (N , T , P , S  ) be a CFG whereN  = {S,A,B}, T   = {a, b}P   = {S  → SAB|AB|SBC , A → AB|a,B  → BAB|b, C  → b}.Clearly G is not in CNF but in WCNF. Hence themodification of rules in P  are as below:

    For S  → SAB, the equivalent rules areS  → SB1, B1 → AB.For S  → SBC , the equivalent rules are

    S  → SB2, B2 → BC .For B  → BAB, the equivalent rules areB  → BB3, B3 → AB.

    Introduction to Formal Languages, Automata and Computability – p.68/74

    contd.

  • 8/20/2019 Grammars Automata

    83/88

    Hence G = (N , T , P  , S ) will be withN  = {S,A,B,C,B1, B2, B3}, T   = {a, b}

    P  ={S  → SB1|SB2|SA,B1 → AB, B2 → BC, B3 → AB,A → AB|a, B  → BB3|b, C  → b}.

    Clearly G is in CNF.

    Introduction to Formal Languages, Automata and Computability – p.69/74

    Strong Chomsky Normal Form

  • 8/20/2019 Grammars Automata

    84/88

    Definition A CFG G = (N , T , P , S  ) is said to be inStrong Chomsky Normal Form (SCNF) when rules inP  are only of the forms A → a, A → BC  where

    A,B,C  ∈ N , a ∈ T  subject to the followingconditions:(i) if  A → BC  ∈ P , then B  = C .

    (ii) if  A  → BC  ∈ P , then for each ruleX  → DE  ∈ P , we have E  = B  and D  = C .

    Introduction to Formal Languages, Automata and Computability – p.70/74

    contd.

  • 8/20/2019 Grammars Automata

    85/88

    Theorem For every CFG G = (N , T , P , S  ) thereexists an equivalent CFG in SCNF.

    Let G  = (N , T , P , S  ) be a CFG in CNF. One can

    construct an equivalent CFG.G = (N , T , P  , S ) in SCNF as below.N  = {S } ∪ {AL, AR|A ∈ N }

    T   = T 

    Introduction to Formal Languages, Automata and Computability – p.71/74

    contd.

  • 8/20/2019 Grammars Automata

    86/88

    P  ={AL → BLC R, AR  → BLC R|A → BC  ∈ P }

    ∪ {S 

    → X LY R|S  → XY   ∈ P }∪ {S  → a|S  → a ∈ P }

    ∪ {AL → a, AR → a|A → a ∈ P, a ∈ T }.

    Clearly L(G) = L(G) and G is in SCNF.

    Introduction to Formal Languages, Automata and Computability – p.72/74

    contd.

    E l L ( ) b CFG h

  • 8/20/2019 Grammars Automata

    87/88

    Example Let G = (N , T , P , S  ) be a CFG whereN  = {S,A,B}, T   = {0, 1} andP   = {S  → AB|0, B  → BA|1, A → AB|0}.Then G = (N , T , P  , S ) in SCNF will be with

    N  =   {S , S L, S R, AL, AR, BL, BR}

    T    =   {0, 1}P  =   {S  → ALBR|0, S L → ALBR|0,

    S R → ALBR|0, AL → ALBR|0,AR → ALBR|0, BL → BLAR|1,

    BR → BLAR|1}.

    Introduction to Formal Languages, Automata and Computability – p.73/74

    Greibach Normal Form

    D fi i i L / L(G) d G (N T P S) b

  • 8/20/2019 Grammars Automata

    88/88

    Definition Let  /∈ L(G) and G = (N , T , P , S  ) be aCFG. G is said to be in Greibach Normal Form(GNF), if each rule in P  rewrites a variable into a

    word in T N ∗ i.e., each rule will be of the formA → aα, a ∈ T , α ∈ N ∗.

    Introduction to Formal Languages, Automata and Computability – p.74/74