Parsing. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “wordsâ€‌ Tokens = “wordsâ€‌ Programs = “sentencesâ€‌

  • View
    214

  • Download
    0

Embed Size (px)

Text of Parsing. Parsing Calculate grammatical structure of program, like diagramming sentences, where:...

  • Slide 1
  • Parsing
  • Slide 2
  • Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = words Tokens = words Programs = sentences Programs = sentences For further information, read: Aho, Sethi, Ullman, Compilers: Principles, Techniques, and Tools (a.k.a, the Dragon Book)
  • Slide 3
  • Outline of coverage Context-free grammars Context-free grammars Parsing Parsing Tabular Parsing Methods One pass Top-down Bottom-up Yacc Yacc
  • Slide 4
  • Parser: extracts grammatical structure of program function-def name argumentsstmt-list main stmt expression operatorexpression variablestring cout
  • If e then if b then d else f If e then if b then d else f { int x; y = 0; } { int x; y = 0; } A.b.c = d; A.b.c = d; Id -> s | s.id Id -> s | s.id E -> E + T -> E + T + T -> T + T + T -> id + T + T -> id + T * id + T -> id + id * id + T -> id + id * id + id
  • Slide 13
  • Ambiguity Ambiguity is a function of the grammar rather than the language Ambiguity is a function of the grammar rather than the language Certain ambiguous grammars may have equivalent unambiguous ones Certain ambiguous grammars may have equivalent unambiguous ones
  • Slide 14
  • Grammar Transformations Grammars can be transformed without affecting the language generated Grammars can be transformed without affecting the language generated Three transformations are discussed next: Three transformations are discussed next: Eliminating Ambiguity Eliminating Left Recursion (i.e.productions of the form A A ) Left Factoring
  • Slide 15
  • Eliminating Ambiguity Sometimes an ambiguous grammar can be rewritten to eliminate ambiguity Sometimes an ambiguous grammar can be rewritten to eliminate ambiguity For example, expressions involving additions and products can be written as follows: For example, expressions involving additions and products can be written as follows: E E + T | T E E + T | T T T * id | id T T * id | id The language generated by this grammar is the same as that generated by the grammar on tranparency 11. Both generate id(+id| * id)* The language generated by this grammar is the same as that generated by the grammar on tranparency 11. Both generate id(+id| * id)* However, this grammar is not ambiguous However, this grammar is not ambiguous
  • Slide 16
  • Eliminating Ambiguity (Cont.) One advantage of this grammar is that it represents the precedence between operators. In the parsing tree, products appear nested within additions One advantage of this grammar is that it represents the precedence between operators. In the parsing tree, products appear nested within additions E T TE id + * T
  • Slide 17
  • Eliminating Ambiguity (Cont.) An example of ambiguity in a programming language is the dangling else An example of ambiguity in a programming language is the dangling else Consider Consider S if then S else S | if then S | S if then S else S | if then S |
  • Slide 18
  • Eliminating Ambiguity (Cont.) When there are two nested ifs and only one else.. When there are two nested ifs and only one else.. Sif then S else S if then S Sif then S if S else S
  • Slide 19
  • Eliminating Ambiguity (Cont.) In most languages (including C++ and Java), each else is assumed to belong to the nearest if that is not already matched by an else. This association is expressed in the following (unambiguous) grammar: In most languages (including C++ and Java), each else is assumed to belong to the nearest if that is not already matched by an else. This association is expressed in the following (unambiguous) grammar: S Matched S Matched | Unmatched | Unmatched Matched if then Matched else Matched Matched if then Matched else Matched | | Unmatched if then S Unmatched if then S | if then Matched else Unmatched | if then Matched else Unmatched
  • Slide 20
  • Eliminating Ambiguity (Cont.) Ambiguity is a property of the grammar Ambiguity is a property of the grammar It is undecidable whether a context free grammar is ambiguous It is undecidable whether a context free grammar is ambiguous The proof is done by reduction to Posts correspondence problem The proof is done by reduction to Posts correspondence problem Although there is no general algorithm, it is possible to isolate certain constructs in productions which lead to ambiguous grammars Although there is no general algorithm, it is possible to isolate certain constructs in productions which lead to ambiguous grammars
  • Slide 21
  • Eliminating Ambiguity (Cont.) For example, a grammar containing the production A AA | would be ambiguous, because the substring has two parses: For example, a grammar containing the production A AA | would be ambiguous, because the substring has two parses: A AA A A A A A A A This ambiguity disappears if we use the productions This ambiguity disappears if we use the productions A AB | B and B A AB | B and B or the productions A BA | B and B . A BA | B and B .
  • Slide 22
  • Eliminating Ambiguity (Cont.) Examples of ambiguous productions: Examples of ambiguous productions: A A A A A | A and A A | A A A language generated by an ambiguous CFG is inherently ambiguous if it has no unambiguous CFG A language generated by an ambiguous CFG is inherently ambiguous if it has no unambiguous CFG An example of such a language is L={a i b j c m | i=j or j=m} which can be generated by the grammar: S AB | DC A aA | C cC | B bBc | D aDb |
  • Slide 23
  • Elimination of Left Recursion A grammar is left recursive if it has a nonterminal A and a derivation A A for some string Top-down parsing methods (to be discussed shortly) cannot handle left-recursive grammars, so a transformation to eliminate left recursion is needed. A grammar is left recursive if it has a nonterminal A and a derivation A A for some string Top-down parsing methods (to be discussed shortly) cannot handle left-recursive grammars, so a transformation to eliminate left recursion is needed. Immediate left recursion (productions of the form A A ) can be easily eliminated. Immediate left recursion (productions of the form A A ) can be easily eliminated. We group the A-productions as We group the A-productions as A A 1 | A 2 | | A m | 1 | 2 | | n A A 1 | A 2 | | A m | 1 | 2 | | n where no i begins with A. Then we replace the A- productions by where no i begins with A. Then we replace the A- productions by A 1 A | 2 A | | n A A 1 A | 2 A | | n A A 1 A | 2 A | | m A | A 1 A | 2 A | | m A |
  • Slide 24
  • Elimination of Left Recursion (Cont.) The previous transformation, however, does not eliminate left recursion involving two or more steps. For example, consider the grammar The previous transformation, however, does not eliminate left recursion involving two or more steps. For example, consider the grammar S A a | b A A c| Sd | S is left-recursive because S Aa Sda but it is not immediately left recursive S is left-recursive because S Aa Sda but it is not immediately left recursive
  • Slide 25
  • Elimination of Left Recursion (Cont.) Algorithm. Eliminate left recursion Arrange nonterminals in some order A 1, A 2,,, A n for i =1 to n { for j =1 to i -1 { for j =1 to i -1 { replace each production of the form A i A j replace each production of the form A i A j by the production A i 1 | 2 | | n by the production A i 1 | 2 | | n where A j 1 | 2 || n are all the current A j - productions where A j 1 | 2 || n are all the current A j - productions } eliminate the immediate left recursion among the A i - productions eliminate the immediate left recursion among the A i - productions}
  • Slide 26
  • Elimination of Left Recursion (Cont.) To show that the previous algorithm actually works all we need notice is that iteration i only changes productions with A i on the left-hand side. And m > i in all productions of the form A i A m To show that the previous algorithm actually works all we need notice is that iteration i only changes productions with A i on the left-hand side. And m > i in all productions of the form A i A m Induction proof: Induction proof: Clearly true for i=1 If it is true for all i i So, at the end of the algorithm, all derivations of the form A i A m will have m > i and therefore left recursion would not be possible So, at the end of the algorithm, all derivations of the form A i A m will have m > i and therefore left recursion would not be possible
  • Slide 27
  • Left Factoring Left factoring helps transform a grammar for predictive parsing Left factoring helps transform a grammar for predictive parsing For example, if we have the two productions For example, if we have the two productions S if then S else S S if then S else S | if then S | if then S on seeing the input token if, we cannot immediately tell which production to choose to expand S on seeing the input token if, we cannot immediately tell which production to choose to expand S In general, if we have A 1 | 2 and the input begins with , we do not know (without looking further) which production to use to expand A In general, if we have A 1 | 2 and the input begins with , we do not know (without looking further) which production to use to expand A
  • Slide 28
  • Left Factoring (Cont.) However, we may defer the decision by expanding A to A However, we m