CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis

Embed Size (px)

Text of CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to...

  • Slide 1

CS412/413 Introduction to Compilers and Translators Spring 99 Lecture 3: Introduction to Syntactic Analysis Slide 2 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 2 Outline Review of lexical analysis Context-Free Grammars (CFGs) Derivations and parse trees Top-down vs. bottom-up parsing Ambiguous grammars Slide 3 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 3 Administration Group preference handouts due today Must turn in questionnaire to be considered part of class! Handout: Homework 1 - due next Friday Slide 4 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 4 Where we are Source code (character stream) Lexical analysis Syntactic Analysis Token stream Abstract syntax tree (AST) Semantic Analysis if (b == 0) a = b; if(b)a=b;0== if == b0 = ab if ; Slide 5 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 5 What is Syntactic Analysis? Source code (token stream) { if (b == (0)) a = b; while (a != 1) { stdio.print(a); a = a - 1; } if_stmt bin_op variable b constant 0 block while_stmt bin_op == !=variableconstant block expr_stmt Abstract Syntax Tree 1acall. stdio print variable a =... Slide 6 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 6 Overview of Syntactic Analysis Input: stream of tokens Output: abstract syntax tree Implementation: Parse token stream to traverse concrete syntax During traversal, build abstract syntax tree Slide 7 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 7 Parsing Parsing: recognizing whether a program (or sentence) is grammatically well- formed & identifying the function of each component. I gave him the book sentence subject: Iverb:gaveobject: him indirect object noun phrase article: the noun: book Slide 8 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 8 What Parsing doesnt do Doesnt check many things: type agreement, variables initialized int x = true; int y; z = f(y); Deferred until semantic analysis Slide 9 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 9 Specifying Language Syntax First problem: how to describe language syntax precisely and conveniently Last time: can describe tokens using regular expressions Regular expressions easy to implement, efficient (by converting to DFA) Why not use regular expressions (on tokens) to specify programming language syntax? Slide 10 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 10 Limits of REs Programming languages are not regular -- cannot be described by regular exprs Consider: language of all strings that contain balanced parentheses (easier than PLs) () (()) ()()() (())()((()())) (( )( ()) (()() Problem: need to keep track of number of parentheses seen so far: unbounded counting Slide 11 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 11 Need more power! RE = DFA DFA has only finite number of states; cannot perform unbounded counting ( ( (( ( ))))) maximum depth: 5 parens Slide 12 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 12 Context-Free Grammars A specification of the balanced- parenthesis language: S ( S ) S S The definition is recursive This is a context-free grammar More expressive than regular expressions S = (S) = ((S) S) = (( ) ) = (()) Slide 13 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 13 RE is subset of CFG Regular Expressions digit [0-9] posint digit+ int -? posint real int ( | (. posint)) Symbolic names are shorthand All symbol names can be fully expanded: not recursive real -? [0-9]+ ( | (. [0-9]+ )) Slide 14 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 14 Definition of CFG Terminals Symbols for strings or tokens (also ) Non-terminals Syntactic variables Start symbol A special nonterminal is designated Production Specifies how non-terminals may be expanded to form strings LHS: single non-terminal, RHS: string of terminals and non-terminals S ( S ) S | Slide 15 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 15 Sum grammar S S + E | E E number | ( S ) accepts (1+2+ (3+4))+5 S S + E S E E number E (S) 4 productions 2 non-terminals (S, E) 4 terminals: (, ), +, number start symbol S Slide 16 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 16 Derivations S S + E | E E number | ( S ) If a grammar accepts a string, there is a derivation of that string using the productions of the grammar Slide 17 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 17 Derivation Example S S + E | E E number | ( S ) Derive (1+2+ (3+4))+5: S S + E E + E (S )+E (S + E)+E (S + E + E)+E (E + E + E)+E (1 + E + E)+E (1 + 2+ E)+E (1 + 2+ (3+4))+E (1 + 2+ (3+4))+5 replacement string non-terminal being expanded Slide 18 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 18 Constructing a derivation Start from start symbol (S) Productions are used to derive a sequence of tokens from the start symbol For arbitrary strings , and and a production A A single step of derivation is A i.e., substitute for an occurrence of A (S + E) + E (S + E + E)+E A = S, = S + E Slide 19 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 19 Derivation -> Parse Tree S S+E E+E (S)+E (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E (1+2+(3+4))+E (1+2+(3+4))+5 S S + E E ( S ) 5 S + E ( S ) S + E E E 1 2 3 4 Tree representation of the derivation Leaves of tree are terminals; in-order traversal yields string Internal nodes: non-terminals Loses information about order of derivation steps Parse Tree Derivation Slide 20 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 20 Parse Tree Also called concrete syntax S S + E E ( S ) 5 S + E ( S ) S + E E E 1 2 3 4 + 5+ ++ 3412 parse tree/ concrete syntax abstract syntax tree (Contains information needed to evaluate program) Slide 21 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 21 Derivation order Can choose to apply productions in any order; select any non-terminal A A Two standard orders: left- and right-most -- useful for automatic parsing Leftmost derivation In the string, find the left-most non- terminal and apply a production to it Slide 22 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 22 Example Left-most derivation S S+E E+E (S)+E (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E (1+2+(S))+E (1+2+(S+E))+E (1+2+(E+E))+E (1+2+(3+E))+E (1+2+(3+4))+E (1+2+(3+4))+5 Right-most derivation S S+E E+5 (S)+5 (S+E)+5 (S+(S))+5 (S+(S+E))+5 (S+(S+4))+5 (S+(E+4))+5 (S+(3+4))+5 (S+E+(3+4))+5 (S+2+(3+4))+5 (E+2+(3+4))+5 (1+2+(3+4))+5 Same parse tree: same productions chosen, diff. order S S + E | E E number | ( S ) Slide 23 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 23 Top-down vs. Bottom-up Parsing We normally scan in tokens from left to right Left-most derivation reflects top-down parsing Start with the start symbol End with the string of tokens S S+E E+E (S)+E (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E (1+2+(S))+E (1+2+(S+E))+E (1+2+(E+E))+E (1+2+(3+E))+E (1+2+(3+4))+E (1+2+(3+4))+5 Slide 24 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 24 Top-down parsing S S+E E+E (S)+E (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E... Entire tree above a token (2) has been expanded when encountered S S + E E ( S ) 5 S + E ( S ) S + E E E 1 2 3 4 Slide 25 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 25 Bottom-up Parsing Right-most derivation reflects bottom-up parsing Start with the tokens End with the start symbol (1+2+(3+4))+5 (E+2+(3+4))+5 (S+2+(3+4))+5 (S+E+(3+4))+5 (S+(3+4))+5 (S+(E+4))+5 (S+(S+4))+5 (S+(S+E))+5 (S+(S))+5 (S+E)+5 (S)+5 E+5 S+E S Slide 26 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 26 Bottom-up parsing (1+2+(3+4))+5 (E+2+(3+4))+5 (S+2+(3+4))+5 (S+E+(3+4))+5 Advantage of bottom-up parsing: can select productions based on more information S S + E E ( S ) 5 S + E S + ES + E ( S ) S + E E E 1 2 3 4 Slide 27 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 27 Ambiguous Grammars In example grammar, left-most and right-most derivations produced identical parse trees + operator associates to left in parse tree regardless of derivation order + 5+ ++ 3412 (1+2+(3+4))+5 Slide 28 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 28 An Ambiguous Grammar + associates to left because of production S S + E Consider another grammar: S S + S | S * S | number Different derivations produce different parse trees: ambiguous grammar Slide 29 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 29 Differing Parse Trees Consider expression 1 + 2 * 3 Derivation 1: S S + S 1 + S 1 + S * S 1 + 2 * S 1 + 2 * 3 Derivation 2: S S * S S * 3 S + S * 3 S + 2 * 3 1 + 2 * 3 S S + S | S * S | number 123 * + 123 * 1 2 + Slide 30 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 30 Impact of Ambiguity 123 * + 123 = 7 = 9 + Different parse trees correspond to different evaluations! Meaning of program ill-defined * Slide 31 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 31 Eliminating Ambiguity Often can eliminate ambiguity by adding non-terminals S S + T | T T T * num | num Enforces precedence, left-associativity S S + T T * 3 T 1 2 Slide 32 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andr