CS412/413
Introduction toCompilers and Translators
Spring ’99
Lecture 3: Introduction to Syntactic Analysis
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
2
Outline
• Review of lexical analysis• Context-Free Grammars (CFGs)• Derivations and parse trees• Top-down vs. bottom-up parsing• Ambiguous grammars
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
3
Administration
• Group preference handouts due today
• Must turn in questionnaire to be considered part of class!
• Handout: Homework 1 - due next Friday
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
4
Where we areSource code(character stream)
Lexical analysis
Syntactic Analysis
Token stream
Abstract syntax tree(AST)
Semantic Analysis
if (b == 0) a = b;
if ( b ) a = b ;0==
if==
b 0
=
a b
if
;
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
5
What is Syntactic Analysis?
Source code(token stream)
{if (b == (0)) a = b;while (a != 1) {
stdio.print(a); a = a - 1;
}}
if_stmtbin_op
variable
b
constant
0
blockwhile_stmtbin_op
== != variable constant
block
expr_stmt
Abstract SyntaxTree
1a call.
stdio print variable
a
=...
...
...
...
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
6
Overview of Syntactic Analysis
• Input: stream of tokens• Output: abstract syntax tree
• Implementation:– Parse token stream to traverse
concrete syntax– During traversal, build abstract
syntax tree
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
7
Parsing• Parsing: recognizing whether a
program (or sentence) is grammatically well-formed & identifying the function of each component.“I gave him the book”
sentence
subject: I verb:gave object: himindirect object
noun phrase
article: the noun: book
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
8
What Parsing doesn’t do
• Doesn’t check many things: type agreement, variables initializedint x = true;int y; z = f(y);
• Deferred until semantic analysis
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
9
Specifying Language Syntax
• First problem: how to describe language syntax precisely and conveniently
• Last time: can describe tokens using regular expressions
• Regular expressions easy to implement, efficient (by converting to DFA)
• Why not use regular expressions (on tokens) to specify programming language syntax?
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
10
Limits of REs
• Programming languages are not regular -- cannot be described by regular exprs
• Consider: language of all strings that contain balanced parentheses (easier than PLs)() (()) ()()() (())()((()()))(( )( ()) (()()
• Problem: need to keep track of number of parentheses seen so far: unbounded counting
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
11
Need more power!
• RE = DFA• DFA has only finite number of
states; cannot perform unbounded counting( ( ( ( (
)))))
maximum depth: 5 parens
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
12
Context-Free Grammars
• A specification of the balanced-parenthesis language:S ( S ) SS
• The definition is recursive• This is a context-free grammar
– More expressive than regular expressions
– S = (S) = ((S) S) = (() ) = (())
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
13
RE is subset of CFGRegular Expressions
digit [0-9]posint digit+int -? posintreal int (ε | (. posint))
• Symbolic names are shorthand• All symbol names can be fully
expanded: not recursivereal -? [0-9]+ (ε | (. [0-9]+))
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
14
Definition of CFG• Terminals
– Symbols for strings or tokens (also ε)• Non-terminals
– Syntactic variables• Start symbol
– A special nonterminal is designated• Production
– Specifies how non-terminals may be expanded to form strings
– LHS: single non-terminal, RHS: string of terminals and non-terminals
S ( S ) S |
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
15
Sum grammar
S S + E | EE number | ( S )
accepts (1+2+ (3+4))+5
S S + ES EE numberE (S)
4 productions2 non-terminals (S, E)4 terminals: (, ), +, numberstart symbol S
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
16
Derivations
S S + E | EE number | ( S )
If a grammar accepts a string, there is a derivation of that string using the productions of the grammar
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
17
Derivation ExampleS S + E | E
E number | ( S )
Derive (1+2+ (3+4))+5:
S S + E E + E (S )+E (S + E)+E (S + E + E)+E (E + E + E)+E
(1 + E + E)+E (1 + 2+ E)+E … (1 + 2+ (3+4))+E
(1 + 2+ (3+4))+5replacement string
non-terminal being expanded
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
18
Constructing a derivation
• Start from start symbol (S)• Productions are used to derive a sequence of
tokens from the start symbol
• For arbitrary strings , and and a production A A single step of derivation is
A – i.e., substitute for an occurrence of A
– (S + E) + E (S + E + E)+E A = S, = S + E
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
19
Derivation -> Parse Tree
S S+E E+E (S)+E (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E
… (1+2+(3+4))+E (1+2+(3+4))+5
SS + E
E
( S )
5
S + E
S + E ( S )S + EE
E12
34
• Tree representation of the derivation
• Leaves of tree are terminals; in-order traversal yields string
• Internal nodes: non-terminals• Loses information about order
of derivation steps
ParseTree
Derivation
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
20
Parse Tree• Also called “concrete syntax”
SS + E
E
( S )
5
S + E
S + E ( S )S + EE
E12
34
+5+
++
3 41 2
parse tree/concrete syntax
abstractsyntax tree
(Contains information needed to evaluate
program)
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
21
Derivation order
• Can choose to apply productions in any order; select any non-terminal AA
• Two standard orders: left- and right-most -- useful for automatic parsing
• Leftmost derivation– In the string, find the left-most non-
terminal and apply a production to it
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
22
Example• Left-most derivationS S+E E+E (S)+E (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E (1+2+
(S))+E (1+2+(S+E))+E (1+2+(E+E))+E(1+2+(3+E))+E (1+2+(3+4))+E (1+2+(3+4))
+5
• Right-most derivationS S+E E+5 (S)+5 (S+E)+5 (S+(S))+5
(S+(S+E))+5 (S+(S+4))+5 (S+(E+4))+5 (S+(3+4))+5 (S+E+(3+4))+5 (S+2+(3+4))+5
(E+2+(3+4))+5 (1+2+(3+4))+5
• Same parse tree: same productions chosen, diff. order
S S + E | EE number | ( S )
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
23
Top-down vs. Bottom-up Parsing
• We normally scan in tokens from left to right• Left-most derivation reflects top-down
parsing– Start with the start symbol– End with the string of tokens
S S+E E+E (S)+E (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E (1+2+
(S))+E (1+2+(S+E))+E (1+2+(E+E))+E(1+2+(3+E))+E (1+2+(3+4))+E (1+2+(3+4))+5
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
24
Top-down parsing
S S+E E+E (S)+E (S+E)+E (S+E+E)+E
(E+E+E)+E (1+E+E)+E(1+2+E)+E ...
• Entire tree above a token (2) has been expanded when encountered
SS + E
E
( S )
5
S + E
S + E ( S )S + EE
E12
34
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
25
Bottom-up Parsing
• Right-most derivation reflects bottom-up parsing– Start with the tokens– End with the start symbol
(1+2+(3+4))+5 (E+2+(3+4))+5 (S+2+(3+4))+5 (S+E+(3+4))+5 (S+(3+4))+5 (S+(E+4))+5 (S+(S+4))+5 (S+(S+E))+5 (S+(S))+5 (S+E)+5 (S)+5 E+5 S+E S
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
26
Bottom-up parsing
• (1+2+(3+4))+5 (E+2+(3+4))+5 (S+2+(3+4))+5 (S+E+(3+4))+5 …
• Advantage of bottom-up parsing: can select productions based on more information
SS + E
E
( S )
5
S + E
S + E ( S )S + EE
E12
34
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
27
Ambiguous Grammars
• In example grammar, left-most and right-most derivations produced identical parse trees
• + operator associates to left in parse tree regardless of derivation order +
5+++
3 41 2
(1+2+(3+4))+5
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
28
An Ambiguous Grammar• + associates to left because of
production S S + E• Consider another grammar:
S S + S | S * S | number
• Different derivations produce different parse trees: ambiguous grammar
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
29
Differing Parse Trees
• Consider expression 1 + 2 * 3• Derivation 1: S S + S 1 + S 1 +
S * S 1 + 2 * S 1 + 2 * 3
• Derivation 2: S S * S S * 3 S + S * 3 S + 2 * 3 1 + 2 * 3
S S + S | S * S | number
1 2 3*
+
1 2 3
*1 2+
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
30
Impact of Ambiguity
1 2 3*
+
1 2 3= 7 = 9+
• Different parse trees correspond to different evaluations!
• Meaning of program ill-defined
*
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
31
Eliminating Ambiguity
• Often can eliminate ambiguity by adding non-terminalsS S + T | TT T * num | num
• Enforces precedence, left-associativity
S
S + T
T * 3T
1 2
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
32
Summary• Context-free grammars allow concise
specification of programming languages• More powerful than regular expressions• CFG converts token stream to parse
tree• Read Appel 3.1, 3.2• Next time: implementing a top-down
parser