Transcript

CS412/413

Introduction toCompilers and Translators

Spring ’99

Lecture 3: Introduction to Syntactic Analysis

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

2

Outline

• Review of lexical analysis• Context-Free Grammars (CFGs)• Derivations and parse trees• Top-down vs. bottom-up parsing• Ambiguous grammars

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

3

Administration

• Group preference handouts due today

• Must turn in questionnaire to be considered part of class!

• Handout: Homework 1 - due next Friday

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

4

Where we areSource code(character stream)

Lexical analysis

Syntactic Analysis

Token stream

Abstract syntax tree(AST)

Semantic Analysis

if (b == 0) a = b;

if ( b ) a = b ;0==

if==

b 0

=

a b

if

;

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

5

What is Syntactic Analysis?

Source code(token stream)

{if (b == (0)) a = b;while (a != 1) {

stdio.print(a); a = a - 1;

}}

if_stmtbin_op

variable

b

constant

0

blockwhile_stmtbin_op

== != variable constant

block

expr_stmt

Abstract SyntaxTree

1a call.

stdio print variable

a

=...

...

...

...

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

6

Overview of Syntactic Analysis

• Input: stream of tokens• Output: abstract syntax tree

• Implementation:– Parse token stream to traverse

concrete syntax– During traversal, build abstract

syntax tree

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

7

Parsing• Parsing: recognizing whether a

program (or sentence) is grammatically well-formed & identifying the function of each component.“I gave him the book”

sentence

subject: I verb:gave object: himindirect object

noun phrase

article: the noun: book

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

8

What Parsing doesn’t do

• Doesn’t check many things: type agreement, variables initializedint x = true;int y; z = f(y);

• Deferred until semantic analysis

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

9

Specifying Language Syntax

• First problem: how to describe language syntax precisely and conveniently

• Last time: can describe tokens using regular expressions

• Regular expressions easy to implement, efficient (by converting to DFA)

• Why not use regular expressions (on tokens) to specify programming language syntax?

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

10

Limits of REs

• Programming languages are not regular -- cannot be described by regular exprs

• Consider: language of all strings that contain balanced parentheses (easier than PLs)() (()) ()()() (())()((()()))(( )( ()) (()()

• Problem: need to keep track of number of parentheses seen so far: unbounded counting

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

11

Need more power!

• RE = DFA• DFA has only finite number of

states; cannot perform unbounded counting( ( ( ( (

)))))

maximum depth: 5 parens

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

12

Context-Free Grammars

• A specification of the balanced-parenthesis language:S ( S ) SS

• The definition is recursive• This is a context-free grammar

– More expressive than regular expressions

– S = (S) = ((S) S) = (() ) = (())

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

13

RE is subset of CFGRegular Expressions

digit [0-9]posint digit+int -? posintreal int (ε | (. posint))

• Symbolic names are shorthand• All symbol names can be fully

expanded: not recursivereal -? [0-9]+ (ε | (. [0-9]+))

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

14

Definition of CFG• Terminals

– Symbols for strings or tokens (also ε)• Non-terminals

– Syntactic variables• Start symbol

– A special nonterminal is designated• Production

– Specifies how non-terminals may be expanded to form strings

– LHS: single non-terminal, RHS: string of terminals and non-terminals

S ( S ) S |

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

15

Sum grammar

S S + E | EE number | ( S )

accepts (1+2+ (3+4))+5

S S + ES EE numberE (S)

4 productions2 non-terminals (S, E)4 terminals: (, ), +, numberstart symbol S

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

16

Derivations

S S + E | EE number | ( S )

If a grammar accepts a string, there is a derivation of that string using the productions of the grammar

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

17

Derivation ExampleS S + E | E

E number | ( S )

Derive (1+2+ (3+4))+5:

S S + E E + E (S )+E (S + E)+E (S + E + E)+E (E + E + E)+E

(1 + E + E)+E (1 + 2+ E)+E … (1 + 2+ (3+4))+E

(1 + 2+ (3+4))+5replacement string

non-terminal being expanded

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

18

Constructing a derivation

• Start from start symbol (S)• Productions are used to derive a sequence of

tokens from the start symbol

• For arbitrary strings , and and a production A A single step of derivation is

A – i.e., substitute for an occurrence of A

– (S + E) + E (S + E + E)+E A = S, = S + E

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

19

Derivation -> Parse Tree

S S+E E+E (S)+E (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E

… (1+2+(3+4))+E (1+2+(3+4))+5

SS + E

E

( S )

5

S + E

S + E ( S )S + EE

E12

34

• Tree representation of the derivation

• Leaves of tree are terminals; in-order traversal yields string

• Internal nodes: non-terminals• Loses information about order

of derivation steps

ParseTree

Derivation

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

20

Parse Tree• Also called “concrete syntax”

SS + E

E

( S )

5

S + E

S + E ( S )S + EE

E12

34

+5+

++

3 41 2

parse tree/concrete syntax

abstractsyntax tree

(Contains information needed to evaluate

program)

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

21

Derivation order

• Can choose to apply productions in any order; select any non-terminal AA

• Two standard orders: left- and right-most -- useful for automatic parsing

• Leftmost derivation– In the string, find the left-most non-

terminal and apply a production to it

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

22

Example• Left-most derivationS S+E E+E (S)+E (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E (1+2+

(S))+E (1+2+(S+E))+E (1+2+(E+E))+E(1+2+(3+E))+E (1+2+(3+4))+E (1+2+(3+4))

+5

• Right-most derivationS S+E E+5 (S)+5 (S+E)+5 (S+(S))+5

(S+(S+E))+5 (S+(S+4))+5 (S+(E+4))+5 (S+(3+4))+5 (S+E+(3+4))+5 (S+2+(3+4))+5

(E+2+(3+4))+5 (1+2+(3+4))+5

• Same parse tree: same productions chosen, diff. order

S S + E | EE number | ( S )

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

23

Top-down vs. Bottom-up Parsing

• We normally scan in tokens from left to right• Left-most derivation reflects top-down

parsing– Start with the start symbol– End with the string of tokens

S S+E E+E (S)+E (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E (1+2+

(S))+E (1+2+(S+E))+E (1+2+(E+E))+E(1+2+(3+E))+E (1+2+(3+4))+E (1+2+(3+4))+5

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

24

Top-down parsing

S S+E E+E (S)+E (S+E)+E (S+E+E)+E

(E+E+E)+E (1+E+E)+E(1+2+E)+E ...

• Entire tree above a token (2) has been expanded when encountered

SS + E

E

( S )

5

S + E

S + E ( S )S + EE

E12

34

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

25

Bottom-up Parsing

• Right-most derivation reflects bottom-up parsing– Start with the tokens– End with the start symbol

(1+2+(3+4))+5 (E+2+(3+4))+5 (S+2+(3+4))+5 (S+E+(3+4))+5 (S+(3+4))+5 (S+(E+4))+5 (S+(S+4))+5 (S+(S+E))+5 (S+(S))+5 (S+E)+5 (S)+5 E+5 S+E S

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

26

Bottom-up parsing

• (1+2+(3+4))+5 (E+2+(3+4))+5 (S+2+(3+4))+5 (S+E+(3+4))+5 …

• Advantage of bottom-up parsing: can select productions based on more information

SS + E

E

( S )

5

S + E

S + E ( S )S + EE

E12

34

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

27

Ambiguous Grammars

• In example grammar, left-most and right-most derivations produced identical parse trees

• + operator associates to left in parse tree regardless of derivation order +

5+++

3 41 2

(1+2+(3+4))+5

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

28

An Ambiguous Grammar• + associates to left because of

production S S + E• Consider another grammar:

S S + S | S * S | number

• Different derivations produce different parse trees: ambiguous grammar

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

29

Differing Parse Trees

• Consider expression 1 + 2 * 3• Derivation 1: S S + S 1 + S 1 +

S * S 1 + 2 * S 1 + 2 * 3

• Derivation 2: S S * S S * 3 S + S * 3 S + 2 * 3 1 + 2 * 3

S S + S | S * S | number

1 2 3*

+

1 2 3

*1 2+

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

30

Impact of Ambiguity

1 2 3*

+

1 2 3= 7 = 9+

• Different parse trees correspond to different evaluations!

• Meaning of program ill-defined

*

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

31

Eliminating Ambiguity

• Often can eliminate ambiguity by adding non-terminalsS S + T | TT T * num | num

• Enforces precedence, left-associativity

S

S + T

T * 3T

1 2

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

32

Summary• Context-free grammars allow concise

specification of programming languages• More powerful than regular expressions• CFG converts token stream to parse

tree• Read Appel 3.1, 3.2• Next time: implementing a top-down

parser


Recommended