COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars

COS 320Compilers

David Walker

last time

• context free grammars (Appel 3.1)– terminals, non-terminals, rules– derivations & parse trees– ambiguous grammars

• recursive descent parsers (Appel 3.2)– parse LL(k) grammars– easy to write as ML programs– algorithms for automatic construction from a CFG

1. S ::= IF E THEN S ELSE S2. | BEGIN S L3. | PRINT E

4. L ::= END5. | ; S L6. E ::= NUM = NUM

non-terminals: S, E, Lterminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, ;, =rules:

fun S () = case !tok of IF => eat IF; E (); eat THEN; S (); eat ELSE; S () | BEGIN => eat BEGIN; S (); L () | PRINT => eat PRINT; E ()

and L () = case !tok of END => eat END | SEMI => eat SEMI; S (); L ()

and E () = eat NUM; eat EQ; eat NUM

val tok = ref (getToken ())fun advance () = tok := getToken ()fun eat t = if (! tok = t) then advance () else error ()

datatype token = NUM | IF | THEN | ELSE | BEGIN | END | PRINT | SEMI | EQ

Constructing RD Parsers

• To construct an RD parser, we need to know what rule to apply when– we have seen a non terminal X– we see the next terminal a in input

• We apply rule X ::= s when– a is the first symbol that can be generated by string s,

OR– s reduces to the empty string (is nullable) and a is the

first symbol in any string that can follow X

Computing Nullable Sets

• Non-terminal X is Nullable only if the following constraints are satisfied (computed using iterative analysis)– base case:

• if (X := ) then X is Nullable

– iterative/inductive case:• if (X := ABC...) and A, B, C, ... are all Nullable then

X is Nullable

Computing First Sets

• First(X) is computed iteratively– base case:

• if T is a terminal symbol then First (T) := {T}• Otherwise First (T) := { }

– iterative/inductive case:• if X is a non-terminal and (X:= ABC...) then

– First (X) := First (X) U First (ABC...) where First(ABC...) = F1 U F2 U F3 U ... and

» F1 = First (A)» F2 = First (B), if A is Nullable» F3 = First (C), if A is Nullable & B is Nullable» ...

Computing Follow Sets

• Follow(X) is computed iteratively– base case:

• initially, we assume nothing in particular follows X– Follow (X) := { } for all X

– inductive case:• if (Y := s1 X s2) for any strings s1, s2 then

– Follow (X) := First (s2) U Follow (X)

• if (Y := s1 X s2) for any strings s1, s2 then– Follow (X) := Follow(Y) U Follow (X), if s2 is Nullable

building a predictive parser

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z

Y

X


Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no

Y yes

X no

base case


Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no

Y yes

X no

after one round of induction, we realize we have reached a fixed point


Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no { }

Y yes { }

X no { }

base case


Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no d

Y yes c

X no a,b

round 1, no fixed point


Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no d,a,b

Y yes c

X no a,b

round 2, no fixed point


Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no d,a,b

Y yes c

X no a,b

round 3, no more changes ==> fixed point


Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no d,a,b { }

Y yes c { }

X no a,b { }

base case


Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,d,a,b

after one round of induction, no fixed point


Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,d,a,b

after two rounds, fixed point (but notice, computing Follow(X) before Follow (Y) would have required 3rd round)

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,d,a,b

Grammar: Computed Sets:

Build parsing table where row X, col Ttells parser which clause to execute infunction X with next-token T:

a b c d e

Z

Y

X

• if T First(s) then enter (X ::= s) in row X, col T• if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b



a b c d e

Z Z ::= XYZ Z ::= XYZ

Y

X


Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b



a b c d e

Z Z ::= XYZ Z ::= XYZ Z ::= d

Y

X


Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b



a b c d e

Z Z ::= XYZ Z ::= XYZ Z ::= d

Y Y ::= c

X


Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b



a b c d e

Z Z ::= XYZ Z ::= XYZ Z ::= d

Y Y ::= Y ::= Y ::= c Y ::= Y ::=

X


Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b



a b c d e

Z Z ::= XYZ Z ::= XYZ Z ::= d

Y Y ::= Y ::= Y ::= c Y ::= Y ::=

X X ::= a X ::= b Y e


Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b


What are the blanks?

a b c d e

Z Z ::= XYZ Z ::= XYZ Z ::= d

Y Y ::= Y ::= Y ::= c Y ::= Y ::=

X X ::= a X ::= b Y e

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b


What are the blanks? --> syntax errors

a b c d e

Z Z ::= XYZ Z ::= XYZ Z ::= d

Y Y ::= Y ::= Y ::= c Y ::= Y ::=

X X ::= a X ::= b Y e

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e


Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b


Is it possible to put 2 grammar rules in the same box?

a b c d e

Z Z ::= XYZ Z ::= XYZ Z ::= d

Y Y ::= Y ::= Y ::= c Y ::= Y ::=

X X ::= a X ::= b Y e

Z ::= X Y ZZ ::= dZ ::= d e

Y ::= cY ::=

X ::= aX ::= b Y e


Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b


Is it possible to put 2 grammar rules in the same box?

a b c d e

Z Z ::= XYZ Z ::= XYZ Z ::= d

Z ::= d e

Y Y ::= Y ::= Y ::= c Y ::= Y ::=

X X ::= a X ::= b Y e

predictive parsing tables

• if a predictive parsing table constructed this way contains no duplicate entries, the grammar is called LL(1)– Left-to-right parse, Left-most derivation, 1 symbol

lookahead

• if not, of the grammar is not LL(1)• in LL(k) parsing table, columns include every k-

length sequence of terminals:

aa ab ba bb ac ca ...

another trick

• Previously, we saw that grammars with left-recursion were problematic, but could be transformed into LL(1) in some cases

• the example non-LL(1) grammar we just saw:

• how do we fix it?

Z ::= X Y ZZ ::= dZ ::= d e

Y ::= cY ::=

X ::= aX ::= b Y e

another trick

• Previously, we saw that grammars with left-recursion were problematic, but could be transformed into LL(1) in some cases

• the example non-LL(1) grammar we just saw:

• solution here is left-factoring:

Z ::= X Y ZZ ::= dZ ::= d e

Y ::= cY ::=

X ::= aX ::= b Y e

Z ::= X Y ZZ ::= d W Y ::= c

Y ::= X ::= aX ::= b Y e

W ::= W ::= e

summary of RD parsing

• CFGs are good at specifying programming language structure• parsing general CFGs is expensive so we define parsers for

simpler classes of CFG– LL(k), LR(k)

• we can build a recursive descent parser for LL(k) grammars by:– computing nullable, first and follow sets– constructing a parse table from the sets– checking for duplicate entries, which indicates failure– creating an ML program from the parse table

• if parser construction fails we can– rewrite the grammar (left factoring, eliminating left recursion) and try

again– try to build a parser using some other method

summary of RD parsing

• CFGs are good at specifying programming language structure• parsing general CFGs is expensive so we define parsers for

simpler classes of CFG– LL(k), LR(k)

• we can build a recursive descent parser for LL(k) grammars by:– computing nullable, first and follow sets– constructing a parse table from the sets– checking for duplicate entries, which indicates failure– creating an ML program from the parse table

• if parser construction fails we can– rewrite the grammar (left factoring, eliminating left recursion) and try

again– try to build a parser using some other method...such as using a bottom-

up parsing technique

Bottom-up (Shift-Reduce) Parsing

shift-reduce parsing

• shift-reduce parsing– aka: bottom-up parsing– aka: LR(k) Left-to-right parse, Rightmost

derivation, k-token lookahead

• more powerful than LL(k) parsers• LALR variant:

– the basis for parsers for most modern programming languages

– implemented in tools such as ML-Yacc

Shift-reduce algorithm

• Parser keeps track of– position in current input (what input to read next)– a stack of terminal & non-terminal symbols representing the

“parse so far”

• Based on next input symbol & stack, parser table indicates– shift: push next input on to top of stack– reduce R:

• top of stack should match RHS of rule• replace top of stack with LHS of rule

– error– accept (we shift EOF & can reduce what remains on stack to

start symbol)

shift-reduce parsing example

A ::= S EOF L ::= L ; SL ::= S

Grammar:

S ::= ( L )S ::= id = num

Parsing Table

State of parse so far:

( id = num ; id = num ) EOF


A ::= S EOF L ::= L ; SL ::= S

Grammar:

S ::= ( L )S ::= id = num

Input from lexer:

yet to read

Parsing Table




A ::= S EOF L ::= L ; SL ::= S

Grammar:

S ::= ( L )S ::= id = num

Input from lexer:

(

yet to read

Parsing Table

SHIFT




A ::= S EOF L ::= L ; SL ::= S

Grammar:

S ::= ( L )S ::= id = num

Input from lexer:

( id

yet to read

Parsing Table

SHIFT




A ::= S EOF L ::= L ; SL ::= S

Grammar:

S ::= ( L )S ::= id = num

Input from lexer:

( id =

yet to read

Parsing Table

SHIFT




A ::= S EOF L ::= L ; SL ::= S

Grammar:

S ::= ( L )S ::= id = num

Input from lexer:

( id = num

yet to read

Parsing Table

SHIFT




A ::= S EOF L ::= L ; SL ::= S

Grammar:

S ::= ( L )S ::= id = num

Input from lexer:

( S

yet to read

Parsing Table

REDUCES ::= id = num




A ::= S EOF L ::= L ; SL ::= S

Grammar:

S ::= ( L )S ::= id = num

Input from lexer:

( L

yet to read

Parsing Table

REDUCEL ::= S




A ::= S EOF L ::= L ; SL ::= S

Grammar:

S ::= ( L )S ::= id = num

Input from lexer:

( L ;

yet to read

Parsing Table

SHIFT




A ::= S EOF L ::= L ; SL ::= S

Grammar:

S ::= ( L )S ::= id = num

Input from lexer:

( L ; id = num

yet to read

Parsing Table

SHIFTSHIFTSHIFT




A ::= S EOF L ::= L ; SL ::= S

Grammar:

S ::= ( L )S ::= id = num

Input from lexer:

( L ; S

yet to read

Parsing Table

REDUCES ::= id = num




A ::= S EOF L ::= L ; SL ::= S

Grammar:

S ::= ( L )S ::= id = num

Input from lexer:

( L

yet to read

Parsing Table

REDUCES ::= L ; S




A ::= S EOF L ::= L ; SL ::= S

Grammar:

S ::= ( L )S ::= id = num

Input from lexer:

( L )

yet to read

Parsing Table

SHIFT




A ::= S EOF L ::= L ; SL ::= S

Grammar:

S ::= ( L )S ::= id = num

Input from lexer:

S

yet to read

Parsing Table

REDUCES ::= ( L )




A ::= S EOF L ::= L ; SL ::= S

Grammar:

S ::= ( L )S ::= id = num

Input from lexer:

A

Parsing Table

SHIFT

REDUCEA ::= S EOF

ACCEPT




A ::= S EOF L ::= L ; SL ::= S

Grammar:

S ::= ( L )S ::= id = num

Input from lexer:

A

Parsing Table

A successful parse! Is this grammar LL(1)?

Shift-reduce algorithm• Parser keeps track of

– position in current input (what input to read next)– a stack of terminal & non-terminal symbols representing the “parse so

far”• Based on next input symbol & stack, parser table indicates

– shift: push next input on to top of stack– reduce R:

• top of stack should match RHS of rule• replace top of stack with LHS of rule

– error– accept (we shift EOF & can reduce what remains on stack to start

symbol)• Reinterpreting the entire stack on every iteration would be very slow

– O(averageStackSize * input) – need optimized algorithm that only looks at top of stack (plus parsing

table to figure out what to do. O(input)

Shift-reduce algorithm (details)

• The parser summarizes the current “parse state” using an integer– the integer is actually a state in a finite automaton– the current parse state can be computed by running the automaton over

the current parse stack• Revised algorithm: Based on next input symbol & the parse

state (as opposed to the entire stack), parser table indicates– shift s:

• push next input on to top of stack and move automaton into state s– reduce R & goto s:

• top of stack should match RHS of rule• replace top of stack with LHS of rule• move automaton into state s• build parse tree corresponding to reduction R

– error– accept


???? ???? EOF

shift-reduce parsingGrammar:

Input from lexer:

????

Like LL parsing, shift-reduce parsing does not always work.What sort of grammar rules make shift-reduce parsing impossible?

????


???? z??? EOF

shift-reduce parsingGrammar:

Input from lexer:

??cd

Like LL parsing, shift-reduce parsing does not always work.

• Shift-Reduce errors: can’t decide whether to Shift z or Reduce cd by a rule

• Reduce-Reduce errors: can’t decide whether to Reduce by R1 or R2

????


shift-reduce errors

A ::= S EOF

Grammar:

S ::= S + SS ::= S * SS ::= id

Input from lexer: ???? ???? EOF

????

notice, this is an ambiguous grammar – we are always going toneed some mechanism for resolvingthe outstanding ambiguity before parsing


id + id * id EOF

shift-reduce errors

A ::= S EOF

Grammar:

S ::= S + SS ::= S * SS ::= id

Input from lexer:

S + S

• reduce by rule (S ::= S + S) or

• shift the * ???


S S+idid

parse tree so far:


id + id * id EOF

shift-reduce errors

A ::= S EOF

Grammar:

S ::= S + SS ::= S * SS ::= id

Input from lexer:

S

• reduce by rule (S ::= S + S)


S

S S+

reduce:

idid


id + id * id EOF

shift-reduce errors

A ::= S EOF

Grammar:

S ::= S + SS ::= S * SS ::= id

Input from lexer:

S *



S

S S+

shift:


id + id * id EOF

shift-reduce errors

A ::= S EOF

Grammar:

S ::= S + SS ::= S * SS ::= id

Input from lexer:

S * id



S

S S+

shift:


id + id * id EOF

shift-reduce errors

A ::= S EOF

Grammar:

S ::= S + SS ::= S * SS ::= id

Input from lexer:

S * S



S

S S+

reduce:

S

id


id + id * id EOF

shift-reduce errors

A ::= S EOF

Grammar:

S ::= S + SS ::= S * SS ::= id

Input from lexer:

S



S

S S+

reduce:

S

id

*

S


id + id * id EOF

alternative parse

A ::= S EOF

Grammar:

S ::= S + SS ::= S * SS ::= id

Input from lexer:

S + S


• shift the *


S +

S

id

id


id + id * id EOF

shift-reduce errors

A ::= S EOF

Grammar:

S ::= S + SS ::= S * SS ::= id

Input from lexer:

S + S *


• shift the *


S +

S

shift:

id

id*


id + id * id EOF

shift-reduce errors

A ::= S EOF

Grammar:

S ::= S + SS ::= S * SS ::= id

Input from lexer:

S + S * id


• shift the *


shift:

S +

S

id

*id id


id + id * id EOF

shift-reduce errors

A ::= S EOF

Grammar:

S ::= S + SS ::= S * SS ::= id

Input from lexer:

S + S * S


• shift the *


reduce:

S +

S

id

*id id

S


id + id * id EOF

shift-reduce errors

A ::= S EOF

Grammar:

S ::= S + SS ::= S * SS ::= id

Input from lexer:

S + S


• shift the *


reduce:

S +

S

id

*id id

S

S


id + id * id EOF

shift-reduce errors

A ::= S EOF

Grammar:

S ::= S + SS ::= S * SS ::= id

Input from lexer:

S


• shift the *


S

S S+

S S*

reduce:

State of parse so far:id + id * id EOF

shift-reduce errors

A ::= S EOF

Grammar:

S ::= S + SS ::= S * SS ::= id

Input from lexer: S + S


S

S S+

S S*

reduce by rule (S ::= S + S) : reduce shift the *:

S

S S+

S

id

*

S

id


shift-reduce errors

A ::= S id EOF

Grammar:

E ::= E ; EE ::= id

Input from lexer: id ; id EOF

E ;

S ::= E ;

• reduce by rule (S ::= E ;) or

• shift the id

id ; id ; id EOF

input might be this,making shiftingcorrect


( id ) EOF

reduce-reduce errors

A ::= S EOF

Grammar:

S ::= ( E )S ::= E

Input from lexer:

( E )

• reduce by rule ( S ::= ( E ) ) or

• reduce by rule ( E ::= ( E ) )

E ::= ( E )E ::= E + EE ::= id

Summary

• Top-down Parsing– simple to understand and implement– you can code it yourself using nullable, first, follow

sets– excellent for quick & dirty parsing jobs

• Bottom-up Parsing– more complex: uses stack & table– more powerful– Bonus: tools do the work for you ==> ML-Yacc

• but you need to understand how shift-reduce & reduce-reduce errors can arise

Documents

COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars