View
4
Download
0
Category
Preview:
Citation preview
CS415 Compilers
LR Parsing & Error Recovery
These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice
University
cs415, spring 14 Lecture 13 2
Review: LR(k) items
The LR(1) table construction algorithm uses LR(1) items to represent valid configurations of an LR(1) parser
An LR(k) item is a pair [P, δ], where P is a production A→β with a • at some position in the rhs δ is a lookahead string of length ≤ k (words or EOF)
The • in an item indicates the position of the top of the stack LR(1): [A→•βγ,a] means that the input seen so far is consistent with the use
of A →βγ immediately after the symbol on top of the stack [A →β•γ,a] means that the input seen so far is consistent with the use
of A →βγ at this point in the parse, and that the parser has already recognized β.
[A →βγ•,a] means that the parser has seen βγ, and that a lookahead symbol of a is consistent with reducing to A.
cs415, spring 14 Lecture 14 3
Review - Computing Closures
Closure(s) adds all the items implied by items already in s • Any item [A→β•Bδ,a] implies [B→•τ,x] for each production
with B on the lhs, and each x ∈ FIRST(δa) – for LR(1) item
The algorithm
Closure( s ) while ( s is still changing ) ∀ items [A → β •Bδ,a] ∈ s ∀ productions B → τ ∈ P ∀ b ∈ FIRST(δa) // δ might be ε if [B → • τ,b] ∉ s then add [B→ • τ,b] to s
Ø Classic fixed-point method Ø Halts because s ⊂ ITEMS
Closure “fills out” a state
cs415, spring 14 Lecture 14 4
Review - Computing Gotos
Goto(s,x) computes the state that the parser would reach if it recognized an x while in state s • Goto( { [A→β•Xδ,a] }, X ) produces [A→βX•δ,a] (easy part) • Should also includes closure( [A→βX•δ,a] ) (fill out the state)
The algorithm
Goto( s, X ) new ←Ø ∀ items [A→β•Xδ,a] ∈ s new ← new ∪ [A→βX•δ,a] return closure(new)
Ø Not a fixed-point method! Ø Straightforward computation Ø Uses closure ( )
Goto() moves forward
cs415, spring 14 Lecture 14 5
Review - Building the Canonical Collection
Start from s0 = closure( [S’→S,EOF ] ) Repeatedly construct new states, until all are found
The algorithm cc0 ← closure ( [S’→ •S, EOF] ) CC ← { cc0 } while ( new sets are still being added to CC) for each unmarked set ccj ∈ CC mark ccj as processed for each x following a • in an item in ccj temp ← goto(ccj, x) if temp ∉ CC then CC ← CC ∪ { temp } record transitions from ccj to temp on x
Ø Fixed-point computation (worklist version) Ø Loop adds to CC Ø CC ⊆ 2ITEMS, so CC is finite
cs415, spring 14 Lecture 14 6
High-level overview 1 Build the canonical collection of sets of LR(1) Items, I
a Begin in an appropriate state, s0 ♦ Assume: S’ →S, and S’ is unique start symbol that does
not occur on any RHS of a production (extended CFG - ECFG)
♦ [S’ →•S,EOF], along with any equivalent items ♦ Derive equivalent items as closure( s0 )
b Repeatedly compute, for each sk, and each X, goto(sk,X) ♦ If the set is not already in the collection, add it ♦ Record all the transitions created by goto( )
This eventually reaches a fixed point
2 Fill in the table from the collection of sets of LR(1) items The canonical collection completely encodes the
transition diagram for the handle-finding DFA
Review – LR(1) Table Construction
cs415, spring 14 Lecture 14 7
Review: Example (building the collection)
Initialization Step
s0 ← closure( { [Goal → •Expr , EOF] } ) = {[Goal → •Expr , EOF], [Expr à • Term – Expr, EOF], [Expr à •
Term, EOF], [Term à • Factor * Term, -], [Term à • Factor, -], [Term à •
Factor * Term, EOF], [Term à • Factor, EOF], [Factor à •ident, *], [Factor à •ident, -], [Factor à •ident, EOF]}
S ← { S0 }
1: Goal → Expr 2: Expr → Term – Expr 3: Expr → Term 4: Term → Factor * Term 5: Term → Factor 6: Factor → ident
Symbol FIRSTGoal { ident }Expr { ident }Term { ident }
Factor { ident }– { – }* { * }
ident { ident }
cs415, spring 14 Lecture 14 8
Example (building the collection)
Iteration 1 s1 ← goto(s0 , Expr) s2 ← goto(s0 , Term) s3 ← goto(s0 , Factor) s4 ← goto(s0 , ident )
s0 ← closure( { [Goal → •Expr , EOF] } ) { [Goal → • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor → • ident , *] }
cs415, spring 14 Lecture 14 9
Example (building the collection)
Iteration 1 s1 ← goto(s0 , Expr) = { [Goal → Expr •, EOF] } s2 ← goto(s0 , Term) = { [Expr → Term • – Expr , EOF], [Expr → Term •,
EOF] }
s3 ← goto(s0 , Factor) = { [Term → Factor • * Term , EOF],[Term →
Factor • * Term , –], [Term → Factor •, EOF], [Term → Factor •, –] }
s4 ← goto(s0 , ident ) = { [Factor → ident •, EOF],[Factor → ident •, –],
[Factor → ident •, *] }
s0 ← closure( { [Goal → •Expr , EOF] } ) { [Goal → • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor → • ident , *] }
cs415, spring 14 Lecture 14 10
Iteration 1 s1 ← goto(s0 , Expr) = { [Goal → Expr •, EOF] } s2 ← goto(s0 , Term) = { [Expr → Term • – Expr , EOF], [Expr → Term •,
EOF] }
s3 ← goto(s0 , Factor) = { [Term → Factor • * Term , EOF],[Term →
Factor • * Term , –], [Term → Factor •, EOF], [Term → Factor •, –] }
s4 ← goto(s0 , ident ) = { [Factor → ident •, EOF],[Factor → ident •, –],
[Factor → ident •, *] }
Iteration 2 s5 ← goto(s2 , – ) s6 ← goto(s3 , * )
Example (building the collection)
cs415, spring 14 Lecture 14 11
Iteration 1 s1 ← goto(s0 , Expr) = { [Goal → Expr •, EOF] } s2 ← goto(s0 , Term) = { [Expr → Term • – Expr , EOF], [Expr → Term •,
EOF] }
s3 ← goto(s0 , Factor) = { [Term → Factor • * Term , EOF],[Term →
Factor • * Term , –], [Term → Factor •, EOF], [Term → Factor •, –] }
s4 ← goto(s0 , ident ) = { [Factor → ident •, EOF],[Factor → ident •, –],
[Factor → ident •, *] }
Iteration 2
s5 ← goto(s2 , – ) = { [Expr → Term – • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , –], [Term → • Factor * Term , EOF], [Term → • Factor , EOF], [Factor → • ident , *], [Factor → • ident , –], [Factor → • ident , EOF] }
s6 ← goto(s3 , * ) = … see next page
Example (building the collection)
cs415, spring 14 Lecture 14 12
Iteration 2
s6 ← goto(s3 , * ) = { [Term → Factor * • Term , EOF], [Term → Factor * • Term , –], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor → • ident , *] }
Example (building the collection)
s5 ← goto(s2 , – ) = { [Expr → Term – • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , –], [Term → • Factor * Term , EOF], [Term → • Factor , –], [Term → • Factor , EOF], [Factor → • ident , *], [Factor → • ident , –], [Factor → • ident , EOF] }
Iteration 3 s7 ← goto(s5 , Expr ) = { ? } s8 ← goto(s6 , Term ) = { ? } s2 ← goto(s5, Term), s3 ← goto(s5, factor) , s4 ← goto(S5,
ident), s3 ← goto(s6, Factor), s4 ← goto(S6, ident)
cs415, spring 14 Lecture 14 13
Iteration 2
s6 ← goto(s3 , * ) = { [Term → Factor * • Term , EOF], [Term → Factor * • Term , –], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor → • ident , *] }
Example (building the collection)
s5 ← goto(s2 , – ) = { [Expr → Term – • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , –], [Term → • Factor * Term , EOF], [Term → • Factor , –], [Term → • Factor , EOF], [Factor → • ident , *], [Factor → • ident , –], [Factor → • ident , EOF] }
Iteration 3 s7 ← goto(s5 , Expr ) = { [Expr → Term – Expr •, EOF] } s8 ← goto(s6 , Term ) = { [Term → Factor * Term •, EOF], [Term →
Factor * Term •, –] }
cs415, spring 14 Lecture 14 14
Example (Summary)
S0 : { [Goal → • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor→ • ident, *] } S1 : { [Goal → Expr •, EOF] } S2 : { [Expr → Term • – Expr , EOF], [Expr → Term •, EOF] }
S3 : { [Term → Factor • * Term , EOF],[Term → Factor • * Term , –], [Term → Factor •, EOF], [Term → Factor •, –] }
S4 : { [Factor → ident •, EOF],[Factor → ident •, –], [Factor → ident •, *] }
S5 : { [Expr → Term – • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , –], [Term → • Factor * Term , EOF], [Term → • Factor , EOF], [Factor → • ident , *], [Factor → • ident , –], [Factor → • ident , EOF] }
cs415, spring 14 Lecture 14 15
Example (Summary)
S6 : { [Term → Factor * • Term , EOF], [Term → Factor * • Term , –], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor → • ident , *] }
S7: { [Expr → Term – Expr •, EOF] }
S8 : { [Term → Factor * Term •, EOF], [Term → Factor * Term •, –] }
cs415, spring 14 Lecture 14 16
Example (DFA)
s0 s4 s5
s1 s2 s7
s6 s3
s8
ident
term
factor
-
term
expr ident
factor
expr
*
factor term
ident State Ident - * Expr Term Factor
0 4 1 2 3
1
2 5
3 6 4
5 4 7 2 3
6 4 8 3
7 8
The State Transition Table
cs415, spring 14 Lecture 14 17
Example (DFA)
s0 s4 s5
s1 s2 s7
s6 s3
s8
ident
term
factor
-
term
expr ident
factor
expr
*
factor term
ident State Ident - * Expr Term Factor
0 4 1 2 3
1
2 5
3 6 4
5 4 7 2 3
6 4 8 3
7 8
The State Transition Table
cs415, spring 14 Lecture 14 18
Filling in the ACTION and GOTO Tables
The algorithm
∀ set sx ∈ S ∀ item i ∈ sx if i is [A→β •ad,b] and goto(sx,a) = sk, a ∈ T then ACTION[x,a] ← “shift k” else if i is [S’→S •,EOF] then ACTION[x , EOF] ← “accept” else if i is [A→β •,a] then ACTION[x,a] ← “reduce A→β” ∀ n ∈ NT if goto(sx ,n) = sk then GOTO[x,n] ← k
Many items generate no table entry
cs415, spring 14 Lecture 14 19
Example (Filling in the tables) The algorithm produces LR(1) parse table
ACTION GOTO
Ident - * EOF Expr Term Factor0 s 4 1 2 31 acc2 s 5 r 33 r 5 s 6 r 54 r 6 r 6 r 65 s 4 7 2 36 s 4 8 37 r 28 r 4 r 4
Plugs into the skeleton LR(1) parser
State Ident - * Expr Term Factor
0 4 1 2 3
1
2 5
3 6 4
5 4 7 2 3
6 4 8 3
7 8
Remember the state transition table?
cs415, spring 14
An Example for Table Filling Practice
Lecture 15 20
A Parse Table Filling Example
For pdf lecture notes readers, see attached LR(1) parse table example file
cs415, spring 14 Lecture 14 21
What can go wrong?
What if set s contains [A→β•aγ,b] and [B→β•,a] ? • First item generates “shift”, second generates “reduce” • Both define ACTION[s,a] — cannot do both actions • This is a fundamental ambiguity, called a shift/reduce error • Modify the grammar to eliminate it (if-then-else)
What if set s contains [A→γ•, a] and [B→γ•, a] ? • Each generates “reduce”, but with a different production • Both define ACTION[s,a] — cannot do both reductions • This fundamental ambiguity is called a reduce/reduce error • Modify the grammar to eliminate it
In either case, the grammar is not LR(1)
EaC includes a worked example
cs415, spring 14 Lecture 14 22
Shrinking the Tables
Three options: • Combine terminals such as number & identifier, + & -, * & /
→ Directly removes a column, may remove a row → For expression grammar, 198 (vs. 384) table entries
• Combine rows or columns (table compression) → Implement identical rows once & remap states → Requires extra indirection on each lookup → Use separate mapping for ACTION & for GOTO
• Use another construction algorithm → Both LALR(1) and SLR(1) produce smaller tables → Implementations are readily available
cs415, spring 14 Lecture 14 23
LR(0) versus SLR(1) versus LR(1)
LR(0) ? -- set of LR(0) items as states LR(1) ? -- set of LR(1) items as states, different states
compared to LR(0) SLR(1) ? -- LR(0) items and canonical sets, same as LR(0) SLR(1): add FOLLOW(A) to each LR(0) item [A→γ•] as its
second component: [A→γ•, a], ∀a ∈FOLLOW(A)
cs415, spring 14 Lecture 15 24
LR(0) versus SLR(1) versus LR(1)
Example: LR(0) ? LR(1) ? SLR(1) ?
S’ → S S → S ; a | a
cs415, spring 14 Lecture 15 25
LR(0) versus LR(1) versus SLR(1)
s0 = Closure({[S’ → .S]}) = {[S’ -> .S], [S -> .S; a], [S -> .a] } s1 = Closure( GoTo (s0, S)) = {[S’ → S. ], [S → S.; a] } s3 = Closure( GoTo (s1, ;)) = {[S → S; . a]}
s2 = Closure( GoTo (s0, a)) = {[S → a.]} s4 = Closure( GoTo (s3, a)) = {[S → S;a .] }
Grammar is not LR(0), but LR(1) and SLR(1)
s0 = Closure({[S’ → .S,eof]}) = {[S’ -> .S,eof], [S -> .S; a,eof], [S -> .a,;] } s1 = Closure( GoTo (s0, S)) = {[S’ → S. eof], [S → S.; a,eof] } s3 = Closure( GoTo (s1, ;)) = {[S → S; . a,eof]}
LR(0) States
s2 = Closure( GoTo (s0, a)) = {[S → a.,;]} s4 = Closure( GoTo (s3, a)) = {[S → S;a ., eof] }
LR(1) States
cs415, spring 14 Lecture 14 26
LALR(1) versus LR(1)
LALR(1) ? LR(1) items, State -> Grouped LR(1) states LALR(1): Merge two sets of LR(1) items (states), if they have the same core. core of set of LR(1) items: set of LR(0) items derived by ignoring the lookahead symbols
FACT: collapsing LR(1) states into LALR(1) states cannot introduce shift/reduce conflicts
cs415, spring 14 Lecture 15 27
LALR(1) versus LR(1)
s0 = Closure({[S’ → .S, eof]}) s1 = Closure( GoTo (s0, a)) = {[S → a . Ad, eof], [S → a . Be, eof], [A → .c, d], [ B → .c, e]} s2 = Closure( GoTo (s0, b)) = {[S → b . Ae, eof], [S → b . Bd, eof], [A → .c, e], [B → .c, d]} There are other states that are not listed here!
s3 = Closure( GoTo (s1, c)) = {[A → c. , d], [B → c. , e]} s4 = Closure( GoTo (s2, c)) = {[A → c. , e], [B → c. , d]}
Grammar is LR(1), but not LALR(1), since collapsing s3 and s4 (same core) will introduce reduce-reduce conflict.
cs415, spring 14 Lecture 16 28
Hierarchy of Context-Free Grammars
The inclusion hierarchy for context-free grammars
• Operator precedence includes some ambiguous grammars
• LL(1) is a subset of SLR(1)
Context-free grammars
Unambiguous CFGs
Operator Precedence
Floyd-Evans Parsable
LR(k)
LR(1)
LALR(1)
SLR(1)
LR(0)
LL(k)
LL(1)
Ref Book: Michael Sipser, “Introduction to the Theory of Computation”, 3rd Edition
cs415, spring 14 Lecture 15 29
Error Recovery in Shift-Reduce Parsers
The problem: parser encounters an invalid token Goal: Want to parse the rest of the file Basic idea (panic mode):
→ Assume something went wrong while trying to find handle for nonterminal A
→ Pretend handle for A has been found; pop “handle”, skip over input to find terminal that can follow A
Restarting the parser (panic mode): → find a restartable state on the stack (has transition for
nonterminal A) → move to a consistent place in the input (token that can follow A) → perform (error) reduction (for nonterminal A) → print an informative message
cs415, spring 14 Lecture 15 30
Error Recovery in YACC
Yacc’s (bison’s) error mechanism (note: version dependent!) • designated token error • used in error productions of the form A → error α // basic case • α specifies synchronization points When error is discovered • pops stack until it finds state where it can shift the error token • resumes parsing to match α special cases:
→ α = w, where w is string of terminals: skip input until w has been read → α = ε : skip input until state transition on input token is defined
• error productions can have actions
cs415, spring 14 Lecture 15 31
Error Recovery in YACC
cmpdstmt: BEG stmt_list END stmt_list : stmt | stmt_list ‘;’ stmt | error { yyerror(“\n***Error: illegal statement\n”);} This should • throw out the erroneous statement • synchronize at “;” or “end” (implicit: α = ε) • writes message “***Error: illegal statement” to stderr Example: begin a & 5 | hello ; a := 3 end ↑ ↑ resume parsing ***Error: illegal statement
cs415, spring 14 Lecture 16 32
Project #2 (see “lex & yacc”, Levine et al., O’Reilly)
→ You do have to (slightly) change the scanner (scan.l)
→ How to specify and use attributes in YACC?
§ Define attributes as types in attr.h
typedef struct info_node {int a; int b} infonode; § Include type attribute name in %union in parse.y
%union {tokentype token; infonode myinfo; … } § Assign attributes in parse.y to
– Terminals: %token <token> ID ICONST – Non-terminals: %type <myinfo> block variables procdecls cmpdstmt
§ Accessing attribute values in parse.y – use $$, $1, $2 … etc. notation: block : variables procdecls {$2.b = $1.b + 1;} cmpdstmt { $$.a = $1.a + $2.a + $3.b;}
cs415, spring 14 Lecture 16 33
YACC : parse.y %{ #include <stdio.h> #include "attr.h" int yylex(); void yyerror(char * s); #include "symtab.h" %} %union {tokentype token; } %token PROG PERIOD PROC VAR ARRAY RANGE OF %token INT REAL DOUBLE WRITELN THEN ELSE IF %token BEG END ASG NOT %token EQ NEQ LT LEQ GEQ GT OR EXOR AND DIV NOT %token <token> ID CCONST ICONST RCONST %start program %% program : PROG ID ';' block PERIOD { }
; block : BEG ID ASG ICONST END { }
; %% void yyerror(char* s) { fprintf(stderr,"%s\n",s); } int main() { printf("1\t"); yyparse(); return 1; }
parse.y : Will be included verbatim in parse.tab.c
Rules with semantic actions
Main program and “helper” functions; may contain initialization code of global structures. Will be included verbatim in parse.tab.c
List and assign attributes
cs415, spring 14 Lecture 16 34
Project #2 : Things to do
• Learn/Review the C programming language
• Add error productions (syntax errors)
• Define and assign attributes to non-terminals
• Implement single-level symbol table
• Perform type checking and produce required error messages; note: actions may occur at “any” location on right-hand side (implicit use of marker productions)
cs415, spring 14 Lecture 15 35
Ad-hoc, syntax directed translation schemes,type checking
Read EaC: Chapters 4.1- 4.4
Next two classes
Recommended