Upload
opal-wilkins
View
231
Download
0
Embed Size (px)
Citation preview
1
Theory of Compilation 236360
Erez Petrank
Lecture 2: Syntax Analysis, Top-Down Parsing
2
You are here
Executable
code
exe
Source
text
txt
Compiler
LexicalAnalysi
s
Syntax Analysi
s
Parsing
Semantic
Analysis
Inter.Rep.
(IR)
Code
Gen.
3
Last Week: from characters to tokens(Using Regular Expressions)
x = b*b – 4*a*c
txt
<ID,”x”> <EQ> <ID,”b”> <MULT> <ID,”b”> <MINUS>
<INT,4> <MULT> <ID,”a”> <MULT> <ID,”c”>
Token Stream
4
The Lex Tool• Lex automatically generates a lexical analyzer from
declaration file.• Advantages: easy to produce a lexical analyzer from a
short declaration, easily verified, easily modified and maintained.
• Intuitively: Lex builds a DFA, The analyzer simulates the DFA on a given input.
LexDeclaration file
LexicalAnalysi
s
characters
tokens
5
Today: from tokens to AST
LexicalAnalysi
s
Syntax Analysi
s
Sem.Analysi
s
Inter.Rep.
Code Gen.
<ID,”b”> <MULT> <ID,”b”> <MINUS> <INT,4> <MULT> <ID,”a”> <MULT> <ID,”c”>
‘b’ ‘4’
‘b’‘a’
‘c’
ID
ID
ID
ID
ID
factor
term factorMULT
term
expression
expression
factor
term factorMULT
term
expression
term
MULT factor
MINUS
SyntaxTree
6
Syntax Analysis (Parsing)• Goal: discover the program structure.
– For example, a C program is built of functions, each function is built from declarations and instructions, each instruction is built from expressions, etc.
– Is a sequence of tokens a valid program in the language?– Construct a structured representation of the input text– Error detection and reporting
An Example Structure of a Program
program
Main function More Functions
More FunctionsFunction
Function
Decls Stmts
Decls Stmts
Decls Stmts• • •
• • • •
• •
• • •
Decl Decls
Decl Decls
Decl
Stmt Stmts
StmtIdType• • • •
• •
• • •
exprid =• • •
;
;
{ }
{
{
}
}
8
Syntax Analysis (Parsing)• Goal: discover the program structure.
– For example, a C program is built of functions, each function is built from declarations and instructions, each instruction is built from expressions, etc.
– Is a sequence of tokens a valid program in the language?– Construct a structured representation of the input text– Error detection and reporting
• Context free grammars: a simple and accurate method for describing a program structure.
• We will look at families of grammars that can be efficiently parsed.
• The parser will read the token series, make sure that they are derivable in the grammar (or report an error), and construct the derivation tree.
9
Context free grammars
• V – non terminals• T – terminals (tokens for us)• P – production rules
– Each rule of the form V ➞ (T ∪ V)
• S∈V – the initial symbol
G = (V,T,P,S)
10
Why do we need context free grammars?
• Important program structures cannot be expressed by regular expressions. E.g., balanced parenthesis… – S ➞ SS; S ➞ (S); S ➞ ()
• Anything expressible as a regular expression is expressible by CFG. Why use regular expressions at all? – Separation, modularity, simplification. – No point in using strong (and less efficient) tools on
easily analyzable regular expressions.
• Regular expressions describe lexical structures like identifiers, constants, keywords, etc.
• Grammars describe nested structured like balanced parenthesis, match begin-end, if-then-else, etc.
11
Example
S ➞ S;SS ➞ id := EE ➞ id | E + E | E * E | ( E )
V = { S, E }T = { id, ‘+’, ‘*’, ‘(‘, ‘)’}S is the initial variable.
Derivation Example
12
S
S S;
id := E S;
id := id S;
id := id id := E ;
id := id id := E + E ;
id := id id := E + id ;
id := id id := id + id ;
x := z;y := x + z
S ➞ S;SS ➞ id := EE ➞ id | E + E | E * E | ( E )
S ➞ S;S
S ➞ id := E
E ➞ id
S ➞ id := E
E ➞ E + E
E ➞ id
E ➞ id
x:= z ; y := x + z
input grammar
Derivation Example
13
S
S S;
id := E S;
id := id S;
id := id id := E ;
id := id id := E + E ;
id := id id := E + id ;
id := id id := id + id ;
<id,”x”> <ASS> <id,”z”> <SEMI> <id,”y”> <ASS> <id,”x”> <PLUS> <id,”z”>
S ➞ S;SS ➞ id := EE ➞ id | E + E | E * E | ( E )
S ➞ S;S
S ➞ id := E
E ➞ id
S ➞ id := E
E ➞ E + E
E ➞ id
E ➞ id
input grammar
<id,”x”> <ASS> <id,”z”><SEMI><id,”y”><ASS><id,”x”><PLUS><id,”z”>
14
Terminology
• Derivation: a sequence of replacements of non-terminals using the production rules.
• Language: the set of strings of terminals derivable from the initial state.
• Sentential form (תבנית פסוקית) – the result of a partial derivation in which there may be non-terminals.
15
Parse TreeS
S S;
id := E S;
id := id S;
id := id id := E ;
id := id id := E + E ;
id := id id := E + id ;
id := id id := id + id ;x:= z ; y := x + z
S
S
;
S
id :=
E
id
id := E
E
+
E
id id
16
Questions
• How did we know which rule to apply on every step?
• Does it matter? • Would we always get the same result?
17
Ambiguity
x := y+z*wS ➞ S;SS ➞ id := EE ➞ id | E + E | E * E | ( E )
S
id := E
E + E
id
id
E * E
id
S
id := E
E*E
id
id
E + E
id
18
Leftmost/rightmost Derivation
• Leftmost derivation– always expand leftmost non-terminal
• Rightmost derivation– Always expand rightmost non-terminal
• Allows us to describe derivation by listing a sequence of rules only. – always know what a rule is applied to
• Note that it does not necessarily solve ambiguity (e.g., previous slide).
• These are the orders of derivation applied in our parsers (coming soon).
Leftmost Derivation
19
x := z;y := x + z
S ➞ S;S
S ➞ id := E
E ➞ id | E + E | E * E | ( E )
S
S S;
id := E S;
id := id S;
id := id id := E ;
id := id id := E + E ;
id := id id := id + E ;
id := id id := id + id ;
S ➞ S;S
S ➞ id := E
E ➞ id
S ➞ id := E
E ➞ E + E
E ➞ id
E ➞ id
x:= z ; y := x + z
20
Rightmost Derivation
S
S S;
S id := E;
S id := E + E;
S id := E + id;
S id := id + id ;
id := E id := id + id ;
id := id id := id + id ;
x := z;y := x + z
S ➞ S;SS ➞ id := EE ➞ id | E + E | E * E | ( E )
S ➞ S;S
S ➞ id := E
E ➞ E + E
E ➞ id
E ➞ id
S ➞ id := E
E ➞ id
x:= z ; y := x + z
21
Bottom-up Example
x := z;y := x + z
S ➞ S;SS ➞ id := EE ➞ id | E + E | E * E | ( E )
id := id ; id := id + id
id := E id := id + id;
S id := id + id;
S id := E + id;
S id := E + E;
S id := E ;
S S;
S
E ➞ id
S ➞ id := E
E ➞ id
E ➞ id
E ➞ E + E
S ➞ id := E
S ➞ S;S
Bottom-up picking left alternative on every step Rightmost derivation when going top-down
22
Parsing
• A context free language can be recognized by a non-deterministic pushdown automaton– But not a deterministic one…
• Parsing can be seen as a search problem– Can you find a derivation from the start symbol to the input
word?– Easy (but very expensive) to solve with backtracking
• Cocke-Younger-Kasami parser can be used to parse any context-free language but has complexity O(n3)– Imagine a program with hundreds of thousands of lines of
code.
• We want efficient parsers– Linear in input size– Deterministic pushdown automata– We will sacrifice generality for efficiency
23
“Brute-force” Parsing
x := z;y := x + z
S ➞ S;SS ➞ id := EE ➞ id | E + E | E * E | ( E )
id := id ; id := id + id
id := E id := id + id; id := id id := E+ id; …E ➞ id
E ➞ id
(not a parse tree… a search for the parse tree by exhaustively applying all rules)
id := E id := id + id; id := E id := id + id;
24
Efficient Parsers
• Top-down (predictive)– Construct the leftmost derivation– Apply rules “from left to right”– Predict what rule to apply based on nonterminal and
token
• Bottom up (shift reduce)– Construct the rightmost derivation– Apply rules “from right to left”– Reduce a right-hand side of a production to its non-
terminal
25
Efficient Parsers
• Top-down (predictive parsing)
Bottom-up (shift reduce)
to be read…already read…
26
Top-down Parsing
• Given a grammar G=(V,T,P,S) and a word w• Goal: derive w using G• Idea
– Apply production to leftmost nonterminal– Pick production rule based on next input token
• General grammar– More than one option for choosing the next production
based on a token
• Restricted grammars (LL)– Know exactly which single rule to apply– May require some lookahead to decide
27
An Easily Parse-able GrammarE ➞ LIT | (E OP E) | not ELIT ➞ true | falseOP ➞ and | or | xor
not (not true or false)
E => not E => not ( E OP E ) =>not ( not E OP E ) =>not ( not LIT OP E ) =>not ( not true OP E ) =>not ( not true or E ) =>not ( not true or LIT ) =>not ( not true or false )
Production to apply is known from next input token
E
not E
EOPE
LIT
true
not LIT or
( )
false
At any stage, looking at the current variable and the next input token, a rule can be easily determined.
28
An Easily Parse-able Grammar
E => not E => not ( E OP E ) =>not ( not E OP E ) =>not ( not LIT OP E ) =>not ( not true OP E ) =>not ( not true or E ) =>not ( not true or LIT ) =>not ( not true or false )
Production to apply is known from next input token
E
not E
EOPE
LIT
true
not
LIT
or
( )
false
E
At any stage, looking at the current variable and the next input token, a rule can be easily determined.
E ➞ LIT | (E OP E) | not ELIT ➞ true | falseOP ➞ and | or | xor
not (not true or false)
29
Recursive Descent Parsing
• Define a function for every nonterminal• Every function simulates the derivation of the
variable it represents:– Find applicable production rule– Terminal function checks match with next input token– Nonterminal function calls (recursively) other functions
• If there are several applicable productions for a nonterminal, use lookahead
30
Matching tokens
• Variable current holds the current input token
void match(token t) { if (current == t) current = next_token(); else error;}
31
functions for nonterminalsE ➞ LIT | (E OP E) | not ELIT ➞ true | falseOP ➞ and | or | xor
void E() { if (current {TRUE, FALSE}) // E → LIT LIT(); else if (current == LPAREN) // E → ( E OP E ) match(LPARENT); E(); OP(); E(); match(RPAREN); else if (current == NOT) // E → not E match(NOT); E(); else error; }
32
functions for nonterminals
void LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error;}
E ➞ LIT | (E OP E) | not ELIT ➞ true | falseOP ➞ and | or | xor
33
functions for nonterminals
void OP() {if (current == AND)
match(AND);else if (current == OR)
match(OR);else if (current == XOR)
match(XOR);else
error;}
E ➞ LIT | (E OP E) | not ELIT ➞ true | falseOP ➞ and | or | xor
34
Overall: Functions for Grammar
E → LIT | ( E OP E ) | not ELIT → true | falseOP → and | or | xor
void E() {if (current {TRUE, FALSE}) LIT();else if (current == LPAREN) match(LPARENT);
E(); OP(); E();match(RPAREN);
else if (current == NOT) match(NOT); E();else error;
}
void LIT() {if (current == TRUE) match(TRUE);else if (current == FALSE) match(FALSE);else error;
}
void OP() {if (current == AND) match(AND);else if (current == OR) match(OR);else if (current == XOR) match(XOR);else error;
}
35
Adding semantic actions
• Can add an action to perform on each production rule simply by executing it when a function is invoked.
• For example, can build the parse tree– Every function returns an object of type Node– Every Node maintains a list of children– Function calls can add new children
36
Building the parse treeNode E() { result = new Node(); result.name = “E”; if (current {TRUE, FALSE}) // E → LIT result.addChild(LIT()); else if (current == LPAREN) // E → ( E OP E ) result.addChild(match(LPARENT)); result.addChild(E()); result.addChild(OP()); result.addChild(E()); result.addChild(match(RPAREN)); else if (current == NOT) // E → not E result.addChild(match(NOT)); result.addChild(E()); else error; return result;}
37
Getting Back to the Example
• Input = “( not true and false )”;Node treeRoot = E();
E
( E OP E )
not LIT
falsetrue
and LIT
38
Recursive Descent
• How do you pick the right A-production?• Generally – try them all and use backtracking
(costly). • In our case – use lookahead
void A() { choose an A-production, A -> X1X2…Xk; for (i=1; i≤ k; i++) { if (Xi is a nonterminal) call procedure Xi(); elseif (Xi == current) advance input; else report error; }}
In its basic form, each variable has a procedure that looks like:
39
Recursive Descent: a problem
• With lookahead 1, the function for indexed_elem will never be tried… – What happens for input of the form
• ID [ expr ]
term ➞ ID | indexed_elemindexed_elem ➞ ID [ expr ]
40
Recursive Descent: Another Problem
Bool S() { return A() && match(token(‘a’)) && match(token(‘b’));}Bool A() { if (current == ‘a’) return match(token(‘a’)) else return true ;}
S ➞ A a bA ➞ a |
What happens for input “ab” ? What happens if you flip order of alternatives and try “aab”?
41
Recursive descent: a third problem
Bool E() { return E() && match(token(‘-’)) && term() || ID();}
E ➞ E – term | term
What happens with this procedure? Recursive descent parsers cannot handle left-recursive
grammars
42
3 Bad Examples for Recursive Descent
Can we make it work?
term ➞ ID | indexed_elemindexed_elem ➞ ID [ expr ]
S ➞ A a bA ➞ a |
E ➞ E - term
43
The “FIRST” Sets
• To formalize the property (of a grammar) that we can determine a rule using a single lookahead we define the FIRST sets.
• For every production rule A➞ 𝞪– FIRST( ) = all terminals that can start with 𝞪 𝞪– i.e., every token that can appear first under some derivation
for 𝞪
• No intersection between FIRST sets => can pick a single rule
• In our Boolean expressions example– FIRST(LIT) = { true, false }– FIRST( ( E OP E ) ) = { ‘(‘ }– FIRST ( not E ) = { not }
E ➞ LIT | (E OP E) | not ELIT ➞ true | falseOP ➞ and | or | xor
44
The “FIRST” Sets
• No intersection between FIRST sets => can pick a single rule
• If the FIRST sets intersect, may need longer lookahead– LL(k) = class of grammars in which production rule can be
determined using a lookahead of k tokens– LL(1) is an important and useful class
45
The FOLLOW Sets
• FIRST is not enough when variables are nullified. • Consider: S ➞ AB | c ; A ➞ a | ; B ➞ b;
• Need to know what comes afterwards to select the right production
• For any non-terminal A – FOLLOW(A) = set of tokens that can immediately follow
A
• Can select the rule N ➞ with lookahead “b”, if 𝞪– b∈FIRST( ) or𝞪– 𝞪 may be nullified and b∈FOLLOW(N).
46
LL(k) Grammars
• A grammar is in the class LL(K) when it can be derived via:– Top down derivation– Scanning the input from left to right (L)– Producing the leftmost derivation (L)– With lookahead of k tokens (k)
• A language is said to be LL(k) when it has an LL(k) grammar
47
Back to our 1st example
• FIRST(ID) = { ID }• FIRST(indexed_elem) = { ID }
• FIRST/FIRST conflict
• This grammar is not in LL(1). Can we “fix” it?
term ➞ ID | indexed_elemindexed_elem ➞ ID [ expr ]
48
Left factoring
• Rewrite into an equivalent grammar in LL(1)
term ➞ ID | indexed_elemindexed_elem ➞ ID [ expr ]
term ➞ ID after_IDafter_ID ➞ [ expr ] |
Intuition: just like factoring x*y + x*z into x*(y+z)
49
Left factoring – another example
S ➞ if E then S else S | if E then S | T
S ➞ if E then S S’ | TS’ ➞ else S |
50
Back to our 2nd example
• Select a rule for A with a in the look-ahead: – Should we pick (1) A ➞ a or (2) A ➞ ?
• (1) FIRST(a) = { ‘a’ } (and a cannot be nullified).
• (2) FIRST ()=. Also, can (must) be nullified and FOLLOW(A) = { ‘a’ }
• FIRST/FOLLOW conflict• The grammar is not in LL(1).
S ➞ A a bA ➞ a |
51
An Equivalent Grammar via Substitution
S ➞ A a bA ➞ a |
S ➞ a a b | a b
Substitute A in S
S ➞ a after_a after_a ➞ a b | b
Left factoring
52
So Far
• We have tools to determine if a grammar is in LL(1)– The FIRST and FOLLOW sets. – In tutorials: algorithms for finding and using those.
• We have some techniques for modifying a grammar to find an equivalent in LL(1). – Left factoring,– Assignment.
• Now let’s look at the 3rd example and present one more such technique.
53
Back to our 3rd example
• Left recursion cannot be handled with a bounded lookahead.
• What can we do?
• Any grammar with a left recursion has an equivalent grammar with no left recursion.
E ➞ E – term | term
54
Left Recursion Elimination
• L(G1) = β, βα, βαα, βααα, …• L(G2) = same
N N➞ α | β N ➞ βN’ N’ ➞ αN’ |
G1 G2
E ➞ E – term | termE ➞ term TE
TE ➞ - term TE |
For our 3rd example:
אלימינציה של רקורסיה שמאלית
• נחליף את הכלליםביטול רקורסיה ישירה:• A → Aα1 | Aα2 | ··· | Aαn | β1 | β2 | ··· | βn
• בכללים• A → β1A’ | β2A’ | ··· | βnA’ • A’ → α1A’ | α2A’| ··· | αnA’ | Є
• ...ריק αi שימו לב שהשיטה לא עובדת אם• .ריק βi וגם עלולה ליצור רקורסיה שמאלית עקיפה אם
• .’A ריק, אז נוצרת רקורסיה שמאלית של αi אם• αj ריק, אז תיתכן רקורסיה שמאלית עקיפה כאשר βi אם
:A-מתחיל ב– A → A’ וגם – A’ → A….
אלימינציה של רקורסיה שמאלית
• נחליף את הכלליםביטול רקורסיה ישירה:• A → Aα1 | Aα2 | ··· | Aαn | β1 | β2 | ··· | βn
• בכללים• A → β1A’ | β2A’ | ··· | βnA’ • A’ → α1A’ | α2A’| ··· | αnA’ | Є
• :צריך לטפל גם ברקורסיה עקיפה. למשל• S → Aa | b• A → Ac | Sd | Є
• .ועבורה האלגוריתם מעט יותר מורכב
)עקיפה אלגוריתם להעלמת רקורסיה שמאלת וישירה( מדקדוק
• שאולי יש בו רקורסיה שמאלית, ללא מעגלים, וללא G דקדוקקלט:.ε כללי
• . דקדוק שקול ללא רקורסיה שמאליתפלט:
• .A → Є :דוגמא לכלל אפסילון• .;A → B; B → A :דוגמא למעגל
• .)ניתן לבטל כללי אפסילון ומעגלים בדקדוק )באופן אוטומטי
• רעיון האלגוריתם לסילוק רקורסיה שמאלית: נסדר את המשתנים A1, A2, …, An :לפי סדר כלשהו
• נדאג שכל כלל שלו יהיה Ai נעבור על המשתנים לפי הסדר, ולכל מהצורה
• Ai → Ajβ with j > i .• ?מדוע זה מספיק
An Algorithm for Left-Recursion Elimination
• Input: Grammar G possibly left-recursive, no cycles, no ε productions.
• Output: An equivalent grammar with no left-recursion• Method: Arrange the nonterminals in some order A1, A2, …, An
• for i:=1 to n do begin for s:=1 to i-1 do begin replace each production of the form Ai → Asβ
by the productions Ai → d1β |d2β|…|dkβ
where As -> d1 | d2 | …| dk are all the current As-productions;
end eliminate immediate left recursion among the Ai-productions
end
ניתוח האלגוריתם
• Ak → Atβ נראה שבסיום האלגוריתם כל חוק גזירה מהצורה. t > k מקיים
• : כשגומרים את הלולאה הפנימית עבור1שמורה s כלשהו )עם Ai בלולאה
Aj מתחילים בטרמינלים, או במשתנים Ai החיצונית( אז כל כללי הגזירה של .j>s עבורם
• : כשמסיימים עם המשתנה2שמורה Ai, כל כללי הגזירה שלו מתחיליםאו בטרמינלים. j>i עבורם Aj במשתנים
.s-ו i הוכחת שתי השמורות יחד באינדוקציה על• : בסיום האלגוריתם אין רקורסיה שמאלית בין המשתנים מסקנה
המקוריים )ישירה או עקיפה(. 2נובע משמורה .
• לגבי המשתנים החדשים, הם תמיד מופיעים כימניים ביותר, ולכן לעולם לא .יהיו מעורבים ברקורסיה שמאלית
60
LL(k) Parsers
• Recursive Descent– Manual construction– Uses recursion
• Wanted– A parser that can be generated automatically– Does not use recursion
61
LL(k) parsing with pushdown automata
• Pushdown automaton uses– A stack– Input stream– Transition table
• nonterminals x tokens -> production rule• Entry indexed by nonterminal N and token t contains the
rule of N that must be used when current input starts with t
• The initial state: – Input stream has the input ($ marks its end). – Stack starts with “S$” for the initial variable S.
62
LL(k) parsing with pushdown automata
• Two possible moves– Prediction:
• When top of stack is nonterminal N and next token is t: pop N, lookup rule at table[N,t]. If table[N,t] is not empty, push the right-side of the rule on prediction stack, otherwise – syntax error.
– Match:• When top of prediction stack is a terminal T and next token is
t:If (t == T), pop T and consume t. If (t ≠ T) syntax error.
• Parsing terminates when prediction stack is empty. If input is empty at that point, success. Otherwise, syntax error
Stack During the Run:
if ( E ) then Stmt else Stmt ; Stmts ; } $
מחסנית:
top
if ( id < id ) then id = id + num else break; id = id * id; …
Remaining Input:
64
Example transition table
( ) not true false and or xor $
E 2 3 1 1
LIT 4 5
OP 6 7 8
(1) E → LIT(2) E → ( E OP E ) (3) E → not E(4) LIT → true(5) LIT → false(6) OP → and(7) OP → or(8) OP → xor
Non
term
inal
s
Input tokens
Which rule should be used
65
Simple Example
a b c
A A aAb➞ A c➞
A aAb | c➞aacbb$
Input suffix Stack content Move
aacbb$ A$ predict(A,a) = A aAb➞
aacbb$ aAb$ match(a,a)
acbb$ Ab$ predict(A,a) = A aAb➞
acbb$ aAbb$ match(a,a)
cbb$ Abb$ predict(A,c) = A c➞
cbb$ cbb$ match(c,c)
bb$ bb$ match(b,b)
b$ b$ match(b,b)
$ $ match($,$) – success
Stack top on left
66
The Transition Table
• Constructing the transition table is not hard. – It builds on FIRST and FOLLOW.
• You will construct First, Follow, and the table in the tutorials.
67
Simple Example on a Bad Word
a b c
A A aAb➞ A c➞
A ➞ aAb | cabcbb$
Input suffix Stack content Move
abcbb$ A$ predict(A,a) = A aAb➞
abcbb$ aAb$ match(a,a)
bcbb$ Ab$ predict(A,b) = ERROR
68
Error Handling
• Types of errors: – Lexical errors (typos)– Syntax errors (e.g., imbalanced parenthesis) – Semantic errors (e.g., type mismatch)– Logical errors (infinite loop, but also use of ‘=‘ instead of
‘==‘).
• Requirements: – Report the error clearly. – Recover and continue so that more errors can be
discovered. – Be reasonably efficient.
69
Error Handling and Recoveryx = a * (p+q * ( -b * (r-s);
Where should we report the error? The valid prefix property Recovery is tricky
Heuristics for dropping tokens, skipping to semicolon, etc.
70
Error Handling in LL Parsers
• Now what?– Predict bS anyway “missing token b inserted in line XXX”
S ➞ a c | b Sc$
a b c
S S ➞ a c S ➞ bS
Input suffix Stack content Move
c$ S$ predict(S,c) = ERROR
71
Error Handling in LL Parsers
• Result: infinite loop
S ➞ a c | b Sc$
a b c
S S ➞ a c S ➞ bS
Input suffix Stack content Move
bc$ S$ predict(b,c) = S ➞ bS
bc$ bS$ match(b,b)
c$ S$ Looks familiar?
72
Error Handling
• Requires more systematic treatment• Some examples
– Panic mode (or acceptable-set method): drop tokens until reaching a synchronizing token, like a semicolon, a right parenthesis, end of file, etc.
– Phrase-level recovery: attempting local changes: replace “,” with “;”, eliminate or add a “;”, etc.
– Error production: anticipate errors and automatically handle them by adding them to the grammar.
– Global correction: find the minimum modification to the program that will make it derivable in the grammar. • Not a practical solution…
73
Summary
• Lexical analysis tokens• Parsing understand the program structure. • Context-Free Grammars. • Top-down or bottom-up. • Recursive descent: recursion, a function for each variable. • General grammars hard to parse. • LL(k) grammars (with small k’s): efficient. • Use pushdown automata. • Non-LL(k) Grammars may sometimes be “fixed”:
– left-recursion elimination, left factorization, and assignments.
74
Coming up next time
• Bottom-Up Parsing.