46
Chapter 4 Syntax Analysis

Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Embed Size (px)

Citation preview

Page 1: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Chapter 4

Syntax Analysis

Page 2: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Syntax Error Handling• Example:

1. program prmax(input,output)2. var

3. x,y:integer; 4. function max(i:integer , j:integer) : integer;5. {return maximum of integers I and j}6. begin7. if I > j then max := I ;8. else max := j9. end;10. 11. readln (x,y);12. writelin (max(x,y))13. end.

Page 3: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Common Punctuation Errors

• Using a comma instead of a semicolon in the argument list of a function declaration (line 4)

• Leaving out a mandatory semicolon at the end of a line (line 4)

• Using an extraneous semicolon before an else (line 7)• Common Operator Error : Using = instead of := (line 7 or

8)• Misspelling keywords : writelin instead of writeln (line 12)

• Missing begin or end (line 9 missing), usually difficult to

repair.

Page 4: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Error Reporting

• A common technique is to print the offending line with a pointer to the position of the error.

• The parser might add a diagnostic message like “semicolon missing at this position” if it knows what the likely error is.

Page 5: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

How to handle Syntax errors

• Error Recovery : The parser should try to recover from an error quickly so subsequent errors can be reported. If the parser doesn’t recover correctly it may report spurious errors.

• Panic mode

• Phase-level Recovery

• Error Productions

Page 6: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Panic-mode Recovery

• Discard input tokens until a synchronizing token (like; or end) is found.

• Simple but may skip a considerable amount of input before checking for errors again.

• Will not generate an infinite loop.

Page 7: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Phase-level Recovery

• Perform local corrections

• Replace the prefix of the remaining input with some string to allow the parser to continue. – Examples: replace a comma with a

semicolon, delete an extraneous semicolon or insert a missing semicolon. Must be careful not to get into an infinite loop.

Page 8: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Recovery with Error Productions

• Augment the grammar with productions to handle common errors

• Example:

parameter_list identifier_list : type

| parameter_list; identifier_list : type

| parameter_list, {error; writeln (“comma should be a semicolon”)} identifier_list : type

Page 9: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Recovery with Global Corrections

• Find the minimum number of changes to correct the erroneous input stream.

• Too costly in time and space to implement.

• Currently only of theoretical interest.

Page 10: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Context Free Grammars

• A CFG consists of – terminals , – non-terminals, – a start symbol and – productions.

Page 11: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Context Free Grammars

• Terminals – The basic symbols from which strings are formed the tokens

• Non-terminals – Syntactic variables denoting sets of strings.

• Start symbol – One of the non-terminals. The set of strings it denotes is the language defined by the grammar. The first production is for the start symbol.

• Productions : Specify the manner in which the terminals and non-terminals can be combined to form strings.

Page 12: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Example

• A grammar for simple arithmetic expressions. The productions are:

expr expr op exprexpr (expr)expr -exprexpr idop +op -op *op /op

• The terminals are : id + - * / ()• The nonterminals are : expr op• The start symbol is : expr

Page 13: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Notational Conventions

• Terminals – lower case letters early in the alphabet (a,b,c), – operator symbols (+,-), – punctuation symbols, digits and – boldface stringd ((id, if).

• Nonterminals will be – upper-case letters early in the alphabet (A,B,C) – lower-case italic strings (expr, stmt).– The letter S when it appears is the start symbol.

• Upper-case letters late in the alphabet (X,Y,Z) are grammar symbols (either terminals or nonterminals).

Page 14: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Notational Conventions

• Lower – case letters late in the alphabet (x,y,z) are strings of terminals.

• Lower – case Greek letters (α, β, γ) are strings of grammar symbols.

• A vertical bar |, separates alternative productions. – A α | β | γ means A α, A β, A γ are all productions.

• Example: The grammar of earlier example can be written:E E A E | (E) | -E | idA + | - | * | / |

Page 15: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

• Derivations : The double arrow, , means derives.

• In general, α A β α γ β if A γ is a production and α and β are arbitrary strings of grammar symbols.

• A leftmost derivation always replaces the leftmost nonterminal of a sentinential form.

• A rightmost derivation always replaces the rightmost nonterminal of s sentinential form.

• A sentence is a sentential form with no nonterminals.

Page 16: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Classification of Parsers

• An LR parser reads the source from left – to – right (the L) and produces a parse tree with right – most derivations (the R).

• An LL parser reads the source from left – to – right (the first L) and produces a parse tree with left most derivations (the second L).

• Example: The sentence id + id * id has two distinct leftmost derivations :

E E + Eid + Eid + E * Eid + id * Eid + id * id

E E * E

E + E * E

id + E * E

id + id * E

id + id * id

Page 17: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

The corresponding parse trees are:

E

E + E

id E * E

id id

E

E*

id

E

E * E

id id

•The left tree reflects the customary precedence of + and *. •The right tree does not. •The grammar is ambiguous.

Page 18: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Ambiguity

• A grammar is ambiguous if it can produce more than one parse tree. It produces more than one leftmost derivation or more than one rightmost derivation.

• An unambiguous grammar is desirable. Otherwise the parser needs some disambiguating rules to throwaway the incorrect parse trees.

• Regular expressions vs. grammars: Every construct described by a regular expression can also be described by a grammar. The converse is not true.

• We could use a CFG instead of regular expressions to describe a lexical analyzer.

Page 19: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

• We use regular expressions because: - Regular expressions are easier to

understand.- Lexical analysis is simpler than syntax

analysis and doesn’t need a grammar.- A more efficient analyzer can be constructed

automatically from regular expressions than from a grammar.

Page 20: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

● We could combine lexical analysis with syntax analysis for a simple grammar● using a single grammar where the terminals

are source characters instead of tokens.● Separating the functions is better because it

modularizes the front-end functions into two components of manageable size.

Page 21: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Removing common Ambiguity

• Eliminating the “dangling – else” ambiguity. Some languages allow if-then statements and if –then-else statements:

stmt if expr then stmt | if expr then stmt else stmt

• The grammar is unambiguous since the string if E1 then if E2 then S1 else S2 has two parse trees:

Page 22: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

stmt

if exprthen

E1

stmt

if exprthen

stmt

else

stmt

E2 S1S2

Page 23: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

stmt

if exprthen

E1

stmt

if

thenstmt

E2 S1

expr

else

stmt

S2

Page 24: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

• All programming languages allowing both forms of conditional statements use the disambiguating rule that each else should be matched with the closest previously unmatched then. The first parse tree should be used.

• The dangling – else ambiguity can be eliminated by rewriting the grammar. Each else will be matched with the closest previously unmatched then.

Page 25: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Removing common Ambiguity

stmt matched_stmt | unmatched_stmt

matched_stmt if expr then matched_stmt else matched_stmt | other

unmatched_stmt if expr then stmt | if expr then matched_stmt else unmatched_stmt

Page 26: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Removing common Ambiguity

Eliminating left recursion : A grammar is left recursive if there is a derivation A A α for some nonterminal A and string α. Top-down parsers cannot handle left – recursive grammars (they go into an infinite loop). We need to transform such a grammar to eliminate the left-recursion.

• Elimination immediate left-recursion : Immediate left-recursion occurs if the grammar has a production of the form A A α. First group all the productions for A:

Page 27: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Eliminating left recursion

A Aα1 | Aα2 | …. | Aαm | β1 | β2 | …. | βn

where no βi begins with A. Then add a new nonterminal, A’, and replace the A – productions with :

A β1 A’ | β2 A’ | … | βn A’

A’ α1 A’ | α2 A’ |… | αm A’ | €• A grammar may not have immediate left-

recursion but still have left-recursion. The productions for two or more non-terminals combine to give left-recursion.

Page 28: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Eliminating left recursion

• For example : A Aa | b

• The nonterminal A is left-recursive because A => Aa => Aba.

A A’b A’ aA’ | €

Page 29: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Left factoring

• Left factoring : A grammar transformation useful for predictive parsing. Suppose a non terminal, A, has two alternative productions, A αβ1 | αβ2 , beginning with the same non empty string α. A predictive parser doesn’t know which production to pick until α has been treated and the next token has been read.

• Replace A αβ1 | αβ2 with A αA’ and A’ β1 | β2. Now the parser doesn’t have to make a decision until the start of β1 or β2.

Page 30: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Example of Left - Factoring

• The alternative productions:cond_stmt if expr then stmt

| if expr then stmt else stmtcan be left-factored to get : cond_stmt if expr then stmt else_partelse_part else stmt | €

• There are some syntactic constructs that can’t be described with grammars. The syntax analyzer can’t make these checks so they are postponed to the semantic analyzer phase.

• E.g. We can’t write a grammar to check that each variable is declared before being used or that the number of arguments in a procedure (function) call agrees with the number of arguments in the definition.

Page 31: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Bottom – Up Parsing (stop here)

• Shift-reduce parsing : Reduce the input string to the start symbol in a series of reduction steps. Each step replaces a substring of the input with a nonterminal.

• E.g. The grammar is :S a A B eA A b c | bB dThe sentence a b b c d e can be reduced to S as follows:a b b c d e A b, B da A b c d e A A b c, A b, B da A d e B da A B e S a A B e

Page 32: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

• Writing these strings in reverse gives us a right-most derivation of a b b c d e:S => a A B e => a A d e => a A b c d e => a b b c d e

• A shift-reduce parser is an LR parser. • Definition : A handle of a string is a substring

that (1) matches the right side of a production and (2) whose reduction to the nonterminal on the left side of the production is one step along the reverse of a right-most derivation.

Page 33: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Example• The grammar is E E + E | E * E | (E) | id.

• The rightmost derivation of id1 + id2 * id3 is:E => E + E => E + E * E

=> E + E * id3

=> E + id2 * id3

=> id1 + id2 * id3 • The handles are underlined. The grammar is ambiguous and there

is another rightmost derivation :E => E * E => E * id3

=> E + E * id3

=> E + id2 * id3

=> id1 + id2 * id3

Page 34: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Shift-Reduce Parsing

• The handle for E + E * id3 is now E + E instead of id3.

• Shift-reduce parser actions:Shift : The next input symbol is shifted onto the top of stack. Reduce : Replace the handle (on the top of the stack) with a non terminal.Accept : Announce successful completion of parsing.Error : call an error recovery routine if a syntax error is discovered.

Page 35: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Stop here

Page 36: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Operator-Precedence Parsing

• Operator grammar : No production right slide has € or two or more adjacent nonterminals. It is easy to build an efficient shift-reduce parser for these grammars.

• E.g. The foolowing grammar is not an operator grammar:E E A E | (E) | -E | idA + | - | * | / | but it can be modified to make it an operator grammar:E E + E | E – E | E * E | E / E |E E| (E) | -E | id

• Define three disjoint precedence relations between certain pairs of terminals:a <• b a “yields precedence to” ba = b a “has the same precedence as” ba •> b a “takes precedence over” b

Page 37: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

• Use $ to mark the ends of the string and define $ <• b and b •> $ for all terminals b.

• Operator precedence relations:

id + * $

id •> •> •>+ <• •> <• •>* <• •> •> •>$ <• <• <•

Page 38: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

• Note that id has precedence over +. The + and * operators are both left associative so + •> + and * •> *

• Finding the handle in a sentential form : The string never has two adjacent nonterminals. Ignore the nonterminals and insert the precedence between the terminals. Scan the string from the left end until a •> is found. Go back to the right until a <• is found. The handle contains everything between the <• and •> including any intervening or surrounding nonterminals.

Page 39: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

Example• $ id + id * id $. The precedence relations are:

$ id + id * id $

<• •> <• •> <• •>The first handle is the first id. It is reduced by E id to form:$ E + id * id $

<• <• •> <• •>The second handle is the next id. It is reduced by E id to form:$ E + E * id $

<• <• <• •>The third handle is the last id. It is reduced by E id to form :

Page 40: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

$ E + E * E $<• <• •>

The fourth handle is E * E. It is reduced by E E * E to form :$ E + E $ <• •>The last handle is E + E. It is reduced by E E + E to form:

$ E $ • A shift-reduce parser can work on an operator grammar

as follows. It keeps track of the last terminal stacked and keeps shifting input to the stack until a •> is found. Then it goes back in the stack until a <• is found. The symbols in between the <• and •> are the handle which it reduces to a nonterminal.

Page 41: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

• The hyphen may be a binary infix operator like 3-2 or it may be a unary prefix operator like -4. Looking at thee token on its left tells us which it is.

• A hyphen is a binary infix operator if the token on its left is id or )

• A hyphen is a unary prefix operator if the token on its left is + - * / ( or $ or

• The hyphen should have two different tokens to distinguish between the two cases.

• The lexical analyzer can remember the previous token generated and assign the correct token to the hyphen.

• Or the syntax analyzer can assign the correct token as it scans the input from the lexical analyzer.

Page 42: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

• The unary minus sign has higher precedence than any operator (unary or binary) on its left. It has lower precedence than id ( or any unary operator on its right.

• Error Handling : The fig. below is a condensed table showing the error entries.

id ( ) $

id e3 e3 > >

( < < = e4

) e3 e3 > >

$ < < e2 e1

Page 43: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

• e1 : insert id onto the input.

message: “missing operand”• e2 : delete ) from the input.

message: “unbalanced right paranthesis”• e3 : insert + onto the output.

message: “missing operator”• e4 : pop ( from the stack.

message: “missing right paranthesis”

Page 44: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

• When a handle is found on the stack a reduction is called for. If there are missing nonterminals then issue a “missing operand” message and do the reduction anyway.

• Example: The input is id + ) :

Stack Input Action

$ id + ) $ shift

$ id + ) $ Reduce by E id

$ E + ) $ Shift

$ E + ) $ Reduce by E E + E. issue “missing operand”message.

$ E ) $ Error e2. issue “unbalanced right paranthesis”message

$ E $ accept

Page 45: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

• Write an operator – precedence parser for Pascal expressions . Treat the following productions:

• expression simple_ expression | simple_ expression relop simple_ expression

• simple_ expression term | sign term | simple_ expression addop term

• term factor | term mulop factor• factor id | id (expression_list) | num | (expresion) | not

factor• sign + | -• expression_list expression | expression_list ,

expression

Page 46: Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)

where

relop = | <> | < | <= | >= | >

addop + | - | or

mulop * | / | div | mod | and• Note that not is a unary operator and + and – may be

unary or binary.• Note that id (expression_list) is a function call. Thus id <•

( instead of being an error.